Writing your first flow with Prompt Flow in Visual Studio Code

In this blog post, we will create a flow with Prompt Flow in Visual Studio Code. Prompt Flow is a suite of development tools to build LLM-based AI applications. It tries to cover the end-to-end development cycle, including prototyping, testing and deployment to production.

In Prompt Flow, you create flows. Flows link LLMs (large language models), prompts and tools together in an executable workflow. An example of such a flow is show below:

Sample flow

The flow above (basically a distributed acyclical graph – DAG – of functions) sends its input, a description to search for an image, to a tool that embeds the description with an Azure OpenAI embedding model. The embedding is used as input to a Python tool that does a similarity search in Azure AI Search. The search returns three results. The original input, together with the query results, are subsequently handed to an LLM (above, the final_result node) that hopefully picks the correct image url.

Although you could write your own API that does all of the above, Prompt Flow allows you to visually build, run and debug a flow that has input and output. When you are happy with the flow, you can convert it to an API. One of the ways to host the API is via a container.

We will build this flow on our local machine and host it as a container. Note that Prompt Flow can also be used from the portal using Azure Machine Learning or Azure AI Studio.

👉 Another blog post will describe how to build and run the container

Installing Prompt Flow on your machine

To install Prompt Flow you will need Python on your machine. Use Python 3.9 or higher. I use Python 3.11 on an Apple M2. Check the full installation instructions here. Without using a Python virtual environment, you can just run the following command to install Prompt Flow:

pip install promptflow promptflow-tools

Next, run pf -v to check the installation.

⚠️ Do not forget to install promptflow-tools because it enables the embedding tool, llm tool and other tools to be used as nodes in the flow; also ensure this package is installed in the container image that will be created for this flow

In Visual Studio Code, install the Prompt flow for VS Code extension. It has the VS Code Python Extension as a prerequisite. Be sure to check the Prompt Flow Quick Start for full instructions.

We will mainly use the Visual Code extension. Note that the pf command can be used to perform many of the tasks we will discuss below (e.g, creating connections, running a flow, etc…).

Creating an empty flow

In VS Code, ensure you opened an empty folder or create a new folder. Right click and select New flow in this directory. You will get the following question:

Flow selection

Select Empty Flow. This creates a file called flow.dag.yaml with the following content:

Empty flow.dag.yaml

If you look closely, you will see a link to open a Visual editor. Click that link:

Visual editor with empty input and output and blank canvas

We can now add input(s) and output(s) and add the nodes in between.

Inputs and outputs

Inputs have a type and a value. Add a string input called description:

One string input: a description (of an image, like creature or fruit)

When you later run the flow, you can type the description in the Value textbox. When the flow is converted to an API, the API will except a description in the POST body.

Next, add an output called url. In the end, the flow returns a url to an image that matches the description:

One output: the url to a matching image

The value of the output will be the coming from another node. We still have to add those. If you click the Value dropdown list, you will only be able to select the input value for now. You can do that and click the run icon. Save your flow before running it.

Running the flow with output set to the input

When you click the run button, a command will be run in the terminal that runs the flow:

python3 -m promptflow._cli._pf.entry flow test --flow /Users/geertbaeke/projects/promptflow/images/blogpost --user-agent "prompt-flow-extension/1.6.0 (darwin; arm64) VSCode/1.85.0"

The output of this command is:

Output of the flow is JSON, here with just the url

Although this is not very useful, the flow runs and produces a result. The output is our input. We can now add nodes to do something useful.

Creating an embedding from the description

We need to embed the description to search for similar descriptions in an Azure AI Search index. If you are not sure what embeddings are, check Microsoft Learn for a quick intro. It short, it’s a bunch of numbers that represents the meaning of a piece of text. We can use the numbers of the description to compare it to the sets of numbers of image descriptions to see how close they are.

To create an embedding, we need access to an Azure OpenAI embedding model. Such a model takes text as input and returns the bunch of numbers we talked about. This model returns 1536 numbers, aka dimensions.

To use the model, we will need an Azure OpenAI resource’s endpoint and key. If you do not have an Azure OpenAI resource in Azure, create one and deploy the text-embedding-ada-002 model. In my example, the deployment is called embedding:

Embedding model in Azure OpenAI

With the Azure resources created, we can add a connection in Prompt Flow that holds the OpenAI endpoint and key:

Click the Prompt Flow extension icon and click + next to Azure OpenAI in the Connections section:

Azure OpenAI connection

A document will open that looks like the one below:

Connection information

Fill in the name and api_base only. The api_base is the https url to your Azure OpenAI instance. It’s something like https://OPENAIRESOURCENAME.openai.azure.com/. Do not provide the api_key. When you click Create connection (the smallish link at the bottom), you will be asked for the key.

After providing the key, the connection should appear under the Azure OpenAI section. You will need this connection in the embedding tool to point to the embedding model to use.

In the Prompt Flow extension pane, now click + next to Embedding in the TOOLS section:

Embedding tool

You will be asked for the tool’s name (top of VS Code window). Provide a name (e.g, embedding) and press enter. Select the connection you just created, the deployment name of your embedding model and the input. The input is the description we configured in the flow’s input node. We want to embed that description. The output of this tool will be a list of floating point numbers, a vector, of 1536 dimensions.

Embedding tool

The moment you set the input of the embedding, the input node will be connected to the embedding node on the canvas. To check if embedding works, you can connect the output of the embedding node to the url output and run the flow. You should then see the vector as output. The canvas looks like:

Show the embedding as output

Of course, we will need to supply the embedding to a vector search engine, not to the output. In our case, that is Azure AI Search. Let’s try that…

⚠️ Instead of connecting the embedding to the output, you can simply debug the embedding by clicking the debug icon in the embedding tool. The tool will be executed with the value of the input. The result should be a bunch of numbers in your terminal:

Debug output of the embedding tool

Searching for similar images

This section is a bit more tricky because you need an Azure AI Search index that allows you to search for images using a description of an image. To create such an index, see https://atomic-temporary-16150886.wpcomstaging.com/2023/12/09/building-an-azure-ai-search-index-with-a-custom-skill/.

Although you can use a Vector DB Lookup tool that supports Azure AI Search, we will create a custom Python tool that does the same thing. The Python tool uses the azure-search-documents Python library to perform the search. Learning how to use Python tools is important to implement logic there is no specific tool for.

First, we will create a custom connection that holds the name of our Azure AI Search instance and a key to authenticate.

Similar to the Azure OpenAI connection, create a custom connection:

Custom connection

After clicking +, a document opens. Modify it as follows:

Custom connection content

Like before, set a name. In a custom connection, you can have configs and secrets. In configs add the Azure AI Search endpoint and index name. In the secrets set key to <user-input>. When you click Create connection, you will be asked to supply the key.

⚠️ Connection information is saved to a local SQLLite database in the .promtflow folder in your home folder

We can now add a Python tool. In TOOLS, next to Python click +. Give the tool a name and select new file. You should get a new Python file in your code with the filename set to <YOURTOOLNAME>.py. The code without comments is below:

from promptflow import tool

@tool
def my_python_tool(input1: str) -> str:
    return 'hello ' + input1

This tool takes a string input and returns a string. The @tool decorator is required.

We need to change this code to get the custom connection information, query Azure AI Search and return search results as a list. The code is below:

from promptflow import tool
from promptflow.connections import CustomConnection
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

@tool
def my_python_tool(vector: list, ai_conn: CustomConnection) -> list:
    ai_conn_dict = dict(ai_conn)
    endpoint = ai_conn_dict['endpoint']
    key = ai_conn_dict['key']
    index = ai_conn_dict['index']

    # query azure ai search
    credential = AzureKeyCredential(key)
    client = SearchClient(endpoint=endpoint,
                          index_name=index,
                          credential=credential)
    vector_query = VectorizedQuery(vector=vector, k_nearest_neighbors=3, fields="textVector", exhaustive=True)
    results = client.search(
        search_text=None,
        vector_queries=[vector_query],
        select=["name", "description", "url"]
    )

    # convert results to json list
    results = [dict(result) for result in results]

    return results

The function has two parameters: a vector of type list to match the output of the embedding tool, and a variable of type CustomConnection. The custom connection can be converted to a dict to retrieve both the configs and the secret.

Next, we use the configs and secret to perform the query with a SearchClient. The query only returns three fields from our index: name, description and url. The result returned from Azure AI Search is converted to a list and returned.

When you save the Python file and go back to your flow, you should see the Python tool (aisearch) with the vector and ai_conn field. If not, click the regenerate link. Set it as below:

Python tool

The input to the Python tool is the output from the embedding tool. We also pass in the custom connection to provide the configs and key to the tool.

You can set the output of the entire flow (url) to the output of the Python tool to check the results of the search when you run the flow:

Running the flow with Python tool’s output as output

I ran the flow with a description equal to cat. A list of three JSON objects is returned.The first search result is the url to cat.jpg but there are other results as well (not shown above).

Adding an LLM tool

Although we could just pick the first result from the search, that would not work very well. Azure AI Search will always return a result, even if it does not make much sense. In a search for nearest neighbors, your nearest neighbor could be very far away! 😀

For example, if I search for person with a hat, I will get a result even though I do not have such a picture in my index. It simply finds vectors that are “closest” but semantically “far” away from my description. That is bound to happen with just a few images in the index.

An LLM can look at the original description and see if it matches one of the search results. It might pick the 3rd result if it fits better. It might also decide to return nothing if there is no match. In order to do so, we will need a good prompt.

Click + LLM at the top left of the flow to add an LLM tool:

Adding an LLM

Give the LLM tool a name and select new file. In the flow editor, set the LLM model information:

LLM settings

You can reuse the connection that was used for the embedding. Ensure you have deployed a chat model in your Azure OpenAI resource. I deployed gpt-4 and called the deployment gpt-4 as well. I also set temperature to 0.

The inputs of the node do not make much sense. We do not need chat history for instance. The inputs come from a .jinja2 file that was created for you. The file has the name of the LLM tool. Following the example above, the name is pick_result.jinja2. Open that file and replace it with the following contents and save it:

system:
You return the url to an image that best matches the user's question. Use the provided context to select the image. Only return the url. When no
matching url is found, simply return NO_IMAGE

user:
{{description}}

context : {{search_results}}

The file defines a system message to tell the LLM what to do. The input from the user is the description from the input node. We provide extra context to the LLM as well (the output from search). The {{…}} serve as placeholders to inject data into the prompt.

When you save the file and go back to the flow designer, you should see description and search_results as parameters. Set them as follows:

Inputs to the LLM node

In addition, set the output of the flow output node to the output of the LLM node:

Setting the output

Save your flow and run it. In my case, with a description of cat I get the following output:

Output is just a URL from the LLM node

It I use man with a hat as input, I get:

LLM did not find a URL to match the description

Using a prompt variant

Suppose we want to try a different prompt that returns JSON instead of text. To try that, we can create a prompt variant.

In the LLM node, click the variants icon:

Variants

You will see a + icon to create a new variant. Click it.

New variant

The variant appears under the original variant and is linked to a new file: pick_result_variant_1.jinja2. I have also set the variant as default. Let’s click the new file to open it. Add the following prompt:

system:
You return the url to an image that best matches the user's question. Use the provided context to select the image. 
Return the url and name of the file as JSON. Here is an example of a response. Do not use markdown in the response. Use pure JSON.
{
  "url": "http://www.example.com/images/1.jpg",
  "name": "1.jpg"
}

If there is not matching image, return an empty string in the JSON:
{
  "url": ""
}

user:
{{description}}

context : {{search_results}}

This prompt should return JSON instead of just the url or NO_IMAGE. To test this, run the flow and select Use default variant for all nodes. When I run the flow with description cat, I get the following output:

JSON output

Because the flow’s output is already JSON, the string representation of the JSON result is used. Adding an extra Python tool that parses the JSON and outputs both the URL and file name might be a good idea here.

You can modify and switch between the prompts and see which one works best. This is especially handy when you are prototyping your flow.

Conclusion

On your local machine, Prompt Flow is easy to install and get started with. In this post we built a relatively simple flow that did not require a lot of custom code. We also touched on using variants, to test different prompts and their outcome.

In a follow-up post, we will take a look at turning this flow into a container. Stay tuned! 📺

Building an Azure AI Search index with a custom skill

In this post, we will take a look at building an Azure AI Search index with a custom skill. We will use the Azure AI Search Python SDK to do the following:

  • create a search index: a search index contains content to be searched
  • create a data source: a datasource tells an Azure AI Search indexer where to get input data
  • create a skillset: a skillset is a collection of skills that process the input data during the indexing process; you can use built-in skills but also build your own skills
  • create an indexer: the indexer creates a search index from input data in the data source; it can transform the data with skills

If you are more into videos, I already created a video about this topic. In the video, I use the REST API to define the resources above. In this post, I will use the Python SDK.

Azure AI Search with custom GPT-4 vision skill

What do we want to achieve?

We want to build an application that allows a user to search for images with text or a similar image like in the diagram below:

Search application

The application uses an Azure AI Search index to provide search results. An index is basically a collection of JSON documents that can be searched with various techniques.

The input data to create the index is just a bunch of .jpg files in Azure Blob Storage. The index will need fields to support the two different types of searches (text and image search):

  • a text description of the image: we will need to generate the description from the image; we will use GPT-4 Vision to do so; the description supports keyword-based searches
  • a text vector of the description: with text vectors, we can search for descriptions similar to the user’s query; it can provide better results than keyword-based searches alone
  • an image vector of the image: with image vectors, we can supply an image and search for similar images in the index

I described building this application in a previous blog post. In that post, we pushed the index content to the index. In this post, we create an indexer that pulls in the data, potentially on a schedule. Using an indexer is recommended.

Creating the index

If you have an Azure subscription, first create an Azure AI Search resource. The code we write requires at least the basic tier.

Although you can create the index in the portal, we will create it using the Python SDK. At the time of writing (December 2023), you have to use a preview version of the SDK to support integrated vectorization. The notebook we use contains instructions about installing this version. The notebook is here: https://github.com/gbaeke/vision/blob/main/image_index/indexer-sdk.ipynb

The notebook starts with the necessary imports and also loads environment variables via a .env file. See the README of the repo to learn about the required variables.

To create the index, we define a blog_index function that returns an index definition. Here’s the start of the function:

def blog_index(name: str):
    fields = [
        SearchField(name="path", type=SearchFieldDataType.String, key=True),
        SearchField(name="name", type=SearchFieldDataType.String),
        SearchField(name="url", type=SearchFieldDataType.String),
        SearchField(name="description", type=SearchFieldDataType.String),
        SimpleField(name="enriched", type=SearchFieldDataType.String, searchable=False),  
        SearchField(
            name="imageVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1024,
            vector_search_profile="myHnswProfile"
        ),
        SearchField(
            name="textVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1536,
            vector_search_profile="myHnswProfile"
        ),
    ]

Above, we define an array of fields for the index. We will have 7 fields. The first three fields will be retrieved from blob storage metadata:

  • path: base64-encoded url of the file; will be used as unique key
  • name: name of the file
  • url: full url of the file in Azure blob storage

The link between these fields and the metadata is defined in the indexer we will create later.

Next, we have the description field. We will generate the image description via GPT-4 Vision during indexing. The indexer will use a custom skill to do so.

The enriched field is there for debugging. It will show the enrichments by custom or built-in skills. You can remove that field if you wish.

To finish, we have vector fields. These fields are designed to hold arrays of a specific size:

  • imageVector: a vector field that can hold 1024 values; the image vector model we use outputs 1024 dimensions
  • textVector: a vector field that can hold 1536 values; the text vector model we use outputs that number of dimensions

Note that the vector fields references a search profile. We create that in the next block of code in the blog_index function:

vector_config = VectorSearch(  
        algorithms=[  
            HnswVectorSearchAlgorithmConfiguration(  
                name="myHnsw",  
                kind=VectorSearchAlgorithmKind.HNSW,  
                parameters=HnswParameters(  
                    m=4,  
                    ef_construction=400,  
                    ef_search=500,  
                    metric=VectorSearchAlgorithmMetric.COSINE,  
                ),  
            ),  
            ExhaustiveKnnVectorSearchAlgorithmConfiguration(  
                name="myExhaustiveKnn",  
                kind=VectorSearchAlgorithmKind.EXHAUSTIVE_KNN,  
                parameters=ExhaustiveKnnParameters(  
                    metric=VectorSearchAlgorithmMetric.COSINE,  
                ),  
            ),  
        ],  
        profiles=[  
            VectorSearchProfile(  
                name="myHnswProfile",  
                algorithm="myHnsw",  
                vectorizer="myOpenAI",  
            ),  
            VectorSearchProfile(  
                name="myExhaustiveKnnProfile",  
                algorithm="myExhaustiveKnn",  
                vectorizer="myOpenAI",  
            ),  
        ],  
        vectorizers=[  
            AzureOpenAIVectorizer(  
                name="myOpenAI",  
                kind="azureOpenAI",  
                azure_open_ai_parameters=AzureOpenAIParameters(  
                    resource_uri="AZURE_OPEN_AI_RESOURCE",  
                    deployment_id="EMBEDDING_MODEL_NAME",  
                    api_key=os.getenv('AZURE_OPENAI_KEY'),  
                ),  
            ),  
        ],  
    )

Above, vector_config is an instance of the VectorSearch object, which holds algorithms, profiles and vectorizers:

  • algorithms: Azure AI search supports both HNSW and exhaustive to search for nearest neighbors to an input vector; above, both algorithms are defined; they both use cosine similarity as the distance metric
  • vectorizers: this defines the integrated vectorizer and points to an Azure OpenAI resource and embedding model. You need to deploy that model in Azure OpenAI and give it a name; at the time of writing (December 2023), this feature was in public preview
  • profiles: a profile combines an algorithm and a vectorizer; we create two profiles, one for each algorithm; the vector fields use the myHnswProfile profile.

Note: using HNSW on a vector field, designed to perform approximate nearest neighbor searches, still allows you to do an exhaustive search; the notebook contains sample searches at the bottom, which use exhaustive searches to search the entire vector space; note that the reverse is not possible (using HNSW when index on field is set as exhaustive).

We finish the function with the code below:

    semantic_config = SemanticConfiguration(  
        name="my-semantic-config",  
        prioritized_fields=PrioritizedFields(  
            prioritized_content_fields=[SemanticField(field_name="description")]  
        ),  
    )

    semantic_settings = SemanticSettings(configurations=[semantic_config])

    return SearchIndex(name=name, fields=fields, vector_search=vector_config, semantic_settings=semantic_settings)

Above, we specify a semantic_config. It is used to inform the semantic reranker abiut the fields in our index with valuable data. Here, we use the description field. The config is used to create an instance of type Semantic_Settings. You also have to enable the semantic reranker in Azure AI Search to enable this feature.

The function ends by returning an instance of type SearchIndex, which contains the fields array, the vector configuration and the semantic configuration.

Now we can use the output of this function to create the index:

service_endpoint = "https://acs-geba.search.windows.net"
index_name = "images-sdk"
key = os.getenv("AZURE_AI_SEARCH_KEY")


index_client = SearchIndexClient(service_endpoint, AzureKeyCredential(key))
search_client = SearchClient(service_endpoint, index_name, AzureKeyCredential(key))
index = blog_index(index_name)

# create the index
try:
    index_client.create_or_update_index(index)
    print("Index created or updated successfully")
except Exception as e:
    print("Index creation error", e)

The important part here is the creation of a SearchIndexClient that authenticates to our Azure AI Search resource. We use that client to create_or_update our index. That function requires a SearchIndex parameter, provided by the blog_index function.

When that call succeeds, you should see the index in the portal. Text and vector fields are searchable.

Index in the portal

The vector profiles should be present:

Vector profiles

Click on an algorithm or vectorizer. It should match the definition in our code.

Now we can define the data source, skillset and indexer.

Data source

Our images are stored in Azure Blob Storage. The data source needs to point to that resource and specify a container. We can use the following code:

# Create a data source 
ds_client = SearchIndexerClient(service_endpoint, AzureKeyCredential(key))
container = SearchIndexerDataContainer(name="images")
data_source_connection = SearchIndexerDataSourceConnection(
    name=f"{index_name}-blob",
    type="azureblob",
    connection_string=os.getenv("STORAGE_CONNNECTION_STRING"),
    container=container
)
data_source = ds_client.create_or_update_data_source_connection(data_source_connection)

print(f"Data source '{data_source.name}' created or updated")

The code is pretty self-explanatory. The data source is shown in the portal as below:

Azure AI Search data source

Skillset with two skills

Before we create the indexer, we define a skillset with two skills:

  • AzureOpenAIEmbeddingSkill: a built-in skill that uses an Azure OpenAI embedding model and takes text as input; it returns a vector (embedding) of 1536 dimensions; this skill is not free; you will be billed for the vectors you create via your Azure OpenAI resource
  • WebApiSkill: a custom skill that points to an endpoint that you need to build and host; you define the inputs and outputs of the custom skill; my custom skill runs in Azure Container Apps but it can run anywhere. Often, skills are implemented as an Azure Function.

The code starts as follows:

skillset_name = f"{index_name}-skillset"

embedding_skill = AzureOpenAIEmbeddingSkill(  
    description="Skill to generate embeddings via Azure OpenAI",  
    context="/document",  
    resource_uri="https://OPEN_AI_RESOURCE.openai.azure.com",  
    deployment_id="DEPLOYMENT_NAME_OF_EMBEDDING MODEL",  
    api_key=os.getenv('AZURE_OPENAI_KEY'),  
    inputs=[  
        InputFieldMappingEntry(name="text", source="/document/description"),  
    ],  
    outputs=[  
        OutputFieldMappingEntry(name="embedding", target_name="textVector")  
    ],  
)

Above, we define the skillset and the embedding_skill. The AzureOpenAIEmbeddingSkill points to a deployed text-embedding-ada-002 embedding model. Use the name of your deployment, not the model name.

A skillset operates within a context. The context above is the entire document (/document) but that’s not necessarily the case for other skills. The input to the embedding skill is our description field (/document/description). The output will be a vector. The target_name above is some sort of a temporary name used during the so-called enrichment process of the indexer. We will need to configure the indexer to write this field to the index.

The question is: “Where does the description come from?”. The description comes from the WebApiSkill. Because the embedding skill needs the description field generated by the WebApiSkill, the WebApiSkill will run first. Here is the custom web api skill:

custom_skill = WebApiSkill(
    description="A custom skill that creates an image vector and description",
    uri="YOUR_ENDPOINT",
    http_method="POST",
    timeout="PT60S",
    batch_size=4,
    degree_of_parallelism=4,
    context="/document",
    inputs=[
        InputFieldMappingEntry(name="url", source="/document/url"),
    ],
    outputs=[
        OutputFieldMappingEntry(name="embedding", target_name="imageVector"),
        OutputFieldMappingEntry(name="description", target_name="description"),
    ],
)

The input to the custom skill is the url to our image. That url is posted to the endpoint you define in the uri field. You can control how many inputs are sent in one batch and how many batches are sent concurrently. The inputs have to be sent in a specific format.

This skill also operates at the document level and creates two new fields. The contents of those fields are generated by your custom endpoint and returned as embedding and description. They are mapped to imageVector and description. Again, those fields are temporary and need to be written to the index by the indexer.

To see the code of the custom skill, check https://github.com/gbaeke/vision/tree/main/img_vector_skill. That skill is written for demo purposes and was not thoroughly vetted to be used in production. Use at your own risk. In addition, GPT-4 Vision requires an OpenAI key (not Azure OpenAI) and currently (December 2023) allows 100 calls per day! You currently cannot use this at scale. Azure also provides image captioning models that might fit the purpose.

Now we can create the skillset:

skillset = SearchIndexerSkillset(  
    name=skillset_name,  
    description="Skillset to generate embeddings",  
    skills=[embedding_skill, custom_skill],  
)

client = SearchIndexerClient(service_endpoint, AzureKeyCredential(key))
client.create_or_update_skillset(skillset)
print(f"Skillset '{skillset.name}' created or updated")

The above code results in the following:

skllset with two skills

Indexer

The indexer is the final piece of the puzzle and brings the data source, index and skillset together:

indexer_name = f"{index_name}-indexer"

indexer = SearchIndexer(  
    name=indexer_name,  
    description="Indexer to index documents and generate description and embeddings",  
    skillset_name=skillset_name,  
    target_index_name=index_name,
    parameters=IndexingParameters(
        max_failed_items=-1
    ),
    data_source_name=data_source.name,  
    # Map the metadata_storage_name field to the title field in the index to display the PDF title in the search results  
    field_mappings=[
        FieldMapping(source_field_name="metadata_storage_path", target_field_name="path", 
            mapping_function=FieldMappingFunction(name="base64Encode")),
        FieldMapping(source_field_name="metadata_storage_name", target_field_name="name"),
        FieldMapping(source_field_name="metadata_storage_path", target_field_name="url"),
    ],
    output_field_mappings=[
        FieldMapping(source_field_name="/document/textVector", target_field_name="textVector"),
        FieldMapping(source_field_name="/document/imageVector", target_field_name="imageVector"),
        FieldMapping(source_field_name="/document/description", target_field_name="description"),
    ],
)

indexer_client = SearchIndexerClient(service_endpoint, AzureKeyCredential(key))  
indexer_result = indexer_client.create_or_update_indexer(indexer)  

Above, we create an instance of type SearchIndexer and set the indexer’s name, the data source name, the skillset name and the target index.

The most important parts are the field mappings and the output field mappings.

Field mappings take data from the indexer’s data source and map them to a field in the index. In our case, that’s content and metadata from Azure Blob Storage. The metadata fields in the code above are described in the documentation. In a field mapping, you can configure a mapping function. We use the base64Encode mapping function for the path field.

Output field mappings take new fields created during the enrichment process and map them to fields in the index. You can see that the fields created by the skills are mapped to fields in the index. Without these mappings, the skillsets would generate the data internally but the data would never appear in the index.

Once the indexer is defined, it gets created (or updated) using an instance of type SearchIndexerClient.

Note that we set a parameter in the index, max_failed_items, to -1. This means that the indexer process keeps going, no matter how many errors it produces. In the indexer screen below, you can see there was one error:

Indexer with one error

The error happened because the image vectorizer in the custom web skill threw an error on one of the images.

Using an indexer has several advantages:

  • Indexing is a background process and can run on a schedule; there is no need to schedule your own indexing process
  • Indexers keep track of what they indexed and can index only new data; with your own code, you have to maintain that state; failed documents like above are not reprocessed
  • Depending on the source, indexers see deletions and will remove entries from the index
  • Indexers can be easily reset to trigger a full index
  • Indexing errors are reported and errors can be sent to a debugger to inspect what went wrong

Testing the index

We can test the index by performing a text-based search that uses the integrated vectorizer:

# Pure Vector Search
query = "city"  
  
search_client = SearchClient(service_endpoint, index_name, credential=AzureKeyCredential(key))
vector_query = VectorizableTextQuery(text=query, k=1, fields="textVector", exhaustive=True)

  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["name", "description", "url"],
    top=1
)  

# print selected fields from dictionary
for result in results:
    print(result["name"])
    print(result["description"])
    print(result["url"])
    print("")

Above, we search for city (in the query variable). The VectorizableTextQuery class (in preview) takes the plain text in the query variable and vectorizes it for us with the embedding model defined in the integrated vectorizer. In addition, we specify how many results to return (1 nearest neighbors) and that we want to search all vectors (exhaustive).

Note: remember that the vector field was configured for HNSW; we can switch to exhaustive as shown above

Next, search_client.search performs the actual search. It only provides the vector query, which results in a pure similarity search with the query vector. search_text is set to None. Set search string to the query if you want to do a hybrid search. The notebook contains additional examples that also does a keyword and semantic search with highlighting.

The search gives the following result (selected fields: name, description, url):

city.jpg
This is an image of the London skyline, featuring a mix of modern skyscrapers and historical buildings. Prominent among the skyscrapers are the Leadenhall Building, also known as the "Cheesegrater," and the rounded, distinctive shape of 30 St Mary Axe, commonly referred to as "The Gherkin." Further in the background, the towers of Canary Wharf can be seen. The view is clear and taken on a day with excellent visibility.
https://stgebaoai883.blob.core.windows.net/images/city.jpg

The image the URL points to is:

yep, a city (London)

In the repo’s search-client folder, you can find a Streamlit app to search for and display images and dump the entire search result object. Make sure you install all the packages in requirements.txt and the preview Azure AI Search package from the whl folder. Simply type streamlit run app.py to run the app:

Streamlit Query app

Conclusion

In this post, we demonstrated the use of the Azure AI Search Python SDK to create an indexer that takes images as input, create new fields with skills, and write those fields + metadata to an index.

We touched on the advantages of using an indexer versus your own indexing code (pull versus push).

With this code and some sample images, you should be able to build an image search application yourself.

Finding images with text and image queries with the help of GPT-4 Vision

With the gpt-4-vision-preview model available at OpenAI, it was time to build something with it. I decided to use it as part of a quick solution that can search for images with text, or by providing a similar image.

We will do the following:

  • Describe a collection of images. To generate the description, GPT-4 Vision is used
  • Create a text embedding of the description with the text-embedding-ada-002 model
  • Create an image embedding using the vectorizeImage API, part of Azure AI Computer Vision
  • Save the description and both embeddings to an Azure AI Search index
  • Search for images with either text or a similar image

The end result should be that when I search for desert plant, I get an image of a cactus or similar plant. When I provide a picture of an apple, I should get an apple or other fruit as a result. It’s basically Google image and reverse image search.

Let’s see how it works and if it is easy to do. The code is here: https://github.com/gbaeke/vision. The main code is in a Jupyter notebook in the image_index folder.

A word on vectors and vectorization

When we want to search for images using text or find similar images, we use a technique that involves turning both text and images into a form that a computer can understand easily. This is done by creating vectors. Think of vectors as a list of numbers that describe the important parts of a text or an image.

For text, we use a tool called ‘text-embedding-ada-002’ which changes the words and sentences into a vector. This vector is like a unique fingerprint of the text. For images, we use something like Azure’s multi-modal embedding API, which does the same thing but for pictures. It turns the image into a vector that represents what’s in the picture.

After we have these vectors, we store them in a place where they can be searched. We will use Azure AI Search. When you search, the system looks for the closest matching vectors – it’s like finding the most similar fingerprint, whether it’s from text or an image. This helps the computer to give you the right image when you search with words or find images that are similar to the one you have.

Getting a description from an image

Although Azure has Computer Vision APIs to describe images, GPT-4 with vision can do the same thing. It is more flexible and easier to use because you have the ability to ask for what you want with a prompt.

To provide an image to the model, you can provide a URL or the base64 encoding of an image file. The code below uses the latter approach:

def describe_image(image_file: str) -> str:
    with open(f'{image_file}', 'rb') as f:
        image_base64 = base64.b64encode(f.read()).decode('utf-8')
        print(image_base64[:100] + '...')

    print(f"Describing {image_file}...")

    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe the image in detail"},
                {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image_base64}",
                },
                },
            ],
            }
        ],
        max_tokens=500,  # default max tokens is low so set higher
    )

    return response.choices[0].message.content

As usual, the OpenAI API is very easy to use. Above, we open and read the image and base64-encode it. The base64 encoded file is provided in the url field. A simple prompt is all you need to get the description. Let’s look at the result for the picture below:

The generated description is below:

The image displays a single, healthy-looking cactus planted in a terracotta-colored pot against a pale pink background. The cactus is elongated, predominantly green with some modest blue hues, and has evenly spaced spines covering its surface. The spines are white or light yellow, quite long, and arranged in rows along the cactus’s ridges. The pot has a classic, cylindrical shape with a slight lip at the top and appears to be a typical pot for houseplants. The overall scene is minimalistic, with a focus on the cactus and the pot due to the plain background, which provides a soft contrast to the vibrant colors of the plant and its container.

Description generated by GPT-4 Vision

Embedding of the image

To create an embedding for the image, I decided to use Azure’s multi-modal embedding API. Take a look at the code below:

def get_image_vector(image_path: str) -> list:
    # Define the URL, headers, and data
    url = "https://AI_ACCOUNT.cognitiveservices.azure.com//computervision/retrieval:vectorizeImage?api-version=2023-02-01-preview&modelVersion=latest"
    headers = {
        "Content-Type": "application/octet-stream",
        "Ocp-Apim-Subscription-Key": os.getenv("AZURE_AI_KEY")
    }

    with open(image_path, 'rb') as image_file:
        # Read the contents of the image file
        image_data = image_file.read()

    print(f"Getting vector for {image_path}...")

    # Send a POST request
    response = requests.post(url, headers=headers, data=image_data)

    # return the vector
    return response.json().get('vector')

The code uses an environment variable to get the key to an Azure AI Services multi-service endpoint. Check the README.md in the repository for a sample .env file.

The API generates a vector with 1024 dimensions. We will need that number when we create the Azure AI Search index.

Note that this API can accept a url or the raw image data (not base64-encoded). Above, we provide the raw image data and set the Content-Type properly.

Generating the data to index

In the next step, we will get all .jpg files from a folder and do the following:

  • create the description
  • create the image vector
  • create the text vector of the description

Check the code below for the details:

# get all *.jpg files in the images folder
image_files = [file for file in os.listdir('./images') if file.endswith('.jpg')]

# describe each image and store filename and description in a list of dicts
descriptions = []
for image_file in image_files:
    try:
        description = describe_image(f"./images/{image_file}")
        image_vector = get_image_vector(f"./images/{image_file}")
        text_vector = get_text_vector(description)
        
        descriptions.append({
            'id': image_file.split('.')[0], # remove file extension
            'fileName': image_file,
            'imageDescription': description,
            'imageVector': image_vector,
            'textVector': text_vector
        })
    except Exception as e:
        print(f"Error describing {image_file}: {e}")

# print the descriptions but only show first 5 numbers in vector
for description in descriptions:
    print(f"{description['fileName']}: {description['imageDescription'][:50]}... {description['imageVector'][:5]}... {description['textVector'][:5]}...")

The important part is the descriptions list, which is a list of JSON objects with fields that match the fields in the Azure AI Search index we will build in the next step.

The text vector is calculated with the get_text_vector function. It uses OpenAI’s text-embedding-ada-002 model.

Building the index

The code below uses the Azure AI Search Python SDK to build and populate the index in code. You.will need an AZURE_AI_SEARCH_KEY environment variable to authenticate to your Azure AI Search instance.

def blog_index(name: str):
    from azure.search.documents.indexes.models import (
        SearchIndex,
        SearchField,
        SearchFieldDataType,
        SimpleField,
        SearchableField,
        VectorSearch,
        VectorSearchProfile,
        HnswAlgorithmConfiguration,
    )

    fields = [
        SimpleField(name="Id", type=SearchFieldDataType.String, key=True), # key
        SearchableField(name="fileName", type=SearchFieldDataType.String),
        SearchableField(name="imageDescription", type=SearchFieldDataType.String),
        SearchField(
            name="imageVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1024,
            vector_search_profile_name="vector_config"
        ),
        SearchField(
            name="textVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1536,
            vector_search_profile_name="vector_config"
        ),

    ]

    vector_search = VectorSearch(
        profiles=[VectorSearchProfile(name="vector_config", algorithm_configuration_name="algo_config")],
        algorithms=[HnswAlgorithmConfiguration(name="algo_config")],
    )
    return SearchIndex(name=name, fields=fields, vector_search=vector_search)

#  create the index
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.models import VectorizedQuery

service_endpoint = "https://YOUR_SEARCH_INSTANCE.search.windows.net"
index_name = "image-index"
key = os.getenv("AZURE_AI_SEARCH_KEY")

index_client = SearchIndexClient(service_endpoint, AzureKeyCredential(key))
index = blog_index(index_name)

# create the index
try:
    index_client.create_index(index)
    print("Index created")
except Exception as e:
    print("Index probably already exists", e)

The code above creates an index with some string fields and two vector fields:

  • imageVector: 1024 dimensions (as defined by the Azure AI Computer Vision image embedder)
  • textVector: 1536 dimensions (as defined by the OpenAI embedding model)

Although not specified in the code, the index will use cosine similarity to perform similarity searches. It’s the default. It will return approximate nearest neighbour (ANN) results unless you create a search client that uses exhaustive search. An exhaustive search searches the entire vector space. The queries near the end of this post use the exhaustive setting.

When the index is created, we can upload documents:

# now upload the documents
try:
    search_client = SearchClient(service_endpoint, index_name, AzureKeyCredential(key))
    search_client.upload_documents(descriptions)
    print("Documents uploaded successfully")
except Exception as e:
    print("Error uploading documents", e)

The upload_documents method uploads the documents in the descriptions Python list to the search index. The upload is actually an upsert. You can run this code multiple times without creating duplicate documents in the index.

Search images with text

To search an image with a text description, a vector query on the textVector is used. The function below takes a text query string as input, vectorizes the query, and performs a similarity search returning the first nearest neighbour. The function displays the description and the image in the notebook:

# now search based on text
def single_vector_search(query: str):
    vector_query = VectorizedQuery(vector=get_text_vector(query), k_nearest_neighbors=1, fields="textVector", exhaustive=True)

    results = search_client.search(
        vector_queries=[vector_query],
        select=["fileName", "imageDescription"],
        
    )

    for result in results:
        print(result['fileName'], result["imageDescription"], sep=": ")

        # show the image
        from IPython.display import Image
        display(Image(f"./images/{result['fileName']}"))
    
single_vector_search("desert plant")

The code searches for an image based on the query desert plant. It returns the picture of the cactus shown earlier. Note that if you search for something there is no image for, like blue car, you will still get a result because we always return a nearest neighbor. Even if your nearest neighbor lives 100km away, it’s still your nearest neighbor. 😀

Return similar images

Since our index contains an image vector, we can search for images similar to a vector of a reference image. The function below takes an image file path as input, calculates the vector for that image, and performs a nearest neighbor search. The function displays the description and image of each document returned. In this case, the code returns two similar documents:

def image_search(image_file: str):

    vector_query = VectorizedQuery(vector=get_image_vector(image_file), k_nearest_neighbors=2, fields="imageVector", exhaustive=True)

    results = search_client.search(
        vector_queries=[vector_query],
        select=["fileName", "imageDescription"],
    )

    for result in results:
        print(result['fileName'], result["imageDescription"], sep=": ")

        # show the image
        from IPython.display import Image
        display(Image(f"./images/{result['fileName']}"))
 
# get vector of another image and find closest match
image_search('rotten-apple.jpg')
image_search('flower.jpeg')

At the bottom, the function is called with filenames of pictures that contain a rotten apple and a flower. The result of the first query is a picture of the apple and banana. The result of the second query is the cactus and the rose. You can debate whether the cactus should be in the results. Some cacti have flowers but some don’t. 😀

Conclusion

The GPT-4 Vision API, like most OpenAI APIs, is very easy to use. In this post, we used it to generate image descriptions to build a simple query engine that can search for images via text or a reference image. Together with their text embedding API and Microsoft’s multi-modal embedding API to create an image embedding, it is relatively straightforward to build these type of systems.

As usual, this is a tutorial with quick sample code to illustrate the basic principle. If you need help building these systems in production, simply reach out. Help is around the corner! 😉

Using Integrated Vectorization in Azure AI Search

The vector search capability of Azure AI Search became generally available mid November 2023. With that release, the developer is responsible for creating embeddings and storing them in a vector field in the index.

However, Microsoft also released integrated vectorization in preview. Integrated vectorization is useful in two ways:

  • You can define a vectorizer in the index schema. It can be used to automatically convert a query to a vector. This is useful in the Search Explorer in the portal but can also be used programmatically.
  • You can use an embedding skill for your indexer that automatically vectorizes index fields for you.

First, let’s look at defining a vectorizer in the index definition and using it in the portal for search.

Vector search in the portal

Below is a screenshot of an index with a title and a titleVector field. The index stores information about movies:

Index with a vector field

The integrated vectorizer is defined in the Vector profiles section:

Vector profile

When you add the profile, you configure the algorithm and vectorizer. The vectorizer simply points to an embedding model in Azure OpenAI. For example:

Vectorizer

Note: it’s recommended to use managed identity

Now, from JSON View in Search Explorer, you can perform a vector search. If you see a search field at the top, you can remove that. It’s for full-text search.

Vector search in the portal

Above, the query commencement is converted to a vector by the integrated vectorizer. The vector search comes up with Inception as the first match. I am not sure if you would want to search for movies this way but it proves the point. 😛

Using an embedding skill during indexing

Suppose you have several JSON documents about movies. Below is one example:

{
    "title": "Inception",
    "year": 2010,
    "director": "Christopher Nolan",
    "genre": ["Action", "Adventure", "Sci-Fi"],
    "starring": ["Leonardo DiCaprio", "Joseph Gordon-Levitt", "Ellen Page"],
    "imdb_rating": 8.8
  }

When you have a bunch of these files in Azure Blob Storage, you can use the Import Data wizard to construct an index from these files.

Import Data Wizard

This wizard, at the time of writing, does not create vectors for you. There is another wizard, Import and vectorize data, but it will treat the JSON as any document and store it in a content field. A vector is created from the content field.

We will stick to the first wizard. It will do several things:

  • create a data source to access the JSON documents in an Azure Storage Account container
  • infer the schema from the JSON files
  • propose an index definition that you can alter
  • create an indexer that indexes the documents on the schedule that you set
  • add skills like entity extraction; select a simple skill here like translation so you are sure there will be a skillset that the indexer will use

In the step to customize the index definition, ensure you make fields searchable and retrievable as needed. In addition, define a vector field. In my case, I created a titleVector field:

titleVector

When the wizard is finished, the indexer will run and populate the index. Of course, the titleVector field will be empty because there is no process in place that calculates the vectors during indexing.

Let’s fix that. In Skillsets, go the the skillset created by the wizard and click it.

Skillset created by the wizard

Replace the Skillset JSON definition with the content below and change resourceUri, apiKey and deploymentId as needed. You can also add the embedding skill to the existing array of skills if you want to keep them.

{
  "@odata.context": "https://acs-geba.search.windows.net/$metadata#skillsets/$entity",
  "@odata.etag": "\"0x8DBF01523E9A94D\"",
  "name": "azureblob-skillset",
  "description": "Skillset created from the portal. skillsetName: azureblob-skillset; contentField: title; enrichmentGranularity: document; knowledgeStoreStorageAccount: ;",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "embed",
      "description": null,
      "context": "/document",
      "resourceUri": "https://OPENAI_INSTANCE.openai.azure.com",
      "apiKey": "AZURE_OPENAI_KEY",
      "deploymentId": "EMBEDDING_MODEL",
      "inputs": [
        {
          "name": "text",
          "source": "/document/title"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "titleVector"
        }
      ],
      "authIdentity": null
    }
  ],
  "cognitiveServices": null,
  "knowledgeStore": null,
  "indexProjections": null,
  "encryptionKey": null
}

Above, we want to embed the title field in our document and create a vector for it. The context is set to /document which means that this skill is executed for each document once.

Now save the skillset. This skill on its own will create the vectors but will not save them in the index. You need to update the indexer to write the vector to a field.

Let’s navigate to the indexer:

Indexer

Click the indexer and go to the Indexer Definition (JSON) tab. Ensure you have an outputFieldMappings section like below:

{
  "@odata.context": "https://acs-geba.search.windows.net/$metadata#indexers/$entity",
  "@odata.etag": "\"0x8DBF01561D9E97F\"",
  "name": "movies-indexer",
  "description": "",
  "dataSourceName": "movies",
  "skillsetName": "azureblob-skillset",
  "targetIndexName": "movies-index",
  "disabled": null,
  "schedule": null,
  "parameters": {
    "batchSize": null,
    "maxFailedItems": 0,
    "maxFailedItemsPerBatch": 0,
    "base64EncodeKeys": null,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "parsingMode": "json"
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "metadata_storage_path",
      "mappingFunction": {
        "name": "base64Encode",
        "parameters": null
      }
    }
  ],
  "outputFieldMappings": [
    {
      "sourceFieldName": "/document/titleVector",
      "targetFieldName": "titleVector"
    }
  ],
  "cache": null,
  "encryptionKey": null
}

Above, we map the titleVector enrichment (think of it as something temporary during indexing) to the real titleVector field in the index.

Reset and run the indexer

Reset the indexer so it will index all documents again:

Resetting the indexer

Next, click the Run button to start the indexing process. When it finishes, do a search with Search Explorer and check that there are vectors in the titleVector field. It’s an array of 1536 floating point numbers.

Conclusion

Integrated vectorization is a welcome extra feature in Azure AI Search. Using it in searches is very easy, especially in the portal.

Using the embedding skill is a bit harder, because you need to work with skillset and indexer definitions in JSON and you have to know exactly what you have to add. But once you get it right, the indexer does all the vectorization work for you.

Creating a custom GPT to query any knowledge base with actions

A while ago, OpenAI introduced GPTs. A GPT is a custom version of ChatGPT that combine instructions, extra knowledge, and any combination of skills.

In this tutorial, we are going to create a custom GPT that can answer questions about articles on this blog. In order to achieve that, we will do the following:

  • create an Azure AI Search index
  • populate the index with content of the last 50 blog posts (via its RSS feed)
  • create a custom API with FastAPI (Python) that uses the Azure OpenAI “add your data” APIs to provide relevant content to the user’s query
  • add the custom API as an action to the custom GPT

The image below shows the properties of the GPT. You need to be a ChatGPT Plus subscriber to create a GPT.

Part of the custom GPT definition

To implement a custom action for the GPT, you need an API with an OpenAPI spec. When you use FastAPI, an OpenAPI JSON document can easily be downloaded and provided to the GPT. You will need to modify the JSON document with a servers section to specify the URL the GPT has to use.

In what follows, we will look at all of the different pieces that make this work. Beware: long post! 😀

Azure AI Search Index

Azure AI Search is a search service you create in Azure. Although there is a free tier, I used the basic tier. The basic tiers allows you to use its semantic reranker to optimise search results.

To create the index and populate it with content, I used the following notebook: https://github.com/gbaeke/custom-gpt/blob/main/blog-index/website-index.ipynb.

The result is an index like below:

Index in Azure AI Search

The index contains 292 documents although I only retrieve the last 50 blog posts. This is the result of chunking each post into smaller pieces of about 500 tokens with 100 tokens of overlap for each chunk. We use smaller chunks because we do not want to send entire blog posts as content to the large language model (LLM).

Note that the index supports similarity searches using vectors. The contentVector field contains the OpenAI embedding of the text in the content field.

Although vectors are available, we do not have to use vector search. Azure AI search supports simple keyword search as well. Together with the semantic ranker, it can provide more relevant results than keyword search on its own.

Note: in general, vector search will provide better results, especially when combined with keyword search and the semantic ranker

Use the index with Azure OpenAI “add your data”

I have written about the Azure OpenAI “add your data” features before. It provides a wizard experience to add an Azure AI Search index to the Azure OpenAI playground and directly test your index with the model of your choice.

From you Azure OpenAI instance, first open Azure OpenAI Studio:

Go to OpenAI Studio from the Overview page of your Azure OpenAI instance

Note: you still need to complete a form to get access to Azure OpenAI. Currently, it can take around a day before you are allowed to create Azure OpenAI instances in your subscription.

In Azure OpenAI Studio, click Bring your own data from the Home screen:

Bring your own data

Select the Azure AI Search index and click Next.

Azure AI Search index selection

Note: I created the index using the generally available API that supports vector search. The Add your data wizard, at the time of writing, was not updated yet to support these new indexes. That is the reason why vector search cannot be enabled. We will use keyword + semantic search instead. I expect this functionality to be available soon (November/December 2023).

Next, provide field mappings:

Field Mappings

These mappings are required because the Add your data feature excepts these standard fields. You should have at least a content field to search. Above, I do not have a file name field because I have indexed blog posts. It’s ok to leave that field blank.

After clicking Next, we get to data management:

Data Management

Here, we specify the type of search. Semantic means keyword + semantic. In the dropdown list, you can also select keyword search on its own. However, that might give you less relevant results.

Note: for Semantic to work, you need to turn on the Semantic ranker on the Azure AI Search resource. Additionally, you need to create a semantic profile on the index.

Now you can click Next, followed by Save and close. The Azure OpenAI Chat Playground appears with the index added:

Index added as a data source

You can now start chatting with your data. Select a chat model like gpt-4 or gpt-35-turbo. In Azure OpenAI, you have to deploy these models first and give the deployment a name.

Chat session with your data

Above, I asked about the OpenAI Assistants API, which is one of the posts on my blog. In the background, the playground performs a search on the Azure AI Search index and provides the results as context to the model. The gpt-35-turbo model answers the user’s question, based on the context coming from the index.

When you are happy with the result, you can export this experience to an Azure Web App of CoPilot Studio (Power Virtual Agents):

Export the “chat with data” experience

In our case, we want to use this configuration from code and provide an API we can add to the custom GPT.

⚠️ It’s import to realise that, with this approach, we will send the final answer, generated by an Azure OpenAI model, to the custom GPT. An alternate approach would be to hand the results of the Azure AI Search query to the custom GPT and let it formulate the answer on its own. That would be faster and less costly. If you also provide the blog post’s URL, ChatGPT can refer to it. However, the focus here is on using any API with a custom GPT so let’s continue with the API that uses the “add your data” APIs.

If you want to hand over Azure AI search results directly to ChatGPT, check out the code in the azure-ai-search folder in the Github repo.

Creating the API

To create an API that uses the index with the model, as configured in the playground, we can use some code. In fact, the playground provides sample code to work with:

Sample code from the playground

‼️ Sadly, this code will not work due to changes to the openai Python package. However, the principle is still the same:

  • call the chat completion extension API which is specific to Azure; in the code you will see this is as a Python f-string: f"{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}"
  • the JSON payload for this API needs to include the Azure AI Search configuration in a dataSources array.

The extension API will query Azure AI Search for you and create the prompt for the chat completion with context from the search result.

To create a FastAPI API that does this for the custom GPT, I decided to not use the openai package and simply use the REST API. Here is the code:

from fastapi import FastAPI, HTTPException, Depends, Header
from pydantic import BaseModel
import httpx, os
import dotenv
import re

# Load environment variables
dotenv.load_dotenv()

# Initialize FastAPI app
app = FastAPI()

# Constants (replace with your actual values)
api_base = "https://oa-geba-france.openai.azure.com/"
api_key = os.getenv("OPENAI_API_KEY")
deployment_id = "gpt-35-turbo"
search_endpoint = "https://acs-geba.search.windows.net"
search_key = os.getenv("SEARCH_KEY")
search_index = "blog"
api_version = "2023-08-01-preview"

# Pydantic model for request body
class RequestBody(BaseModel):
    query: str

# Define the API key dependency
def get_api_key(api_key: str = Header(None)):
    if api_key is None or api_key != os.getenv("API_KEY"):
        raise HTTPException(status_code=401, detail="Invalid API Key")
    return api_key

# Endpoint to generate response
@app.post("/generate_response", dependencies=[Depends(get_api_key)])
async def generate_response(request_body: RequestBody):
    url = f"{api_base}openai/deployments/{deployment_id}/extensions/chat/completions?api-version={api_version}"
    headers = {
        "Content-Type": "application/json",
        "api-key": api_key
    }
    data = {
        "dataSources": [
            {
                "type": "AzureCognitiveSearch",
                "parameters": {
                    "endpoint": search_endpoint,
                    "key": search_key,
                    "indexName": search_index
                }
            }
        ],
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant"
            },
            {
                "role": "user",
                "content": request_body.query
            }
        ]
    }

    async with httpx.AsyncClient() as client:
        response = await client.post(url, json=data, headers=headers, timeout=60)

    if response.status_code != 200:
        raise HTTPException(status_code=response.status_code, detail=response.text)

    response_json = response.json()

    # get the assistant response
    assistant_content = response_json['choices'][0]['message']['content']
    assistant_content = re.sub(r'\[doc.\]', '', assistant_content)
    
    # return assistant_content as json
    return {
        "response": assistant_content
    }

# Run the server
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000, timeout_keep_alive=60)

This API has one endpoint: /generate_response that takes { "query": "your query" }as input and returns { "response": assistant_content }as output. Note that the original response from the model contains references like [doc1], [doc2], etc… The regex in the code removes those references. I don not particularly like how the references are handled by the API so I decided to not include them and simplify the response.

The endpoint expects an api-key header. It it is not present, it returns an error.

The endpoint does a call to the Azure OpenAI chat completion extension API which looks very similar to a regular OpenAI chat completion. The request does however, contain a dataSources field with the Azure AI Search information.

The environment variables like the OPENAI_API_KEY and the SEARCH_KEY are retrieved from a .env file.

Note: to stress this again, this API returns the answer to the query as generated by the chosen Azure OpenAI model. This allows it to be used in any application, not just a custom GPT. For a custom GPT in ChatGPT, an alternate approach would be to hand over the search results from Azure AI search directly, allowing the model in the custom GPT to generate the response. It would be faster and avoid Azure OpenAI costs. We are effectively using the custom GPT as a UI and as a way to maintain history between action calls. 😀

If you want to see the code in GitHub, check this URL: https://github.com/gbaeke/custom-gpt.

Running the API in Azure Container Apps

To run the API in the cloud, I decided to use Azure Container Apps. That means we need a Dockerfile to build the container image locally or in the cloud:

# Use an official Python runtime as a parent image
FROM python:3.9-slim-buster

# Set the working directory in the container to /app
WORKDIR /app

# Add the current directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Run app.py when the container launches
CMD ["python3", "app.py"]

We also need a requirements.txt file:

fastapi==0.104.1
pydantic==2.5.2
pydantic_core==2.14.5
httpx==0.25.2
python-dotenv==1.0.0
uvicorn==0.24.0.post1

I use the following shell script to build and run the container locally. The script can also push the container to Azure Container Apps.

#!/bin/bash

# Load environment variables from .env file
export $(grep -v '^#' .env | xargs)

# Check the command line argument
if [ "$1" == "build" ]; then
    # Build the Docker image
    docker build -t myblog .
elif [ "$1" == "run" ]; then
    # Run the Docker container, mapping port 8000 to 8000 and setting environment variables
    docker run -p 8000:8000 -e OPENAI_API_KEY=$OPENAI_API_KEY -e SEARCH_KEY=$SEARCH_KEY -e API_KEY=$API_KEY myblog
elif [ "$1" == "up" ]; then
    az containerapp up -n myblog --ingress external --target-port 8000 \
        --env-vars OPENAI_API_KEY=$OPENAI_API_KEY SEARCH_KEY=$SEARCH_KEY API_KEY=$API_KEY \
        --source .
else
    echo "Usage: $0 {build|run|up}"
fi

The shell script extracts the environment variables defined in .env and sets them in the session. Next, we check the first parameter given to the script (Docker is required on your machine for build and run):

  • build: build the Docker image
  • run: run the Docker image locally on port 8000 and specify the environment variables to authenticate to Azure OpenAI and Azure AI Search
  • up: build the Docker image in the cloud and run it in Container Apps; if you do not have a Container Apps Environment or Azure Container Registry, they will be created for you. In the end, you will get an https endpoint to your API in the cloud.

Note: you should not put secrets in environment variables in Azure Container Apps directly; use Container Apps secrets or Key Vault instead; the above is just quick and easy to simplify the deployment

To test the API locally, use the REST Client extension in VS Code with an .http file:

POST http://localhost:8000/generate_response HTTP/1.1
Host: localhost:8000
Content-Type: application/json
api-key: API_KEY_FROM_DOTENV

{
  "query": "what is the openai assistants api?"
}

###

POST https://AZURE_CONTAINER_APPS_ENDPOINT/generate_response HTTP/1.1
Host: AZURE_CONTAINER_APPS_ENDPOINT
Content-Type: application/json
api-key: API_KEY_FROM_DOTENV

{
  "query": "Can I use Redis as a vector db?"
}

When you get something like below, you are good to go. Note again that we return a final answer and not the relevant chunks from Azure AI search.

Successful response from .http file

Getting the OpenAPI spec and adding it to the GPT

With your API running, you can go to its URL, like this one if the API runs locally: http://localhost:8000/openapi.json. The result is a JSON document you can copy to your GPT. I recommend to copy the JSON to VS Code and format it before you paste it in the GPT.

In the GPT, modify the OpenAPI spec with a servers section that includes your Azure Container Apps ingress URL:

Adding the URL to the GPT Action definition

If you want to give the ability to the user to trust the action to be called without approval (after a first call), also add the following:

Allowing the user to say Always Allow when action is used the first time

Take a look at the video below that shows how to create the GPT, including the configuration of the action and testing it.

Conclusion

Custom GPTs in ChatGPT open up a world of possibilities to offer personalised ChatGPT experiences. With custom actions, you can let the GPT do anything you want. In this tutorial, the custom action is an API call that answers the user’s question using Azure OpenAI with Azure AI Search as the provider of relevant context.

As long as you build and host an API and have an OpenAPI spec for your API, the possibilities are virtually limitless.

Note that custom GPTs with actions are not available in the ChatGPT app on mobile yet (end November, 2023). When that happens, it will open up all these capabilities on the go, including enabling voice chat. Fun stuff! 😀

Trying the OpenAI Assistants API

If you have ever tried to build an AI assistant, you know that is not a simple task. In almost all cases, your assistant needs access to external knowledge such as documents or APIs. You might even want to provide your assistant a code sandbox to solve user queries with code. When your assistant is accessed via a chat application, you also have to implement chat history.

Although there are several frameworks like LangChain and Semantic Kernel that can help, OpenAI recently released the Assistants API. It is their own API, tied to their models. The primitives of an assistant are Assistants, Threads and Runs. Let’s start by creating an assistant.

Note: this post contains code snippets in Python. You can find the full example in this gist: https://gist.github.com/gbaeke/e6e88c0dc68af3aa4a89b1228012ae53

Note: although I except this API to become available in Azure OpenAI, I am not quite sure it will happen fast, if at all. So for now, try it out at OpenAI directly. It is still in beta!

Creating an assistant

You can create an assistant using the portal or from code. An assistant has several parameters:

  • Instructions: how should the assistant behave or respond; think of it as the system message
  • Model: use any supported model, including fine-tuned models; to support retrieval from documents, you need the 1106 version of gpt-3.5-turbo/gpt-4
  • Tools: currently, the API supports Code Interpreter and Retrieval; these are fully hosted by OpenAI
  • Functions: define custom functions to call to integrate with external APIs for instance

Note that the retrieval tool supports uploaded files. There is no need for your own search solution (e.g., vector database with support for vector search, hybrid search, etc…). This is great in simpler scenarios where a full-fledged search system is not required. More control over retrieval will come later.

In this post, we will focus on an assistant that uses Code Interpreter. You can simply create the assistant in the portal. You can see the instructions, model, tools and files:

Assistant with only the Code interpreter tool using the latest gpt-4 model

To create this assistant, make sure you have an account at https://platform.openai.com. Create the assistant from the Assistants section:

Creating an assistant

Assistants have an id. For example, my assistant has this id: asst_VljToh6vQ1Mbu6Ct5L6qgpfy. I can use this id in my code to start creating threads.

Before talking about threads, let’s look at creating the assistant with code:

assistant = client.beta.assistants.create(
                name="Math Tutor",
                instructions="You are a personal math tutor. Write and run code to answer math questions.",
                tools=[{"type": "code_interpreter"}],
                model="gpt-4-1106-preview"
  )

To run this code, make sure you use the most recent version of the openai package (>=1.2). Note that if you run this code multiple times, you will create an assistant at each run. You should save the assistant id after creation and implement some logic to only run the above code when you do not have an id.

Above, we create an assistant with one tool: code interpreter.

Threads

After creating an assistant, you can create threads. Although somewhat unintuitive, a thread is not associated with an assistant. They exist on their own. After a thread is created, you can add messages to a thread, for instance a user message:

# we use streamlit so we save the thread in session state
if 'thread' not in st.session_state:
        st.session_state.thread = client.beta.threads.create()

# user_input contains a quesion like 'solve x^2 + 100 =200'
# here we add a message to the thread, using the thread id
client.beta.threads.messages.create(
            thread_id=st.session_state.thread.id,
            role="user",
            content=user_input
 )

To get a completion from the assistant for our thread, we need to create a run. The run tells the assistant to look at the messages in the thread and provide a response.

Runs

Below, we create the run:

run = client.beta.threads.runs.create(
            thread_id=st.session_state.thread.id,
            assistant_id=st.session_state.assistant_id, # refer to assistant in session state
            instructions="Please address the user as Geert. Only answer math questions."
  )

Above, both the thread_id and assistant_id are passed to the run, tying both together. If you did not create the assistant in your code, ensure you pass the id of a valid assistant created in your OpenAI account. Note that the run can be passed extra instructions. You can also override the model and tools that the assistant uses.

Creating a run is an asynchronous operation. It returns the metadata of the run immediately. The metadata includes fields like the run’s id, the created_at date and more.

You will need to manually check the run’s status in your code. For example:

# display a streamlit spinner while we check the run
with st.spinner('Waiting for completion...'):
    run_status = 'pending'
    while run_status != 'completed':
        run = client.beta.threads.runs.retrieve(
            thread_id=st.session_state.thread.id,
            run_id=run.id
        )
        run_status = run.status
        
        if run_status == 'failed' or run_status == "cancelled":
            st.error("Run failed or cancelled")
            st.stop()

        time.sleep(0.5)

When the run is finished, we can retrieve messages:

messages = client.beta.threads.messages.list(
    thread_id=st.session_state.thread.id
)

The messages data field contains all messages. Each message has a role like user or assistant. Assistant messages can have different content, like text or image_file.

For example, if I ask Plot y=x^3 + 2x, there will be both text and image_file responses. It’s up to the developer to properly display them in the app. Below is a naive approach, which only works with text and image responses, not downloads (Code Interpreter can give download links):

try:
    # no support for file download yet, just text and image_file
    for message in messages.data:
        if message.role == 'user':
            st.markdown(f"**User:** {message.content[0].text.value}")
        if message.role == 'assistant':
            for content in message.content:
                if hasattr(content, 'text'):
                    st.markdown(f"**Assistant:** {content.text.value}")
                elif hasattr(content, 'image_file'):
                    # image Id = content.image_file.file_id
                    content = get_content(content.image_file.file_id)
                    image = Image.open(BytesIO(content))
                    st.image(image, caption="Downloaded Image", use_column_width=True)                    
except Exception as e:
    st.error(e)

The above should be pretty clear:

  • if the assistant responds with text, display the text
  • if the assistant responds with an image, there is an image Id; I use a get_content function to download the image from OpenIA; get_content also implements some straightforward caching logic to avoid having to download images over and over again in the same thread

The get_content function uses client.files.content(file_id).response.content to retrieve the file (client is OpenAI client). The returned result can be used by PIL to open the image and subsequently display it with Streamlit’s st.image:

Assistant in a Streamlit app

Note that I can keep asking questions, which adds messages to the same thread, based on the thread’s Id in Streamlit’s session state. When the user refreshes the browser, session state is cleared so a new thread is started. For example, when I ask change 2x in 3x:

Asking to change the function

In the code, I do not have to worry about chat history at all. I just add messages to the thread, which is managed by OpenAI. At the next run, all those messages are sent to the assistant’s model, which responds appropriately. Note that you do pay for the tokens that all those messages consume.

Conclusion

Compared to the synchronous and stateless ChatCompletion API, the Assistants API is asynchronous and stateful. As a developer, you create an assistant with tools, functions and content for retrieval purposes. Interacting with the assistant is easy: simply add messages to a thread and create a run.

Obviously, it is early days for this API as it is still in beta. Personally, I think it’s a great step forward, making it easier to create quite sophisticated assistants. Most orchestration frameworks and AI tools like LangChain, Semantic Kernel, Flowise, etc… already have support or will support assistants and will add extra capabilities or ease of use on top of the base functionality.

Using Azure Database for PostgreSQL as a vector store

When we build LLM applications, there is always a recurring question: “What vector store will we use?”. In Azure, there are several native solutions. Some of them were discussed in previous posts.

  • Azure Cognitive Search: supports vector search but also hybrid search with semantic reranking as discussed here
  • Azure Redis Cache Enterprise: check my blog post
  • Azure Cosmos DB for MongoDB Core: see Microsoft Learn

In addition to the above, you can of course host your own vector database in a container such as Qdrant or others. You can install these on any service that supports containers such as App Service, Container Instances, Container Apps, or Kubernetes.

Using PostgreSQL

If you are familiar with Azure Database for PostgreSQL flexible servers, you can use it as a vector store, as long as you install the vector extension. This extension can be enabled in all compute tiers. I installed PostgreSQL and set the compute tier to Burstable with size Standard_B1ms (1 vCore, 2GB). This is great for testing and will cost around 20 euros per month with 32GB of storage. For production use, the monthly cost will start from about 150 euros at the lowest General Purpose tier.

PostrgreSQL flexible server with lowest compute tier

After deployment, you need to enable the vector extension. In Server Parameters, search for azure.extensions and select VECTOR from the list. Then click Save.

VECTOR extension added

When done, grab the connection details from the Connect pane:

Connection details

In pgAdmin, register the server with the above details. Connect to the server and create a database. Ensure you configure the firewall settings in Azure to allow your IP address to connect to the server.

Database creation in pgAdmin

Inside the database, go to Extensions and add the vector extension:

vector extension added to the database

Note: in the code below, we will use LangChain. LangChain will try to enable the vector extension if it is not enabled

Note: if you do not want to install pgAdmin, you can create the database from the Azure Portal or use the Azure CLI.

Working with the vector store

Although you can create and query tables that contain vectors with plain SQL, we will use LangChain as a higher-level library that takes care of many of the details for us.

Take a look at the following Python code that creates a few embeddings (vectors) and then uses RAG (retrieval augmented generation) to answer a question with OpenAI’s text-davinci-003 model.

Note: the code is on Github as well

import os
import getpass

# read from .env file
from dotenv import load_dotenv
load_dotenv()

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGVector
from langchain.document_loaders import TextLoader
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI


loader = TextLoader("./state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()


pgpassword = os.getenv("PGPASSWORD", "")
if not pgpassword:
    pgpassword = getpass.getpass("Enter pgpassword: ")

CONNECTION_STRING = f"postgresql+psycopg2://pgadmin:{pgpassword}@pg-vec-geba.postgres.database.azure.com:5432/pgvector"

COLLECTION_NAME = "state_of_the_union_test"

# if you run this code more than once, you will duplicated vectors
# no upserts
db = PGVector.from_documents(
    embedding=embeddings,
    documents=docs,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING
)

retriever = db.as_retriever()

query = "What did the president say about Ketanji Brown Jackson"

# LLM will default to text-davinci-003 because we are using a completion endpoint
# versus a chat endpoint
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)

answer = qa.run(query)

print(answer)

The code above requires a .env file with the following content:

OPENAI_API_KEY=OPENAI_API_KEY
PGPASSWORD=PASSWORD_TO_POSTGRES

You will also need the State of the Union text file from here.

Before running the code, install the following packages:

pip install pgvector
pip install openai
pip install psycopg2-binary
pip install tiktoken

The code does the following:

  • Import required libraries: important here is the PGVector import
  • Loads the text file and splits it into chunks: chunking strategy is not too important here; just use chunks of 1000 characters
  • Create an instance of type OpenAIEmbeddings, later used to create a vector per chunk for storage in PostgreSQL; it will also be used when doing queries with a retrieval QA chain (see below); uses text-embedding-ada-002 embedding model
  • Construct the connection string for later use and set a collection name: collections are a way to store vectors together; the collections you create are kept in a table and each vector references the collection
  • Create an instance of PGVector with PGVector.from_documents: this will create/use tables to hold the collection(s) and vectors for you; all chunks will be vectorized and stored in a table; we will take a look at those tables in a moment; in a real application, you would reference existing tables and another process would create/update the vectors
  • Create a retriever (qa) from the PGVector instance for use in a retrieval QA chain
  • Run a query and print the answer: the qa.run (query) line does the n-nearest neighbor vector search in PostgreSQL (via the retriever), creates a meta-prompt with the relevant context, and returns the OpenAI model response in one step

In the PostgreSQL database, the above code creates two tables:

Tables created by LangChain to store the vectors

The collection table contains the collections you create from code. Each collection has a unique ID. The embedding table contains the vectors. Each vector has a unique ID and belongs to a collection. The fields of the embedding table are:

  • uuid: unique ID of the vector
  • collection_id: collection ID referencing the collection table
  • embedding: a field of type vector that stores the embedding (1536 dimensions)
  • document: the chunk of text that was vectorized
  • cmetadata: a JSON field with a link to the source file
  • custom_id: an id that is unique for each run

Note that when you run the sample Python code multiple times, you will have duplicated content. In a real application, you should avoid that. The process that creates and stores the vectors will typically be separate from the process that queries them.

⚠️ Important: Today, LangChain cannot search over all vectors in all collections. You always need to specify the collection to search. If you do need to search over all vectors, you can use SQL statements instead.

The search has the following properties:

  • Distance strategy: cosine similarity; the pgvector extension also supports L2 distance and inner product; the code above uses the text-embedding-ada-002 embeddings model by default; with that model, you should use cosine similarity; LangChain uses cosine similarity as the default for PGVector so that’s a match! 👏
  • Exact nearest neighbor search: although this provides perfect recall, it can get slow when there are many vectors because the entire table is scanned; the extension supports the creation of indexes to perform an approximate nearest neighbor search using IVFFLat or HNSW; see pgvector on GitHub for more details and also this article from Crunchy Data.

Note: most other vector databases use HNSW as the index type (e.g., Azure Cognitive Search, Qdrant, …); unlike IVFFLat you can create this index without having any vectors in your database table; at the time of writing (end of September 2023), the version of the vector extension on Azure was 0.4.1 and did not support HNSW; HNSW requires version 0.5.0 or higher

Conclusion

Azure Database for PostgreSQL with the vector extension is an interesting alternative to other vector database solutions in Azure. This is especially the case when PostgreSQL is your database of choice! In this post, we have shown how LangChain supports it with a simple example. If you do not use LangChain or other libraries, you can simply use SQL statements to create and search indexes as documented here.

The drawback of using PostgreSQL is that you need to know a bit more about exact and approximate nearest neighbor searches and the different index mechanisms. That’s actually a good thing if you want to create production applications with good performance. For a simple POC with not a lot of data, you can skip all of this and perform exact searches.

Besides the free tier of Azure Cognitive Search, the configuration above is the service with the lowest cost for POCs that need vector search. On top of that, the cheapest PostgreSQL option has more storage than Cognitive Search’s free tier (32GB vs. 50MB). Adding more storage is easy and relatively cheap as well. Give it a go and tell me what you think!

Use Azure OpenAI Add your data vector search from code

In the previous post, we looked at using Azure OpenAI Add your data from the Azure OpenAI Chat Playground. It is an easy-to-follow wizard to add documents to a storage account and start asking questions about them. From the playground, you can deploy a chat app to an Azure web app and you are good to go. The vector search is performed by an Azure Cognitive Search resource via an index that includes a vector next to other fields such as the actual content, the original URL, etc…

In this post, we will look at using this index from code and build a chat app using the Python Streamlit library.

All code can be found here: https://github.com/gbaeke/azure-cog-search

Requirements

You need an Azure Cognitive Search resource with an index that supports vector search. Use this post to create one. Besides Azure Cognitive Search, you will need Azure OpenAI deployed with both gpt-4 (or 3.5) and the text-embedding-ada-002 embedding model. The embedding model is required to support vector search. In Europe, use France Central as the region.

Next, you need Python installed. I use Python 3.11.4 64-bit on an M1 Mac. You will need to install the following libraries with pip:

  • streamlit
  • requests

You do not need the OpenAI library because we will use the Azure OpenAI REST APIs to be able to use the extension that enables the Add your data feature.

Configuration

We need several configuration settings. The can be divided into two big blocks:

  • Azure Cognitive Search settings: name of the resource, access key, index name, columns, type of search (vector), and more…
  • Azure OpenAI settings: name of the model (e.g., gpt-4), OpenAI access key, embedding model, and more…

You should create a .env file with the following content:

AZURE_SEARCH_SERVICE = "AZURE_COG_SEARCH_SHORT_NAME"
AZURE_SEARCH_INDEX = "INDEX_NAME"
AZURE_SEARCH_KEY = "AZURE_COG_SEARCH_AUTH_KEY"
AZURE_SEARCH_USE_SEMANTIC_SEARCH = "false"
AZURE_SEARCH_TOP_K = "5"
AZURE_SEARCH_ENABLE_IN_DOMAIN = "true"
AZURE_SEARCH_CONTENT_COLUMNS = "content"
AZURE_SEARCH_FILENAME_COLUMN = "filepath"
AZURE_SEARCH_TITLE_COLUMN = "title"
AZURE_SEARCH_URL_COLUMN = "url"
AZURE_SEARCH_QUERY_TYPE = "vector"

# AOAI Integration Settings
AZURE_OPENAI_RESOURCE = "AZURE_OPENAI_SHORT_NAME"
AZURE_OPENAI_MODEL = "gpt-4"
AZURE_OPENAI_KEY = "AZURE_OPENAI_AUTH_KEY"
AZURE_OPENAI_TEMPERATURE = 0
AZURE_OPENAI_TOP_P = 1.0
AZURE_OPENAI_MAX_TOKENS = 1000
AZURE_OPENAI_STOP_SEQUENCE = ""
AZURE_OPENAI_SYSTEM_MESSAGE = "You are an AI assistant that helps people find information."
AZURE_OPENAI_PREVIEW_API_VERSION = "2023-06-01-preview"
AZURE_OPENAI_STREAM = "false"
AZURE_OPENAI_MODEL_NAME = "gpt-4"
AZURE_OPENAI_EMBEDDING_ENDPOINT = "https://AZURE_OPENAI_SHORT_NAME.openai.azure.com/openai/deployments/embedding/EMBEDDING_MODEL_NAME?api-version=2023-03-15-preview"
AZURE_OPENAI_EMBEDDING_KEY = "AZURE_OPENAI_AUTH_KEY"

Now we can create a config.py that reads these settings.

from dotenv import load_dotenv
import os
load_dotenv()

# ACS Integration Settings
AZURE_SEARCH_SERVICE = os.environ.get("AZURE_SEARCH_SERVICE")
AZURE_SEARCH_INDEX = os.environ.get("AZURE_SEARCH_INDEX")
AZURE_SEARCH_KEY = os.environ.get("AZURE_SEARCH_KEY")
AZURE_SEARCH_USE_SEMANTIC_SEARCH = os.environ.get("AZURE_SEARCH_USE_SEMANTIC_SEARCH", "false")
AZURE_SEARCH_TOP_K = os.environ.get("AZURE_SEARCH_TOP_K", 5)
AZURE_SEARCH_ENABLE_IN_DOMAIN = os.environ.get("AZURE_SEARCH_ENABLE_IN_DOMAIN", "true")
AZURE_SEARCH_CONTENT_COLUMNS = os.environ.get("AZURE_SEARCH_CONTENT_COLUMNS")
AZURE_SEARCH_FILENAME_COLUMN = os.environ.get("AZURE_SEARCH_FILENAME_COLUMN")
AZURE_SEARCH_TITLE_COLUMN = os.environ.get("AZURE_SEARCH_TITLE_COLUMN")
AZURE_SEARCH_URL_COLUMN = os.environ.get("AZURE_SEARCH_URL_COLUMN")
AZURE_SEARCH_VECTOR_COLUMNS = os.environ.get("AZURE_SEARCH_VECTOR_COLUMNS")
AZURE_SEARCH_QUERY_TYPE = os.environ.get("AZURE_SEARCH_QUERY_TYPE")

# AOAI Integration Settings
AZURE_OPENAI_RESOURCE = os.environ.get("AZURE_OPENAI_RESOURCE")
AZURE_OPENAI_MODEL = os.environ.get("AZURE_OPENAI_MODEL")
AZURE_OPENAI_KEY = os.environ.get("AZURE_OPENAI_KEY")
AZURE_OPENAI_TEMPERATURE = os.environ.get("AZURE_OPENAI_TEMPERATURE", 0)
AZURE_OPENAI_TOP_P = os.environ.get("AZURE_OPENAI_TOP_P", 1.0)
AZURE_OPENAI_MAX_TOKENS = os.environ.get("AZURE_OPENAI_MAX_TOKENS", 1000)
AZURE_OPENAI_STOP_SEQUENCE = os.environ.get("AZURE_OPENAI_STOP_SEQUENCE")
AZURE_OPENAI_SYSTEM_MESSAGE = os.environ.get("AZURE_OPENAI_SYSTEM_MESSAGE", "You are an AI assistant that helps people find information about jobs.")
AZURE_OPENAI_PREVIEW_API_VERSION = os.environ.get("AZURE_OPENAI_PREVIEW_API_VERSION", "2023-06-01-preview")
AZURE_OPENAI_STREAM = os.environ.get("AZURE_OPENAI_STREAM", "true")
AZURE_OPENAI_MODEL_NAME = os.environ.get("AZURE_OPENAI_MODEL_NAME", "gpt-35-turbo")
AZURE_OPENAI_EMBEDDING_ENDPOINT = os.environ.get("AZURE_OPENAI_EMBEDDING_ENDPOINT")
AZURE_OPENAI_EMBEDDING_KEY = os.environ.get("AZURE_OPENAI_EMBEDDING_KEY")

Writing the chat app

Now we will create chat.py. The diagram below summarizes the architecture:

Chat app architecture (high level)

Here is the first section of the code with explanations:

import requests
import streamlit as st
from config import *
import json

# Azure OpenAI REST endpoint
endpoint = f"https://{AZURE_OPENAI_RESOURCE}.openai.azure.com/openai/deployments/{AZURE_OPENAI_MODEL}/extensions/chat/completions?api-version={AZURE_OPENAI_PREVIEW_API_VERSION}"
    
# endpoint headers with Azure OpenAI key
headers = {
    'Content-Type': 'application/json',
    'api-key': AZURE_OPENAI_KEY
}

# Streamlit app title
st.title("🤖 Azure Add Your Data Bot")

# Keep messages array in session state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display previous chat messages from history on app rerun
# Add your data messages include tool responses and assistant responses
# Exclude the tool responses from the chat display
for message in st.session_state.messages:
    if message["role"] != "tool":
        with st.chat_message(message["role"]):
            st.markdown(message["content"])

A couple of things happen here:

  • We import all the variables from config.py
  • We construct the Azure OpenAI REST endpoint and store it in endpoint; we use the extensions/chat endpoint here which supports the Add your data feature in API version 2023-06-01-preview and higher
  • We configure the HTTP headers to send to the endpoint; the headers include the Azure OpenAI authentication key
  • We print a title with Streamlit (st.title) and define a messages array that we store in Streamlit’s session state
  • Because of the way Streamlit works, we have to print the previous messages of the chat each time the page reloads. We do that in the last part but we exclude the tool role. The extensions/chat endpoint returns a tool response that contains the data returned by Azure Cognitive Search. We do not want to print the tool response. Together with the tool response, the endpoint returns an assistant response which is the response from the gpt model. We do want to print that response.

Now we can look at the code that gets executed each time the user asks a question. In the UI, the question box is at the bottom:

Streamlit chat UI

Whenever you type a question, the following code gets executed:

# if user provides chat input, get and display response
# add user question and response to previous chat messages
if user_prompt := st.chat_input():
    st.chat_message("user").write(user_prompt)
    with st.chat_message("assistant"):
        with st.spinner("🧠 thinking..."):
            # add the user query to the messages array
            st.session_state.messages.append({"role": "user", "content": user_prompt})
            body = {
                "messages": st.session_state.messages,
                "temperature": float(AZURE_OPENAI_TEMPERATURE),
                "max_tokens": int(AZURE_OPENAI_MAX_TOKENS),
                "top_p": float(AZURE_OPENAI_TOP_P),
                "stop": AZURE_OPENAI_STOP_SEQUENCE.split("|") if AZURE_OPENAI_STOP_SEQUENCE else None,
                "stream": False,
                "dataSources": [
                    {
                        "type": "AzureCognitiveSearch",
                        "parameters": {
                            "endpoint": f"https://{AZURE_SEARCH_SERVICE}.search.windows.net",
                            "key": AZURE_SEARCH_KEY,
                            "indexName": AZURE_SEARCH_INDEX,
                            "fieldsMapping": {
                                "contentField": AZURE_SEARCH_CONTENT_COLUMNS.split("|") if AZURE_SEARCH_CONTENT_COLUMNS else [],
                                "titleField": AZURE_SEARCH_TITLE_COLUMN if AZURE_SEARCH_TITLE_COLUMN else None,
                                "urlField": AZURE_SEARCH_URL_COLUMN if AZURE_SEARCH_URL_COLUMN else None,
                                "filepathField": AZURE_SEARCH_FILENAME_COLUMN if AZURE_SEARCH_FILENAME_COLUMN else None,
                                "vectorFields": AZURE_SEARCH_VECTOR_COLUMNS.split("|") if AZURE_SEARCH_VECTOR_COLUMNS else []
                            },
                            "inScope": True if AZURE_SEARCH_ENABLE_IN_DOMAIN.lower() == "true" else False,
                            "topNDocuments": AZURE_SEARCH_TOP_K,
                            "queryType":  AZURE_SEARCH_QUERY_TYPE,
                            "roleInformation": AZURE_OPENAI_SYSTEM_MESSAGE,
                            "embeddingEndpoint": AZURE_OPENAI_EMBEDDING_ENDPOINT,
                            "embeddingKey": AZURE_OPENAI_EMBEDDING_KEY
                        }
                    }   
                ]
            }  

            # send request to chat completion endpoint
            try:
                response = requests.post(endpoint, headers=headers, json=body)

                # there will be a tool response and assistant response
                tool_response = response.json()["choices"][0]["messages"][0]["content"]
                tool_response_json = json.loads(tool_response)
                assistant_response = response.json()["choices"][0]["messages"][1]["content"]

                # get urls for the JSON tool response
                urls = [citation["url"] for citation in tool_response_json["citations"]]


            except Exception as e:
                st.error(e)
                st.stop()
            
           
            # replace [docN] with urls and use 0-based indexing
            for i, url in enumerate(urls):
                assistant_response = assistant_response.replace(f"[doc{i+1}]", f"[[{i}]({url})]")
            

            # write the response to the chat
            st.write(assistant_response)

            # write the urls to the chat; gpt response might not refer to all
            st.write(urls)

            # add both responses to the messages array
            st.session_state.messages.append({"role": "tool", "content": tool_response})
            st.session_state.messages.append({"role": "assistant", "content": assistant_response})
            

When there is input, we write the input to the chat history on the screen and add it to the messages array. The OpenAI APIs expect a messages array that includes user and assistant roles. In other words, user questions and assistant (here gpt-4) responses.

With a valid messages array, we can send our payload to the Azure OpenAI extensions/chat endpoint. If you have ever worked with the OpenAI or Azure OpenAI APIs, many of the settings in the JSON body will be familiar. For example: temperature, max_tokens, and of course the messages themselves.

What’s new here is the dataSources field. It contains all the information required to perform a vector search in Azure Cognitive Services. The search finds content relevant to the user’s question (that was added last to the messages array). Because queryType is set to vector, we also need to provide the embedding endpoint and key. It’s required because the user question has to be vectorized in order to compare it with the stored vectors.

It’s important to note that the extensions/chat endpoint, together with the dataSources configuration takes care of a lot of the details:

  • Perform a k-nearest neighbor search (k=5 here) to find 5 documents closely related to the user’s question
  • It uses vector search for this query (could be combined with keyword and semantic search to perform a hybrid search but that is not used here)
  • It stuffs the prompt to the GPT model with the relevant content
  • It returns the GPT model response (assistant response) together with a tool response. The tool response contains citations that include URLs to the original content and the content itself.

In the UI, we print the URLs from these citations after modifying the assistant response to just return hyperlinked numbers like [0] and [1] for the citations instead of unlinked [doc1], [doc2], etc… In the UI, that looks like:

Printing the URLs from the citations

Note that this chat app is a prototype and does not include management of the messages array. Long interactions will reach the model’s token limit!

You can find this code on GitHub here: https://github.com/gbaeke/azure-cog-search.

Conclusion

Although still in preview, you now have an Azure-native solution that enables the RAG pattern with vector search. RAG stands for Retrieval Augmented Generation. Azure Cognitive Search is a fully managed service that stores the vectors and performs similarity searches. There is no need to deploy a 3rd party vector database.

There is no need for specific libraries to implement this feature because it is all part of the Azure OpenAI API. Microsoft simply extended that API to add data sources and takes care of all the behind-the-scenes work that finds relevant content and adds it to your prompt.

If, for some reason, you do not want to use the Azure OpenAI API directly and use something like LangChain or Semantic Kernel, you can of course still do that. Both solutions support Azure Cognitive Search as a vector store.

Improvements in Azure OpenAI Add your data

In a previous post, I talked about the Add your data feature in the Azure OpenAI Chat playground. Recently, there have been some updates to this feature, including vector search. Let’s take a look at the updated experience and focus on vector search.

Starting point

I have some PDF documents in a storage account container. They are PDFs containing job descriptions for a select group of companies. You can use .txt, .md, .html, Microsoft Word files, Microsoft PowerPoint files, or PDFs.

PDFs in a storage account

At the storage account level, CORS settings should be GET from all origins (*). This can also be set from the Add your data wizard in the OpenAI Playground.

CORS settings

In addition to the storage account, you need an OpenAI resource deployed to a region of choice. I have chosen France Central which provides access to gpt-4 and the text-embedding-ada-002 embedding model (a text embedding model is required for vector search). Ensure those models are deployed. For example:

Deployed models in France Central

Running the wizard

In OpenAI Chat Playground, you will find the Add your data (preview) tab. Use the + Add a data source button to start.

There are several sources to start from. Because I already have my files in a storage account, I will select Azure Blob Storage as the source and select the name of the storage account and the container with my files. You can also upload files or use an existing index in Cognitive Search. Whatever the option you choose, you will always end up with a Cognitive Search index that serves relevant content to the chat.

Data source options

As you can see from the above screenshot, in addition to the storage account, you have to select an Azure Cognitive Search instance. It will not be created for you. If you do not have such an instance, either click the link under the Select Azure Cognitive Search resource dropdown or create one yourself and use the refresh icon. I already have such a resource called acs-geba. Use the Basic pricing tier as a minimum. This gives you a vector quota of 1GB.

After selecting the Azure Cognitive Search resource, enter an index name. The documents in the storage account will be added to this index so we can search via this index later. The index will be created for you. I will use oai as the index name and also set a schedule to Hourly to update the index automatically. The schedule can be adjusted afterward in Azure Cognitive Search.

We now have the following in the wizard:

Add your data, data source config

You can now add vector search. This is in addition to keyword and semantic search. To use vector search, you need to specify an embedding model. If you do not have text-embedding-ada-002 deployed in your region, you will not be able to turn on vector search. This feature requires at least the Basic or higher SKU.

Turning on vector search (still in the first page of the wizard)

Above, I called my deployment of the text-embedding-ada-002 model embedding but you can use any name you like. It’s just a deployment name.

Now we can press Next, to be presented with the Data management page:

Data management page

You can find more information about those options here. In most cases, using Vector search alone is sufficient but it depends a bit on your dataset and use-case. I will just use Vector search. When we use Redis, Qdrant, Pinecone, or other vector stores, we also use vectorized search exclusively, which works very well.

After clicking Next, review what will happen and click Save and Close. The data will be added to the index:

Data is being added

Asking questions about the data

When the data has been added to Azure Cognitive Search, you can start asking questions. If you want to limit the chat to only your data, ensure that Limit Responses to your data content is checked.

Ready to go

In the Chat Platground, I selected gpt-4 and asked the following question: “Who are MBarQ and do they need AI translators?”. The answer is as follows:

Asking a question

This answer comes from one of the PDFs containing the job description.

Behind the scenes

For the above interaction to work, the question “Who are MBarQ and do they need AI translators?” is vectorized using the selected embedding model. Let’s call this the query vector. The selected embedding model creates a vector with 1536 dimensions that represent the text within a vector space. The nice thing here is that the embedding of the query is created automatically as part of the extended Azure OpenAI API.

The vectors for your documents are stored in an index that ends with the word chunks. Here’s my index and its defined fields. This is all the result of the wizard. No changes have been made to Azure Cognitive Search manually.

Index used for vector search and its fields

As you can see, there is a field for the contentVector which also notes the number of dimensions. The embedding model we used just happens to output 1536 numbers. Other embedding models use a different number of dimensions. Next to the contentVector, the content field contains the actual text that the vector was created from. That text will later be injected, behind the scenes, in the gpt-4 prompt. But we first have to find these pieces of text!

With the query vector in hand, Add your data searches for pieces of text with vectors that are close to the query vector. Cognitive Search uses cosine similarity to do that but there is no real need to know that. Note that we only use vector search in this scenario. When you do hybrid and/or semantic searches, the query process is different. Also, note that the index with vectors works on chunks of text coming from your documents. This chunking happens transparently in the background when the indexer runs.

Once the top N (usually n=5 but can be adjusted in code) vectors that are closest to the query vector are found, we also have the pieces of text closest to the query (from the content field). The original pieces of text that the vectors were calculated from get added to the query and sent to gpt-4. The prompt sent to gpt-4 could be something like the one below (just an example):

Who are MBarQ and do they need AI translators?

Only answer based on the context below after ---

---

First piece of text (no vectors here, just plain text!!!)

Second piece of text

...

Based on this prompt and hopefully relevant context below the — mark, the model can answer the original question.

Note that the Add your data experience also returns references. In the UI, you can click these to see the source text:

References

Deployment

From the Playground, you can deploy the chat experience to a web app or a Power Virtual Agent bot:

Deployment

At this moment (September 2023) the Power Virtual Agent deployment does not work if your default environment is not in the United States. When you click A new Power Virtual Agent bot…, you should quickly copy the URL and replace the environment ID with another one that is in the United States. Navigate to the modified URL to create the bot.

Deploying to a web app is a bit more straightforward because that is just a web app in Azure. No Power Platform madness here… 😀

Note that if you enable chat history, CosmosDB is used. Here’s the app with chat history visible at the right, similar to chat history in ChatGPT or Bing Chat. This app uses Azure Active Directory (Microsoft Entra ID) for authentication.

Chat in web app

Conclusion

The main addition to Add your data surely is vector search. That capability was already a part of Cognitive Search but the Add your data feature did not use it. When you do use it, a lot of stuff happens in the background automatically:

  • An index that supports vectors is created; if selected the index is automatically updated based on the contents in the storage account container
  • Documents are chunked and vectors are created for each of these chunks based on the selected embeddings model
  • There is no need to vectorize the user’s query yourself, performing a nearest neighbor search and stuffing the gpt prompt with content; everything is handled by the underlying API

It will be interesting to see how it evolves further.

Use the Azure Add your data feature from code

In a previous post, we looked at Azure Add your data (preview) that can be configured from the Azure OpenAI Chat playground. In a couple of steps, you can point to or upload files of a supported file type (PDF, Word, …) and create an Azure Cognitive Search index. That index can subsequently be queried to find relevant content to inject in the prompt of an Azure OpenAI model such as gtp-35-turbo or gpt-4. The injection is done for you by the Add your data feature!

Although this feature is fantastic and easy to configure in the portal, you probably want to use it in a custom application. So in this post, we will take a look at how to use this feature in code.

You can find a notebook with some sample code here: https://github.com/gbaeke/azure-cog-search. The code is a subset of the code used by Microsoft to create a web application from the Azure OpenAI playground. That code can be found here. Although the code is written in Python, it does not use specific OpenAI or Azure OpenAI libraries. The code uses the Azure OpenAI REST APIs only, which means you can easily replicate this in any other language or framework.

Before you run the code, make sure you clone the repo and add a .env file as described in README.md. The code itself is in a Python notebook so make sure you can run these notebooks. I use Visual Studio Code to do that:

Running the notebook in VS Code

To configure Visual Studio Code, check https://code.visualstudio.com/docs/datascience/jupyter-notebooks.

The code itself does the following:

  • create Python variables from environment variables to configure both Azure Cognitive Search and Azure OpenAI. Make sure you have read the previous blog post and/or looked at the video to have both of these resources. You will need to add the short names of these services and the authentication keys to these services. The video at the end of this post discusses the options in some more detail.
  • Prepare a messages array with a user question: I am an Azure architect. Is there a job for me? In the previous post, I added documents with job descriptions. You can add any documents you want and modify the user question accordingly.
  • Prepare the JSON body for an HTTP POST. The JSON body will be sent to the Azure OpenAI chat completion (extensions) API and will include information about the Azure Cognitive Search data source.
  • Create the endpoint URL and HTTP headers to send to the chat completion API. The headers contain the Azure OpenAI API key.
  • Send the JSON body to the chat completion endpoint with the Python requests module and extract the full response (that includes citations) and the response by the OpenAI model. In this code, gpt-4 is used.

To see the response from the code, check https://github.com/gbaeke/azure-cog-search/blob/main/cog-search-openai.ipynb. If you have worked with the OpenAI APIs before, you will notice that the response in the choices field contains two messages:

  • a message with the role of tool that includes content from Azure Cognitive Services including metadata like URL and file path
  • a message with the role of assistant that contains the actual answer from the OpenAI model (here gpt-4)

It’s important to note that you need to use the chat completions extensions API with the correct API version of 2023-06-01-preview. This is reflected in the constructed endpoint. Also, note it’s extensions/chat/completions in the URL!

endpoint = f"https://{AZURE_OPENAI_RESOURCE}.openai.azure.com/openai/deployments/{AZURE_OPENAI_MODEL}/extensions/chat/completions?api-version={AZURE_OPENAI_PREVIEW_API_VERSION}"

If you are used to LangChain or other libraries that are supposed to make it easy to work with large language models, prompts, vector databases, etc… you know that the document queries are separated from the actual call to the LLM and that the retrieved documents are injected into the prompt. Of course, LangChain and other tools have higher-level APIs that make it easy to do in just a few lines of code.

Microsoft has gone for direct integration of Azure Cognitive Search in the Azure OpenAI APIs version 2023-06-01-preview and above. You can find the OpenAPI specification (swagger) here. One API call is all that is needed!

Note that in this code, we are not using semantic search. In the sample .env file, AZURE_SEARCH_USE_SEMANTIC_SEARCH is set to false. You cannot simply turn on that setting because it also requires turning on the feature in Azure Cognitive Search. The feature is also in preview:

Semantic search in Azure Cognitive Search (preview)

To understand the differences between lexical search and semantic search, see this article. In general, semantic search should return better results when using natural language queries. By turning on the feature in Cognitive Search (Free tier) and setting AZURE_SEARCH_USE_SEMANTIC_SEARCH to true, you should be good to go.

⚠️ Note that semantic search is not the same as vector search in the context of Azure Cognitive Search. With vector search, you have to generate vectors (embeddings) using an embedding model and store these vectors in your index together with your content. Although Azure Cognitive Search supports vector search, it is not used here. It’s kind of confusing because vector search enables semantic search in a general context.

Here’s a video with more information about this blog post: