Use Azure OpenAI on your data with Semantic Kernel

I have written before about Azure OpenAI on your data. For a refresher, see Microsoft Learn. In short, Azure OpenAI on your data tries to make it easy to create an Azure AI Search index that supports advanced search mechanisms like vector search, potentially enhanced with semantic reranking.

On of the things you can do is simply upload your documents and start asking questions about these documents, right from within the Azure OpenAI Chat playground. The screenshot below shows the starting screen of a step-by-step wizard to get your documents into an index:

Upload your documents to Azure OpenAI on your data

Note that whatever option you choose in the wizard, you will always end up with an index in Azure AI Search. When the index is created, you can start asking questions about your data:

Your questions are answered with links to source documents (citations)

Instead of uploading your documents, you can use any Azure AI Search index. You will have the ability to map the fields from your index to the fields Azure OpenAI expects. You will see an example in the Semantic Kernel code later and in the next section.

Extensions to the OpenAI APIs

To make this feature work, Microsoft extended the OpenAI APIs. By providing extra information to the API about Azure AI Search, mapped fields, type of search, etc… the APIs retrieve relevant content, add that to the prompt and let the model answer. It is retrieval augmented generation (RAG) but completely API driven.

The question I asked in the last screenshot was: “Does Redis on Azure support vector queries?”. The API creates an embedding for that question to find similar vectors. The vectors are stored together with their source text (from your documents). That text is added as context to the prompt, allowing the chosen model to answer as shown above.

Under the hood, the UI makes a call to the URL below:

{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}

This looks similar to a regular chat completions call except for the extensions part. When you use this extension API, you can supply extra information. Using the Python OpenAI packages, the extra information looks like below:

dataSources=[
  {
    "type": "AzureCognitiveSearch",
    "parameters": {
      "endpoint": "'$search_endpoint'",
      "indexName": "'$search_index'",
      "semanticConfiguration": "default",
      "queryType": "vectorSimpleHybrid",
      "fieldsMapping": {
        "contentFieldsSeparator": "\n",
        "contentFields": [
          "Content"
        ],
        "filepathField": null,
        "titleField": "Title",
        "urlField": "Url",
        "vectorFields": [
          "contentVector"
        ]
   ... many more settings (shortened here)

The dataSources section is used by the extension API to learn about the Azure AI Search resource, the API key to use (not shown above), the type of search to perform (hybrid) and how to map the fields in your index to the fields this API expects. For example, we can tell the API about one or more contentFields. Above, there is only one such field named Content. That’s the name of a field in your chosen index.

You can easily get a Python code example to use this API from the Chat Completions playground:

Get sample code by clicking View code in the playground

How to do this in Semantic Kernel?

In what follows, I will show snippets of a full sample you can find on GitHub. The sample uses Streamlit to provide the following UI:

Sample Streamlit app

Above, (1) is the original user questions. Using Azure OpenAI on your data, we use Semantic Kernel to provide a response with citations (2). As an extra, all URLs returned by the vector search are shown in (3). They are not reflected in the response because not all retrieved results are relevant.

Let’s look at the code now…

st.session_state.kernel = sk.Kernel()

# Azure AI Search integration
azure_ai_search_settings = sk.azure_aisearch_settings_from_dot_env_as_dict()
azure_ai_search_settings["fieldsMapping"] = {
    "titleField": "Title",
    "urlField": "Url",
    "contentFields": ["Content"],
    "vectorFields": ["contentVector"], 
}
azure_ai_search_settings["embeddingDependency"] = {
    "type": "DeploymentName",
    "deploymentName": "embedding"  # you need an embedding model with this deployment name is same region as AOAI
}
az_source = AzureAISearchDataSources(**azure_ai_search_settings, queryType="vectorSimpleHybrid", system_message=system_message) # set to simple for text only and vector for vector
az_data = AzureDataSources(type="AzureCognitiveSearch", parameters=az_source)
extra = ExtraBody(dataSources=[az_data]) if search_data else None

Above we create a (semantic) kernel. Don’t bother with the session state stuff, that’s specific to Streamlit. After that, the code effectively puts together the Azure AI Search information to be added to the extension API:

  • get Azure AI Search settings from a .env file: contains the Azure AI Search endpoint, API key and index name
  • add fieldsMapping to the Azure AI Search settings: contentFields and vectorFields are arrays; we need to map the fields in our index to the fields that the API expects
  • add embedding information: the deploymentName is set to embedding; you need an embedding model with that name in the same region as the OpenAI model you will use
  • create an instance of class AzureAISearchDataSources: creates the Azure AI Search settings and add additional settings such as queryType (hybrid search here)
  • create an instance of class AzureDataSources: this will tell the extension API that the data source is AzureCognitiveSearch with the settings provided via the AzureAISearchDataSources class; other datasources are supported
  • the call to the extension API needs the dataSources field as discussed earlier: the ExtraBody class allows us to define what needs to be added to the POST body of a chat completions call; multiple dataSources can be provided but here, we have only one datasource (of type AzureCognitiveSearch); we will need this extra variable later in our request settings

Note: I have a parameter in my code, search_data. Only if search_data is True, Azure OpenAI on your data should be enabled. If it is false, the variable extra should be None. You will see this variable pop up in other places as well

In Semantic Kernel, you can add one or more services to the kernel. In this case, we only add a chat completions service that points to a gpt-4-preview deployment. A .env file is used to get the Azure OpenAI endpoint, key and deployment.

service_id = "gpt"
deployment, api_key, endpoint = azure_openai_settings_from_dot_env(include_api_version=False)
chat_service = sk_oai.AzureChatCompletion(
    service_id=service_id,
    deployment_name=deployment,
    api_key=api_key,
    endpoint=endpoint,
    api_version="2023-12-01-preview" if search_data else "2024-02-01",  # azure openai on your data in SK only supports 2023-12-01-preview
    use_extensions=True if search_data else False # extensions are required for data search
)
st.session_state.kernel.add_service(chat_service)

Above, there are two important settings to make Azure OpenAI on your data work:

  • api_version: needs to be set to 2023-12-01-preview; Semantic Kernel does not support the newer versions at the time of this writing (end of March, 2024). However, this will be resolved soon.
  • use_extensions: required to use the extension API; without it the call to the chat completions API will not have the extension part.

We are not finished yet. We also need to supply the ExtraBody data (extra variable) to the call. That is done via the AzureChatPromptExecutionSettings:

req_settings = AzureChatPromptExecutionSettings(
    service_id=service_id,
    extra_body=extra,
    tool_choice="none" if search_data else "auto", # no tool calling for data search
    temperature=0,
    max_tokens=1000
)

In Semantic Kernel, we can create a function from a prompt with chat history and use that prompt to effectively create the chat experience:

prompt_template_config = PromptTemplateConfig(
    template="{{$chat_history}}{{$user_input}}",
    name="chat",
    template_format="semantic-kernel",
    input_variables=[
        InputVariable(name="chat_history", description="The history of the conversation", is_required=True),
        InputVariable(name="user_input", description="The user input", is_required=True),
    ],
)

# create the chat function
if "chat_function" not in st.session_state:
    st.session_state.chat_function = st.session_state.kernel.create_function_from_prompt(
        plugin_name="chat",
        function_name="chat",
        prompt_template_config=prompt_template_config,
    )

Later, we can call our chat function and provide KernelArguments that contain the request settings we defined earlier, plus the user input and the chat history:

arguments = KernelArguments(settings=req_settings)

arguments["chat_history"] = history
arguments["user_input"] = prompt
response = await st.session_state.kernel.invoke(st.session_state.chat_function, arguments=arguments)

The important part here is that we invoke our chat function. With the kernel’s chat completion service configured to use extensions, and the extra request body field added to the request settings, you effectively use the Azure OpenAI on your data APIs as mentioned earlier.

Conclusion

Semantic Kernel supports Azure OpenAI on your data. To use the feature effectively, you need to:

  • Prepare the extra configuration (ExtraBody) to send to the extension API
  • Enable the extension API in your Azure chat completion service and ensure you use the supported API version
  • Add the ExtraBody data to your AzureChatPromptExecutionSettings together with settings like temperature etc…

Although it should be possible to use Azure OpenAI on your data together with function calling, I could not get that to work. Function calling requires a higher API version, which is not supported by Semantic Kernel in combination with Azure OpenAI on your data yet!

The code on GitHub can be toggled to function mode by setting MODE in .env to anything but search. In that case though, add your data is not used. Be sure to restart the Streamlit app after you change that setting in the .env file. In function mode you can ask about the current time and date. If you provide a Bing api key, you can also ask questions that require a web search.

So you want a chat bot to talk to your SharePoint data?

It’s a common request we hear from clients: “We want a chatbot that can interact with our data in SharePoint!” The idea is compelling – instead of relying on traditional search methods or sifting through hundreds of pages and documents, users could simply ask the bot a question and receive an instant, accurate answer. It promises to be a much more efficient and user-friendly experience.

The appeal is clear:

  • Improved user experience
  • Time savings
  • Increased productivity

But how easy is it to implement a chatbot for SharePoint and what are some of the challenges? Let’s try and find out.

The easy way: Copilot Studio

I have talked about Copilot Studio in previous blog posts. One of the features of Copilot Studio is generative answers. With generative answers, your copilot can find and present information for different sources like web sites or SharePoint data. The high level steps to work with SharePoint data are below:

  • Configure your copilot to use Microsoft Entra ID authentication
  • In the Create generative answers node, in the Data sources field, add the SharePoint URLs you want to work with

From a high level, this is all you need to start asking questions. One advantage of using this feature is that the SharePoint data is accessed on behalf of the user. When generative answers searches for SharePoint data, it only returns information that the user has access to.

It is important to note that the search relies on a call to the Graph API search endpoint (https://graph.microsoft.com/v1.0/search/query) and that only the top three results that come back from this call are used. Generative answers only works with files up to 3MB in size. It is possible that the search returns documents that are larger than 3MB. They would not be processed. If all results are above 3MB, generative answers will return an empty response.

In addition, the user’s question is rewritten to only send the main keywords to the search. The type of search is a keyword search. It is not a similarity search based on vectors.

Note: the type of search will change when Microsoft enables Semantic Index for Copilot for your tenant. Other limitations, like the 3MB size limit, will be removed as well.

Pros:

  • easy to configure (UI)
  • uses only documents the user has access to (Entra ID integration)
  • no need to create a pipeline to process SharePoint data; simply point at SharePoint URLs 🔥
  • an LLM is used “under the hood”; there is no need to setup an Azure OpenAI instance

Cons:

  • uses keyword search which can result in less relevant results
  • does not use vector search and/or semantic reranking (e.g., like in Azure AI Search)
  • number of search results that can provide context is not configurable (maximum 3)
  • documents are not chunked; search can not retrieve relevant pieces of text from a document
  • maximum size is 3MB; if the document is highly relevant to answer the user’s query, it might be dropped because of its size

Although your mileage may vary, the limitations make it hard to build a chat bot that provides relevant and qualitative answers. What can we do to fix that?

Copilot Studio with Azure OpenAI on your data

Copilot Studio has integration with Azure OpenAI on your data. Azure OpenAI on your data makes it easy to create an Azure AI Search index based on your documents. Such an index creates chunks of larger documents and uses vectors to match a user’s query to similar chunks. Such queries usually result in more relevant pieces of text from multiple documents. In addition to vector search, you can combine vector search with keyword search and optionally rerank the search results semantically. In most cases, you want these advanced search options because relevant context is key for the LLM to work with!

The diagram below shows the big picture:

Using AI Search to query documents with vectors

The diagram above shows documents in a storage account (not SharePoint, we will get to that). With Azure OpenAI on your data, you simply point to the storage account, allowing Azure AI Search to build an index that contains one or more document chunks per document. The index contains the text in the chunk and a vector of that text. Via the Azure OpenAI APIs, chat applications (including Copilot Studio) can send user questions to the service together with information about the index that contains relevant content. Behind the scenes, the API searches for similar chunks and uses them in the prompt to answer the user’s question. You can configure the number of chunks that should be put in the prompt. The number is only limited by the OpenAI model’s context limit (8k, 16k, 32k or 128k tokens).

You do not need to write code to create this index. Azure OpenAI on your data provides a wizard to create the index. The image below shows the wizard in Azure AI Studio (https://ai.azure.com):

Azure OpenAI add your data

Above, instead of pointing to a storage account, I selected the Upload files/folder feature. This allows you to upload files to a storage account first, and then create the index from that storage account.

Azure OpenAI on your data is great, but there is this one tiny issue: there is no easy way to point it to your SharePoint data!

It would be fantastic if SharePoint was a supported datasource. However, it is important to realise that SharePoint is not a simple datasource:

  • What credentials are used to create the index?
  • How do you ensure that queries use only the data the user has access to?
  • How do you keep the SharePoint data in sync with the Azure AI Search index? And not just the data, the ACLs (access control lists) too.
  • What SharePoint data do you support? Just documents? List items? Web pages?

The question now becomes: “How do you get SharePoint data into AI Search to improve search results?” Let’s find out.

Creating an AI Search index with SharePoint data

Azure AI Search offers support for SharePoint as a data source. However, it’s important to note that this feature is currently in preview and has been in that state for an extended period of time. Additionally, there are several limitations associated with this functionality:

  • SharePoint .ASPX site content is not supported.
  • Permissions are not automatically ingested into the index. To enable security trimming, you will need to add permission-related information to the index manually, which is a non-trivial task.

In the official documentation, Microsoft clearly states that if you require SharePoint content indexing in a production environment, you should consider creating a custom connector that utilizes SharePoint webhooks in conjunction with the Microsoft Graph API to export data to an Azure Blob container. Subsequently, you can leverage the Azure Blob indexer to index the exported content. This approach essentially means that you are responsible for developing and maintaining your own custom solution.

Note: we do not follow the approach with webhooks because of its limitations

What to do?

When developing chat applications that leverage retrieval-augmented generation (RAG) with SharePoint data, we typically use a Logic App or custom job to process the SharePoint data in bulk. This Logic App or job ingests various types of content, including documents and site pages.

To maintain data integrity and ensure that the system remains up-to-date, we also utilize a separate Logic App or job that monitors for changes within the SharePoint environment and updates the index accordingly.

However, implementing this solution in a production environment is not a trivial task, as there are numerous factors to consider:

  • Logic Apps have limitations when it comes to processing large volumes of data. Custom code can be used as a workaround.
  • Determining the appropriate account credentials for retrieving the data securely.
  • Identifying the types of changes to monitor: file modifications, additions, deletions, metadata updates, access control list (ACL) changes, and more.
  • Ensuring that the index is updated correctly based on the detected changes.
  • Implementing a mechanism to completely rebuild the index when the data chunking strategy changes, typically involving the creation of a new index and updating the bot to utilize the new index. Index aliases can be helpful in this regard.

In summary, building a custom solution to index SharePoint data for chat applications with RAG capabilities is a complex undertaking that requires careful consideration of various technical and operational aspects.

Security trimming

Azure AI Search does not provide document-level permissions. There is also no concept of user authentication. This means that you have to add security information to an Azure AI Search index yourself and, in code, ensure that AI Search only returns results that the logged on user has access to.

Full details are here with the gist of it below:

  • add a security field of type collection of strings to your index; the field should allow filtering
  • in that field, store group Ids (e.g., Entra ID group oid’s) in the array
  • while creating the index, retrieve the group Ids that have at least read access to the document you are indexing; add each group Id to the security field

When you query the index, retrieve the logged on user’s list of groups. In your query, use a filter like the one below:

{
"filter":"group_ids/any(g:search.in(g, 'group_id1, group_id2'))"
}

Above, group_ids is the security field and group_id1 etc… are the groups the user belongs to.

For more detailed steps and example C# code, see https://learn.microsoft.com/en-us/azure/search/search-security-trimming-for-azure-search-with-aad.

If you want changes in ACLs in SharePoint to be reflected in your index as quickly as possible, you need a process to update the security field in your index that is triggered by ACL changes.

Conclusion

Crafting a chat bot that seamlessly works with SharePoint data to deliver precise answers is no simple feat. Should you manage to obtain satisfactory outcomes leveraging generative responses within Copilot Studio, it’s advisable to proceed with that route. Even if you do not use Copilot Studio, you can use Graph API search within custom code.

If you want more accurate search results and switch to Azure AI Search, be mindful that establishing and maintaining the Azure AI Search index, encompassing both SharePoint data and access control lists, can be quite involved.

It seems Microsoft is relying on the upcoming Semantic Index capability to tackle these hurdles, potentially in combination with Copilot for Microsoft 365. When Semantic Index ultimately becomes available, executing a search through the Graph API could potentially fulfill your requirements.

Embedding flows created with Microsoft Prompt Flow in your own applications

A while ago, I wrote about creating your first Prompt Flow in Visual Studio Code. In this post, we will embed such a flow in a Python application built with Streamlit. The application allows you to search for images based on a description. Check the screenshot below:

Streamlit app to search for images based on a description

There are a few things we need to make this work:

  • An index in Azure AI Search that contains descriptions of images, a vector of these descriptions and a link to the image
  • A flow in Prompt Flow that takes a description as input and returns the image link or the entire image as output
  • A Python application (the Streamlit app above) that uses the flow to return an image based on the description

Let’s look at each component in turn.

Azure AI Search Index

Azure AI Search is a search index that supports keyword search, vector search and semantic reranking. You can combine keyword and vector search in what is called a hybrid search. The hybrid search results can optionally be reranked further using a state-of-the-art semantic reranker.

The index we use is represented below:

Index in Azure AI Search
  • Description: contains the description of the image; the image description was generated with the gpt-4-vision model and is larger than just a few words
  • URL: the link to the actual image; the image is not stored in the index, it’s just shown for reference
  • Vector: vector generated by the Azure OpenAI embedding model; it generates 1536 floating point numbers that represent the meaning of the description

Using vectors and vector search allows us to search not just for cat but also for words like kat (in Dutch) or even feline creature.

The flow we will create in Prompt Flow uses the Azure AI Search index to find the URL based on the description. However, because Azure AI Search might return images that are not relevant, we also use a GPT model to make the final call about what image to return.

Flow

In Prompt Flow in Visual Studio Code, we will create the flow below:

Flow we will embed in the Streamlit app

It all starts from the input node:

Input node

The flow takes one input: description. In order to search for this description, we need to convert it to a vector. Note that we could skip this and just do a text search. However, that will not get us the best results.

To embed the input, we use the embedding node:

Embedding node

The embedding node uses a connection called open_ai_connection. This connection contains connection information to an Azure OpenAI resource that hosts the embedding model. The model deployment’s name is embedding. The input to the embedding node is the description from the input. The output is a vector:

Output of embedding node

Now that we have the embedding, we can use a Vector DB Lookup node to perform a vector search in Azure AI Search:

Azure AI Search

Above, we use another connection (acs-geba) that holds the credentials to connect to the Azure AI Search resource. We specify the following to perform the search:

  • index name to search: images-sdk here
  • what text to put in the text_field: the description from the input; this search will be a hybrid search; we search with both text and a vector
  • vector field: the name of the field that holds the vector (textVector field in the images-sdk index)
  • search_params: here we specify the fields we want to return in the search results; name, description and url
  • vector to find similar vectors for: the output from the embedding node
  • the number of similar items to return: top_k is 3

The result of the search node is shown below:

Search results

The result contains three entries from the search index. The first result is the closest to the description from our input node. In this case, we could just take the first result and be done with it. But what if we get results that do not match the description?

To make the final judgement about what picture to return, let’s add an LLM node:

LLM Node

The LLM node uses the same OpenAI connection and is configured to use the chat completions API with the gpt-4 model. We want this node to return proper JSON by setting the response_format to json_object. We also need a prompt, which is a ninja2 template best_image.jinja2:

system:
You return the url to an image that best matches the user's question. Use the provided context to select the image. Return the URL in JSON like so:
{ "url": "the_url_from_search" }

Only return an image when the user question matches the context. If not found, return JSON with the url empty like { "url": "" }

user question:
{{description}}

context : {{search_results}}

The template above sets the system prompt and specifically asks to return JSON. With the response format set to JSON, the word JSON (in uppercase) needs to be in the prompt or you will get an error.

The prompt defines two parameters:

  • description: we connect the description from the input to this parameter
  • search_results: we connect the results from the aisearch node to this parameter

In the screenshot above, you can see this mapping being made. It’s all done in the UI, no code required.

When this node returns an output, it will be in the JSON format we specified. However, that does still not mean that the URL will be correct. The model might still return an incorrect url, although we try to mitigate that in the prompt.

Below is an example of the LLM output when the description is cat:

Model picked the cat picture

Now that we have the URL, I want the flow to output two values:

  • the URL: the URL as a string, not wrapped in JSON
  • the base-64 representation of the image that can we used directly in an HTML IMG tag

We use two Python tools for this and bring the results to the output node. Python tools use custom Python code:

Setting the output

The code in get_image is below:

from promptflow import tool
import json, base64, requests

def url_to_base64(image_url):
    response = requests.get(image_url)
    return 'data:image/jpg;base64,' + base64.b64encode(response.content).decode('utf-8')

@tool
def my_python_tool(image_json: str) -> str:
    url = json.loads(image_json)["url"]

    if url:
        base64_string = url_to_base64(url)
    else:
        base64_string = url_to_base64("https://placehold.co/400/jpg?text=No+image")

    return base64_string

The node executes the function that is marked with the @tool decorator and sends it the output from the LLM node. The code grabs the url and downloads and transforms the image to its base64 representation. You can see how the output from the LLM node is mapped to the image_json parameter below:

linking the function parameter to the LLM output

The code in get_url is similar. It just extracts the url as a string from the input JSON coming from the url.

The output node is the following:

Output node

The output has two properties: data (the base64-encoded image) and the url to the image. Later, in the Python code that uses this flow, the output will be a Python dict with a data and url entry.

Using the flow in your application

Although you can host this flow as an API using either an Azure Machine Learning endpoint or a Docker container, we will simply embed the flow in our Python application and call it like a regular Python function.

Here is the code, which uses Streamlit for the UI:

from promptflow import load_flow
import streamlit as st

# load Prompt Flow from parent folder
flow_path = "../."
f = load_flow(flow_path)

# Streamlit UI
st.title('Search for an image')

# User input
user_query = st.text_input('Enter your query and press enter:')

if user_query:
    # extract url from dict and wrap in img tag
    flow_result = f(description=user_query)
    image = flow_result["data"]
    url = flow_result["url"]

    img_tag = f'<a href="{url}"><img src="{image}" alt="image" width="300"></a>'
     
    # just use markdown to display the image
    st.markdown(f"🌆 Image URL: {url}")
    st.markdown(img_tag, unsafe_allow_html=True)

To load the flow in your Python app as a function:

  • import load_flow from the promptflow module
  • set a path to your flow (relative or absolute): here we load the flow that is in the parent directory that contains flow.dag.yaml.
  • use load_flow to create the function: above the function is called f

When the user enters the query, you can simply use f(description="user's query...") to obtain the output. The output is a Python dict with a data and url entry.

In Streamlit, we can use markdown to display HTML directly using unsafe_allow_html=True. The HTML is simply an <img> tag with the src attribute set to the base64 representation of the image.

Connections

Note that the flow on my system uses two connections: one to connect to OpenAI and one to connect to Azure AI Search. By default, Prompt Flow stores these connections in a SQLite database in the .promptflow folder of your home folder. This means that the Streamlit app work on my machine but will not work anywhere else.

To solve this, you can override the connections in your app. See https://github.com/microsoft/promptflow/blob/main/examples/tutorials/get-started/flow-as-function.ipynb for more information about these overrides.

Conclusion

Embedding a flow as a function in a Python app is one of the easiest ways to use a flow in your applications. Although we used a straightforward Streamlit app here, you could build a FastAPI server that provides endpoints to multiple flows from one API. Such an API can easily be hosted as a container on Container Apps or Kubernetes as part of a larger application.

Give it a try and let me know what you think! 😉

Building an Azure AI Search index with a custom skill

In this post, we will take a look at building an Azure AI Search index with a custom skill. We will use the Azure AI Search Python SDK to do the following:

  • create a search index: a search index contains content to be searched
  • create a data source: a datasource tells an Azure AI Search indexer where to get input data
  • create a skillset: a skillset is a collection of skills that process the input data during the indexing process; you can use built-in skills but also build your own skills
  • create an indexer: the indexer creates a search index from input data in the data source; it can transform the data with skills

If you are more into videos, I already created a video about this topic. In the video, I use the REST API to define the resources above. In this post, I will use the Python SDK.

Azure AI Search with custom GPT-4 vision skill

What do we want to achieve?

We want to build an application that allows a user to search for images with text or a similar image like in the diagram below:

Search application

The application uses an Azure AI Search index to provide search results. An index is basically a collection of JSON documents that can be searched with various techniques.

The input data to create the index is just a bunch of .jpg files in Azure Blob Storage. The index will need fields to support the two different types of searches (text and image search):

  • a text description of the image: we will need to generate the description from the image; we will use GPT-4 Vision to do so; the description supports keyword-based searches
  • a text vector of the description: with text vectors, we can search for descriptions similar to the user’s query; it can provide better results than keyword-based searches alone
  • an image vector of the image: with image vectors, we can supply an image and search for similar images in the index

I described building this application in a previous blog post. In that post, we pushed the index content to the index. In this post, we create an indexer that pulls in the data, potentially on a schedule. Using an indexer is recommended.

Creating the index

If you have an Azure subscription, first create an Azure AI Search resource. The code we write requires at least the basic tier.

Although you can create the index in the portal, we will create it using the Python SDK. At the time of writing (December 2023), you have to use a preview version of the SDK to support integrated vectorization. The notebook we use contains instructions about installing this version. The notebook is here: https://github.com/gbaeke/vision/blob/main/image_index/indexer-sdk.ipynb

The notebook starts with the necessary imports and also loads environment variables via a .env file. See the README of the repo to learn about the required variables.

To create the index, we define a blog_index function that returns an index definition. Here’s the start of the function:

def blog_index(name: str):
    fields = [
        SearchField(name="path", type=SearchFieldDataType.String, key=True),
        SearchField(name="name", type=SearchFieldDataType.String),
        SearchField(name="url", type=SearchFieldDataType.String),
        SearchField(name="description", type=SearchFieldDataType.String),
        SimpleField(name="enriched", type=SearchFieldDataType.String, searchable=False),  
        SearchField(
            name="imageVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1024,
            vector_search_profile="myHnswProfile"
        ),
        SearchField(
            name="textVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1536,
            vector_search_profile="myHnswProfile"
        ),
    ]

Above, we define an array of fields for the index. We will have 7 fields. The first three fields will be retrieved from blob storage metadata:

  • path: base64-encoded url of the file; will be used as unique key
  • name: name of the file
  • url: full url of the file in Azure blob storage

The link between these fields and the metadata is defined in the indexer we will create later.

Next, we have the description field. We will generate the image description via GPT-4 Vision during indexing. The indexer will use a custom skill to do so.

The enriched field is there for debugging. It will show the enrichments by custom or built-in skills. You can remove that field if you wish.

To finish, we have vector fields. These fields are designed to hold arrays of a specific size:

  • imageVector: a vector field that can hold 1024 values; the image vector model we use outputs 1024 dimensions
  • textVector: a vector field that can hold 1536 values; the text vector model we use outputs that number of dimensions

Note that the vector fields references a search profile. We create that in the next block of code in the blog_index function:

vector_config = VectorSearch(  
        algorithms=[  
            HnswVectorSearchAlgorithmConfiguration(  
                name="myHnsw",  
                kind=VectorSearchAlgorithmKind.HNSW,  
                parameters=HnswParameters(  
                    m=4,  
                    ef_construction=400,  
                    ef_search=500,  
                    metric=VectorSearchAlgorithmMetric.COSINE,  
                ),  
            ),  
            ExhaustiveKnnVectorSearchAlgorithmConfiguration(  
                name="myExhaustiveKnn",  
                kind=VectorSearchAlgorithmKind.EXHAUSTIVE_KNN,  
                parameters=ExhaustiveKnnParameters(  
                    metric=VectorSearchAlgorithmMetric.COSINE,  
                ),  
            ),  
        ],  
        profiles=[  
            VectorSearchProfile(  
                name="myHnswProfile",  
                algorithm="myHnsw",  
                vectorizer="myOpenAI",  
            ),  
            VectorSearchProfile(  
                name="myExhaustiveKnnProfile",  
                algorithm="myExhaustiveKnn",  
                vectorizer="myOpenAI",  
            ),  
        ],  
        vectorizers=[  
            AzureOpenAIVectorizer(  
                name="myOpenAI",  
                kind="azureOpenAI",  
                azure_open_ai_parameters=AzureOpenAIParameters(  
                    resource_uri="AZURE_OPEN_AI_RESOURCE",  
                    deployment_id="EMBEDDING_MODEL_NAME",  
                    api_key=os.getenv('AZURE_OPENAI_KEY'),  
                ),  
            ),  
        ],  
    )

Above, vector_config is an instance of the VectorSearch object, which holds algorithms, profiles and vectorizers:

  • algorithms: Azure AI search supports both HNSW and exhaustive to search for nearest neighbors to an input vector; above, both algorithms are defined; they both use cosine similarity as the distance metric
  • vectorizers: this defines the integrated vectorizer and points to an Azure OpenAI resource and embedding model. You need to deploy that model in Azure OpenAI and give it a name; at the time of writing (December 2023), this feature was in public preview
  • profiles: a profile combines an algorithm and a vectorizer; we create two profiles, one for each algorithm; the vector fields use the myHnswProfile profile.

Note: using HNSW on a vector field, designed to perform approximate nearest neighbor searches, still allows you to do an exhaustive search; the notebook contains sample searches at the bottom, which use exhaustive searches to search the entire vector space; note that the reverse is not possible (using HNSW when index on field is set as exhaustive).

We finish the function with the code below:

    semantic_config = SemanticConfiguration(  
        name="my-semantic-config",  
        prioritized_fields=PrioritizedFields(  
            prioritized_content_fields=[SemanticField(field_name="description")]  
        ),  
    )

    semantic_settings = SemanticSettings(configurations=[semantic_config])

    return SearchIndex(name=name, fields=fields, vector_search=vector_config, semantic_settings=semantic_settings)

Above, we specify a semantic_config. It is used to inform the semantic reranker abiut the fields in our index with valuable data. Here, we use the description field. The config is used to create an instance of type Semantic_Settings. You also have to enable the semantic reranker in Azure AI Search to enable this feature.

The function ends by returning an instance of type SearchIndex, which contains the fields array, the vector configuration and the semantic configuration.

Now we can use the output of this function to create the index:

service_endpoint = "https://acs-geba.search.windows.net"
index_name = "images-sdk"
key = os.getenv("AZURE_AI_SEARCH_KEY")


index_client = SearchIndexClient(service_endpoint, AzureKeyCredential(key))
search_client = SearchClient(service_endpoint, index_name, AzureKeyCredential(key))
index = blog_index(index_name)

# create the index
try:
    index_client.create_or_update_index(index)
    print("Index created or updated successfully")
except Exception as e:
    print("Index creation error", e)

The important part here is the creation of a SearchIndexClient that authenticates to our Azure AI Search resource. We use that client to create_or_update our index. That function requires a SearchIndex parameter, provided by the blog_index function.

When that call succeeds, you should see the index in the portal. Text and vector fields are searchable.

Index in the portal

The vector profiles should be present:

Vector profiles

Click on an algorithm or vectorizer. It should match the definition in our code.

Now we can define the data source, skillset and indexer.

Data source

Our images are stored in Azure Blob Storage. The data source needs to point to that resource and specify a container. We can use the following code:

# Create a data source 
ds_client = SearchIndexerClient(service_endpoint, AzureKeyCredential(key))
container = SearchIndexerDataContainer(name="images")
data_source_connection = SearchIndexerDataSourceConnection(
    name=f"{index_name}-blob",
    type="azureblob",
    connection_string=os.getenv("STORAGE_CONNNECTION_STRING"),
    container=container
)
data_source = ds_client.create_or_update_data_source_connection(data_source_connection)

print(f"Data source '{data_source.name}' created or updated")

The code is pretty self-explanatory. The data source is shown in the portal as below:

Azure AI Search data source

Skillset with two skills

Before we create the indexer, we define a skillset with two skills:

  • AzureOpenAIEmbeddingSkill: a built-in skill that uses an Azure OpenAI embedding model and takes text as input; it returns a vector (embedding) of 1536 dimensions; this skill is not free; you will be billed for the vectors you create via your Azure OpenAI resource
  • WebApiSkill: a custom skill that points to an endpoint that you need to build and host; you define the inputs and outputs of the custom skill; my custom skill runs in Azure Container Apps but it can run anywhere. Often, skills are implemented as an Azure Function.

The code starts as follows:

skillset_name = f"{index_name}-skillset"

embedding_skill = AzureOpenAIEmbeddingSkill(  
    description="Skill to generate embeddings via Azure OpenAI",  
    context="/document",  
    resource_uri="https://OPEN_AI_RESOURCE.openai.azure.com",  
    deployment_id="DEPLOYMENT_NAME_OF_EMBEDDING MODEL",  
    api_key=os.getenv('AZURE_OPENAI_KEY'),  
    inputs=[  
        InputFieldMappingEntry(name="text", source="/document/description"),  
    ],  
    outputs=[  
        OutputFieldMappingEntry(name="embedding", target_name="textVector")  
    ],  
)

Above, we define the skillset and the embedding_skill. The AzureOpenAIEmbeddingSkill points to a deployed text-embedding-ada-002 embedding model. Use the name of your deployment, not the model name.

A skillset operates within a context. The context above is the entire document (/document) but that’s not necessarily the case for other skills. The input to the embedding skill is our description field (/document/description). The output will be a vector. The target_name above is some sort of a temporary name used during the so-called enrichment process of the indexer. We will need to configure the indexer to write this field to the index.

The question is: “Where does the description come from?”. The description comes from the WebApiSkill. Because the embedding skill needs the description field generated by the WebApiSkill, the WebApiSkill will run first. Here is the custom web api skill:

custom_skill = WebApiSkill(
    description="A custom skill that creates an image vector and description",
    uri="YOUR_ENDPOINT",
    http_method="POST",
    timeout="PT60S",
    batch_size=4,
    degree_of_parallelism=4,
    context="/document",
    inputs=[
        InputFieldMappingEntry(name="url", source="/document/url"),
    ],
    outputs=[
        OutputFieldMappingEntry(name="embedding", target_name="imageVector"),
        OutputFieldMappingEntry(name="description", target_name="description"),
    ],
)

The input to the custom skill is the url to our image. That url is posted to the endpoint you define in the uri field. You can control how many inputs are sent in one batch and how many batches are sent concurrently. The inputs have to be sent in a specific format.

This skill also operates at the document level and creates two new fields. The contents of those fields are generated by your custom endpoint and returned as embedding and description. They are mapped to imageVector and description. Again, those fields are temporary and need to be written to the index by the indexer.

To see the code of the custom skill, check https://github.com/gbaeke/vision/tree/main/img_vector_skill. That skill is written for demo purposes and was not thoroughly vetted to be used in production. Use at your own risk. In addition, GPT-4 Vision requires an OpenAI key (not Azure OpenAI) and currently (December 2023) allows 100 calls per day! You currently cannot use this at scale. Azure also provides image captioning models that might fit the purpose.

Now we can create the skillset:

skillset = SearchIndexerSkillset(  
    name=skillset_name,  
    description="Skillset to generate embeddings",  
    skills=[embedding_skill, custom_skill],  
)

client = SearchIndexerClient(service_endpoint, AzureKeyCredential(key))
client.create_or_update_skillset(skillset)
print(f"Skillset '{skillset.name}' created or updated")

The above code results in the following:

skllset with two skills

Indexer

The indexer is the final piece of the puzzle and brings the data source, index and skillset together:

indexer_name = f"{index_name}-indexer"

indexer = SearchIndexer(  
    name=indexer_name,  
    description="Indexer to index documents and generate description and embeddings",  
    skillset_name=skillset_name,  
    target_index_name=index_name,
    parameters=IndexingParameters(
        max_failed_items=-1
    ),
    data_source_name=data_source.name,  
    # Map the metadata_storage_name field to the title field in the index to display the PDF title in the search results  
    field_mappings=[
        FieldMapping(source_field_name="metadata_storage_path", target_field_name="path", 
            mapping_function=FieldMappingFunction(name="base64Encode")),
        FieldMapping(source_field_name="metadata_storage_name", target_field_name="name"),
        FieldMapping(source_field_name="metadata_storage_path", target_field_name="url"),
    ],
    output_field_mappings=[
        FieldMapping(source_field_name="/document/textVector", target_field_name="textVector"),
        FieldMapping(source_field_name="/document/imageVector", target_field_name="imageVector"),
        FieldMapping(source_field_name="/document/description", target_field_name="description"),
    ],
)

indexer_client = SearchIndexerClient(service_endpoint, AzureKeyCredential(key))  
indexer_result = indexer_client.create_or_update_indexer(indexer)  

Above, we create an instance of type SearchIndexer and set the indexer’s name, the data source name, the skillset name and the target index.

The most important parts are the field mappings and the output field mappings.

Field mappings take data from the indexer’s data source and map them to a field in the index. In our case, that’s content and metadata from Azure Blob Storage. The metadata fields in the code above are described in the documentation. In a field mapping, you can configure a mapping function. We use the base64Encode mapping function for the path field.

Output field mappings take new fields created during the enrichment process and map them to fields in the index. You can see that the fields created by the skills are mapped to fields in the index. Without these mappings, the skillsets would generate the data internally but the data would never appear in the index.

Once the indexer is defined, it gets created (or updated) using an instance of type SearchIndexerClient.

Note that we set a parameter in the index, max_failed_items, to -1. This means that the indexer process keeps going, no matter how many errors it produces. In the indexer screen below, you can see there was one error:

Indexer with one error

The error happened because the image vectorizer in the custom web skill threw an error on one of the images.

Using an indexer has several advantages:

  • Indexing is a background process and can run on a schedule; there is no need to schedule your own indexing process
  • Indexers keep track of what they indexed and can index only new data; with your own code, you have to maintain that state; failed documents like above are not reprocessed
  • Depending on the source, indexers see deletions and will remove entries from the index
  • Indexers can be easily reset to trigger a full index
  • Indexing errors are reported and errors can be sent to a debugger to inspect what went wrong

Testing the index

We can test the index by performing a text-based search that uses the integrated vectorizer:

# Pure Vector Search
query = "city"  
  
search_client = SearchClient(service_endpoint, index_name, credential=AzureKeyCredential(key))
vector_query = VectorizableTextQuery(text=query, k=1, fields="textVector", exhaustive=True)

  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["name", "description", "url"],
    top=1
)  

# print selected fields from dictionary
for result in results:
    print(result["name"])
    print(result["description"])
    print(result["url"])
    print("")

Above, we search for city (in the query variable). The VectorizableTextQuery class (in preview) takes the plain text in the query variable and vectorizes it for us with the embedding model defined in the integrated vectorizer. In addition, we specify how many results to return (1 nearest neighbors) and that we want to search all vectors (exhaustive).

Note: remember that the vector field was configured for HNSW; we can switch to exhaustive as shown above

Next, search_client.search performs the actual search. It only provides the vector query, which results in a pure similarity search with the query vector. search_text is set to None. Set search string to the query if you want to do a hybrid search. The notebook contains additional examples that also does a keyword and semantic search with highlighting.

The search gives the following result (selected fields: name, description, url):

city.jpg
This is an image of the London skyline, featuring a mix of modern skyscrapers and historical buildings. Prominent among the skyscrapers are the Leadenhall Building, also known as the "Cheesegrater," and the rounded, distinctive shape of 30 St Mary Axe, commonly referred to as "The Gherkin." Further in the background, the towers of Canary Wharf can be seen. The view is clear and taken on a day with excellent visibility.
https://stgebaoai883.blob.core.windows.net/images/city.jpg

The image the URL points to is:

yep, a city (London)

In the repo’s search-client folder, you can find a Streamlit app to search for and display images and dump the entire search result object. Make sure you install all the packages in requirements.txt and the preview Azure AI Search package from the whl folder. Simply type streamlit run app.py to run the app:

Streamlit Query app

Conclusion

In this post, we demonstrated the use of the Azure AI Search Python SDK to create an indexer that takes images as input, create new fields with skills, and write those fields + metadata to an index.

We touched on the advantages of using an indexer versus your own indexing code (pull versus push).

With this code and some sample images, you should be able to build an image search application yourself.

Finding images with text and image queries with the help of GPT-4 Vision

With the gpt-4-vision-preview model available at OpenAI, it was time to build something with it. I decided to use it as part of a quick solution that can search for images with text, or by providing a similar image.

We will do the following:

  • Describe a collection of images. To generate the description, GPT-4 Vision is used
  • Create a text embedding of the description with the text-embedding-ada-002 model
  • Create an image embedding using the vectorizeImage API, part of Azure AI Computer Vision
  • Save the description and both embeddings to an Azure AI Search index
  • Search for images with either text or a similar image

The end result should be that when I search for desert plant, I get an image of a cactus or similar plant. When I provide a picture of an apple, I should get an apple or other fruit as a result. It’s basically Google image and reverse image search.

Let’s see how it works and if it is easy to do. The code is here: https://github.com/gbaeke/vision. The main code is in a Jupyter notebook in the image_index folder.

A word on vectors and vectorization

When we want to search for images using text or find similar images, we use a technique that involves turning both text and images into a form that a computer can understand easily. This is done by creating vectors. Think of vectors as a list of numbers that describe the important parts of a text or an image.

For text, we use a tool called ‘text-embedding-ada-002’ which changes the words and sentences into a vector. This vector is like a unique fingerprint of the text. For images, we use something like Azure’s multi-modal embedding API, which does the same thing but for pictures. It turns the image into a vector that represents what’s in the picture.

After we have these vectors, we store them in a place where they can be searched. We will use Azure AI Search. When you search, the system looks for the closest matching vectors – it’s like finding the most similar fingerprint, whether it’s from text or an image. This helps the computer to give you the right image when you search with words or find images that are similar to the one you have.

Getting a description from an image

Although Azure has Computer Vision APIs to describe images, GPT-4 with vision can do the same thing. It is more flexible and easier to use because you have the ability to ask for what you want with a prompt.

To provide an image to the model, you can provide a URL or the base64 encoding of an image file. The code below uses the latter approach:

def describe_image(image_file: str) -> str:
    with open(f'{image_file}', 'rb') as f:
        image_base64 = base64.b64encode(f.read()).decode('utf-8')
        print(image_base64[:100] + '...')

    print(f"Describing {image_file}...")

    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe the image in detail"},
                {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image_base64}",
                },
                },
            ],
            }
        ],
        max_tokens=500,  # default max tokens is low so set higher
    )

    return response.choices[0].message.content

As usual, the OpenAI API is very easy to use. Above, we open and read the image and base64-encode it. The base64 encoded file is provided in the url field. A simple prompt is all you need to get the description. Let’s look at the result for the picture below:

The generated description is below:

The image displays a single, healthy-looking cactus planted in a terracotta-colored pot against a pale pink background. The cactus is elongated, predominantly green with some modest blue hues, and has evenly spaced spines covering its surface. The spines are white or light yellow, quite long, and arranged in rows along the cactus’s ridges. The pot has a classic, cylindrical shape with a slight lip at the top and appears to be a typical pot for houseplants. The overall scene is minimalistic, with a focus on the cactus and the pot due to the plain background, which provides a soft contrast to the vibrant colors of the plant and its container.

Description generated by GPT-4 Vision

Embedding of the image

To create an embedding for the image, I decided to use Azure’s multi-modal embedding API. Take a look at the code below:

def get_image_vector(image_path: str) -> list:
    # Define the URL, headers, and data
    url = "https://AI_ACCOUNT.cognitiveservices.azure.com//computervision/retrieval:vectorizeImage?api-version=2023-02-01-preview&modelVersion=latest"
    headers = {
        "Content-Type": "application/octet-stream",
        "Ocp-Apim-Subscription-Key": os.getenv("AZURE_AI_KEY")
    }

    with open(image_path, 'rb') as image_file:
        # Read the contents of the image file
        image_data = image_file.read()

    print(f"Getting vector for {image_path}...")

    # Send a POST request
    response = requests.post(url, headers=headers, data=image_data)

    # return the vector
    return response.json().get('vector')

The code uses an environment variable to get the key to an Azure AI Services multi-service endpoint. Check the README.md in the repository for a sample .env file.

The API generates a vector with 1024 dimensions. We will need that number when we create the Azure AI Search index.

Note that this API can accept a url or the raw image data (not base64-encoded). Above, we provide the raw image data and set the Content-Type properly.

Generating the data to index

In the next step, we will get all .jpg files from a folder and do the following:

  • create the description
  • create the image vector
  • create the text vector of the description

Check the code below for the details:

# get all *.jpg files in the images folder
image_files = [file for file in os.listdir('./images') if file.endswith('.jpg')]

# describe each image and store filename and description in a list of dicts
descriptions = []
for image_file in image_files:
    try:
        description = describe_image(f"./images/{image_file}")
        image_vector = get_image_vector(f"./images/{image_file}")
        text_vector = get_text_vector(description)
        
        descriptions.append({
            'id': image_file.split('.')[0], # remove file extension
            'fileName': image_file,
            'imageDescription': description,
            'imageVector': image_vector,
            'textVector': text_vector
        })
    except Exception as e:
        print(f"Error describing {image_file}: {e}")

# print the descriptions but only show first 5 numbers in vector
for description in descriptions:
    print(f"{description['fileName']}: {description['imageDescription'][:50]}... {description['imageVector'][:5]}... {description['textVector'][:5]}...")

The important part is the descriptions list, which is a list of JSON objects with fields that match the fields in the Azure AI Search index we will build in the next step.

The text vector is calculated with the get_text_vector function. It uses OpenAI’s text-embedding-ada-002 model.

Building the index

The code below uses the Azure AI Search Python SDK to build and populate the index in code. You.will need an AZURE_AI_SEARCH_KEY environment variable to authenticate to your Azure AI Search instance.

def blog_index(name: str):
    from azure.search.documents.indexes.models import (
        SearchIndex,
        SearchField,
        SearchFieldDataType,
        SimpleField,
        SearchableField,
        VectorSearch,
        VectorSearchProfile,
        HnswAlgorithmConfiguration,
    )

    fields = [
        SimpleField(name="Id", type=SearchFieldDataType.String, key=True), # key
        SearchableField(name="fileName", type=SearchFieldDataType.String),
        SearchableField(name="imageDescription", type=SearchFieldDataType.String),
        SearchField(
            name="imageVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1024,
            vector_search_profile_name="vector_config"
        ),
        SearchField(
            name="textVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1536,
            vector_search_profile_name="vector_config"
        ),

    ]

    vector_search = VectorSearch(
        profiles=[VectorSearchProfile(name="vector_config", algorithm_configuration_name="algo_config")],
        algorithms=[HnswAlgorithmConfiguration(name="algo_config")],
    )
    return SearchIndex(name=name, fields=fields, vector_search=vector_search)

#  create the index
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.models import VectorizedQuery

service_endpoint = "https://YOUR_SEARCH_INSTANCE.search.windows.net"
index_name = "image-index"
key = os.getenv("AZURE_AI_SEARCH_KEY")

index_client = SearchIndexClient(service_endpoint, AzureKeyCredential(key))
index = blog_index(index_name)

# create the index
try:
    index_client.create_index(index)
    print("Index created")
except Exception as e:
    print("Index probably already exists", e)

The code above creates an index with some string fields and two vector fields:

  • imageVector: 1024 dimensions (as defined by the Azure AI Computer Vision image embedder)
  • textVector: 1536 dimensions (as defined by the OpenAI embedding model)

Although not specified in the code, the index will use cosine similarity to perform similarity searches. It’s the default. It will return approximate nearest neighbour (ANN) results unless you create a search client that uses exhaustive search. An exhaustive search searches the entire vector space. The queries near the end of this post use the exhaustive setting.

When the index is created, we can upload documents:

# now upload the documents
try:
    search_client = SearchClient(service_endpoint, index_name, AzureKeyCredential(key))
    search_client.upload_documents(descriptions)
    print("Documents uploaded successfully")
except Exception as e:
    print("Error uploading documents", e)

The upload_documents method uploads the documents in the descriptions Python list to the search index. The upload is actually an upsert. You can run this code multiple times without creating duplicate documents in the index.

Search images with text

To search an image with a text description, a vector query on the textVector is used. The function below takes a text query string as input, vectorizes the query, and performs a similarity search returning the first nearest neighbour. The function displays the description and the image in the notebook:

# now search based on text
def single_vector_search(query: str):
    vector_query = VectorizedQuery(vector=get_text_vector(query), k_nearest_neighbors=1, fields="textVector", exhaustive=True)

    results = search_client.search(
        vector_queries=[vector_query],
        select=["fileName", "imageDescription"],
        
    )

    for result in results:
        print(result['fileName'], result["imageDescription"], sep=": ")

        # show the image
        from IPython.display import Image
        display(Image(f"./images/{result['fileName']}"))
    
single_vector_search("desert plant")

The code searches for an image based on the query desert plant. It returns the picture of the cactus shown earlier. Note that if you search for something there is no image for, like blue car, you will still get a result because we always return a nearest neighbor. Even if your nearest neighbor lives 100km away, it’s still your nearest neighbor. 😀

Return similar images

Since our index contains an image vector, we can search for images similar to a vector of a reference image. The function below takes an image file path as input, calculates the vector for that image, and performs a nearest neighbor search. The function displays the description and image of each document returned. In this case, the code returns two similar documents:

def image_search(image_file: str):

    vector_query = VectorizedQuery(vector=get_image_vector(image_file), k_nearest_neighbors=2, fields="imageVector", exhaustive=True)

    results = search_client.search(
        vector_queries=[vector_query],
        select=["fileName", "imageDescription"],
    )

    for result in results:
        print(result['fileName'], result["imageDescription"], sep=": ")

        # show the image
        from IPython.display import Image
        display(Image(f"./images/{result['fileName']}"))
 
# get vector of another image and find closest match
image_search('rotten-apple.jpg')
image_search('flower.jpeg')

At the bottom, the function is called with filenames of pictures that contain a rotten apple and a flower. The result of the first query is a picture of the apple and banana. The result of the second query is the cactus and the rose. You can debate whether the cactus should be in the results. Some cacti have flowers but some don’t. 😀

Conclusion

The GPT-4 Vision API, like most OpenAI APIs, is very easy to use. In this post, we used it to generate image descriptions to build a simple query engine that can search for images via text or a reference image. Together with their text embedding API and Microsoft’s multi-modal embedding API to create an image embedding, it is relatively straightforward to build these type of systems.

As usual, this is a tutorial with quick sample code to illustrate the basic principle. If you need help building these systems in production, simply reach out. Help is around the corner! 😉

Using Integrated Vectorization in Azure AI Search

The vector search capability of Azure AI Search became generally available mid November 2023. With that release, the developer is responsible for creating embeddings and storing them in a vector field in the index.

However, Microsoft also released integrated vectorization in preview. Integrated vectorization is useful in two ways:

  • You can define a vectorizer in the index schema. It can be used to automatically convert a query to a vector. This is useful in the Search Explorer in the portal but can also be used programmatically.
  • You can use an embedding skill for your indexer that automatically vectorizes index fields for you.

First, let’s look at defining a vectorizer in the index definition and using it in the portal for search.

Vector search in the portal

Below is a screenshot of an index with a title and a titleVector field. The index stores information about movies:

Index with a vector field

The integrated vectorizer is defined in the Vector profiles section:

Vector profile

When you add the profile, you configure the algorithm and vectorizer. The vectorizer simply points to an embedding model in Azure OpenAI. For example:

Vectorizer

Note: it’s recommended to use managed identity

Now, from JSON View in Search Explorer, you can perform a vector search. If you see a search field at the top, you can remove that. It’s for full-text search.

Vector search in the portal

Above, the query commencement is converted to a vector by the integrated vectorizer. The vector search comes up with Inception as the first match. I am not sure if you would want to search for movies this way but it proves the point. 😛

Using an embedding skill during indexing

Suppose you have several JSON documents about movies. Below is one example:

{
    "title": "Inception",
    "year": 2010,
    "director": "Christopher Nolan",
    "genre": ["Action", "Adventure", "Sci-Fi"],
    "starring": ["Leonardo DiCaprio", "Joseph Gordon-Levitt", "Ellen Page"],
    "imdb_rating": 8.8
  }

When you have a bunch of these files in Azure Blob Storage, you can use the Import Data wizard to construct an index from these files.

Import Data Wizard

This wizard, at the time of writing, does not create vectors for you. There is another wizard, Import and vectorize data, but it will treat the JSON as any document and store it in a content field. A vector is created from the content field.

We will stick to the first wizard. It will do several things:

  • create a data source to access the JSON documents in an Azure Storage Account container
  • infer the schema from the JSON files
  • propose an index definition that you can alter
  • create an indexer that indexes the documents on the schedule that you set
  • add skills like entity extraction; select a simple skill here like translation so you are sure there will be a skillset that the indexer will use

In the step to customize the index definition, ensure you make fields searchable and retrievable as needed. In addition, define a vector field. In my case, I created a titleVector field:

titleVector

When the wizard is finished, the indexer will run and populate the index. Of course, the titleVector field will be empty because there is no process in place that calculates the vectors during indexing.

Let’s fix that. In Skillsets, go the the skillset created by the wizard and click it.

Skillset created by the wizard

Replace the Skillset JSON definition with the content below and change resourceUri, apiKey and deploymentId as needed. You can also add the embedding skill to the existing array of skills if you want to keep them.

{
  "@odata.context": "https://acs-geba.search.windows.net/$metadata#skillsets/$entity",
  "@odata.etag": "\"0x8DBF01523E9A94D\"",
  "name": "azureblob-skillset",
  "description": "Skillset created from the portal. skillsetName: azureblob-skillset; contentField: title; enrichmentGranularity: document; knowledgeStoreStorageAccount: ;",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "embed",
      "description": null,
      "context": "/document",
      "resourceUri": "https://OPENAI_INSTANCE.openai.azure.com",
      "apiKey": "AZURE_OPENAI_KEY",
      "deploymentId": "EMBEDDING_MODEL",
      "inputs": [
        {
          "name": "text",
          "source": "/document/title"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "titleVector"
        }
      ],
      "authIdentity": null
    }
  ],
  "cognitiveServices": null,
  "knowledgeStore": null,
  "indexProjections": null,
  "encryptionKey": null
}

Above, we want to embed the title field in our document and create a vector for it. The context is set to /document which means that this skill is executed for each document once.

Now save the skillset. This skill on its own will create the vectors but will not save them in the index. You need to update the indexer to write the vector to a field.

Let’s navigate to the indexer:

Indexer

Click the indexer and go to the Indexer Definition (JSON) tab. Ensure you have an outputFieldMappings section like below:

{
  "@odata.context": "https://acs-geba.search.windows.net/$metadata#indexers/$entity",
  "@odata.etag": "\"0x8DBF01561D9E97F\"",
  "name": "movies-indexer",
  "description": "",
  "dataSourceName": "movies",
  "skillsetName": "azureblob-skillset",
  "targetIndexName": "movies-index",
  "disabled": null,
  "schedule": null,
  "parameters": {
    "batchSize": null,
    "maxFailedItems": 0,
    "maxFailedItemsPerBatch": 0,
    "base64EncodeKeys": null,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "parsingMode": "json"
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "metadata_storage_path",
      "mappingFunction": {
        "name": "base64Encode",
        "parameters": null
      }
    }
  ],
  "outputFieldMappings": [
    {
      "sourceFieldName": "/document/titleVector",
      "targetFieldName": "titleVector"
    }
  ],
  "cache": null,
  "encryptionKey": null
}

Above, we map the titleVector enrichment (think of it as something temporary during indexing) to the real titleVector field in the index.

Reset and run the indexer

Reset the indexer so it will index all documents again:

Resetting the indexer

Next, click the Run button to start the indexing process. When it finishes, do a search with Search Explorer and check that there are vectors in the titleVector field. It’s an array of 1536 floating point numbers.

Conclusion

Integrated vectorization is a welcome extra feature in Azure AI Search. Using it in searches is very easy, especially in the portal.

Using the embedding skill is a bit harder, because you need to work with skillset and indexer definitions in JSON and you have to know exactly what you have to add. But once you get it right, the indexer does all the vectorization work for you.

Creating a custom GPT to query any knowledge base with actions

A while ago, OpenAI introduced GPTs. A GPT is a custom version of ChatGPT that combine instructions, extra knowledge, and any combination of skills.

In this tutorial, we are going to create a custom GPT that can answer questions about articles on this blog. In order to achieve that, we will do the following:

  • create an Azure AI Search index
  • populate the index with content of the last 50 blog posts (via its RSS feed)
  • create a custom API with FastAPI (Python) that uses the Azure OpenAI “add your data” APIs to provide relevant content to the user’s query
  • add the custom API as an action to the custom GPT

The image below shows the properties of the GPT. You need to be a ChatGPT Plus subscriber to create a GPT.

Part of the custom GPT definition

To implement a custom action for the GPT, you need an API with an OpenAPI spec. When you use FastAPI, an OpenAPI JSON document can easily be downloaded and provided to the GPT. You will need to modify the JSON document with a servers section to specify the URL the GPT has to use.

In what follows, we will look at all of the different pieces that make this work. Beware: long post! 😀

Azure AI Search Index

Azure AI Search is a search service you create in Azure. Although there is a free tier, I used the basic tier. The basic tiers allows you to use its semantic reranker to optimise search results.

To create the index and populate it with content, I used the following notebook: https://github.com/gbaeke/custom-gpt/blob/main/blog-index/website-index.ipynb.

The result is an index like below:

Index in Azure AI Search

The index contains 292 documents although I only retrieve the last 50 blog posts. This is the result of chunking each post into smaller pieces of about 500 tokens with 100 tokens of overlap for each chunk. We use smaller chunks because we do not want to send entire blog posts as content to the large language model (LLM).

Note that the index supports similarity searches using vectors. The contentVector field contains the OpenAI embedding of the text in the content field.

Although vectors are available, we do not have to use vector search. Azure AI search supports simple keyword search as well. Together with the semantic ranker, it can provide more relevant results than keyword search on its own.

Note: in general, vector search will provide better results, especially when combined with keyword search and the semantic ranker

Use the index with Azure OpenAI “add your data”

I have written about the Azure OpenAI “add your data” features before. It provides a wizard experience to add an Azure AI Search index to the Azure OpenAI playground and directly test your index with the model of your choice.

From you Azure OpenAI instance, first open Azure OpenAI Studio:

Go to OpenAI Studio from the Overview page of your Azure OpenAI instance

Note: you still need to complete a form to get access to Azure OpenAI. Currently, it can take around a day before you are allowed to create Azure OpenAI instances in your subscription.

In Azure OpenAI Studio, click Bring your own data from the Home screen:

Bring your own data

Select the Azure AI Search index and click Next.

Azure AI Search index selection

Note: I created the index using the generally available API that supports vector search. The Add your data wizard, at the time of writing, was not updated yet to support these new indexes. That is the reason why vector search cannot be enabled. We will use keyword + semantic search instead. I expect this functionality to be available soon (November/December 2023).

Next, provide field mappings:

Field Mappings

These mappings are required because the Add your data feature excepts these standard fields. You should have at least a content field to search. Above, I do not have a file name field because I have indexed blog posts. It’s ok to leave that field blank.

After clicking Next, we get to data management:

Data Management

Here, we specify the type of search. Semantic means keyword + semantic. In the dropdown list, you can also select keyword search on its own. However, that might give you less relevant results.

Note: for Semantic to work, you need to turn on the Semantic ranker on the Azure AI Search resource. Additionally, you need to create a semantic profile on the index.

Now you can click Next, followed by Save and close. The Azure OpenAI Chat Playground appears with the index added:

Index added as a data source

You can now start chatting with your data. Select a chat model like gpt-4 or gpt-35-turbo. In Azure OpenAI, you have to deploy these models first and give the deployment a name.

Chat session with your data

Above, I asked about the OpenAI Assistants API, which is one of the posts on my blog. In the background, the playground performs a search on the Azure AI Search index and provides the results as context to the model. The gpt-35-turbo model answers the user’s question, based on the context coming from the index.

When you are happy with the result, you can export this experience to an Azure Web App of CoPilot Studio (Power Virtual Agents):

Export the “chat with data” experience

In our case, we want to use this configuration from code and provide an API we can add to the custom GPT.

⚠️ It’s import to realise that, with this approach, we will send the final answer, generated by an Azure OpenAI model, to the custom GPT. An alternate approach would be to hand the results of the Azure AI Search query to the custom GPT and let it formulate the answer on its own. That would be faster and less costly. If you also provide the blog post’s URL, ChatGPT can refer to it. However, the focus here is on using any API with a custom GPT so let’s continue with the API that uses the “add your data” APIs.

If you want to hand over Azure AI search results directly to ChatGPT, check out the code in the azure-ai-search folder in the Github repo.

Creating the API

To create an API that uses the index with the model, as configured in the playground, we can use some code. In fact, the playground provides sample code to work with:

Sample code from the playground

‼️ Sadly, this code will not work due to changes to the openai Python package. However, the principle is still the same:

  • call the chat completion extension API which is specific to Azure; in the code you will see this is as a Python f-string: f"{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}"
  • the JSON payload for this API needs to include the Azure AI Search configuration in a dataSources array.

The extension API will query Azure AI Search for you and create the prompt for the chat completion with context from the search result.

To create a FastAPI API that does this for the custom GPT, I decided to not use the openai package and simply use the REST API. Here is the code:

from fastapi import FastAPI, HTTPException, Depends, Header
from pydantic import BaseModel
import httpx, os
import dotenv
import re

# Load environment variables
dotenv.load_dotenv()

# Initialize FastAPI app
app = FastAPI()

# Constants (replace with your actual values)
api_base = "https://oa-geba-france.openai.azure.com/"
api_key = os.getenv("OPENAI_API_KEY")
deployment_id = "gpt-35-turbo"
search_endpoint = "https://acs-geba.search.windows.net"
search_key = os.getenv("SEARCH_KEY")
search_index = "blog"
api_version = "2023-08-01-preview"

# Pydantic model for request body
class RequestBody(BaseModel):
    query: str

# Define the API key dependency
def get_api_key(api_key: str = Header(None)):
    if api_key is None or api_key != os.getenv("API_KEY"):
        raise HTTPException(status_code=401, detail="Invalid API Key")
    return api_key

# Endpoint to generate response
@app.post("/generate_response", dependencies=[Depends(get_api_key)])
async def generate_response(request_body: RequestBody):
    url = f"{api_base}openai/deployments/{deployment_id}/extensions/chat/completions?api-version={api_version}"
    headers = {
        "Content-Type": "application/json",
        "api-key": api_key
    }
    data = {
        "dataSources": [
            {
                "type": "AzureCognitiveSearch",
                "parameters": {
                    "endpoint": search_endpoint,
                    "key": search_key,
                    "indexName": search_index
                }
            }
        ],
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant"
            },
            {
                "role": "user",
                "content": request_body.query
            }
        ]
    }

    async with httpx.AsyncClient() as client:
        response = await client.post(url, json=data, headers=headers, timeout=60)

    if response.status_code != 200:
        raise HTTPException(status_code=response.status_code, detail=response.text)

    response_json = response.json()

    # get the assistant response
    assistant_content = response_json['choices'][0]['message']['content']
    assistant_content = re.sub(r'\[doc.\]', '', assistant_content)
    
    # return assistant_content as json
    return {
        "response": assistant_content
    }

# Run the server
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000, timeout_keep_alive=60)

This API has one endpoint: /generate_response that takes { "query": "your query" }as input and returns { "response": assistant_content }as output. Note that the original response from the model contains references like [doc1], [doc2], etc… The regex in the code removes those references. I don not particularly like how the references are handled by the API so I decided to not include them and simplify the response.

The endpoint expects an api-key header. It it is not present, it returns an error.

The endpoint does a call to the Azure OpenAI chat completion extension API which looks very similar to a regular OpenAI chat completion. The request does however, contain a dataSources field with the Azure AI Search information.

The environment variables like the OPENAI_API_KEY and the SEARCH_KEY are retrieved from a .env file.

Note: to stress this again, this API returns the answer to the query as generated by the chosen Azure OpenAI model. This allows it to be used in any application, not just a custom GPT. For a custom GPT in ChatGPT, an alternate approach would be to hand over the search results from Azure AI search directly, allowing the model in the custom GPT to generate the response. It would be faster and avoid Azure OpenAI costs. We are effectively using the custom GPT as a UI and as a way to maintain history between action calls. 😀

If you want to see the code in GitHub, check this URL: https://github.com/gbaeke/custom-gpt.

Running the API in Azure Container Apps

To run the API in the cloud, I decided to use Azure Container Apps. That means we need a Dockerfile to build the container image locally or in the cloud:

# Use an official Python runtime as a parent image
FROM python:3.9-slim-buster

# Set the working directory in the container to /app
WORKDIR /app

# Add the current directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Run app.py when the container launches
CMD ["python3", "app.py"]

We also need a requirements.txt file:

fastapi==0.104.1
pydantic==2.5.2
pydantic_core==2.14.5
httpx==0.25.2
python-dotenv==1.0.0
uvicorn==0.24.0.post1

I use the following shell script to build and run the container locally. The script can also push the container to Azure Container Apps.

#!/bin/bash

# Load environment variables from .env file
export $(grep -v '^#' .env | xargs)

# Check the command line argument
if [ "$1" == "build" ]; then
    # Build the Docker image
    docker build -t myblog .
elif [ "$1" == "run" ]; then
    # Run the Docker container, mapping port 8000 to 8000 and setting environment variables
    docker run -p 8000:8000 -e OPENAI_API_KEY=$OPENAI_API_KEY -e SEARCH_KEY=$SEARCH_KEY -e API_KEY=$API_KEY myblog
elif [ "$1" == "up" ]; then
    az containerapp up -n myblog --ingress external --target-port 8000 \
        --env-vars OPENAI_API_KEY=$OPENAI_API_KEY SEARCH_KEY=$SEARCH_KEY API_KEY=$API_KEY \
        --source .
else
    echo "Usage: $0 {build|run|up}"
fi

The shell script extracts the environment variables defined in .env and sets them in the session. Next, we check the first parameter given to the script (Docker is required on your machine for build and run):

  • build: build the Docker image
  • run: run the Docker image locally on port 8000 and specify the environment variables to authenticate to Azure OpenAI and Azure AI Search
  • up: build the Docker image in the cloud and run it in Container Apps; if you do not have a Container Apps Environment or Azure Container Registry, they will be created for you. In the end, you will get an https endpoint to your API in the cloud.

Note: you should not put secrets in environment variables in Azure Container Apps directly; use Container Apps secrets or Key Vault instead; the above is just quick and easy to simplify the deployment

To test the API locally, use the REST Client extension in VS Code with an .http file:

POST http://localhost:8000/generate_response HTTP/1.1
Host: localhost:8000
Content-Type: application/json
api-key: API_KEY_FROM_DOTENV

{
  "query": "what is the openai assistants api?"
}

###

POST https://AZURE_CONTAINER_APPS_ENDPOINT/generate_response HTTP/1.1
Host: AZURE_CONTAINER_APPS_ENDPOINT
Content-Type: application/json
api-key: API_KEY_FROM_DOTENV

{
  "query": "Can I use Redis as a vector db?"
}

When you get something like below, you are good to go. Note again that we return a final answer and not the relevant chunks from Azure AI search.

Successful response from .http file

Getting the OpenAPI spec and adding it to the GPT

With your API running, you can go to its URL, like this one if the API runs locally: http://localhost:8000/openapi.json. The result is a JSON document you can copy to your GPT. I recommend to copy the JSON to VS Code and format it before you paste it in the GPT.

In the GPT, modify the OpenAPI spec with a servers section that includes your Azure Container Apps ingress URL:

Adding the URL to the GPT Action definition

If you want to give the ability to the user to trust the action to be called without approval (after a first call), also add the following:

Allowing the user to say Always Allow when action is used the first time

Take a look at the video below that shows how to create the GPT, including the configuration of the action and testing it.

Conclusion

Custom GPTs in ChatGPT open up a world of possibilities to offer personalised ChatGPT experiences. With custom actions, you can let the GPT do anything you want. In this tutorial, the custom action is an API call that answers the user’s question using Azure OpenAI with Azure AI Search as the provider of relevant context.

As long as you build and host an API and have an OpenAPI spec for your API, the possibilities are virtually limitless.

Note that custom GPTs with actions are not available in the ChatGPT app on mobile yet (end November, 2023). When that happens, it will open up all these capabilities on the go, including enabling voice chat. Fun stuff! 😀

Use Azure OpenAI Add your data vector search from code

In the previous post, we looked at using Azure OpenAI Add your data from the Azure OpenAI Chat Playground. It is an easy-to-follow wizard to add documents to a storage account and start asking questions about them. From the playground, you can deploy a chat app to an Azure web app and you are good to go. The vector search is performed by an Azure Cognitive Search resource via an index that includes a vector next to other fields such as the actual content, the original URL, etc…

In this post, we will look at using this index from code and build a chat app using the Python Streamlit library.

All code can be found here: https://github.com/gbaeke/azure-cog-search

Requirements

You need an Azure Cognitive Search resource with an index that supports vector search. Use this post to create one. Besides Azure Cognitive Search, you will need Azure OpenAI deployed with both gpt-4 (or 3.5) and the text-embedding-ada-002 embedding model. The embedding model is required to support vector search. In Europe, use France Central as the region.

Next, you need Python installed. I use Python 3.11.4 64-bit on an M1 Mac. You will need to install the following libraries with pip:

  • streamlit
  • requests

You do not need the OpenAI library because we will use the Azure OpenAI REST APIs to be able to use the extension that enables the Add your data feature.

Configuration

We need several configuration settings. The can be divided into two big blocks:

  • Azure Cognitive Search settings: name of the resource, access key, index name, columns, type of search (vector), and more…
  • Azure OpenAI settings: name of the model (e.g., gpt-4), OpenAI access key, embedding model, and more…

You should create a .env file with the following content:

AZURE_SEARCH_SERVICE = "AZURE_COG_SEARCH_SHORT_NAME"
AZURE_SEARCH_INDEX = "INDEX_NAME"
AZURE_SEARCH_KEY = "AZURE_COG_SEARCH_AUTH_KEY"
AZURE_SEARCH_USE_SEMANTIC_SEARCH = "false"
AZURE_SEARCH_TOP_K = "5"
AZURE_SEARCH_ENABLE_IN_DOMAIN = "true"
AZURE_SEARCH_CONTENT_COLUMNS = "content"
AZURE_SEARCH_FILENAME_COLUMN = "filepath"
AZURE_SEARCH_TITLE_COLUMN = "title"
AZURE_SEARCH_URL_COLUMN = "url"
AZURE_SEARCH_QUERY_TYPE = "vector"

# AOAI Integration Settings
AZURE_OPENAI_RESOURCE = "AZURE_OPENAI_SHORT_NAME"
AZURE_OPENAI_MODEL = "gpt-4"
AZURE_OPENAI_KEY = "AZURE_OPENAI_AUTH_KEY"
AZURE_OPENAI_TEMPERATURE = 0
AZURE_OPENAI_TOP_P = 1.0
AZURE_OPENAI_MAX_TOKENS = 1000
AZURE_OPENAI_STOP_SEQUENCE = ""
AZURE_OPENAI_SYSTEM_MESSAGE = "You are an AI assistant that helps people find information."
AZURE_OPENAI_PREVIEW_API_VERSION = "2023-06-01-preview"
AZURE_OPENAI_STREAM = "false"
AZURE_OPENAI_MODEL_NAME = "gpt-4"
AZURE_OPENAI_EMBEDDING_ENDPOINT = "https://AZURE_OPENAI_SHORT_NAME.openai.azure.com/openai/deployments/embedding/EMBEDDING_MODEL_NAME?api-version=2023-03-15-preview"
AZURE_OPENAI_EMBEDDING_KEY = "AZURE_OPENAI_AUTH_KEY"

Now we can create a config.py that reads these settings.

from dotenv import load_dotenv
import os
load_dotenv()

# ACS Integration Settings
AZURE_SEARCH_SERVICE = os.environ.get("AZURE_SEARCH_SERVICE")
AZURE_SEARCH_INDEX = os.environ.get("AZURE_SEARCH_INDEX")
AZURE_SEARCH_KEY = os.environ.get("AZURE_SEARCH_KEY")
AZURE_SEARCH_USE_SEMANTIC_SEARCH = os.environ.get("AZURE_SEARCH_USE_SEMANTIC_SEARCH", "false")
AZURE_SEARCH_TOP_K = os.environ.get("AZURE_SEARCH_TOP_K", 5)
AZURE_SEARCH_ENABLE_IN_DOMAIN = os.environ.get("AZURE_SEARCH_ENABLE_IN_DOMAIN", "true")
AZURE_SEARCH_CONTENT_COLUMNS = os.environ.get("AZURE_SEARCH_CONTENT_COLUMNS")
AZURE_SEARCH_FILENAME_COLUMN = os.environ.get("AZURE_SEARCH_FILENAME_COLUMN")
AZURE_SEARCH_TITLE_COLUMN = os.environ.get("AZURE_SEARCH_TITLE_COLUMN")
AZURE_SEARCH_URL_COLUMN = os.environ.get("AZURE_SEARCH_URL_COLUMN")
AZURE_SEARCH_VECTOR_COLUMNS = os.environ.get("AZURE_SEARCH_VECTOR_COLUMNS")
AZURE_SEARCH_QUERY_TYPE = os.environ.get("AZURE_SEARCH_QUERY_TYPE")

# AOAI Integration Settings
AZURE_OPENAI_RESOURCE = os.environ.get("AZURE_OPENAI_RESOURCE")
AZURE_OPENAI_MODEL = os.environ.get("AZURE_OPENAI_MODEL")
AZURE_OPENAI_KEY = os.environ.get("AZURE_OPENAI_KEY")
AZURE_OPENAI_TEMPERATURE = os.environ.get("AZURE_OPENAI_TEMPERATURE", 0)
AZURE_OPENAI_TOP_P = os.environ.get("AZURE_OPENAI_TOP_P", 1.0)
AZURE_OPENAI_MAX_TOKENS = os.environ.get("AZURE_OPENAI_MAX_TOKENS", 1000)
AZURE_OPENAI_STOP_SEQUENCE = os.environ.get("AZURE_OPENAI_STOP_SEQUENCE")
AZURE_OPENAI_SYSTEM_MESSAGE = os.environ.get("AZURE_OPENAI_SYSTEM_MESSAGE", "You are an AI assistant that helps people find information about jobs.")
AZURE_OPENAI_PREVIEW_API_VERSION = os.environ.get("AZURE_OPENAI_PREVIEW_API_VERSION", "2023-06-01-preview")
AZURE_OPENAI_STREAM = os.environ.get("AZURE_OPENAI_STREAM", "true")
AZURE_OPENAI_MODEL_NAME = os.environ.get("AZURE_OPENAI_MODEL_NAME", "gpt-35-turbo")
AZURE_OPENAI_EMBEDDING_ENDPOINT = os.environ.get("AZURE_OPENAI_EMBEDDING_ENDPOINT")
AZURE_OPENAI_EMBEDDING_KEY = os.environ.get("AZURE_OPENAI_EMBEDDING_KEY")

Writing the chat app

Now we will create chat.py. The diagram below summarizes the architecture:

Chat app architecture (high level)

Here is the first section of the code with explanations:

import requests
import streamlit as st
from config import *
import json

# Azure OpenAI REST endpoint
endpoint = f"https://{AZURE_OPENAI_RESOURCE}.openai.azure.com/openai/deployments/{AZURE_OPENAI_MODEL}/extensions/chat/completions?api-version={AZURE_OPENAI_PREVIEW_API_VERSION}"
    
# endpoint headers with Azure OpenAI key
headers = {
    'Content-Type': 'application/json',
    'api-key': AZURE_OPENAI_KEY
}

# Streamlit app title
st.title("🤖 Azure Add Your Data Bot")

# Keep messages array in session state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display previous chat messages from history on app rerun
# Add your data messages include tool responses and assistant responses
# Exclude the tool responses from the chat display
for message in st.session_state.messages:
    if message["role"] != "tool":
        with st.chat_message(message["role"]):
            st.markdown(message["content"])

A couple of things happen here:

  • We import all the variables from config.py
  • We construct the Azure OpenAI REST endpoint and store it in endpoint; we use the extensions/chat endpoint here which supports the Add your data feature in API version 2023-06-01-preview and higher
  • We configure the HTTP headers to send to the endpoint; the headers include the Azure OpenAI authentication key
  • We print a title with Streamlit (st.title) and define a messages array that we store in Streamlit’s session state
  • Because of the way Streamlit works, we have to print the previous messages of the chat each time the page reloads. We do that in the last part but we exclude the tool role. The extensions/chat endpoint returns a tool response that contains the data returned by Azure Cognitive Search. We do not want to print the tool response. Together with the tool response, the endpoint returns an assistant response which is the response from the gpt model. We do want to print that response.

Now we can look at the code that gets executed each time the user asks a question. In the UI, the question box is at the bottom:

Streamlit chat UI

Whenever you type a question, the following code gets executed:

# if user provides chat input, get and display response
# add user question and response to previous chat messages
if user_prompt := st.chat_input():
    st.chat_message("user").write(user_prompt)
    with st.chat_message("assistant"):
        with st.spinner("🧠 thinking..."):
            # add the user query to the messages array
            st.session_state.messages.append({"role": "user", "content": user_prompt})
            body = {
                "messages": st.session_state.messages,
                "temperature": float(AZURE_OPENAI_TEMPERATURE),
                "max_tokens": int(AZURE_OPENAI_MAX_TOKENS),
                "top_p": float(AZURE_OPENAI_TOP_P),
                "stop": AZURE_OPENAI_STOP_SEQUENCE.split("|") if AZURE_OPENAI_STOP_SEQUENCE else None,
                "stream": False,
                "dataSources": [
                    {
                        "type": "AzureCognitiveSearch",
                        "parameters": {
                            "endpoint": f"https://{AZURE_SEARCH_SERVICE}.search.windows.net",
                            "key": AZURE_SEARCH_KEY,
                            "indexName": AZURE_SEARCH_INDEX,
                            "fieldsMapping": {
                                "contentField": AZURE_SEARCH_CONTENT_COLUMNS.split("|") if AZURE_SEARCH_CONTENT_COLUMNS else [],
                                "titleField": AZURE_SEARCH_TITLE_COLUMN if AZURE_SEARCH_TITLE_COLUMN else None,
                                "urlField": AZURE_SEARCH_URL_COLUMN if AZURE_SEARCH_URL_COLUMN else None,
                                "filepathField": AZURE_SEARCH_FILENAME_COLUMN if AZURE_SEARCH_FILENAME_COLUMN else None,
                                "vectorFields": AZURE_SEARCH_VECTOR_COLUMNS.split("|") if AZURE_SEARCH_VECTOR_COLUMNS else []
                            },
                            "inScope": True if AZURE_SEARCH_ENABLE_IN_DOMAIN.lower() == "true" else False,
                            "topNDocuments": AZURE_SEARCH_TOP_K,
                            "queryType":  AZURE_SEARCH_QUERY_TYPE,
                            "roleInformation": AZURE_OPENAI_SYSTEM_MESSAGE,
                            "embeddingEndpoint": AZURE_OPENAI_EMBEDDING_ENDPOINT,
                            "embeddingKey": AZURE_OPENAI_EMBEDDING_KEY
                        }
                    }   
                ]
            }  

            # send request to chat completion endpoint
            try:
                response = requests.post(endpoint, headers=headers, json=body)

                # there will be a tool response and assistant response
                tool_response = response.json()["choices"][0]["messages"][0]["content"]
                tool_response_json = json.loads(tool_response)
                assistant_response = response.json()["choices"][0]["messages"][1]["content"]

                # get urls for the JSON tool response
                urls = [citation["url"] for citation in tool_response_json["citations"]]


            except Exception as e:
                st.error(e)
                st.stop()
            
           
            # replace [docN] with urls and use 0-based indexing
            for i, url in enumerate(urls):
                assistant_response = assistant_response.replace(f"[doc{i+1}]", f"[[{i}]({url})]")
            

            # write the response to the chat
            st.write(assistant_response)

            # write the urls to the chat; gpt response might not refer to all
            st.write(urls)

            # add both responses to the messages array
            st.session_state.messages.append({"role": "tool", "content": tool_response})
            st.session_state.messages.append({"role": "assistant", "content": assistant_response})
            

When there is input, we write the input to the chat history on the screen and add it to the messages array. The OpenAI APIs expect a messages array that includes user and assistant roles. In other words, user questions and assistant (here gpt-4) responses.

With a valid messages array, we can send our payload to the Azure OpenAI extensions/chat endpoint. If you have ever worked with the OpenAI or Azure OpenAI APIs, many of the settings in the JSON body will be familiar. For example: temperature, max_tokens, and of course the messages themselves.

What’s new here is the dataSources field. It contains all the information required to perform a vector search in Azure Cognitive Services. The search finds content relevant to the user’s question (that was added last to the messages array). Because queryType is set to vector, we also need to provide the embedding endpoint and key. It’s required because the user question has to be vectorized in order to compare it with the stored vectors.

It’s important to note that the extensions/chat endpoint, together with the dataSources configuration takes care of a lot of the details:

  • Perform a k-nearest neighbor search (k=5 here) to find 5 documents closely related to the user’s question
  • It uses vector search for this query (could be combined with keyword and semantic search to perform a hybrid search but that is not used here)
  • It stuffs the prompt to the GPT model with the relevant content
  • It returns the GPT model response (assistant response) together with a tool response. The tool response contains citations that include URLs to the original content and the content itself.

In the UI, we print the URLs from these citations after modifying the assistant response to just return hyperlinked numbers like [0] and [1] for the citations instead of unlinked [doc1], [doc2], etc… In the UI, that looks like:

Printing the URLs from the citations

Note that this chat app is a prototype and does not include management of the messages array. Long interactions will reach the model’s token limit!

You can find this code on GitHub here: https://github.com/gbaeke/azure-cog-search.

Conclusion

Although still in preview, you now have an Azure-native solution that enables the RAG pattern with vector search. RAG stands for Retrieval Augmented Generation. Azure Cognitive Search is a fully managed service that stores the vectors and performs similarity searches. There is no need to deploy a 3rd party vector database.

There is no need for specific libraries to implement this feature because it is all part of the Azure OpenAI API. Microsoft simply extended that API to add data sources and takes care of all the behind-the-scenes work that finds relevant content and adds it to your prompt.

If, for some reason, you do not want to use the Azure OpenAI API directly and use something like LangChain or Semantic Kernel, you can of course still do that. Both solutions support Azure Cognitive Search as a vector store.