baeke.info

A look at the Azure OpenAI Assistants API

Introduction

A while ago, I looked at the OpenAI Assistants API. In February of 2024, Microsoft have released their Assistants API in public preview. It works in the same way as the OpenAI Assistants API while being able to use it with Azure OpenAI models, deployed to a region of your choice.

The goal of the Assistants API is to make it easier for developers to create applications with copilot-like experiences. It should be easier to provide the assistant with extra knowledge or allow the assistant to interact with the world by calling external APIs.

If you have ever created a chat-based copilot with the standard Azure OpenAI chat completions API, you know that it is stateless. It does not know about the conversation history. As a developer, you have to maintain and manage conversation history and pass it to the completions API. With the Assistants API, that is not necessary. The API is stateful. Conversation history is automatically managed via threads. There is no need to manage conversation state to ensure you do not break the model’s context window limits.

In addition to threads, the Assistants API also supports tools. One of these tools is Code Interpreter, a sandboxed Python environment that can help solving complex questions. If you are a ChatGPT Plus subscriber, you should know that tool already. Code Interpreter is often used to solve math questions, something that LLMs are not terribly good at. However, it is not limited to math. Next to Code Interpreter, you can define your own functions. A function could call an API that queries a database that returns the results to the assistant.

Before diving into a code example you should understand the following components:

Assistant: custom AI with Azure OpenAI models that have access to files and tools
Thread: conversation between the assistant and the user
Message: message created by the assistant or a user; a message does not have to be text; it could be an image or a file; messages are stored on a thread
Run: you run a thread to illicit a response from the model; for instance if you just placed a user question on the thread and you run the thread, the model can respond with text or perform a tool call
Run Step: detailed list of steps the assistant took as part of a run; this could include a tools call

Enough talk, let’s look at some code. The code can be found on GitHub in a Python notebook: https://github.com/gbaeke/azure-assistants-api/blob/main/getting-started.ipynb

Initialising the OpenAI client and creating the assistant

We will use a .env file to load the Azure OpenAI API key, the endpoint and the API version. You will need an Azure OpenAI resource in a supported region such as Sweden Central. The API version should be 2024-02-15-preview.

import os
from dotenv import load_dotenv
from openai import AzureOpenAI

load_dotenv()

# Create Azure OpenAI client
client = AzureOpenAI(
    api_key=os.getenv('AZURE_OPENAI_API_KEY'),
    azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT'),
    api_version=os.getenv('AZURE_OPENAI_API_VERSION')
)

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="""You are a math tutor that helps users solve math problems. 
    You have access to a sandboxed environment for writing and testing code. 
    Explain to the user why you used the code and how it works
    """,
    tools=[{"type": "code_interpreter"}],
    model="gpt-4-preview" # ensure you have a deployment in the region you are using
)

Above, we create an assistant with the client.beta.assistant.create method. Indeed, OpenAI Assistants as developed by OpenAI are still in beta so the OpenAI library reflects that.

Note that an assistant is given specific instructions and, in this case, a tool. We will use the built-in Code Interpreter tool. It can help us solving math questions, including the generation of plots.

Ensure that the model refers to a deployed model in your region. I use the gpt-4-turbo preview here.

Note that the assistants you create are shown in the Azure OpenAI Assistant Playground. For example, I created the Math Assistant a few times by running the same code:

When you click on one of the assistants, it opens in the Assistant Playground. In that playground, you can start chatting right away. For example:

In the screenshot above, I have asked the assistant to plot a sinus wave. It explains how it did that because that is what the Instructions tell the assistant to do. At the end, Code Interpreter creates the plot and generates an image file. That image file is picked up in the playground and displayed.

Also note the panel on the right with API instructions. You can click on those instructions to execute them and see the JSON response.

Note that you can reuse an assistant by simply using its id. You can also create the assistant directly in the portal. You do not have to create it in code, like we are doing.

Let’s now create a thread in code and ask some math questions.

Creating a thread and adding a message

Below, a thread is created which results in a thread id. Subsequently, a message is added to the thread with role set to user. This is the first user question in the thread.

# Create a thread
thread = client.beta.threads.create()

# print the thread id
print("Thread id: ", thread.id)

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Solve the equation y = x^2 + 3 for x = 3 and plot the function graph."
)

# Show the messages
thread_messages = client.beta.threads.messages.list(thread.id)
print(thread_messages.model_dump_json(indent=2))

The JSON dump of the messages contains a data array. In our case the single item in the data array contains a content array next to other information such as role, the thread id, the creation timestamp and more. The content array can contain multiple pieces of content of different types. In this case, we simply have the user question which is of type text.

"content": [
        {
          "text": {
            "annotations": [],
            "value": "Solve the equation y = x^2 + 3 for x = 3 and plot the function graph."
          },
          "type": "text"
        }
      ]

Running the thread

A message on a thread is great but does not do all that much. We want a response from the assistant. In order to get a response, we need to run the thread:

run = client.beta.threads.runs.create(
  thread_id=thread.id,
  assistant_id=assistant.id
)

status = run.status

while status not in ["completed", "cancelled", "expired", "failed"]:
    time.sleep(2)
    run = client.beta.threads.runs.retrieve(thread_id=thread.id,run_id=run.id)
    status = run.status
    print(f'Status: {status}')
    clear_output(wait=True)

print(f'Status: {status}')

The run is where the assistant and the thread come together via their ids. As you can probably tell, the run does not directly return the result. You need to check the run status yourself and act accordingly.

When the status is completed, the run was successful. That means that there should be some response from the assistant.

Interpreting the messages after the run

After a completed run in response to a message with role = user, there should be a response from the model. There are all sorts of responses, including responses that indicate you should run a function. Our assistant does not have custom functions defined so the response can be one of the following:

a response from the model without using Code Interpreter
a response from the model, interpreting the response from Code Interpreter and possibly including images and text

Note that you do not have to call Code Interpreter specifically. The assistant will decide to use Code Interpreter (you can also be explicit) and use the Code Interpreter response in its final answer.

The code below shows one way of dealing with the assistant response:

messages = client.beta.threads.messages.list(
    thread_id=thread.id
)

messages_json = json.loads(messages.model_dump_json())

for item in reversed(messages_json['data']):
    # Check the content array
    for content in reversed(item['content']):
        # If there is text in the content array, print it as Markdown
        if 'text' in content:
            display(Markdown(content['text']['value']))
        # If there is an image_file in the content, print the file_id
        if 'image_file' in content:
            file_id = content['image_file']['file_id']
            file_content = client.files.content(file_id)
            # use PIL with the file_content
            img = Image.open(file_content)
            img = img.resize((400, 400))
            display(img)

Above, the following happens:

all messages from the thread are retrieved: this includes the original user question in addition to the assistant response; the later responses are first in the array
we loop through the reversed array and check for a content field: if there is a content field (an array) we loop over that and check for a text or image_file field
if we find content of type text, we display it with markdown (we are using a Notebook here)
if we find content of type image_file, we retrieve the image from Azure OpenAI using its files API and display it in the notebook with some help of PIL.

Here is the response I got in my notebook. Note that there are only two messages. The assistant response contains two pieces of content.

All messages in the thread visualised from 1st to last

Follow-up questions

One of the advantages of the Assistants API is that we do not have to maintain chat history. We only have to add follow-up questions to the thread and run it again. Below is the model response after adding this question: “Is this a concave function?”:

Above, I print the entire thread in reverse order again. The answer of the assistant is that this is clearly not a concave function but a convex one.

You should know that at present (February 2024), the Assistants API simply tries to fit the messages in the model’s context window. If the context window is large, long conversations might cost you a lot in tokens. At present, there is no way that I know of to change this mechanism. OpenAI, and Microsoft, are planning to add some extra capabilities. For example:

control token count regardless of the chosen model (e.g. set token count to 2000 even if the model allows for 8000)
generate summaries of previous messages and pass the summaries as context during a thread run

In most production applications that are used at scale, you really need to control token usage by managing chat history meticulously. Today, that is only possible with the chat completions API and/or abstractions on top of it like LangChain.

Conclusion

With the arrival of the Assistants API in Azure OpenAI, it is easier to write assistants that work with tools like Code Interpreter or custom functions. This post has focused on the basics of using the API with only the Code Interpreter tool.

In follow-up posts, we will look at custom functions and how to work with uploaded files.

Keep in mind that this is all in public preview and should not be used in production.

Deploy a flow created in Prompt Flow with Docker

Update: this post used an older version of Prompt Flow at the time. It had some issues with building and running the Docker image without issues. In version 1.5.0, it should work fine because the Dockerfile now also installs gcc.

In the previous post, we created a flow with Prompt Flow in Visual Studio Code. The Prompt Flow extension for VS Code has a visual flow editor to test the flow. You simply provide the input and click the Run button. When the flow is finished, the result can be seen in the Outputs node, including a trace of the flow:

Now it’s time to deploy the flow. One of the options is creating a container image with Docker.

Before we start, we will first convert this flow into a chat flow. Chat does not make much sense for this flow. However, the Docker container includes a UI to run your flow via a chat interface. You will also be able to test your flow locally in a web app.

Convert the flow to a chat flow

To convert the flow to a chat flow, enable chat mode and add chat_history to the Inputs node:

To include the chat history in your conversations, modify the .jinja2 template in the LLM node:

system:
You return the url to an image that best matches the user's question. Use the provided context to select the image. Only return the url. When no
matching url is found, simply return NO_IMAGE

{% for item in chat_history %}
user:
{{item.inputs.description}}
assistant:
{{item.outputs.url}}
{% endfor %}

user:
{{description}}

context : {{search_results}}

Enabling chat history allows you to loop over its content and reconstruct the user/assistant interactions before adding the most recent description. When you run the flow, you get:

The third option will give you a GUI to test your flow:

As you can probably tell, this requires Streamlit. The first time you run this flow, check the terminal for instructions about the packages to install. When you are finished, press CTRL-C in the terminal.

Now that we know the chat flow works, we can create the Docker image.

⚠️ Important: a chat flow is not required to build the Docker image; we only add it here to illustrate the user interface that the Docker image can present to the user; you can always call your flow using a HTTP endpoint, chat flow or not

Generating the Docker image

Before creating the Docker image, ensure your Python requirements.txt file in your flow’s folder has the following content:

promptflow
promptflow-tools
azure-search-documents

We need promptflow-tools to support tools like the embedding tool in the container. We also need azure-search-documents to use in the custom Python tool.

To build the flow as a Docker image, you should be able to use the build icon and select Build as Docker:

However, in my case, that did not result in any output to build a Docker image. This is a temporary issue from the 1.6 version of the extension and will be fixed. For now, I recommend building the image with the command line tool:

pf flow build --source <path-to-your-flow-folder> --output <your-output-dir> --format docker

I ran the following command in my flow folder:

pf flow build --source .  --output ./docker --format docker

That resulted in a docker folder like below:

Note that this copies your flow’s files to a flow folder under the docker folder. Ensure that requirements.txt in the docker/flow folder matches requirements.txt in your original flow folder (it should).

You can now cd into the Docker folder and run the following command. Don’t forget the . at the end:

docker build -t YOURTAG .

In my case, I used:

docker build -t gbaeke/pfimage .

After running the above command, you might get an error. I got: ERROR: failed to solve... I fixed that by modifying the Docker file. Move the RUN apt-get line above the RUN conda create line and add gcc:

# syntax=docker/dockerfile:1
FROM docker.io/continuumio/miniconda3:latest

WORKDIR /

COPY ./flow /flow

RUN apt-get update && apt-get install -y runit gcc

# create conda environment
RUN conda create -n promptflow-serve python=3.9.16 pip=23.0.1 -q -y && \
    conda run -n promptflow-serve \
.......

After this modification, the docker build command ran successfully.

Running the image

The image contains the connections you created. Remember we created an Azure OpenAI connection and a custom connection. Connections contain both config and secrets. Although the config is available in the image, the secrets are not. You need to provide the secrets as environment variables.

You can find the names of the environment variables in the settings.json file:

{
  "OPEN_AI_CONNECTION_API_KEY": "",
  "AZURE_AI_SEARCH_CONNECTION_KEY": ""
}

Run the container as shown below and replace OPENAIKEY and AISEARCHKEY with the key to your Azure OpenAI resource and Azure AI Search resource. In the container, the code listens on port 8080 so we map that port to port 8080 on the host:

docker run -itp 8080:8080 -e OPEN_AI_CONNECTION_API_KEY=OPENAIKEY \
  AZURE_AI_SEARCH_CONNECTION_KEY=AISEARCHKEY

When you run the above command, you get the following output (some parts removed):

finish  run  supervise
Azure_AI_Search_Connection.yaml  open_ai_connection.yaml
{
    "name": "open_ai_connection",
    "module": "promptflow.connections", 
    ......
    "api_type": "azure",
    "api_version": "2023-07-01-preview"
}
{
    "name": "Azure AI Search Connection",
    "module": "promptflow.connections",
    ....
    },
    "secrets": {
        "key": "******"
    }
}
start promptflow serving with worker_num: 8, worker_threads: 1
[2023-12-14 12:55:09 +0000] [51] [INFO] Starting gunicorn 20.1.0
[2023-12-14 12:55:09 +0000] [51] [INFO] Listening at: http://0.0.0.0:8080 (51)
[2023-12-14 12:55:09 +0000] [51] [INFO] Using worker: sync
...

You should now be able to send requests to the score endpoint. The screenshot below shows a .http file with the call config and result:

Calling the flow via the container’s score endpoint

When you browse to http://localhost:8080, you get a chat interface like the one below:

In my case, the chat UI did not work. Although I could enter a description and press ENTER, I did not see the response. In the background, the flow was triggered, just the response was missing. Remember that these features, and Prompt Flow on your local machine are still experimental at the time of writing (December 2023). They will probably change quite a lot in the future or have changed by the time you read this.

Conclusion

Although you can create a flow in the cloud and deploy that flow to an online endpoint, you might want more control over the deployment. Developing the flow locally and building a container image gives you that control. Once the image is built and pushed to a container registry, you can deploy to your environment of choice. That could be Kubernetes, Azure Container Apps or any other environment that supports containers.

Writing your first flow with Prompt Flow in Visual Studio Code

In this blog post, we will create a flow with Prompt Flow in Visual Studio Code. Prompt Flow is a suite of development tools to build LLM-based AI applications. It tries to cover the end-to-end development cycle, including prototyping, testing and deployment to production.

In Prompt Flow, you create flows. Flows link LLMs (large language models), prompts and tools together in an executable workflow. An example of such a flow is show below:

The flow above (basically a distributed acyclical graph – DAG – of functions) sends its input, a description to search for an image, to a tool that embeds the description with an Azure OpenAI embedding model. The embedding is used as input to a Python tool that does a similarity search in Azure AI Search. The search returns three results. The original input, together with the query results, are subsequently handed to an LLM (above, the final_result node) that hopefully picks the correct image url.

Although you could write your own API that does all of the above, Prompt Flow allows you to visually build, run and debug a flow that has input and output. When you are happy with the flow, you can convert it to an API. One of the ways to host the API is via a container.

We will build this flow on our local machine and host it as a container. Note that Prompt Flow can also be used from the portal using Azure Machine Learning or Azure AI Studio.

👉 Another blog post will describe how to build and run the container

Installing Prompt Flow on your machine

To install Prompt Flow you will need Python on your machine. Use Python 3.9 or higher. I use Python 3.11 on an Apple M2. Check the full installation instructions here. Without using a Python virtual environment, you can just run the following command to install Prompt Flow:

pip install promptflow promptflow-tools

Next, run pf -v to check the installation.

⚠️ Do not forget to install promptflow-tools because it enables the embedding tool, llm tool and other tools to be used as nodes in the flow; also ensure this package is installed in the container image that will be created for this flow

In Visual Studio Code, install the Prompt flow for VS Code extension. It has the VS Code Python Extension as a prerequisite. Be sure to check the Prompt Flow Quick Start for full instructions.

We will mainly use the Visual Code extension. Note that the pf command can be used to perform many of the tasks we will discuss below (e.g, creating connections, running a flow, etc…).

Creating an empty flow

In VS Code, ensure you opened an empty folder or create a new folder. Right click and select New flow in this directory. You will get the following question:

Select Empty Flow. This creates a file called flow.dag.yaml with the following content:

If you look closely, you will see a link to open a Visual editor. Click that link:

Visual editor with empty input and output and blank canvas

We can now add input(s) and output(s) and add the nodes in between.

Inputs and outputs

Inputs have a type and a value. Add a string input called description:

One string input: a description (of an image, like **creature** or **fruit**)

When you later run the flow, you can type the description in the Value textbox. When the flow is converted to an API, the API will except a description in the POST body.

Next, add an output called url. In the end, the flow returns a url to an image that matches the description:

The value of the output will be the coming from another node. We still have to add those. If you click the Value dropdown list, you will only be able to select the input value for now. You can do that and click the run icon. Save your flow before running it.

Running the flow with output set to the input

When you click the run button, a command will be run in the terminal that runs the flow:

python3 -m promptflow._cli._pf.entry flow test --flow /Users/geertbaeke/projects/promptflow/images/blogpost --user-agent "prompt-flow-extension/1.6.0 (darwin; arm64) VSCode/1.85.0"

The output of this command is:

Output of the flow is JSON, here with just the **url**

Although this is not very useful, the flow runs and produces a result. The output is our input. We can now add nodes to do something useful.

Creating an embedding from the description

We need to embed the description to search for similar descriptions in an Azure AI Search index. If you are not sure what embeddings are, check Microsoft Learn for a quick intro. It short, it’s a bunch of numbers that represents the meaning of a piece of text. We can use the numbers of the description to compare it to the sets of numbers of image descriptions to see how close they are.

To create an embedding, we need access to an Azure OpenAI embedding model. Such a model takes text as input and returns the bunch of numbers we talked about. This model returns 1536 numbers, aka dimensions.

To use the model, we will need an Azure OpenAI resource’s endpoint and key. If you do not have an Azure OpenAI resource in Azure, create one and deploy the text-embedding-ada-002 model. In my example, the deployment is called embedding:

With the Azure resources created, we can add a connection in Prompt Flow that holds the OpenAI endpoint and key:

Click the Prompt Flow extension icon and click + next to Azure OpenAI in the Connections section:

A document will open that looks like the one below:

Fill in the name and api_base only. The api_base is the https url to your Azure OpenAI instance. It’s something like https://OPENAIRESOURCENAME.openai.azure.com/. Do not provide the api_key. When you click Create connection (the smallish link at the bottom), you will be asked for the key.

After providing the key, the connection should appear under the Azure OpenAI section. You will need this connection in the embedding tool to point to the embedding model to use.

In the Prompt Flow extension pane, now click + next to Embedding in the TOOLS section:

You will be asked for the tool’s name (top of VS Code window). Provide a name (e.g, embedding) and press enter. Select the connection you just created, the deployment name of your embedding model and the input. The input is the description we configured in the flow’s input node. We want to embed that description. The output of this tool will be a list of floating point numbers, a vector, of 1536 dimensions.

The moment you set the input of the embedding, the input node will be connected to the embedding node on the canvas. To check if embedding works, you can connect the output of the embedding node to the url output and run the flow. You should then see the vector as output. The canvas looks like:

Of course, we will need to supply the embedding to a vector search engine, not to the output. In our case, that is Azure AI Search. Let’s try that…

⚠️ Instead of connecting the embedding to the output, you can simply debug the embedding by clicking the debug icon in the embedding tool. The tool will be executed with the value of the input. The result should be a bunch of numbers in your terminal:

Searching for similar images

This section is a bit more tricky because you need an Azure AI Search index that allows you to search for images using a description of an image. To create such an index, see https://atomic-temporary-16150886.wpcomstaging.com/2023/12/09/building-an-azure-ai-search-index-with-a-custom-skill/.

Although you can use a Vector DB Lookup tool that supports Azure AI Search, we will create a custom Python tool that does the same thing. The Python tool uses the azure-search-documents Python library to perform the search. Learning how to use Python tools is important to implement logic there is no specific tool for.

First, we will create a custom connection that holds the name of our Azure AI Search instance and a key to authenticate.

Similar to the Azure OpenAI connection, create a custom connection:

After clicking +, a document opens. Modify it as follows:

Like before, set a name. In a custom connection, you can have configs and secrets. In configs add the Azure AI Search endpoint and index name. In the secrets set key to <user-input>. When you click Create connection, you will be asked to supply the key.

⚠️ Connection information is saved to a local SQLLite database in the .promtflow folder in your home folder

We can now add a Python tool. In TOOLS, next to Python click +. Give the tool a name and select new file. You should get a new Python file in your code with the filename set to <YOURTOOLNAME>.py. The code without comments is below:

from promptflow import tool

@tool
def my_python_tool(input1: str) -> str:
    return 'hello ' + input1

This tool takes a string input and returns a string. The @tool decorator is required.

We need to change this code to get the custom connection information, query Azure AI Search and return search results as a list. The code is below:

from promptflow import tool
from promptflow.connections import CustomConnection
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

@tool
def my_python_tool(vector: list, ai_conn: CustomConnection) -> list:
    ai_conn_dict = dict(ai_conn)
    endpoint = ai_conn_dict['endpoint']
    key = ai_conn_dict['key']
    index = ai_conn_dict['index']

    # query azure ai search
    credential = AzureKeyCredential(key)
    client = SearchClient(endpoint=endpoint,
                          index_name=index,
                          credential=credential)
    vector_query = VectorizedQuery(vector=vector, k_nearest_neighbors=3, fields="textVector", exhaustive=True)
    results = client.search(
        search_text=None,
        vector_queries=[vector_query],
        select=["name", "description", "url"]
    )

    # convert results to json list
    results = [dict(result) for result in results]

    return results

The function has two parameters: a vector of type list to match the output of the embedding tool, and a variable of type CustomConnection. The custom connection can be converted to a dict to retrieve both the configs and the secret.

Next, we use the configs and secret to perform the query with a SearchClient. The query only returns three fields from our index: name, description and url. The result returned from Azure AI Search is converted to a list and returned.

When you save the Python file and go back to your flow, you should see the Python tool (aisearch) with the vector and ai_conn field. If not, click the regenerate link. Set it as below:

The input to the Python tool is the output from the embedding tool. We also pass in the custom connection to provide the configs and key to the tool.

You can set the output of the entire flow (url) to the output of the Python tool to check the results of the search when you run the flow:

Running the flow with Python tool’s output as output

I ran the flow with a description equal to cat. A list of three JSON objects is returned.The first search result is the url to cat.jpg but there are other results as well (not shown above).

Adding an LLM tool

Although we could just pick the first result from the search, that would not work very well. Azure AI Search will always return a result, even if it does not make much sense. In a search for nearest neighbors, your nearest neighbor could be very far away! 😀

For example, if I search for person with a hat, I will get a result even though I do not have such a picture in my index. It simply finds vectors that are “closest” but semantically “far” away from my description. That is bound to happen with just a few images in the index.

An LLM can look at the original description and see if it matches one of the search results. It might pick the 3rd result if it fits better. It might also decide to return nothing if there is no match. In order to do so, we will need a good prompt.

Click + LLM at the top left of the flow to add an LLM tool:

Give the LLM tool a name and select new file. In the flow editor, set the LLM model information:

You can reuse the connection that was used for the embedding. Ensure you have deployed a chat model in your Azure OpenAI resource. I deployed gpt-4 and called the deployment gpt-4 as well. I also set temperature to 0.

The inputs of the node do not make much sense. We do not need chat history for instance. The inputs come from a .jinja2 file that was created for you. The file has the name of the LLM tool. Following the example above, the name is pick_result.jinja2. Open that file and replace it with the following contents and save it:

system:
You return the url to an image that best matches the user's question. Use the provided context to select the image. Only return the url. When no
matching url is found, simply return NO_IMAGE

user:
{{description}}

context : {{search_results}}

The file defines a system message to tell the LLM what to do. The input from the user is the description from the input node. We provide extra context to the LLM as well (the output from search). The {{…}} serve as placeholders to inject data into the prompt.

When you save the file and go back to the flow designer, you should see description and search_results as parameters. Set them as follows:

In addition, set the output of the flow output node to the output of the LLM node:

Save your flow and run it. In my case, with a description of cat I get the following output:

It I use man with a hat as input, I get:

LLM did not find a URL to match the description

Using a prompt variant

Suppose we want to try a different prompt that returns JSON instead of text. To try that, we can create a prompt variant.

In the LLM node, click the variants icon:

You will see a + icon to create a new variant. Click it.

The variant appears under the original variant and is linked to a new file: pick_result_variant_1.jinja2. I have also set the variant as default. Let’s click the new file to open it. Add the following prompt:

system:
You return the url to an image that best matches the user's question. Use the provided context to select the image. 
Return the url and name of the file as JSON. Here is an example of a response. Do not use markdown in the response. Use pure JSON.
{
  "url": "http://www.example.com/images/1.jpg",
  "name": "1.jpg"
}

If there is not matching image, return an empty string in the JSON:
{
  "url": ""
}

user:
{{description}}

context : {{search_results}}

This prompt should return JSON instead of just the url or NO_IMAGE. To test this, run the flow and select Use default variant for all nodes. When I run the flow with description cat, I get the following output:

Because the flow’s output is already JSON, the string representation of the JSON result is used. Adding an extra Python tool that parses the JSON and outputs both the URL and file name might be a good idea here.

You can modify and switch between the prompts and see which one works best. This is especially handy when you are prototyping your flow.

Conclusion

On your local machine, Prompt Flow is easy to install and get started with. In this post we built a relatively simple flow that did not require a lot of custom code. We also touched on using variants, to test different prompts and their outcome.

In a follow-up post, we will take a look at turning this flow into a container. Stay tuned! 📺

Building an Azure AI Search index with a custom skill

In this post, we will take a look at building an Azure AI Search index with a custom skill. We will use the Azure AI Search Python SDK to do the following:

create a search index: a search index contains content to be searched
create a data source: a datasource tells an Azure AI Search indexer where to get input data
create a skillset: a skillset is a collection of skills that process the input data during the indexing process; you can use built-in skills but also build your own skills
create an indexer: the indexer creates a search index from input data in the data source; it can transform the data with skills

If you are more into videos, I already created a video about this topic. In the video, I use the REST API to define the resources above. In this post, I will use the Python SDK.

Azure AI Search with custom GPT-4 vision skill

What do we want to achieve?

We want to build an application that allows a user to search for images with text or a similar image like in the diagram below:

The application uses an Azure AI Search index to provide search results. An index is basically a collection of JSON documents that can be searched with various techniques.

The input data to create the index is just a bunch of .jpg files in Azure Blob Storage. The index will need fields to support the two different types of searches (text and image search):

a text description of the image: we will need to generate the description from the image; we will use GPT-4 Vision to do so; the description supports keyword-based searches
a text vector of the description: with text vectors, we can search for descriptions similar to the user’s query; it can provide better results than keyword-based searches alone
an image vector of the image: with image vectors, we can supply an image and search for similar images in the index

I described building this application in a previous blog post. In that post, we pushed the index content to the index. In this post, we create an indexer that pulls in the data, potentially on a schedule. Using an indexer is recommended.

Creating the index

If you have an Azure subscription, first create an Azure AI Search resource. The code we write requires at least the basic tier.

Although you can create the index in the portal, we will create it using the Python SDK. At the time of writing (December 2023), you have to use a preview version of the SDK to support integrated vectorization. The notebook we use contains instructions about installing this version. The notebook is here: https://github.com/gbaeke/vision/blob/main/image_index/indexer-sdk.ipynb

The notebook starts with the necessary imports and also loads environment variables via a .env file. See the README of the repo to learn about the required variables.

To create the index, we define a blog_index function that returns an index definition. Here’s the start of the function:

def blog_index(name: str):
    fields = [
        SearchField(name="path", type=SearchFieldDataType.String, key=True),
        SearchField(name="name", type=SearchFieldDataType.String),
        SearchField(name="url", type=SearchFieldDataType.String),
        SearchField(name="description", type=SearchFieldDataType.String),
        SimpleField(name="enriched", type=SearchFieldDataType.String, searchable=False),  
        SearchField(
            name="imageVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1024,
            vector_search_profile="myHnswProfile"
        ),
        SearchField(
            name="textVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1536,
            vector_search_profile="myHnswProfile"
        ),
    ]

Above, we define an array of fields for the index. We will have 7 fields. The first three fields will be retrieved from blob storage metadata:

path: base64-encoded url of the file; will be used as unique key
name: name of the file
url: full url of the file in Azure blob storage

The link between these fields and the metadata is defined in the indexer we will create later.

Next, we have the description field. We will generate the image description via GPT-4 Vision during indexing. The indexer will use a custom skill to do so.

The enriched field is there for debugging. It will show the enrichments by custom or built-in skills. You can remove that field if you wish.

To finish, we have vector fields. These fields are designed to hold arrays of a specific size:

imageVector: a vector field that can hold 1024 values; the image vector model we use outputs 1024 dimensions
textVector: a vector field that can hold 1536 values; the text vector model we use outputs that number of dimensions

Note that the vector fields references a search profile. We create that in the next block of code in the blog_index function:

vector_config = VectorSearch(  
        algorithms=[  
            HnswVectorSearchAlgorithmConfiguration(  
                name="myHnsw",  
                kind=VectorSearchAlgorithmKind.HNSW,  
                parameters=HnswParameters(  
                    m=4,  
                    ef_construction=400,  
                    ef_search=500,  
                    metric=VectorSearchAlgorithmMetric.COSINE,  
                ),  
            ),  
            ExhaustiveKnnVectorSearchAlgorithmConfiguration(  
                name="myExhaustiveKnn",  
                kind=VectorSearchAlgorithmKind.EXHAUSTIVE_KNN,  
                parameters=ExhaustiveKnnParameters(  
                    metric=VectorSearchAlgorithmMetric.COSINE,  
                ),  
            ),  
        ],  
        profiles=[  
            VectorSearchProfile(  
                name="myHnswProfile",  
                algorithm="myHnsw",  
                vectorizer="myOpenAI",  
            ),  
            VectorSearchProfile(  
                name="myExhaustiveKnnProfile",  
                algorithm="myExhaustiveKnn",  
                vectorizer="myOpenAI",  
            ),  
        ],  
        vectorizers=[  
            AzureOpenAIVectorizer(  
                name="myOpenAI",  
                kind="azureOpenAI",  
                azure_open_ai_parameters=AzureOpenAIParameters(  
                    resource_uri="AZURE_OPEN_AI_RESOURCE",  
                    deployment_id="EMBEDDING_MODEL_NAME",  
                    api_key=os.getenv('AZURE_OPENAI_KEY'),  
                ),  
            ),  
        ],  
    )

Above, vector_config is an instance of the VectorSearch object, which holds algorithms, profiles and vectorizers:

algorithms: Azure AI search supports both HNSW and exhaustive to search for nearest neighbors to an input vector; above, both algorithms are defined; they both use cosine similarity as the distance metric
vectorizers: this defines the integrated vectorizer and points to an Azure OpenAI resource and embedding model. You need to deploy that model in Azure OpenAI and give it a name; at the time of writing (December 2023), this feature was in public preview
profiles: a profile combines an algorithm and a vectorizer; we create two profiles, one for each algorithm; the vector fields use the myHnswProfile profile.

Note: using HNSW on a vector field, designed to perform approximate nearest neighbor searches, still allows you to do an exhaustive search; the notebook contains sample searches at the bottom, which use exhaustive searches to search the entire vector space; note that the reverse is not possible (using HNSW when index on field is set as exhaustive).

We finish the function with the code below:

    semantic_config = SemanticConfiguration(  
        name="my-semantic-config",  
        prioritized_fields=PrioritizedFields(  
            prioritized_content_fields=[SemanticField(field_name="description")]  
        ),  
    )

    semantic_settings = SemanticSettings(configurations=[semantic_config])

    return SearchIndex(name=name, fields=fields, vector_search=vector_config, semantic_settings=semantic_settings)

Above, we specify a semantic_config. It is used to inform the semantic reranker abiut the fields in our index with valuable data. Here, we use the description field. The config is used to create an instance of type Semantic_Settings. You also have to enable the semantic reranker in Azure AI Search to enable this feature.

The function ends by returning an instance of type SearchIndex, which contains the fields array, the vector configuration and the semantic configuration.

Now we can use the output of this function to create the index:

service_endpoint = "https://acs-geba.search.windows.net"
index_name = "images-sdk"
key = os.getenv("AZURE_AI_SEARCH_KEY")


index_client = SearchIndexClient(service_endpoint, AzureKeyCredential(key))
search_client = SearchClient(service_endpoint, index_name, AzureKeyCredential(key))
index = blog_index(index_name)

# create the index
try:
    index_client.create_or_update_index(index)
    print("Index created or updated successfully")
except Exception as e:
    print("Index creation error", e)

The important part here is the creation of a SearchIndexClient that authenticates to our Azure AI Search resource. We use that client to create_or_update our index. That function requires a SearchIndex parameter, provided by the blog_index function.

When that call succeeds, you should see the index in the portal. Text and vector fields are searchable.

The vector profiles should be present:

Click on an algorithm or vectorizer. It should match the definition in our code.

Now we can define the data source, skillset and indexer.

Data source

Our images are stored in Azure Blob Storage. The data source needs to point to that resource and specify a container. We can use the following code:

# Create a data source 
ds_client = SearchIndexerClient(service_endpoint, AzureKeyCredential(key))
container = SearchIndexerDataContainer(name="images")
data_source_connection = SearchIndexerDataSourceConnection(
    name=f"{index_name}-blob",
    type="azureblob",
    connection_string=os.getenv("STORAGE_CONNNECTION_STRING"),
    container=container
)
data_source = ds_client.create_or_update_data_source_connection(data_source_connection)

print(f"Data source '{data_source.name}' created or updated")

The code is pretty self-explanatory. The data source is shown in the portal as below:

Skillset with two skills

Before we create the indexer, we define a skillset with two skills:

AzureOpenAIEmbeddingSkill: a built-in skill that uses an Azure OpenAI embedding model and takes text as input; it returns a vector (embedding) of 1536 dimensions; this skill is not free; you will be billed for the vectors you create via your Azure OpenAI resource
WebApiSkill: a custom skill that points to an endpoint that you need to build and host; you define the inputs and outputs of the custom skill; my custom skill runs in Azure Container Apps but it can run anywhere. Often, skills are implemented as an Azure Function.

The code starts as follows:

skillset_name = f"{index_name}-skillset"

embedding_skill = AzureOpenAIEmbeddingSkill(  
    description="Skill to generate embeddings via Azure OpenAI",  
    context="/document",  
    resource_uri="https://OPEN_AI_RESOURCE.openai.azure.com",  
    deployment_id="DEPLOYMENT_NAME_OF_EMBEDDING MODEL",  
    api_key=os.getenv('AZURE_OPENAI_KEY'),  
    inputs=[  
        InputFieldMappingEntry(name="text", source="/document/description"),  
    ],  
    outputs=[  
        OutputFieldMappingEntry(name="embedding", target_name="textVector")  
    ],  
)

Above, we define the skillset and the embedding_skill. The AzureOpenAIEmbeddingSkill points to a deployed text-embedding-ada-002 embedding model. Use the name of your deployment, not the model name.

A skillset operates within a context. The context above is the entire document (/document) but that’s not necessarily the case for other skills. The input to the embedding skill is our description field (/document/description). The output will be a vector. The target_name above is some sort of a temporary name used during the so-called enrichment process of the indexer. We will need to configure the indexer to write this field to the index.

The question is: “Where does the description come from?”. The description comes from the WebApiSkill. Because the embedding skill needs the description field generated by the WebApiSkill, the WebApiSkill will run first. Here is the custom web api skill:

custom_skill = WebApiSkill(
    description="A custom skill that creates an image vector and description",
    uri="YOUR_ENDPOINT",
    http_method="POST",
    timeout="PT60S",
    batch_size=4,
    degree_of_parallelism=4,
    context="/document",
    inputs=[
        InputFieldMappingEntry(name="url", source="/document/url"),
    ],
    outputs=[
        OutputFieldMappingEntry(name="embedding", target_name="imageVector"),
        OutputFieldMappingEntry(name="description", target_name="description"),
    ],
)

The input to the custom skill is the url to our image. That url is posted to the endpoint you define in the uri field. You can control how many inputs are sent in one batch and how many batches are sent concurrently. The inputs have to be sent in a specific format.

This skill also operates at the document level and creates two new fields. The contents of those fields are generated by your custom endpoint and returned as embedding and description. They are mapped to imageVector and description. Again, those fields are temporary and need to be written to the index by the indexer.

To see the code of the custom skill, check https://github.com/gbaeke/vision/tree/main/img_vector_skill. That skill is written for demo purposes and was not thoroughly vetted to be used in production. Use at your own risk. In addition, GPT-4 Vision requires an OpenAI key (not Azure OpenAI) and currently (December 2023) allows 100 calls per day! You currently cannot use this at scale. Azure also provides image captioning models that might fit the purpose.

Now we can create the skillset:

skillset = SearchIndexerSkillset(  
    name=skillset_name,  
    description="Skillset to generate embeddings",  
    skills=[embedding_skill, custom_skill],  
)

client = SearchIndexerClient(service_endpoint, AzureKeyCredential(key))
client.create_or_update_skillset(skillset)
print(f"Skillset '{skillset.name}' created or updated")

The above code results in the following:

Indexer

The indexer is the final piece of the puzzle and brings the data source, index and skillset together:

indexer_name = f"{index_name}-indexer"

indexer = SearchIndexer(  
    name=indexer_name,  
    description="Indexer to index documents and generate description and embeddings",  
    skillset_name=skillset_name,  
    target_index_name=index_name,
    parameters=IndexingParameters(
        max_failed_items=-1
    ),
    data_source_name=data_source.name,  
    # Map the metadata_storage_name field to the title field in the index to display the PDF title in the search results  
    field_mappings=[
        FieldMapping(source_field_name="metadata_storage_path", target_field_name="path", 
            mapping_function=FieldMappingFunction(name="base64Encode")),
        FieldMapping(source_field_name="metadata_storage_name", target_field_name="name"),
        FieldMapping(source_field_name="metadata_storage_path", target_field_name="url"),
    ],
    output_field_mappings=[
        FieldMapping(source_field_name="/document/textVector", target_field_name="textVector"),
        FieldMapping(source_field_name="/document/imageVector", target_field_name="imageVector"),
        FieldMapping(source_field_name="/document/description", target_field_name="description"),
    ],
)

indexer_client = SearchIndexerClient(service_endpoint, AzureKeyCredential(key))  
indexer_result = indexer_client.create_or_update_indexer(indexer)

Above, we create an instance of type SearchIndexer and set the indexer’s name, the data source name, the skillset name and the target index.

The most important parts are the field mappings and the output field mappings.

Field mappings take data from the indexer’s data source and map them to a field in the index. In our case, that’s content and metadata from Azure Blob Storage. The metadata fields in the code above are described in the documentation. In a field mapping, you can configure a mapping function. We use the base64Encode mapping function for the path field.

Output field mappings take new fields created during the enrichment process and map them to fields in the index. You can see that the fields created by the skills are mapped to fields in the index. Without these mappings, the skillsets would generate the data internally but the data would never appear in the index.

Once the indexer is defined, it gets created (or updated) using an instance of type SearchIndexerClient.

Note that we set a parameter in the index, max_failed_items, to -1. This means that the indexer process keeps going, no matter how many errors it produces. In the indexer screen below, you can see there was one error:

The error happened because the image vectorizer in the custom web skill threw an error on one of the images.

Using an indexer has several advantages:

Indexing is a background process and can run on a schedule; there is no need to schedule your own indexing process
Indexers keep track of what they indexed and can index only new data; with your own code, you have to maintain that state; failed documents like above are not reprocessed
Depending on the source, indexers see deletions and will remove entries from the index
Indexers can be easily reset to trigger a full index
Indexing errors are reported and errors can be sent to a debugger to inspect what went wrong

Testing the index

We can test the index by performing a text-based search that uses the integrated vectorizer:

# Pure Vector Search
query = "city"  
  
search_client = SearchClient(service_endpoint, index_name, credential=AzureKeyCredential(key))
vector_query = VectorizableTextQuery(text=query, k=1, fields="textVector", exhaustive=True)

  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["name", "description", "url"],
    top=1
)  

# print selected fields from dictionary
for result in results:
    print(result["name"])
    print(result["description"])
    print(result["url"])
    print("")

Above, we search for city (in the query variable). The VectorizableTextQuery class (in preview) takes the plain text in the query variable and vectorizes it for us with the embedding model defined in the integrated vectorizer. In addition, we specify how many results to return (1 nearest neighbors) and that we want to search all vectors (exhaustive).

Note: remember that the vector field was configured for HNSW; we can switch to exhaustive as shown above

Next, search_client.search performs the actual search. It only provides the vector query, which results in a pure similarity search with the query vector. search_text is set to None. Set search string to the query if you want to do a hybrid search. The notebook contains additional examples that also does a keyword and semantic search with highlighting.

The search gives the following result (selected fields: name, description, url):

city.jpg
This is an image of the London skyline, featuring a mix of modern skyscrapers and historical buildings. Prominent among the skyscrapers are the Leadenhall Building, also known as the "Cheesegrater," and the rounded, distinctive shape of 30 St Mary Axe, commonly referred to as "The Gherkin." Further in the background, the towers of Canary Wharf can be seen. The view is clear and taken on a day with excellent visibility.
https://stgebaoai883.blob.core.windows.net/images/city.jpg

The image the URL points to is:

In the repo’s search-client folder, you can find a Streamlit app to search for and display images and dump the entire search result object. Make sure you install all the packages in requirements.txt and the preview Azure AI Search package from the whl folder. Simply type streamlit run app.py to run the app:

Conclusion

In this post, we demonstrated the use of the Azure AI Search Python SDK to create an indexer that takes images as input, create new fields with skills, and write those fields + metadata to an index.

We touched on the advantages of using an indexer versus your own indexing code (pull versus push).

With this code and some sample images, you should be able to build an image search application yourself.

Finding images with text and image queries with the help of GPT-4 Vision

With the gpt-4-vision-preview model available at OpenAI, it was time to build something with it. I decided to use it as part of a quick solution that can search for images with text, or by providing a similar image.

We will do the following:

Describe a collection of images. To generate the description, GPT-4 Vision is used
Create a text embedding of the description with the text-embedding-ada-002 model
Create an image embedding using the vectorizeImage API, part of Azure AI Computer Vision
Save the description and both embeddings to an Azure AI Search index
Search for images with either text or a similar image

The end result should be that when I search for desert plant, I get an image of a cactus or similar plant. When I provide a picture of an apple, I should get an apple or other fruit as a result. It’s basically Google image and reverse image search.

Let’s see how it works and if it is easy to do. The code is here: https://github.com/gbaeke/vision. The main code is in a Jupyter notebook in the image_index folder.

A word on vectors and vectorization

When we want to search for images using text or find similar images, we use a technique that involves turning both text and images into a form that a computer can understand easily. This is done by creating vectors. Think of vectors as a list of numbers that describe the important parts of a text or an image.

For text, we use a tool called ‘text-embedding-ada-002’ which changes the words and sentences into a vector. This vector is like a unique fingerprint of the text. For images, we use something like Azure’s multi-modal embedding API, which does the same thing but for pictures. It turns the image into a vector that represents what’s in the picture.

After we have these vectors, we store them in a place where they can be searched. We will use Azure AI Search. When you search, the system looks for the closest matching vectors – it’s like finding the most similar fingerprint, whether it’s from text or an image. This helps the computer to give you the right image when you search with words or find images that are similar to the one you have.

Getting a description from an image

Although Azure has Computer Vision APIs to describe images, GPT-4 with vision can do the same thing. It is more flexible and easier to use because you have the ability to ask for what you want with a prompt.

To provide an image to the model, you can provide a URL or the base64 encoding of an image file. The code below uses the latter approach:

def describe_image(image_file: str) -> str:
    with open(f'{image_file}', 'rb') as f:
        image_base64 = base64.b64encode(f.read()).decode('utf-8')
        print(image_base64[:100] + '...')

    print(f"Describing {image_file}...")

    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe the image in detail"},
                {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image_base64}",
                },
                },
            ],
            }
        ],
        max_tokens=500,  # default max tokens is low so set higher
    )

    return response.choices[0].message.content

As usual, the OpenAI API is very easy to use. Above, we open and read the image and base64-encode it. The base64 encoded file is provided in the url field. A simple prompt is all you need to get the description. Let’s look at the result for the picture below:

The generated description is below:

The image displays a single, healthy-looking cactus planted in a terracotta-colored pot against a pale pink background. The cactus is elongated, predominantly green with some modest blue hues, and has evenly spaced spines covering its surface. The spines are white or light yellow, quite long, and arranged in rows along the cactus’s ridges. The pot has a classic, cylindrical shape with a slight lip at the top and appears to be a typical pot for houseplants. The overall scene is minimalistic, with a focus on the cactus and the pot due to the plain background, which provides a soft contrast to the vibrant colors of the plant and its container.
Description generated by GPT-4 Vision

Embedding of the image

To create an embedding for the image, I decided to use Azure’s multi-modal embedding API. Take a look at the code below:

def get_image_vector(image_path: str) -> list:
    # Define the URL, headers, and data
    url = "https://AI_ACCOUNT.cognitiveservices.azure.com//computervision/retrieval:vectorizeImage?api-version=2023-02-01-preview&modelVersion=latest"
    headers = {
        "Content-Type": "application/octet-stream",
        "Ocp-Apim-Subscription-Key": os.getenv("AZURE_AI_KEY")
    }

    with open(image_path, 'rb') as image_file:
        # Read the contents of the image file
        image_data = image_file.read()

    print(f"Getting vector for {image_path}...")

    # Send a POST request
    response = requests.post(url, headers=headers, data=image_data)

    # return the vector
    return response.json().get('vector')

The code uses an environment variable to get the key to an Azure AI Services multi-service endpoint. Check the README.md in the repository for a sample .env file.

The API generates a vector with 1024 dimensions. We will need that number when we create the Azure AI Search index.

Note that this API can accept a url or the raw image data (not base64-encoded). Above, we provide the raw image data and set the Content-Type properly.

Generating the data to index

In the next step, we will get all .jpg files from a folder and do the following:

create the description
create the image vector
create the text vector of the description

Check the code below for the details:

# get all *.jpg files in the images folder
image_files = [file for file in os.listdir('./images') if file.endswith('.jpg')]

# describe each image and store filename and description in a list of dicts
descriptions = []
for image_file in image_files:
    try:
        description = describe_image(f"./images/{image_file}")
        image_vector = get_image_vector(f"./images/{image_file}")
        text_vector = get_text_vector(description)
        
        descriptions.append({
            'id': image_file.split('.')[0], # remove file extension
            'fileName': image_file,
            'imageDescription': description,
            'imageVector': image_vector,
            'textVector': text_vector
        })
    except Exception as e:
        print(f"Error describing {image_file}: {e}")

# print the descriptions but only show first 5 numbers in vector
for description in descriptions:
    print(f"{description['fileName']}: {description['imageDescription'][:50]}... {description['imageVector'][:5]}... {description['textVector'][:5]}...")

The important part is the descriptions list, which is a list of JSON objects with fields that match the fields in the Azure AI Search index we will build in the next step.

The text vector is calculated with the get_text_vector function. It uses OpenAI’s text-embedding-ada-002 model.

Building the index

The code below uses the Azure AI Search Python SDK to build and populate the index in code. You.will need an AZURE_AI_SEARCH_KEY environment variable to authenticate to your Azure AI Search instance.

def blog_index(name: str):
    from azure.search.documents.indexes.models import (
        SearchIndex,
        SearchField,
        SearchFieldDataType,
        SimpleField,
        SearchableField,
        VectorSearch,
        VectorSearchProfile,
        HnswAlgorithmConfiguration,
    )

    fields = [
        SimpleField(name="Id", type=SearchFieldDataType.String, key=True), # key
        SearchableField(name="fileName", type=SearchFieldDataType.String),
        SearchableField(name="imageDescription", type=SearchFieldDataType.String),
        SearchField(
            name="imageVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1024,
            vector_search_profile_name="vector_config"
        ),
        SearchField(
            name="textVector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1536,
            vector_search_profile_name="vector_config"
        ),

    ]

    vector_search = VectorSearch(
        profiles=[VectorSearchProfile(name="vector_config", algorithm_configuration_name="algo_config")],
        algorithms=[HnswAlgorithmConfiguration(name="algo_config")],
    )
    return SearchIndex(name=name, fields=fields, vector_search=vector_search)

#  create the index
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.models import VectorizedQuery

service_endpoint = "https://YOUR_SEARCH_INSTANCE.search.windows.net"
index_name = "image-index"
key = os.getenv("AZURE_AI_SEARCH_KEY")

index_client = SearchIndexClient(service_endpoint, AzureKeyCredential(key))
index = blog_index(index_name)

# create the index
try:
    index_client.create_index(index)
    print("Index created")
except Exception as e:
    print("Index probably already exists", e)

The code above creates an index with some string fields and two vector fields:

imageVector: 1024 dimensions (as defined by the Azure AI Computer Vision image embedder)
textVector: 1536 dimensions (as defined by the OpenAI embedding model)

Although not specified in the code, the index will use cosine similarity to perform similarity searches. It’s the default. It will return approximate nearest neighbour (ANN) results unless you create a search client that uses exhaustive search. An exhaustive search searches the entire vector space. The queries near the end of this post use the exhaustive setting.

When the index is created, we can upload documents:

# now upload the documents
try:
    search_client = SearchClient(service_endpoint, index_name, AzureKeyCredential(key))
    search_client.upload_documents(descriptions)
    print("Documents uploaded successfully")
except Exception as e:
    print("Error uploading documents", e)

The upload_documents method uploads the documents in the descriptions Python list to the search index. The upload is actually an upsert. You can run this code multiple times without creating duplicate documents in the index.

Search images with text

To search an image with a text description, a vector query on the textVector is used. The function below takes a text query string as input, vectorizes the query, and performs a similarity search returning the first nearest neighbour. The function displays the description and the image in the notebook:

# now search based on text
def single_vector_search(query: str):
    vector_query = VectorizedQuery(vector=get_text_vector(query), k_nearest_neighbors=1, fields="textVector", exhaustive=True)

    results = search_client.search(
        vector_queries=[vector_query],
        select=["fileName", "imageDescription"],
        
    )

    for result in results:
        print(result['fileName'], result["imageDescription"], sep=": ")

        # show the image
        from IPython.display import Image
        display(Image(f"./images/{result['fileName']}"))
    
single_vector_search("desert plant")

The code searches for an image based on the query desert plant. It returns the picture of the cactus shown earlier. Note that if you search for something there is no image for, like blue car, you will still get a result because we always return a nearest neighbor. Even if your nearest neighbor lives 100km away, it’s still your nearest neighbor. 😀

Return similar images

Since our index contains an image vector, we can search for images similar to a vector of a reference image. The function below takes an image file path as input, calculates the vector for that image, and performs a nearest neighbor search. The function displays the description and image of each document returned. In this case, the code returns two similar documents:

def image_search(image_file: str):

    vector_query = VectorizedQuery(vector=get_image_vector(image_file), k_nearest_neighbors=2, fields="imageVector", exhaustive=True)

    results = search_client.search(
        vector_queries=[vector_query],
        select=["fileName", "imageDescription"],
    )

    for result in results:
        print(result['fileName'], result["imageDescription"], sep=": ")

        # show the image
        from IPython.display import Image
        display(Image(f"./images/{result['fileName']}"))
 
# get vector of another image and find closest match
image_search('rotten-apple.jpg')
image_search('flower.jpeg')

At the bottom, the function is called with filenames of pictures that contain a rotten apple and a flower. The result of the first query is a picture of the apple and banana. The result of the second query is the cactus and the rose. You can debate whether the cactus should be in the results. Some cacti have flowers but some don’t. 😀

Conclusion

The GPT-4 Vision API, like most OpenAI APIs, is very easy to use. In this post, we used it to generate image descriptions to build a simple query engine that can search for images via text or a reference image. Together with their text embedding API and Microsoft’s multi-modal embedding API to create an image embedding, it is relatively straightforward to build these type of systems.

As usual, this is a tutorial with quick sample code to illustrate the basic principle. If you need help building these systems in production, simply reach out. Help is around the corner! 😉

Using Integrated Vectorization in Azure AI Search

The vector search capability of Azure AI Search became generally available mid November 2023. With that release, the developer is responsible for creating embeddings and storing them in a vector field in the index.

However, Microsoft also released integrated vectorization in preview. Integrated vectorization is useful in two ways:

You can define a vectorizer in the index schema. It can be used to automatically convert a query to a vector. This is useful in the Search Explorer in the portal but can also be used programmatically.
You can use an embedding skill for your indexer that automatically vectorizes index fields for you.

First, let’s look at defining a vectorizer in the index definition and using it in the portal for search.

Vector search in the portal

Below is a screenshot of an index with a title and a titleVector field. The index stores information about movies:

The integrated vectorizer is defined in the Vector profiles section:

When you add the profile, you configure the algorithm and vectorizer. The vectorizer simply points to an embedding model in Azure OpenAI. For example:

Note: it’s recommended to use managed identity

Now, from JSON View in Search Explorer, you can perform a vector search. If you see a search field at the top, you can remove that. It’s for full-text search.

Above, the query commencement is converted to a vector by the integrated vectorizer. The vector search comes up with Inception as the first match. I am not sure if you would want to search for movies this way but it proves the point. 😛

Using an embedding skill during indexing

Suppose you have several JSON documents about movies. Below is one example:

{
    "title": "Inception",
    "year": 2010,
    "director": "Christopher Nolan",
    "genre": ["Action", "Adventure", "Sci-Fi"],
    "starring": ["Leonardo DiCaprio", "Joseph Gordon-Levitt", "Ellen Page"],
    "imdb_rating": 8.8
  }

When you have a bunch of these files in Azure Blob Storage, you can use the Import Data wizard to construct an index from these files.

This wizard, at the time of writing, does not create vectors for you. There is another wizard, Import and vectorize data, but it will treat the JSON as any document and store it in a content field. A vector is created from the content field.

We will stick to the first wizard. It will do several things:

create a data source to access the JSON documents in an Azure Storage Account container
infer the schema from the JSON files
propose an index definition that you can alter
create an indexer that indexes the documents on the schedule that you set
add skills like entity extraction; select a simple skill here like translation so you are sure there will be a skillset that the indexer will use

In the step to customize the index definition, ensure you make fields searchable and retrievable as needed. In addition, define a vector field. In my case, I created a titleVector field:

titleVector

When the wizard is finished, the indexer will run and populate the index. Of course, the titleVector field will be empty because there is no process in place that calculates the vectors during indexing.

Let’s fix that. In Skillsets, go the the skillset created by the wizard and click it.

Replace the Skillset JSON definition with the content below and change resourceUri, apiKey and deploymentId as needed. You can also add the embedding skill to the existing array of skills if you want to keep them.

{
  "@odata.context": "https://acs-geba.search.windows.net/$metadata#skillsets/$entity",
  "@odata.etag": "\"0x8DBF01523E9A94D\"",
  "name": "azureblob-skillset",
  "description": "Skillset created from the portal. skillsetName: azureblob-skillset; contentField: title; enrichmentGranularity: document; knowledgeStoreStorageAccount: ;",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "embed",
      "description": null,
      "context": "/document",
      "resourceUri": "https://OPENAI_INSTANCE.openai.azure.com",
      "apiKey": "AZURE_OPENAI_KEY",
      "deploymentId": "EMBEDDING_MODEL",
      "inputs": [
        {
          "name": "text",
          "source": "/document/title"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "titleVector"
        }
      ],
      "authIdentity": null
    }
  ],
  "cognitiveServices": null,
  "knowledgeStore": null,
  "indexProjections": null,
  "encryptionKey": null
}

Above, we want to embed the title field in our document and create a vector for it. The context is set to /document which means that this skill is executed for each document once.

Now save the skillset. This skill on its own will create the vectors but will not save them in the index. You need to update the indexer to write the vector to a field.

Let’s navigate to the indexer:

Click the indexer and go to the Indexer Definition (JSON) tab. Ensure you have an outputFieldMappings section like below:

{
  "@odata.context": "https://acs-geba.search.windows.net/$metadata#indexers/$entity",
  "@odata.etag": "\"0x8DBF01561D9E97F\"",
  "name": "movies-indexer",
  "description": "",
  "dataSourceName": "movies",
  "skillsetName": "azureblob-skillset",
  "targetIndexName": "movies-index",
  "disabled": null,
  "schedule": null,
  "parameters": {
    "batchSize": null,
    "maxFailedItems": 0,
    "maxFailedItemsPerBatch": 0,
    "base64EncodeKeys": null,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "parsingMode": "json"
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "metadata_storage_path",
      "mappingFunction": {
        "name": "base64Encode",
        "parameters": null
      }
    }
  ],
  "outputFieldMappings": [
    {
      "sourceFieldName": "/document/titleVector",
      "targetFieldName": "titleVector"
    }
  ],
  "cache": null,
  "encryptionKey": null
}

Above, we map the titleVector enrichment (think of it as something temporary during indexing) to the real titleVector field in the index.

Reset and run the indexer

Reset the indexer so it will index all documents again:

Next, click the Run button to start the indexing process. When it finishes, do a search with Search Explorer and check that there are vectors in the titleVector field. It’s an array of 1536 floating point numbers.

Conclusion

Integrated vectorization is a welcome extra feature in Azure AI Search. Using it in searches is very easy, especially in the portal.

Using the embedding skill is a bit harder, because you need to work with skillset and indexer definitions in JSON and you have to know exactly what you have to add. But once you get it right, the indexer does all the vectorization work for you.

Creating a custom GPT to query any knowledge base with actions

A while ago, OpenAI introduced GPTs. A GPT is a custom version of ChatGPT that combine instructions, extra knowledge, and any combination of skills.

In this tutorial, we are going to create a custom GPT that can answer questions about articles on this blog. In order to achieve that, we will do the following:

create an Azure AI Search index
populate the index with content of the last 50 blog posts (via its RSS feed)
create a custom API with FastAPI (Python) that uses the Azure OpenAI “add your data” APIs to provide relevant content to the user’s query
add the custom API as an action to the custom GPT

The image below shows the properties of the GPT. You need to be a ChatGPT Plus subscriber to create a GPT.

To implement a custom action for the GPT, you need an API with an OpenAPI spec. When you use FastAPI, an OpenAPI JSON document can easily be downloaded and provided to the GPT. You will need to modify the JSON document with a servers section to specify the URL the GPT has to use.

In what follows, we will look at all of the different pieces that make this work. Beware: long post! 😀

Azure AI Search Index

Azure AI Search is a search service you create in Azure. Although there is a free tier, I used the basic tier. The basic tiers allows you to use its semantic reranker to optimise search results.

To create the index and populate it with content, I used the following notebook: https://github.com/gbaeke/custom-gpt/blob/main/blog-index/website-index.ipynb.

The result is an index like below:

The index contains 292 documents although I only retrieve the last 50 blog posts. This is the result of chunking each post into smaller pieces of about 500 tokens with 100 tokens of overlap for each chunk. We use smaller chunks because we do not want to send entire blog posts as content to the large language model (LLM).

Note that the index supports similarity searches using vectors. The contentVector field contains the OpenAI embedding of the text in the content field.

Although vectors are available, we do not have to use vector search. Azure AI search supports simple keyword search as well. Together with the semantic ranker, it can provide more relevant results than keyword search on its own.

Note: in general, vector search will provide better results, especially when combined with keyword search and the semantic ranker

Use the index with Azure OpenAI “add your data”

I have written about the Azure OpenAI “add your data” features before. It provides a wizard experience to add an Azure AI Search index to the Azure OpenAI playground and directly test your index with the model of your choice.

From you Azure OpenAI instance, first open Azure OpenAI Studio:

Go to OpenAI Studio from the Overview page of your Azure OpenAI instance

Note: you still need to complete a form to get access to Azure OpenAI. Currently, it can take around a day before you are allowed to create Azure OpenAI instances in your subscription.

In Azure OpenAI Studio, click Bring your own data from the Home screen:

Select the Azure AI Search index and click Next.

Note: I created the index using the generally available API that supports vector search. The Add your data wizard, at the time of writing, was not updated yet to support these new indexes. That is the reason why vector search cannot be enabled. We will use keyword + semantic search instead. I expect this functionality to be available soon (November/December 2023).

Next, provide field mappings:

These mappings are required because the Add your data feature excepts these standard fields. You should have at least a content field to search. Above, I do not have a file name field because I have indexed blog posts. It’s ok to leave that field blank.

After clicking Next, we get to data management:

Here, we specify the type of search. Semantic means keyword + semantic. In the dropdown list, you can also select keyword search on its own. However, that might give you less relevant results.

Note: for Semantic to work, you need to turn on the Semantic ranker on the Azure AI Search resource. Additionally, you need to create a semantic profile on the index.

Now you can click Next, followed by Save and close. The Azure OpenAI Chat Playground appears with the index added:

You can now start chatting with your data. Select a chat model like gpt-4 or gpt-35-turbo. In Azure OpenAI, you have to deploy these models first and give the deployment a name.

Above, I asked about the OpenAI Assistants API, which is one of the posts on my blog. In the background, the playground performs a search on the Azure AI Search index and provides the results as context to the model. The gpt-35-turbo model answers the user’s question, based on the context coming from the index.

When you are happy with the result, you can export this experience to an Azure Web App of CoPilot Studio (Power Virtual Agents):

In our case, we want to use this configuration from code and provide an API we can add to the custom GPT.

⚠️ It’s import to realise that, with this approach, we will send the final answer, generated by an Azure OpenAI model, to the custom GPT. An alternate approach would be to hand the results of the Azure AI Search query to the custom GPT and let it formulate the answer on its own. That would be faster and less costly. If you also provide the blog post’s URL, ChatGPT can refer to it. However, the focus here is on using any API with a custom GPT so let’s continue with the API that uses the “add your data” APIs.

If you want to hand over Azure AI search results directly to ChatGPT, check out the code in the azure-ai-search folder in the Github repo.

Creating the API

To create an API that uses the index with the model, as configured in the playground, we can use some code. In fact, the playground provides sample code to work with:

‼️ Sadly, this code will not work due to changes to the openai Python package. However, the principle is still the same:

call the chat completion extension API which is specific to Azure; in the code you will see this is as a Python f-string: f"{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}"
the JSON payload for this API needs to include the Azure AI Search configuration in a dataSources array.

The extension API will query Azure AI Search for you and create the prompt for the chat completion with context from the search result.

To create a FastAPI API that does this for the custom GPT, I decided to not use the openai package and simply use the REST API. Here is the code:

from fastapi import FastAPI, HTTPException, Depends, Header
from pydantic import BaseModel
import httpx, os
import dotenv
import re

# Load environment variables
dotenv.load_dotenv()

# Initialize FastAPI app
app = FastAPI()

# Constants (replace with your actual values)
api_base = "https://oa-geba-france.openai.azure.com/"
api_key = os.getenv("OPENAI_API_KEY")
deployment_id = "gpt-35-turbo"
search_endpoint = "https://acs-geba.search.windows.net"
search_key = os.getenv("SEARCH_KEY")
search_index = "blog"
api_version = "2023-08-01-preview"

# Pydantic model for request body
class RequestBody(BaseModel):
    query: str

# Define the API key dependency
def get_api_key(api_key: str = Header(None)):
    if api_key is None or api_key != os.getenv("API_KEY"):
        raise HTTPException(status_code=401, detail="Invalid API Key")
    return api_key

# Endpoint to generate response
@app.post("/generate_response", dependencies=[Depends(get_api_key)])
async def generate_response(request_body: RequestBody):
    url = f"{api_base}openai/deployments/{deployment_id}/extensions/chat/completions?api-version={api_version}"
    headers = {
        "Content-Type": "application/json",
        "api-key": api_key
    }
    data = {
        "dataSources": [
            {
                "type": "AzureCognitiveSearch",
                "parameters": {
                    "endpoint": search_endpoint,
                    "key": search_key,
                    "indexName": search_index
                }
            }
        ],
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant"
            },
            {
                "role": "user",
                "content": request_body.query
            }
        ]
    }

    async with httpx.AsyncClient() as client:
        response = await client.post(url, json=data, headers=headers, timeout=60)

    if response.status_code != 200:
        raise HTTPException(status_code=response.status_code, detail=response.text)

    response_json = response.json()

    # get the assistant response
    assistant_content = response_json['choices'][0]['message']['content']
    assistant_content = re.sub(r'\[doc.\]', '', assistant_content)
    
    # return assistant_content as json
    return {
        "response": assistant_content
    }

# Run the server
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000, timeout_keep_alive=60)

This API has one endpoint: /generate_response that takes { "query": "your query" }as input and returns { "response": assistant_content }as output. Note that the original response from the model contains references like [doc1], [doc2], etc… The regex in the code removes those references. I don not particularly like how the references are handled by the API so I decided to not include them and simplify the response.

The endpoint expects an api-key header. It it is not present, it returns an error.

The endpoint does a call to the Azure OpenAI chat completion extension API which looks very similar to a regular OpenAI chat completion. The request does however, contain a dataSources field with the Azure AI Search information.

The environment variables like the OPENAI_API_KEY and the SEARCH_KEY are retrieved from a .env file.

Note: to stress this again, this API returns the answer to the query as generated by the chosen Azure OpenAI model. This allows it to be used in any application, not just a custom GPT. For a custom GPT in ChatGPT, an alternate approach would be to hand over the search results from Azure AI search directly, allowing the model in the custom GPT to generate the response. It would be faster and avoid Azure OpenAI costs. We are effectively using the custom GPT as a UI and as a way to maintain history between action calls. 😀

If you want to see the code in GitHub, check this URL: https://github.com/gbaeke/custom-gpt.

Running the API in Azure Container Apps

To run the API in the cloud, I decided to use Azure Container Apps. That means we need a Dockerfile to build the container image locally or in the cloud:

# Use an official Python runtime as a parent image
FROM python:3.9-slim-buster

# Set the working directory in the container to /app
WORKDIR /app

# Add the current directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Run app.py when the container launches
CMD ["python3", "app.py"]

We also need a requirements.txt file:

fastapi==0.104.1
pydantic==2.5.2
pydantic_core==2.14.5
httpx==0.25.2
python-dotenv==1.0.0
uvicorn==0.24.0.post1

I use the following shell script to build and run the container locally. The script can also push the container to Azure Container Apps.

#!/bin/bash

# Load environment variables from .env file
export $(grep -v '^#' .env | xargs)

# Check the command line argument
if [ "$1" == "build" ]; then
    # Build the Docker image
    docker build -t myblog .
elif [ "$1" == "run" ]; then
    # Run the Docker container, mapping port 8000 to 8000 and setting environment variables
    docker run -p 8000:8000 -e OPENAI_API_KEY=$OPENAI_API_KEY -e SEARCH_KEY=$SEARCH_KEY -e API_KEY=$API_KEY myblog
elif [ "$1" == "up" ]; then
    az containerapp up -n myblog --ingress external --target-port 8000 \
        --env-vars OPENAI_API_KEY=$OPENAI_API_KEY SEARCH_KEY=$SEARCH_KEY API_KEY=$API_KEY \
        --source .
else
    echo "Usage: $0 {build|run|up}"
fi

The shell script extracts the environment variables defined in .env and sets them in the session. Next, we check the first parameter given to the script (Docker is required on your machine for build and run):

build: build the Docker image
run: run the Docker image locally on port 8000 and specify the environment variables to authenticate to Azure OpenAI and Azure AI Search
up: build the Docker image in the cloud and run it in Container Apps; if you do not have a Container Apps Environment or Azure Container Registry, they will be created for you. In the end, you will get an https endpoint to your API in the cloud.

Note: you should not put secrets in environment variables in Azure Container Apps directly; use Container Apps secrets or Key Vault instead; the above is just quick and easy to simplify the deployment

To test the API locally, use the REST Client extension in VS Code with an .http file:

POST http://localhost:8000/generate_response HTTP/1.1
Host: localhost:8000
Content-Type: application/json
api-key: API_KEY_FROM_DOTENV

{
  "query": "what is the openai assistants api?"
}

###

POST https://AZURE_CONTAINER_APPS_ENDPOINT/generate_response HTTP/1.1
Host: AZURE_CONTAINER_APPS_ENDPOINT
Content-Type: application/json
api-key: API_KEY_FROM_DOTENV

{
  "query": "Can I use Redis as a vector db?"
}

When you get something like below, you are good to go. Note again that we return a final answer and not the relevant chunks from Azure AI search.

Getting the OpenAPI spec and adding it to the GPT

With your API running, you can go to its URL, like this one if the API runs locally: http://localhost:8000/openapi.json. The result is a JSON document you can copy to your GPT. I recommend to copy the JSON to VS Code and format it before you paste it in the GPT.

In the GPT, modify the OpenAPI spec with a servers section that includes your Azure Container Apps ingress URL:

Adding the URL to the GPT Action definition

If you want to give the ability to the user to trust the action to be called without approval (after a first call), also add the following:

Allowing the user to say Always Allow when action is used the first time

Take a look at the video below that shows how to create the GPT, including the configuration of the action and testing it.

Conclusion

Custom GPTs in ChatGPT open up a world of possibilities to offer personalised ChatGPT experiences. With custom actions, you can let the GPT do anything you want. In this tutorial, the custom action is an API call that answers the user’s question using Azure OpenAI with Azure AI Search as the provider of relevant context.

As long as you build and host an API and have an OpenAPI spec for your API, the possibilities are virtually limitless.

Note that custom GPTs with actions are not available in the ChatGPT app on mobile yet (end November, 2023). When that happens, it will open up all these capabilities on the go, including enabling voice chat. Fun stuff! 😀

Trying the OpenAI Assistants API

If you have ever tried to build an AI assistant, you know that is not a simple task. In almost all cases, your assistant needs access to external knowledge such as documents or APIs. You might even want to provide your assistant a code sandbox to solve user queries with code. When your assistant is accessed via a chat application, you also have to implement chat history.

Although there are several frameworks like LangChain and Semantic Kernel that can help, OpenAI recently released the Assistants API. It is their own API, tied to their models. The primitives of an assistant are Assistants, Threads and Runs. Let’s start by creating an assistant.

Note: this post contains code snippets in Python. You can find the full example in this gist: https://gist.github.com/gbaeke/e6e88c0dc68af3aa4a89b1228012ae53

Note: although I except this API to become available in Azure OpenAI, I am not quite sure it will happen fast, if at all. So for now, try it out at OpenAI directly. It is still in beta!

Creating an assistant

You can create an assistant using the portal or from code. An assistant has several parameters:

Instructions: how should the assistant behave or respond; think of it as the system message
Model: use any supported model, including fine-tuned models; to support retrieval from documents, you need the 1106 version of gpt-3.5-turbo/gpt-4
Tools: currently, the API supports Code Interpreter and Retrieval; these are fully hosted by OpenAI
Functions: define custom functions to call to integrate with external APIs for instance

Note that the retrieval tool supports uploaded files. There is no need for your own search solution (e.g., vector database with support for vector search, hybrid search, etc…). This is great in simpler scenarios where a full-fledged search system is not required. More control over retrieval will come later.

In this post, we will focus on an assistant that uses Code Interpreter. You can simply create the assistant in the portal. You can see the instructions, model, tools and files:

Assistant with only the Code interpreter tool using the latest gpt-4 model

To create this assistant, make sure you have an account at https://platform.openai.com. Create the assistant from the Assistants section:

Assistants have an id. For example, my assistant has this id: asst_VljToh6vQ1Mbu6Ct5L6qgpfy. I can use this id in my code to start creating threads.

Before talking about threads, let’s look at creating the assistant with code:

assistant = client.beta.assistants.create(
                name="Math Tutor",
                instructions="You are a personal math tutor. Write and run code to answer math questions.",
                tools=[{"type": "code_interpreter"}],
                model="gpt-4-1106-preview"
  )

To run this code, make sure you use the most recent version of the openai package (>=1.2). Note that if you run this code multiple times, you will create an assistant at each run. You should save the assistant id after creation and implement some logic to only run the above code when you do not have an id.

Above, we create an assistant with one tool: code interpreter.

Threads

After creating an assistant, you can create threads. Although somewhat unintuitive, a thread is not associated with an assistant. They exist on their own. After a thread is created, you can add messages to a thread, for instance a user message:

# we use streamlit so we save the thread in session state
if 'thread' not in st.session_state:
        st.session_state.thread = client.beta.threads.create()

# user_input contains a quesion like 'solve x^2 + 100 =200'
# here we add a message to the thread, using the thread id
client.beta.threads.messages.create(
            thread_id=st.session_state.thread.id,
            role="user",
            content=user_input
 )

To get a completion from the assistant for our thread, we need to create a run. The run tells the assistant to look at the messages in the thread and provide a response.

Runs

Below, we create the run:

run = client.beta.threads.runs.create(
            thread_id=st.session_state.thread.id,
            assistant_id=st.session_state.assistant_id, # refer to assistant in session state
            instructions="Please address the user as Geert. Only answer math questions."
  )

Above, both the thread_id and assistant_id are passed to the run, tying both together. If you did not create the assistant in your code, ensure you pass the id of a valid assistant created in your OpenAI account. Note that the run can be passed extra instructions. You can also override the model and tools that the assistant uses.

Creating a run is an asynchronous operation. It returns the metadata of the run immediately. The metadata includes fields like the run’s id, the created_at date and more.

You will need to manually check the run’s status in your code. For example:

# display a streamlit spinner while we check the run
with st.spinner('Waiting for completion...'):
    run_status = 'pending'
    while run_status != 'completed':
        run = client.beta.threads.runs.retrieve(
            thread_id=st.session_state.thread.id,
            run_id=run.id
        )
        run_status = run.status
        
        if run_status == 'failed' or run_status == "cancelled":
            st.error("Run failed or cancelled")
            st.stop()

        time.sleep(0.5)

When the run is finished, we can retrieve messages:

messages = client.beta.threads.messages.list(
    thread_id=st.session_state.thread.id
)

The messages data field contains all messages. Each message has a role like user or assistant. Assistant messages can have different content, like text or image_file.

For example, if I ask Plot y=x^3 + 2x, there will be both text and image_file responses. It’s up to the developer to properly display them in the app. Below is a naive approach, which only works with text and image responses, not downloads (Code Interpreter can give download links):

try:
    # no support for file download yet, just text and image_file
    for message in messages.data:
        if message.role == 'user':
            st.markdown(f"**User:** {message.content[0].text.value}")
        if message.role == 'assistant':
            for content in message.content:
                if hasattr(content, 'text'):
                    st.markdown(f"**Assistant:** {content.text.value}")
                elif hasattr(content, 'image_file'):
                    # image Id = content.image_file.file_id
                    content = get_content(content.image_file.file_id)
                    image = Image.open(BytesIO(content))
                    st.image(image, caption="Downloaded Image", use_column_width=True)                    
except Exception as e:
    st.error(e)

The above should be pretty clear:

if the assistant responds with text, display the text
if the assistant responds with an image, there is an image Id; I use a get_content function to download the image from OpenIA; get_content also implements some straightforward caching logic to avoid having to download images over and over again in the same thread

The get_content function uses client.files.content(file_id).response.content to retrieve the file (client is OpenAI client). The returned result can be used by PIL to open the image and subsequently display it with Streamlit’s st.image:

Note that I can keep asking questions, which adds messages to the same thread, based on the thread’s Id in Streamlit’s session state. When the user refreshes the browser, session state is cleared so a new thread is started. For example, when I ask change 2x in 3x:

In the code, I do not have to worry about chat history at all. I just add messages to the thread, which is managed by OpenAI. At the next run, all those messages are sent to the assistant’s model, which responds appropriately. Note that you do pay for the tokens that all those messages consume.

Conclusion

Compared to the synchronous and stateless ChatCompletion API, the Assistants API is asynchronous and stateful. As a developer, you create an assistant with tools, functions and content for retrieval purposes. Interacting with the assistant is easy: simply add messages to a thread and create a run.

Obviously, it is early days for this API as it is still in beta. Personally, I think it’s a great step forward, making it easier to create quite sophisticated assistants. Most orchestration frameworks and AI tools like LangChain, Semantic Kernel, Flowise, etc… already have support or will support assistants and will add extra capabilities or ease of use on top of the base functionality.

Working with Recipes and Gateways in Microsoft’s Radius

In a previous post, we looked at the basics of deploying a multi-container app that uses Dapr with Radius. In this post, we will add two things:

a recipe that deploys Redis
a gateway: to make the app available to the outside world

Find the full code and app.bicep in the following branch: https://github.com/gbaeke/raddemo/tree/radius-step1.

Recipes

When a developer chooses a resource they would like to use in their app, like a database or queue, that type of resource needs to be deployed somehow. In my sample app, the api saves data to Redis.

From an operator point of view, and possibly depending on the environment, Redis needs to be deployed and configured properly. For instance in dev, you could opt for Redis in a container without a password. In production, you could go for Azure Redis Cache instead with TLS and authentication.

This is where recipes come in. They deploy the needed resources and provide the proper connections to allow applications to connect. Let’s look at a recipe that deploys Redis in Kubernetes:

resource redis 'Applications.Datastores/redisCaches@2023-10-01-preview' = {
  name: 'redis'
  properties: {
    application: app.id
    environment: environment
  }
}

Note: you can get a list of recipes with rad recipe list; next to Redis, there are recipes for sqlDatabases, rabbitMQQueues and more; depending on how you initialised Radius, the recipe list might be empty

The above recipe deploys Redis as a container to the underlying Kubernetes cluster. The deployment is linked to an environment. Just like in the previous blog post, we just use the default environment. Working with environments and workspaces will be for another post.

In fact, this recipe does not specify the recipe explicitly. This means that the default recipe is used, which in this case is a Redis container in Kubernetes.

Note that the above recipe actually deploys the resource. It is quite possible that your Redis Cache is already deployed without a recipe. In that case, you can set resourceProvisioning to manual and set hostname, port and other properties manually, via secret integration or with references to another Bicep resource. For example:

resource redis 'Applications.Datastores/redisCaches@2023-10-01-preview' = {
  name: 'redis'
  properties: {
    environment: environment
    application:app.id
    resourceProvisioning: 'manual'
    resources: [{
      id: azureRedis.id
    }]
    username: 'myusername'
    host: azureRedis.properties.hostName
    port: azureRedis.properties.port
    secrets: {
      password: azureRedis.listKeys().primaryKey
    }
  }
}

Note: above, references are made to azureRedis, a symbolic name for a Bicep resource, implying that Azure Redis Cache is deployed from the same Bicep file but without a recipe

In either case (deployment or reference), when a connection from a container is made to this Redis resource, a number of environment variables are set inside the container. For example:

CONNECTION_CONNECTIONNAME_HOSTNAME
CONNECTION_CONNECTIONNAME_PORT
…

Connecting the api to Redis

To connect the api container to Redis, we use the following app.bicep (please read the previous article for the full context):

import radius as radius

@description('Specifies the environment for resources.')
param environment string

resource app 'Applications.Core/applications@2023-10-01-preview' = {
  name: 'raddemo'
  properties: {
    environment: environment
  }
}

resource redis 'Applications.Datastores/redisCaches@2023-10-01-preview' = {
  name: 'redis'
  properties: {
    application: app.id
    environment: environment
  }
}

resource ui 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'ui'
  properties: {
    application: app.id
    container: {
      image: 'gbaeke/radius-ui:latest'
      ports: {
        web: {
          containerPort: 8001
        }
      }
      env: {
        DAPR_APP: api.name  // api name is the same as the Dapr app id here
      }
    }
    extensions: [
      {
        kind: 'daprSidecar'
        appId: 'ui'
      }
    ]
  }
}

resource api 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'api'
  properties: {
    application: app.id
    container: {
      image: 'gbaeke/radius-api:latest'
      ports: {
        web: {
          containerPort: 8000
        }
      }
      env: {
          REDIS_HOST: redis.properties.host
          REDIS_PORT: string(redis.properties.port)
      }
    }
    extensions: [
      {
        kind: 'daprSidecar'
        appId: 'api'
        appPort: 8000
      }
    ]
    connections: {
      redis: {
        source: redis.id  // this creates environment variables in the container
      }
    }
  }
}

Note the connections array in the api resource. In that array, we added redis and we reference the redis recipe’s id.

Because our api expects the Redis host and port in environment variables different from the ones provided by the connection, we set the variables the api expects ourselves and reference the Redis recipe’s properties.

The environment variables in the api container set by the connection will be CONNECTION_REDIS_HOSTNAME etc… but we do not use them here because that would require a code change.

When you run this app with rad run app.bicep, Redis will be deployed. When the user submits a question via the ui, the logs will show that the Redis call succeeded:

api-c8686c8ff-bwf7l api INFO:root:Stored result for question Hello in Redis

redis-hjo6ha3uqagio-64949758b7-td7c8 redis-monitor 1698071908.381740 [0 10.244.0.24:59750] "SET" "Hello" "This is a fake result for question Hello"

Because rad run streams all logs, the redis-monitor logs are also shown. They clearly state a Redis SET operation was performed.

There is much more to say about recipes. You can even create your own recipes. They are just bicep (or Terraform) modules you publish to a registry. See authoring recipes for more information.

Adding a gateway

So far, we have accessed the ui of our application via port forwarding. The ui listens on port 8001 which is mapped to http://localhost:8001 by rad run. What if we want to make the application available to the outside world?

To make the ui available to the outside world, we can add the following to app.bicep:

resource gateway 'Applications.Core/gateways@2023-10-01-preview' = {
  name: 'gateway'
  properties: {
    application: app.id
    routes: [
      {
        path: '/'
        destination: 'http://ui:8001'
      }
    ]
  }
}

The above adds a gateway to our app and adds one route: http://ui:8001.

During deployment of the Radius control plane, Radius deployed Contour. Contour uses Envoy as the data plane and a service of type LoadBalancer makes the data plane available to the outside world.

In k9s, you should see the following pods in the Radius control plane namespace (radius-system):

Contour

You will also find the service of type LoadBalancer:

LoadBalancer with public IP address

When you create a gateway with Radius, it creates a Kubernetes resource of kind HTTPProxy with apiVersion projectcontour.io/v1 in the same namespace as your app. The spec of the resource refers to another HTTPProxy (ui here) and sets the fqdn (fully qualified domain name) to gateway.raddemo.4.175.112.144.nip.io.

nip.io is a service that resolves a name to the IP address in that name, in this case 4.175.122.144. That IP address is the IP address used by the Azure Load Balancer.

The HTTPProxy ui defines the service and port it routes to. Here that is a service called ui and port 8001.

gateway and ui Contour HTTPProxy resources

You can set your own fully qualified domain name if you wish, in addition to specifying a certificate to enable TLS.

The HTTPProxy resources instruct Contour to configure itself to accept traffic on the configured FQDN and forward it to the ui service.

The full Bicep code to deploy the containers, Redis and the gateway is below:

import radius as radius

@description('Specifies the environment for resources.')
param environment string

resource app 'Applications.Core/applications@2023-10-01-preview' = {
  name: 'raddemo'
  properties: {
    environment: environment
  }
}

resource redis 'Applications.Datastores/redisCaches@2023-10-01-preview' = {
  name: 'redis'
  properties: {
    application: app.id
    environment: environment
  }
}

resource gateway 'Applications.Core/gateways@2023-10-01-preview' = {
  name: 'gateway'
  properties: {
    application: app.id
    routes: [
      {
        path: '/'
        destination: 'http://ui:8001'
      }
    ]
  }
}


resource ui 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'ui'
  properties: {
    application: app.id
    container: {
      image: 'gbaeke/radius-ui:latest'
      ports: {
        web: {
          containerPort: 8001
        }
      }
      env: {
        DAPR_APP: api.name  // api name is the same as the Dapr app id here
      }
    }
    extensions: [
      {
        kind: 'daprSidecar'
        appId: 'ui'
      }
    ]
  }
}

resource api 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'api'
  properties: {
    application: app.id
    container: {
      image: 'gbaeke/radius-api:latest'
      ports: {
        web: {
          containerPort: 8000
        }
      }
      env: {
          REDIS_HOST: redis.properties.host
          REDIS_PORT: string(redis.properties.port)
      }
    }
    extensions: [
      {
        kind: 'daprSidecar'
        appId: 'api'
        appPort: 8000
      }
    ]
    connections: {
      redis: {
        source: redis.id  // this creates environment variables in the container
      }
    }
  }
}

To see the app’s URL, use rad app status.

Note: there is a discussion ongoing to use recipes instead of a pre-installed ingress controller like Contour. With recipes, you could install the ingress solution you prefer such as nginx ingress or any other solution.

Conclusion

In this post we added a Redis database and connected the api to Redis via a connection. We did not use the environment variables that the connection creates. Instead, we provided values for the Redis host name and port to the environment variables the api expects.

To make the application available via the built-in Contour ingress, we created a gateway resource that routes to the ui service on port 8001. The gateway creates a nip.io hostname but you can set the hostname to something different as long as that name resolves to the IP address of the Contour LoadBalancer service.

Giving Microsoft’s Radius a spin

Microsoft recently announced Radius. As stated in their inaugural blog post, it is “a tool to describe, deploy, and manage your entire application”. With Radius, you describe your application in a bicep file. This can include containers, databases, the connections between those and much more. Radius is an open-source solution started from within Microsoft. The name is somewhat confusing because of RADIUS, a network authentication solution developed in 1991!

Starting point: app running locally

Instead of talking about it, let’s start with an application that runs locally on a development workstation and uses Dapr:

The ui is a Flask app that presents a text area and a button. When the user clicks the button, the code that handles the event calls the api using Dapr invoke. If you do not know what Dapr is, have a look at docs.dapr.io. The api saves the user’s question and a fake response to Redis. If Redis cannot be found, the api will simply log it could not save the data. The response is returned to the ui.

To run the application with Dapr on a development machine, I use a dapr.yaml file in combination with dapr run -f . See multi-app run for more details.

Here’s the yaml file:

version: 1
apps:
  - appID: ui
    appDirPath: ./ui
    appPort: 8001
    daprHTTPPort: 3510
    env:
      DAPR_APP: api
    command: ["python3","app.py"]
  - appID: api
    appDirPath: ./api
    appPort: 8000
    daprHTTPPort: 3511
    env:
      REDIS_HOST: localhost
      REDIS_PORT: 6379
      REDIS_DB: 0
    command: ["python3","app.py"]

Note that the api needs a couple of environment variables to find the Redis instance. The ui needs one environment variable DAPR_APP that holds the Dapr appId of the api. The Dapr invoke call needs this appId in order to find the api on the network.

In Python, the Dapr invoke call looks like this:

with DaprClient() as d:
        log.info(f"Making call to {dapr_app}")
        resp = d.invoke_method(dapr_app, 'generate', data=bytes_data,
                                 http_verb='POST', content_type='application/json')
        log.info(f"Response from API: {resp}")

The app runs fine locally if you have Python and the dependencies as listed in both the ui’s and api’s requirements.txt file. Let’s try to deploy the app with Radius.

Deploying the app with Radius

Before we can deploy the app with Radius, you need to install a couple of things:

rad CLI: I installed the CLI on MacOS; see the installation instructions for more details
VS Code extension: Radius uses a forked version of Bicep that is older than the current version of Bicep. The two will eventually converge but for now, you need to disable the official Bicep extension in VS Code and install the Radius Bicep extension. This is needed to support code like import radius as radius, which is not supported in the current version of Bicep.
Kubernetes cluster: Radius uses Kubernetes and requires the installation of the Radius control plane on that cluster. I deployed a test & dev AKS cluster in Azure and ensured it was set as my current context. Use kubectl config current-context to check that.
Install Dapr: our app uses Dapr and Radius supports it; however, Dapr needs to be manually installed on the cluster; if you have Dapr on your local machine, run dapr init -k to install it on Kubernetes

Now you can clone my raddemo repo. Use git clone https://github.com/gbaeke/raddemo.git. In the raddemo folder, you will see two folders: api and ui. In the root folder, run the following command:

rad init

Select Yes to use the current folder.

Running rad init does the following:

Installs Radius to the cluster in the radius-system namespace
Creates a new environment and workspace (called default)
Sets up a local-dev recipe pack: recipes allow you to install resources your app needs like Redis, MySQL, etc…

After installation, this is the view on the radius-system Kubernetes namespace with k9s:

There should also be a .rad folder with a rad.yaml file:

workspace:
  application: "raddemo"

The file defines a workspace with our application name raddemo. raddemo is the name of the folder where I ran rad init. You can have multiple workspaces defined with one selected as the default. For instance, you could have a dev and prod workspace where each workspace uses a different Kubernetes cluster and environment. The default could be set to dev but you can easily switch to prod using the rad CLI. Check this overview of workspaces for more information. I am going to work with just one workspace called default, which uses an environment called default. When you just run rad init, those are the defaults.

You also get a default app.bicep file:

import radius as radius
param application string

resource demo 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'demo'
  properties: {
    application: application
    container: {
      image: 'radius.azurecr.io/samples/demo:latest'
      ports: {
        web: {
          containerPort: 3000
        }
      }
    }
  }
}

This is deployable code. If you run rad run app.bicep, a Kubernetes pod will be deployed to your cluster, using the image above. Radius would also setup port forwarding to access the app on it’s containerPort (3000).

We will change this file to deploy the ui. We will remove the application parameter and define our own application. That application needs an environment which we will pass in via a parameter:

import radius as radius

@description('Specifies the environment for resources.')
param environment string

resource app 'Applications.Core/applications@2023-10-01-preview' = {
  name: 'raddemo'
  properties: {
    environment: environment
  }
}

resource ui 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'ui'
  properties: {
    application: app.id
    container: {
      image: 'gbaeke/radius-ui:latest'
      ports: {
        web: {
          containerPort: 8001
        }
      }
    }
    extensions: [
      {
        kind: 'daprSidecar'
        appId: 'ui'
      }
    ]
  }
}

Above, we define the following:

a resource of type Applications.Core/applications: because applications run on Kubernetes, you can use a different namespace than the default and also set labels and annotations. All labels and annotations would be set on all resources belonging to the app, such as containers
the app resource needs an environment: the environment parameter is defined in the Bicep file and is set automatically by the rad CLI; it will match the environment used by your current workspace; environments can also have cloud credentials attached to deploy resources in Azure or AWS; we are not using that here
a resource of type Applications.Core/containers: this will create a pod in a Kubernetes namespace; the container belongs to the app we defined (application property) and uses the image gbaeke/ui-radius:latest on Docker Hub. Radius supports Dapr via extensions. The Dapr sidecar is added via these extensions with the app Id of ui.

In Kubernetes, this results in a pod with two containers: the ui container and the Dapr sidecar.

ui and Dapr sidecar

When you run rad run app.bicep, you should see the resources in namespace default-raddemo. The logs of all containers should stream to your console and local port 8001 should be mapped to the pod’s port 8001. http://localhost:8001 should show:

We will end this post by also deploying the api. It also needs Dapr and we need to update the definition of the ui container by adding an environment variable:

import radius as radius

@description('Specifies the environment for resources.')
param environment string

resource app 'Applications.Core/applications@2023-10-01-preview' = {
  name: 'raddemo'
  properties: {
    environment: environment
  }
}

resource ui 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'ui'
  properties: {
    application: app.id
    container: {
      image: 'gbaeke/radius-ui:latest'
      ports: {
        web: {
          containerPort: 8001
        }
      }
      env: {
        DAPR_APP: api.name  // api name is the same as the Dapr app id here
      }
    }
    extensions: [
      {
        kind: 'daprSidecar'
        appId: 'ui'
      }
    ]
  }
}

resource api 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'api'
  properties: {
    application: app.id
    container: {
      image: 'gbaeke/radius-api:latest'
      ports: {
        web: {
          containerPort: 8000
        }
      }
    }
    extensions: [
      {
        kind: 'daprSidecar'
        appId: 'api'
        appPort: 8000
      }
    ]
  }
}

Above, we added the api container, enabled Dapr, and set the Dapr appId to api. In the ui, we set environment variable DAPR_APP to api.name. We can do this because the name of the api resource is the same as the appId. This also makes Radius deploy the api before the ui. Note that the api does not have Redis environment variables. It will default to finding Redis at localhost, which will fail. But that’s ok.

You now have two pods in your namespace:

Yes, there are three here but Redis will be added in a later post.

Note that instead of running rad run app.bicep, you can also run rad deploy app.bicep. The latter simply deploys the application. It does not forward ports or stream logs.

Summary

In this post, we touched on the basics of using Radius to deploy an application that uses Dapr. Under the hood, Radius uses Kubernetes to deploy container resources specified in the Bicep file. To run the application, simply run rad run app.bicep to deploy the app, stream all logs and set up port forwarding.

We barely scratched the surface here so in a next post, we will add Redis via a recipe, and make the application available publicly via a gateway. Stay tuned!

Introduction

Initialising the OpenAI client and creating the assistant

Creating a thread and adding a message

Running the thread

Interpreting the messages after the run

Follow-up questions

Conclusion

Share this:

Convert the flow to a chat flow

Generating the Docker image

Running the image

Conclusion

Share this:

Installing Prompt Flow on your machine

Creating an empty flow

Inputs and outputs

Creating an embedding from the description

Searching for similar images

Adding an LLM tool

Using a prompt variant

Conclusion

Share this:

What do we want to achieve?

Creating the index

Data source

Skillset with two skills

Indexer

Testing the index

Conclusion

Share this:

A word on vectors and vectorization

Getting a description from an image

Embedding of the image

Generating the data to index

Building the index

Search images with text

Return similar images

Conclusion

Share this:

Vector search in the portal

Using an embedding skill during indexing

Reset and run the indexer

Conclusion

Share this:

Azure AI Search Index

Use the index with Azure OpenAI “add your data”

Creating the API

Running the API in Azure Container Apps

Getting the OpenAPI spec and adding it to the GPT

Conclusion

Share this:

Creating an assistant

Threads

Runs

Conclusion

Share this:

Recipes

Connecting the api to Redis

Adding a gateway

Conclusion

Share this:

Starting point: app running locally

Deploying the app with Radius

Summary

Share this: