Using Azure Database for PostgreSQL as a vector store

When we build LLM applications, there is always a recurring question: “What vector store will we use?”. In Azure, there are several native solutions. Some of them were discussed in previous posts.

  • Azure Cognitive Search: supports vector search but also hybrid search with semantic reranking as discussed here
  • Azure Redis Cache Enterprise: check my blog post
  • Azure Cosmos DB for MongoDB Core: see Microsoft Learn

In addition to the above, you can of course host your own vector database in a container such as Qdrant or others. You can install these on any service that supports containers such as App Service, Container Instances, Container Apps, or Kubernetes.

Using PostgreSQL

If you are familiar with Azure Database for PostgreSQL flexible servers, you can use it as a vector store, as long as you install the vector extension. This extension can be enabled in all compute tiers. I installed PostgreSQL and set the compute tier to Burstable with size Standard_B1ms (1 vCore, 2GB). This is great for testing and will cost around 20 euros per month with 32GB of storage. For production use, the monthly cost will start from about 150 euros at the lowest General Purpose tier.

PostrgreSQL flexible server with lowest compute tier

After deployment, you need to enable the vector extension. In Server Parameters, search for azure.extensions and select VECTOR from the list. Then click Save.

VECTOR extension added

When done, grab the connection details from the Connect pane:

Connection details

In pgAdmin, register the server with the above details. Connect to the server and create a database. Ensure you configure the firewall settings in Azure to allow your IP address to connect to the server.

Database creation in pgAdmin

Inside the database, go to Extensions and add the vector extension:

vector extension added to the database

Note: in the code below, we will use LangChain. LangChain will try to enable the vector extension if it is not enabled

Note: if you do not want to install pgAdmin, you can create the database from the Azure Portal or use the Azure CLI.

Working with the vector store

Although you can create and query tables that contain vectors with plain SQL, we will use LangChain as a higher-level library that takes care of many of the details for us.

Take a look at the following Python code that creates a few embeddings (vectors) and then uses RAG (retrieval augmented generation) to answer a question with OpenAI’s text-davinci-003 model.

Note: the code is on Github as well

import os
import getpass

# read from .env file
from dotenv import load_dotenv
load_dotenv()

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGVector
from langchain.document_loaders import TextLoader
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI


loader = TextLoader("./state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()


pgpassword = os.getenv("PGPASSWORD", "")
if not pgpassword:
    pgpassword = getpass.getpass("Enter pgpassword: ")

CONNECTION_STRING = f"postgresql+psycopg2://pgadmin:{pgpassword}@pg-vec-geba.postgres.database.azure.com:5432/pgvector"

COLLECTION_NAME = "state_of_the_union_test"

# if you run this code more than once, you will duplicated vectors
# no upserts
db = PGVector.from_documents(
    embedding=embeddings,
    documents=docs,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING
)

retriever = db.as_retriever()

query = "What did the president say about Ketanji Brown Jackson"

# LLM will default to text-davinci-003 because we are using a completion endpoint
# versus a chat endpoint
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)

answer = qa.run(query)

print(answer)

The code above requires a .env file with the following content:

OPENAI_API_KEY=OPENAI_API_KEY
PGPASSWORD=PASSWORD_TO_POSTGRES

You will also need the State of the Union text file from here.

Before running the code, install the following packages:

pip install pgvector
pip install openai
pip install psycopg2-binary
pip install tiktoken

The code does the following:

  • Import required libraries: important here is the PGVector import
  • Loads the text file and splits it into chunks: chunking strategy is not too important here; just use chunks of 1000 characters
  • Create an instance of type OpenAIEmbeddings, later used to create a vector per chunk for storage in PostgreSQL; it will also be used when doing queries with a retrieval QA chain (see below); uses text-embedding-ada-002 embedding model
  • Construct the connection string for later use and set a collection name: collections are a way to store vectors together; the collections you create are kept in a table and each vector references the collection
  • Create an instance of PGVector with PGVector.from_documents: this will create/use tables to hold the collection(s) and vectors for you; all chunks will be vectorized and stored in a table; we will take a look at those tables in a moment; in a real application, you would reference existing tables and another process would create/update the vectors
  • Create a retriever (qa) from the PGVector instance for use in a retrieval QA chain
  • Run a query and print the answer: the qa.run (query) line does the n-nearest neighbor vector search in PostgreSQL (via the retriever), creates a meta-prompt with the relevant context, and returns the OpenAI model response in one step

In the PostgreSQL database, the above code creates two tables:

Tables created by LangChain to store the vectors

The collection table contains the collections you create from code. Each collection has a unique ID. The embedding table contains the vectors. Each vector has a unique ID and belongs to a collection. The fields of the embedding table are:

  • uuid: unique ID of the vector
  • collection_id: collection ID referencing the collection table
  • embedding: a field of type vector that stores the embedding (1536 dimensions)
  • document: the chunk of text that was vectorized
  • cmetadata: a JSON field with a link to the source file
  • custom_id: an id that is unique for each run

Note that when you run the sample Python code multiple times, you will have duplicated content. In a real application, you should avoid that. The process that creates and stores the vectors will typically be separate from the process that queries them.

⚠️ Important: Today, LangChain cannot search over all vectors in all collections. You always need to specify the collection to search. If you do need to search over all vectors, you can use SQL statements instead.

The search has the following properties:

  • Distance strategy: cosine similarity; the pgvector extension also supports L2 distance and inner product; the code above uses the text-embedding-ada-002 embeddings model by default; with that model, you should use cosine similarity; LangChain uses cosine similarity as the default for PGVector so that’s a match! 👏
  • Exact nearest neighbor search: although this provides perfect recall, it can get slow when there are many vectors because the entire table is scanned; the extension supports the creation of indexes to perform an approximate nearest neighbor search using IVFFLat or HNSW; see pgvector on GitHub for more details and also this article from Crunchy Data.

Note: most other vector databases use HNSW as the index type (e.g., Azure Cognitive Search, Qdrant, …); unlike IVFFLat you can create this index without having any vectors in your database table; at the time of writing (end of September 2023), the version of the vector extension on Azure was 0.4.1 and did not support HNSW; HNSW requires version 0.5.0 or higher

Conclusion

Azure Database for PostgreSQL with the vector extension is an interesting alternative to other vector database solutions in Azure. This is especially the case when PostgreSQL is your database of choice! In this post, we have shown how LangChain supports it with a simple example. If you do not use LangChain or other libraries, you can simply use SQL statements to create and search indexes as documented here.

The drawback of using PostgreSQL is that you need to know a bit more about exact and approximate nearest neighbor searches and the different index mechanisms. That’s actually a good thing if you want to create production applications with good performance. For a simple POC with not a lot of data, you can skip all of this and perform exact searches.

Besides the free tier of Azure Cognitive Search, the configuration above is the service with the lowest cost for POCs that need vector search. On top of that, the cheapest PostgreSQL option has more storage than Cognitive Search’s free tier (32GB vs. 50MB). Adding more storage is easy and relatively cheap as well. Give it a go and tell me what you think!

Use Azure OpenAI Add your data vector search from code

In the previous post, we looked at using Azure OpenAI Add your data from the Azure OpenAI Chat Playground. It is an easy-to-follow wizard to add documents to a storage account and start asking questions about them. From the playground, you can deploy a chat app to an Azure web app and you are good to go. The vector search is performed by an Azure Cognitive Search resource via an index that includes a vector next to other fields such as the actual content, the original URL, etc…

In this post, we will look at using this index from code and build a chat app using the Python Streamlit library.

All code can be found here: https://github.com/gbaeke/azure-cog-search

Requirements

You need an Azure Cognitive Search resource with an index that supports vector search. Use this post to create one. Besides Azure Cognitive Search, you will need Azure OpenAI deployed with both gpt-4 (or 3.5) and the text-embedding-ada-002 embedding model. The embedding model is required to support vector search. In Europe, use France Central as the region.

Next, you need Python installed. I use Python 3.11.4 64-bit on an M1 Mac. You will need to install the following libraries with pip:

  • streamlit
  • requests

You do not need the OpenAI library because we will use the Azure OpenAI REST APIs to be able to use the extension that enables the Add your data feature.

Configuration

We need several configuration settings. The can be divided into two big blocks:

  • Azure Cognitive Search settings: name of the resource, access key, index name, columns, type of search (vector), and more…
  • Azure OpenAI settings: name of the model (e.g., gpt-4), OpenAI access key, embedding model, and more…

You should create a .env file with the following content:

AZURE_SEARCH_SERVICE = "AZURE_COG_SEARCH_SHORT_NAME"
AZURE_SEARCH_INDEX = "INDEX_NAME"
AZURE_SEARCH_KEY = "AZURE_COG_SEARCH_AUTH_KEY"
AZURE_SEARCH_USE_SEMANTIC_SEARCH = "false"
AZURE_SEARCH_TOP_K = "5"
AZURE_SEARCH_ENABLE_IN_DOMAIN = "true"
AZURE_SEARCH_CONTENT_COLUMNS = "content"
AZURE_SEARCH_FILENAME_COLUMN = "filepath"
AZURE_SEARCH_TITLE_COLUMN = "title"
AZURE_SEARCH_URL_COLUMN = "url"
AZURE_SEARCH_QUERY_TYPE = "vector"

# AOAI Integration Settings
AZURE_OPENAI_RESOURCE = "AZURE_OPENAI_SHORT_NAME"
AZURE_OPENAI_MODEL = "gpt-4"
AZURE_OPENAI_KEY = "AZURE_OPENAI_AUTH_KEY"
AZURE_OPENAI_TEMPERATURE = 0
AZURE_OPENAI_TOP_P = 1.0
AZURE_OPENAI_MAX_TOKENS = 1000
AZURE_OPENAI_STOP_SEQUENCE = ""
AZURE_OPENAI_SYSTEM_MESSAGE = "You are an AI assistant that helps people find information."
AZURE_OPENAI_PREVIEW_API_VERSION = "2023-06-01-preview"
AZURE_OPENAI_STREAM = "false"
AZURE_OPENAI_MODEL_NAME = "gpt-4"
AZURE_OPENAI_EMBEDDING_ENDPOINT = "https://AZURE_OPENAI_SHORT_NAME.openai.azure.com/openai/deployments/embedding/EMBEDDING_MODEL_NAME?api-version=2023-03-15-preview"
AZURE_OPENAI_EMBEDDING_KEY = "AZURE_OPENAI_AUTH_KEY"

Now we can create a config.py that reads these settings.

from dotenv import load_dotenv
import os
load_dotenv()

# ACS Integration Settings
AZURE_SEARCH_SERVICE = os.environ.get("AZURE_SEARCH_SERVICE")
AZURE_SEARCH_INDEX = os.environ.get("AZURE_SEARCH_INDEX")
AZURE_SEARCH_KEY = os.environ.get("AZURE_SEARCH_KEY")
AZURE_SEARCH_USE_SEMANTIC_SEARCH = os.environ.get("AZURE_SEARCH_USE_SEMANTIC_SEARCH", "false")
AZURE_SEARCH_TOP_K = os.environ.get("AZURE_SEARCH_TOP_K", 5)
AZURE_SEARCH_ENABLE_IN_DOMAIN = os.environ.get("AZURE_SEARCH_ENABLE_IN_DOMAIN", "true")
AZURE_SEARCH_CONTENT_COLUMNS = os.environ.get("AZURE_SEARCH_CONTENT_COLUMNS")
AZURE_SEARCH_FILENAME_COLUMN = os.environ.get("AZURE_SEARCH_FILENAME_COLUMN")
AZURE_SEARCH_TITLE_COLUMN = os.environ.get("AZURE_SEARCH_TITLE_COLUMN")
AZURE_SEARCH_URL_COLUMN = os.environ.get("AZURE_SEARCH_URL_COLUMN")
AZURE_SEARCH_VECTOR_COLUMNS = os.environ.get("AZURE_SEARCH_VECTOR_COLUMNS")
AZURE_SEARCH_QUERY_TYPE = os.environ.get("AZURE_SEARCH_QUERY_TYPE")

# AOAI Integration Settings
AZURE_OPENAI_RESOURCE = os.environ.get("AZURE_OPENAI_RESOURCE")
AZURE_OPENAI_MODEL = os.environ.get("AZURE_OPENAI_MODEL")
AZURE_OPENAI_KEY = os.environ.get("AZURE_OPENAI_KEY")
AZURE_OPENAI_TEMPERATURE = os.environ.get("AZURE_OPENAI_TEMPERATURE", 0)
AZURE_OPENAI_TOP_P = os.environ.get("AZURE_OPENAI_TOP_P", 1.0)
AZURE_OPENAI_MAX_TOKENS = os.environ.get("AZURE_OPENAI_MAX_TOKENS", 1000)
AZURE_OPENAI_STOP_SEQUENCE = os.environ.get("AZURE_OPENAI_STOP_SEQUENCE")
AZURE_OPENAI_SYSTEM_MESSAGE = os.environ.get("AZURE_OPENAI_SYSTEM_MESSAGE", "You are an AI assistant that helps people find information about jobs.")
AZURE_OPENAI_PREVIEW_API_VERSION = os.environ.get("AZURE_OPENAI_PREVIEW_API_VERSION", "2023-06-01-preview")
AZURE_OPENAI_STREAM = os.environ.get("AZURE_OPENAI_STREAM", "true")
AZURE_OPENAI_MODEL_NAME = os.environ.get("AZURE_OPENAI_MODEL_NAME", "gpt-35-turbo")
AZURE_OPENAI_EMBEDDING_ENDPOINT = os.environ.get("AZURE_OPENAI_EMBEDDING_ENDPOINT")
AZURE_OPENAI_EMBEDDING_KEY = os.environ.get("AZURE_OPENAI_EMBEDDING_KEY")

Writing the chat app

Now we will create chat.py. The diagram below summarizes the architecture:

Chat app architecture (high level)

Here is the first section of the code with explanations:

import requests
import streamlit as st
from config import *
import json

# Azure OpenAI REST endpoint
endpoint = f"https://{AZURE_OPENAI_RESOURCE}.openai.azure.com/openai/deployments/{AZURE_OPENAI_MODEL}/extensions/chat/completions?api-version={AZURE_OPENAI_PREVIEW_API_VERSION}"
    
# endpoint headers with Azure OpenAI key
headers = {
    'Content-Type': 'application/json',
    'api-key': AZURE_OPENAI_KEY
}

# Streamlit app title
st.title("🤖 Azure Add Your Data Bot")

# Keep messages array in session state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display previous chat messages from history on app rerun
# Add your data messages include tool responses and assistant responses
# Exclude the tool responses from the chat display
for message in st.session_state.messages:
    if message["role"] != "tool":
        with st.chat_message(message["role"]):
            st.markdown(message["content"])

A couple of things happen here:

  • We import all the variables from config.py
  • We construct the Azure OpenAI REST endpoint and store it in endpoint; we use the extensions/chat endpoint here which supports the Add your data feature in API version 2023-06-01-preview and higher
  • We configure the HTTP headers to send to the endpoint; the headers include the Azure OpenAI authentication key
  • We print a title with Streamlit (st.title) and define a messages array that we store in Streamlit’s session state
  • Because of the way Streamlit works, we have to print the previous messages of the chat each time the page reloads. We do that in the last part but we exclude the tool role. The extensions/chat endpoint returns a tool response that contains the data returned by Azure Cognitive Search. We do not want to print the tool response. Together with the tool response, the endpoint returns an assistant response which is the response from the gpt model. We do want to print that response.

Now we can look at the code that gets executed each time the user asks a question. In the UI, the question box is at the bottom:

Streamlit chat UI

Whenever you type a question, the following code gets executed:

# if user provides chat input, get and display response
# add user question and response to previous chat messages
if user_prompt := st.chat_input():
    st.chat_message("user").write(user_prompt)
    with st.chat_message("assistant"):
        with st.spinner("🧠 thinking..."):
            # add the user query to the messages array
            st.session_state.messages.append({"role": "user", "content": user_prompt})
            body = {
                "messages": st.session_state.messages,
                "temperature": float(AZURE_OPENAI_TEMPERATURE),
                "max_tokens": int(AZURE_OPENAI_MAX_TOKENS),
                "top_p": float(AZURE_OPENAI_TOP_P),
                "stop": AZURE_OPENAI_STOP_SEQUENCE.split("|") if AZURE_OPENAI_STOP_SEQUENCE else None,
                "stream": False,
                "dataSources": [
                    {
                        "type": "AzureCognitiveSearch",
                        "parameters": {
                            "endpoint": f"https://{AZURE_SEARCH_SERVICE}.search.windows.net",
                            "key": AZURE_SEARCH_KEY,
                            "indexName": AZURE_SEARCH_INDEX,
                            "fieldsMapping": {
                                "contentField": AZURE_SEARCH_CONTENT_COLUMNS.split("|") if AZURE_SEARCH_CONTENT_COLUMNS else [],
                                "titleField": AZURE_SEARCH_TITLE_COLUMN if AZURE_SEARCH_TITLE_COLUMN else None,
                                "urlField": AZURE_SEARCH_URL_COLUMN if AZURE_SEARCH_URL_COLUMN else None,
                                "filepathField": AZURE_SEARCH_FILENAME_COLUMN if AZURE_SEARCH_FILENAME_COLUMN else None,
                                "vectorFields": AZURE_SEARCH_VECTOR_COLUMNS.split("|") if AZURE_SEARCH_VECTOR_COLUMNS else []
                            },
                            "inScope": True if AZURE_SEARCH_ENABLE_IN_DOMAIN.lower() == "true" else False,
                            "topNDocuments": AZURE_SEARCH_TOP_K,
                            "queryType":  AZURE_SEARCH_QUERY_TYPE,
                            "roleInformation": AZURE_OPENAI_SYSTEM_MESSAGE,
                            "embeddingEndpoint": AZURE_OPENAI_EMBEDDING_ENDPOINT,
                            "embeddingKey": AZURE_OPENAI_EMBEDDING_KEY
                        }
                    }   
                ]
            }  

            # send request to chat completion endpoint
            try:
                response = requests.post(endpoint, headers=headers, json=body)

                # there will be a tool response and assistant response
                tool_response = response.json()["choices"][0]["messages"][0]["content"]
                tool_response_json = json.loads(tool_response)
                assistant_response = response.json()["choices"][0]["messages"][1]["content"]

                # get urls for the JSON tool response
                urls = [citation["url"] for citation in tool_response_json["citations"]]


            except Exception as e:
                st.error(e)
                st.stop()
            
           
            # replace [docN] with urls and use 0-based indexing
            for i, url in enumerate(urls):
                assistant_response = assistant_response.replace(f"[doc{i+1}]", f"[[{i}]({url})]")
            

            # write the response to the chat
            st.write(assistant_response)

            # write the urls to the chat; gpt response might not refer to all
            st.write(urls)

            # add both responses to the messages array
            st.session_state.messages.append({"role": "tool", "content": tool_response})
            st.session_state.messages.append({"role": "assistant", "content": assistant_response})
            

When there is input, we write the input to the chat history on the screen and add it to the messages array. The OpenAI APIs expect a messages array that includes user and assistant roles. In other words, user questions and assistant (here gpt-4) responses.

With a valid messages array, we can send our payload to the Azure OpenAI extensions/chat endpoint. If you have ever worked with the OpenAI or Azure OpenAI APIs, many of the settings in the JSON body will be familiar. For example: temperature, max_tokens, and of course the messages themselves.

What’s new here is the dataSources field. It contains all the information required to perform a vector search in Azure Cognitive Services. The search finds content relevant to the user’s question (that was added last to the messages array). Because queryType is set to vector, we also need to provide the embedding endpoint and key. It’s required because the user question has to be vectorized in order to compare it with the stored vectors.

It’s important to note that the extensions/chat endpoint, together with the dataSources configuration takes care of a lot of the details:

  • Perform a k-nearest neighbor search (k=5 here) to find 5 documents closely related to the user’s question
  • It uses vector search for this query (could be combined with keyword and semantic search to perform a hybrid search but that is not used here)
  • It stuffs the prompt to the GPT model with the relevant content
  • It returns the GPT model response (assistant response) together with a tool response. The tool response contains citations that include URLs to the original content and the content itself.

In the UI, we print the URLs from these citations after modifying the assistant response to just return hyperlinked numbers like [0] and [1] for the citations instead of unlinked [doc1], [doc2], etc… In the UI, that looks like:

Printing the URLs from the citations

Note that this chat app is a prototype and does not include management of the messages array. Long interactions will reach the model’s token limit!

You can find this code on GitHub here: https://github.com/gbaeke/azure-cog-search.

Conclusion

Although still in preview, you now have an Azure-native solution that enables the RAG pattern with vector search. RAG stands for Retrieval Augmented Generation. Azure Cognitive Search is a fully managed service that stores the vectors and performs similarity searches. There is no need to deploy a 3rd party vector database.

There is no need for specific libraries to implement this feature because it is all part of the Azure OpenAI API. Microsoft simply extended that API to add data sources and takes care of all the behind-the-scenes work that finds relevant content and adds it to your prompt.

If, for some reason, you do not want to use the Azure OpenAI API directly and use something like LangChain or Semantic Kernel, you can of course still do that. Both solutions support Azure Cognitive Search as a vector store.

Improvements in Azure OpenAI Add your data

In a previous post, I talked about the Add your data feature in the Azure OpenAI Chat playground. Recently, there have been some updates to this feature, including vector search. Let’s take a look at the updated experience and focus on vector search.

Starting point

I have some PDF documents in a storage account container. They are PDFs containing job descriptions for a select group of companies. You can use .txt, .md, .html, Microsoft Word files, Microsoft PowerPoint files, or PDFs.

PDFs in a storage account

At the storage account level, CORS settings should be GET from all origins (*). This can also be set from the Add your data wizard in the OpenAI Playground.

CORS settings

In addition to the storage account, you need an OpenAI resource deployed to a region of choice. I have chosen France Central which provides access to gpt-4 and the text-embedding-ada-002 embedding model (a text embedding model is required for vector search). Ensure those models are deployed. For example:

Deployed models in France Central

Running the wizard

In OpenAI Chat Playground, you will find the Add your data (preview) tab. Use the + Add a data source button to start.

There are several sources to start from. Because I already have my files in a storage account, I will select Azure Blob Storage as the source and select the name of the storage account and the container with my files. You can also upload files or use an existing index in Cognitive Search. Whatever the option you choose, you will always end up with a Cognitive Search index that serves relevant content to the chat.

Data source options

As you can see from the above screenshot, in addition to the storage account, you have to select an Azure Cognitive Search instance. It will not be created for you. If you do not have such an instance, either click the link under the Select Azure Cognitive Search resource dropdown or create one yourself and use the refresh icon. I already have such a resource called acs-geba. Use the Basic pricing tier as a minimum. This gives you a vector quota of 1GB.

After selecting the Azure Cognitive Search resource, enter an index name. The documents in the storage account will be added to this index so we can search via this index later. The index will be created for you. I will use oai as the index name and also set a schedule to Hourly to update the index automatically. The schedule can be adjusted afterward in Azure Cognitive Search.

We now have the following in the wizard:

Add your data, data source config

You can now add vector search. This is in addition to keyword and semantic search. To use vector search, you need to specify an embedding model. If you do not have text-embedding-ada-002 deployed in your region, you will not be able to turn on vector search. This feature requires at least the Basic or higher SKU.

Turning on vector search (still in the first page of the wizard)

Above, I called my deployment of the text-embedding-ada-002 model embedding but you can use any name you like. It’s just a deployment name.

Now we can press Next, to be presented with the Data management page:

Data management page

You can find more information about those options here. In most cases, using Vector search alone is sufficient but it depends a bit on your dataset and use-case. I will just use Vector search. When we use Redis, Qdrant, Pinecone, or other vector stores, we also use vectorized search exclusively, which works very well.

After clicking Next, review what will happen and click Save and Close. The data will be added to the index:

Data is being added

Asking questions about the data

When the data has been added to Azure Cognitive Search, you can start asking questions. If you want to limit the chat to only your data, ensure that Limit Responses to your data content is checked.

Ready to go

In the Chat Platground, I selected gpt-4 and asked the following question: “Who are MBarQ and do they need AI translators?”. The answer is as follows:

Asking a question

This answer comes from one of the PDFs containing the job description.

Behind the scenes

For the above interaction to work, the question “Who are MBarQ and do they need AI translators?” is vectorized using the selected embedding model. Let’s call this the query vector. The selected embedding model creates a vector with 1536 dimensions that represent the text within a vector space. The nice thing here is that the embedding of the query is created automatically as part of the extended Azure OpenAI API.

The vectors for your documents are stored in an index that ends with the word chunks. Here’s my index and its defined fields. This is all the result of the wizard. No changes have been made to Azure Cognitive Search manually.

Index used for vector search and its fields

As you can see, there is a field for the contentVector which also notes the number of dimensions. The embedding model we used just happens to output 1536 numbers. Other embedding models use a different number of dimensions. Next to the contentVector, the content field contains the actual text that the vector was created from. That text will later be injected, behind the scenes, in the gpt-4 prompt. But we first have to find these pieces of text!

With the query vector in hand, Add your data searches for pieces of text with vectors that are close to the query vector. Cognitive Search uses cosine similarity to do that but there is no real need to know that. Note that we only use vector search in this scenario. When you do hybrid and/or semantic searches, the query process is different. Also, note that the index with vectors works on chunks of text coming from your documents. This chunking happens transparently in the background when the indexer runs.

Once the top N (usually n=5 but can be adjusted in code) vectors that are closest to the query vector are found, we also have the pieces of text closest to the query (from the content field). The original pieces of text that the vectors were calculated from get added to the query and sent to gpt-4. The prompt sent to gpt-4 could be something like the one below (just an example):

Who are MBarQ and do they need AI translators?

Only answer based on the context below after ---

---

First piece of text (no vectors here, just plain text!!!)

Second piece of text

...

Based on this prompt and hopefully relevant context below the — mark, the model can answer the original question.

Note that the Add your data experience also returns references. In the UI, you can click these to see the source text:

References

Deployment

From the Playground, you can deploy the chat experience to a web app or a Power Virtual Agent bot:

Deployment

At this moment (September 2023) the Power Virtual Agent deployment does not work if your default environment is not in the United States. When you click A new Power Virtual Agent bot…, you should quickly copy the URL and replace the environment ID with another one that is in the United States. Navigate to the modified URL to create the bot.

Deploying to a web app is a bit more straightforward because that is just a web app in Azure. No Power Platform madness here… 😀

Note that if you enable chat history, CosmosDB is used. Here’s the app with chat history visible at the right, similar to chat history in ChatGPT or Bing Chat. This app uses Azure Active Directory (Microsoft Entra ID) for authentication.

Chat in web app

Conclusion

The main addition to Add your data surely is vector search. That capability was already a part of Cognitive Search but the Add your data feature did not use it. When you do use it, a lot of stuff happens in the background automatically:

  • An index that supports vectors is created; if selected the index is automatically updated based on the contents in the storage account container
  • Documents are chunked and vectors are created for each of these chunks based on the selected embeddings model
  • There is no need to vectorize the user’s query yourself, performing a nearest neighbor search and stuffing the gpt prompt with content; everything is handled by the underlying API

It will be interesting to see how it evolves further.