docker – baeke.info

Building an AI Agent Server with AG-UI and Microsoft Agent Framework

In this post, I want to talk about the Python backend I built for an AG-UI demo project. It is part of a larger project that also includes a frontend that uses CopilotKit:

This post discusses the Python AG-UI server that is built with Microsoft Agent Framework.

All code is on GitHub: https://github.com/gbaeke/agui. Most of the code for this demo was written with GitHub Copilot with the help of Microsoft Docs MCP and Context7. 🤷

What is AG-UI?

Before we dive into the code, let’s talk about AG-UI. AG-UI is a standardized protocol for building AI agent interfaces. Think of it as a common language that lets your frontend talk to any backend agent that supports it, no matter what technology you use.

The protocol gives you some nice features out of the box:

Remote Agent Hosting: deploy your agents as web services (e.g. FastAPI)
Real-time Streaming: stream responses using Server-Sent Events (SSE)
Standardized Communication: consistent message format for reliable interactions (e.g. tool started, tool arguments, tool end, …)
Thread Management: keep conversation context across multiple requests

Why does this matter? Well, without a standard like AG-UI, every frontend needs custom code to talk to different backends. With AG-UI, you build your frontend once and it works with any AG-UI compatible backend. The same goes for backends – build it once and any AG-UI client can use it.

Under the hood, AG-UI uses simple HTTP POST requests for sending messages and Server-Sent Events (SSE) for streaming responses back. It’s not complicated, but it’s standardized. And that’s the point.

AG-UI has many more features than the ones discussed in this post. Check https://docs.ag-ui.com/introduction for the full picture.

Microsoft Agent Framework

Now, you could implement AG-UI from scratch but that’s a lot of work. This is where Microsoft Agent Framework comes in. It’s a Python (and C#) framework that makes building AI agents really easy.

The framework handles the heavy lifting when it comes to agent building:

Managing chat with LLMs like Azure OpenAI
Function calling (tools)
Streaming responses
Multi-turn conversations
And a lot more

The key concept is the ChatAgent. You give it:

A chat client (like Azure OpenAI)
Instructions (the system prompt)
Tools (functions the agent can call)

And you’re done. The agent knows how to talk to the LLM, when to call tools, and how to stream responses back.

What’s nice about Agent Framework is that it integrates with AG-UI out of the box, similar to other frameworks like LangGraph, Google ADK and others. You write your agent code and expose it via AG-UI with basically one line of code. The framework translates everything automatically – your agent’s responses become AG-UI events, tool calls get streamed correctly, etc…

The integration with Microsoft Agent Framework was announced on the blog of CopilotKit, the team behind AG-UI. The blog included the diagram below to illustrate the capabilities:

From https://www.copilotkit.ai/blog/microsoft-agent-framework-is-now-ag-ui-compatible

The Code

Let’s look at how this actually works in practice. The code is pretty simple. Most of the code is Microsoft Agent Framework code. AG-UI gets exposed with one line of code.

The Server (server.py)

The main server file is really short:

import uvicorn
from api import app
from config import SERVER_HOST, SERVER_PORT

def main():
    print(f"🚀 Starting AG-UI server at http://{SERVER_HOST}:{SERVER_PORT}")
    uvicorn.run(app, host=SERVER_HOST, port=SERVER_PORT)

if __name__ == "__main__":
    main()

That’s it. We run a FastAPI server on port 8888. The interesting part is in api/app.py:

from fastapi import FastAPI
from agent_framework.ag_ui.fastapi import add_agent_framework_fastapi_endpoint
from agents.main_agent import agent

app = FastAPI(title="AG-UI Demo Server")

# This single line exposes your agent via AG-UI protocol
add_agent_framework_fastapi_endpoint(app, agent, "/")

See that add_agent_framework_fastapi_endpoint() call? That’s all you need. This function from Agent Framework takes your agent and exposes it as an AG-UI endpoint. It handles HTTP requests, SSE streaming, protocol translation – everything.

You just pass in your FastAPI app, your agent, and the route path. Done.

The Main Agent (agents/main_agent.py)

Here’s where we define the actual agent with standard Microsoft Agent Framework abstractions:

from agent_framework import ChatAgent
from agent_framework.azure import AzureOpenAIChatClient
from azure.identity import DefaultAzureCredential
from config import AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT_NAME
from tools import get_weather, get_current_time, calculate, bedtime_story_tool

# Create Azure OpenAI chat client
chat_client = AzureOpenAIChatClient(
    credential=DefaultAzureCredential(),
    endpoint=AZURE_OPENAI_ENDPOINT,
    deployment_name=AZURE_OPENAI_DEPLOYMENT_NAME,
)

# Create the AI agent with tools
agent = ChatAgent(
    name="AGUIAssistant",
    instructions="You are a helpful assistant with access to tools...",
    chat_client=chat_client,
    tools=[get_weather, get_current_time, calculate, bedtime_story_tool],
)

This is the heart of the backend. We create a ChatAgent with:

A name: “AGUIAssistant”
Instructions: the system prompt that tells the agent how to behave
A chat client: AzureOpenAIChatClient that handles communication with Azure OpenAI
Tools: a list of functions the agent can call

The code implements a few toy tools and a sub-agent to illustrate how AG-UI handels tool calls. The tools are discussed below:

The Tools (tools/)

In Agent Framework, tools can be Python functions with a decorator:

from agent_framework import ai_function
import httpx
import json

@ai_function(description="Get the current weather for a location")
def get_weather(location: str) -> str:
    """Get real weather information for a location using Open-Meteo API."""
    # Step 1: Geocode the location
    geocode_url = "https://geocoding-api.open-meteo.com/v1/search"
    # ... make HTTP request ...
    
    # Step 2: Get weather data
    weather_url = "https://api.open-meteo.com/v1/forecast"
    # ... make HTTP request ...
    
    # Return JSON string
    return json.dumps({
        "location": resolved_name,
        "temperature": current["temperature_2m"],
        "condition": condition,
        # ...
    })

The @ai_function decorator tells Agent Framework “this is a tool the LLM can use”. The framework automatically:

Generates a schema from the function signature
Makes it available to the LLM
Handles calling the function when needed
Passes the result back to the LLM

You just write normal Python code. The function takes typed parameters (location: str) and returns a string. Agent Framework does the rest.

The weather tool calls the Open-Meteo API to get real weather data. In an AG-UI compatible client, you can intercept the tool result and visualize it any way you want before the LLM generates an answer from the tool result:

Above, when the user asks for weather information, AG-UI events inform the client that a tool call has started and ended. It also streams the tool result back to the client which uses a custom component to render the information. This happens before the chat client generates the answer based on the tool result.

The Subagent (tools/storyteller.py)

This is where it gets interesting. In Agent Framework, a ChatAgent can become a tool with .as_tool():

from agent_framework import ChatAgent
from agent_framework.azure import AzureOpenAIChatClient

# Create a specialized agent for bedtime stories
bedtime_story_agent = ChatAgent(
    name="BedTimeStoryTeller",
    description="A creative storyteller that writes engaging bedtime stories",
    instructions="""You are a gentle and creative bedtime story teller.
When given a topic, create a short, soothing bedtime story for children.
Your stories should be 3-5 paragraphs long, calming, and end peacefully.""",
    chat_client=chat_client,
)

# Convert the agent to a tool
bedtime_story_tool = bedtime_story_agent.as_tool(
    name="tell_bedtime_story",
    description="Generate a calming bedtime story based on a theme",
    arg_name="theme",
    arg_description="The theme for the story (e.g., 'a brave rabbit')",
)

This creates a subagent – another ChatAgent with different instructions. When the main agent needs to tell a bedtime story, it calls tell_bedtime_story which delegates to the subagent.

Why is this useful? Because you can give each agent specialized instructions. The main agent handles general questions and decides which tool to use. The storyteller agent focuses only on creating good stories. Clean separation of concerns.

The subagent has its own chat client and can have its own tools too if you want. It’s a full agent, just exposed as a tool.

And because it is a tool, you can render it with the standard AG-UI tool events:

Testing with a client

In src/backend there is a Python client client_raw.py. When you run that client against the server and invoke a tool, you will see something like below:

This client simply uses httpx to talk the AG-UI server and inspects and renders the AG-UI events as they come in.

Why This Works

Let me tell you what I like about this setup:

Separation of concerns: The frontend doesn’t know about Python, Azure OpenAI, or any backend details. It just speaks AG-UI. You could swap the backend for a C# implementation or something else entirely – the frontend wouldn’t care. Besides of course the handling of specific tool calls.

Standard protocol: Because we use AG-UI, any AG-UI client can talk to this backend. We use CopilotKit in the frontend but you could use anything that speaks AG-UI. Take the Python client as an example.

Framework handles complexity: Streaming, tool calls, conversation history, protocol translation – Agent Framework does all of this. You just write business logic.

Easy to extend: Want a new tool? Write a function with @ai_function. Want a specialized agent? Create a ChatAgent and call .as_tool(). That’s it.

The AG-UI documentation explains that the protocol supports 7 different features including human-in-the-loop, generative UI, and shared state. Our simple backend gets all of these capabilities because Agent Framework implements the protocol.

Note that there are many more capabilities. Check the AG-UI interactive Dojo to find out: https://dojo.ag-ui.com/microsoft-agent-framework-python

Wrap Up

This is a simple but powerful pattern for building AI agent backends. You write minimal code and get a lot of functionality. AG-UI gives you a standard way to expose your agent, and Microsoft Agent Framework handles the implementation details.

If you want to try this yourself, the code is in the repo. You’ll need an Azure OpenAI deployment and follow the OAuth setup. After that, just run the code as instructed in the repo README!

The beauty is in the simplicity. Sometimes the best code is the code you don’t have to write.

Load balancing OpenAI API calls with LiteLLM

If you have ever created an application that makes calls to Azure OpenAI models, you know there are limits to the amount of calls you can make per minute. Take a look at the settings of a GPT model below:

Above, the tokens per minute (TPM) rate limit is set to 60 000 tokens. This translates to about 360 requests per minute. When you exceed these limits, you get 429 Too Many Requests errors.

There are many ways to deal with these limits. A few of the main ones are listed below:

You can ask for a PAYGO quota increase: remember that high quotas do not necessarily lead to consistent lower-latency responses
You can use PTUs (provisioned throughput units): highly recommended if you want consistently quick responses with the lowest latency. Don’t we all? 😉
Your application can use retries with backoffs. Note that OpenAI libraries use automatic retries by default. For Python, it is set to two but that is configurable.
You can use multiple Azure OpenAI instances and load balance between them

In this post, we will take a look at implementing load balancing between OpenAI resources with an open source solution called LiteLLM. Note that, in Azure, you can also use Azure API Management. One example is discussed here. Use it if you must but know it is not simple to configure.

A look at LiteLLM

LiteLLM has many features. In this post, I will be implementing it as a standalone proxy, running as a container in Azure Kubernetes Service (AKS). The proxy is part of a larger application illustrated in the diagram below:

The application above has an upload service that allows users to upload a PDF or other supported document. After storing the document in an Azure Storage Account container, the upload service sends a message to an Azure Service Bus topic. The process service uses those messages to process each file. One part of the process is the use of Azure OpenAI to extract fields from the document. For example, a supplier, document number or anything else.

To support the processing of many documents, multiple Azure OpenAI resources are used: one in France and one in Sweden. Both regions have the gpt-4-turbo model that we require.

The process service uses the Python OpenAI library in combination with the instructor library. Instructor is great for getting structured output from documents based on Pydantic classes. Below is a snippet of code:

from openai import OpenAI
import instructor

client = instructor.from_openai(OpenAI(
        base_url=azure_openai_endpoint,
        api_key=azure_openai_key
))

The only thing we need to do is to set the base_url to the LiteLLM proxy. The api_key is configurable. By default it is empty but you can configure a master key or even virtual keys for different teams and report on the use of these keys. More about that later.

The key point here is that LiteLLM is a transparent proxy that fully supports the OpenAI API. Your code does not have to change. The actual LLM does not have to be an OpenAI LLM. It can be Gemini, Claude and many others.

Let’s take a look at deploying the proxy in AKS.

Deploying LiteLLM on Kubernetes

Before deploying LiteLLM, we need to configure it via a config file. In true Kubernetes style, let’s do that with a ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: litellm-config-file
data:
  config.yaml: |
      model_list: 
        - model_name: gpt-4-preview
          litellm_params:
            model: azure/gpt-4-preview
            api_base: os.environ/SWE_AZURE_OPENAI_ENDPOINT
            api_key: os.environ/SWE_AZURE_OPENAI_KEY
          rpm: 300
        - model_name: gpt-4-preview
          litellm_params:
            model: azure/gpt-4-preview
            api_base: os.environ/FRA_AZURE_OPENAI_ENDPOINT
            api_key: os.environ/FRA_AZURE_OPENAI_KEY
          rpm: 360
      router_settings:
        routing_strategy: least-busy
        num_retries: 2
        timeout: 60                                  
        redis_host: redis
        redis_password: os.environ/REDIS_PASSWORD
        redis_port: 6379
      general_settings:
        master_key: os.environ/MASTER_KEY

The configuration contains a list of models. Above, there are two models with the same name: gpt-4-preview. Each model points to a deployed model in Azure with the same name (can be different) and its own API base and key. For example, the first model uses an API base and API key for my instance in Sweden. However, by using os.environ/ and appending an environment variable, we can tell LiteLLM to use an environment variable. Of course, that means we have to set these environment variables in the LiteLLM container. We will do that later.

When the code in the process service uses the gpt-4-preview model via the proxy, the proxy will perform load balancing based on the router settings.

To spin up more than one instance of LiteLLM, a Redis instance is required. Redis is used to share information between the instances to make routing decisions. The routing strategy is set to least-busy.

Note that retries is set to 2. You can turn off retries in your code and let the proxy handle this for you.

To support mounting the secrets as environment variables, I use a .env file in combination with a secretGenerator in Kustomize:

STORAGE_CONNECTION_STRING=<placeholder for storage connection string>
CONTAINER=<placeholder for container name>
AZURE_AI_ENDPOINT=<placeholder for Azure AI endpoint>
AZURE_AI_KEY=<placeholder for Azure AI key>
AZURE_OPENAI_ENDPOINT=<placeholder for Azure OpenAI endpoint>
AZURE_OPENAI_KEY=<placeholder for Azure OpenAI key>

LLM_LITE_SWE_AZURE_OPENAI_ENDPOINT=<placeholder for LLM Lite SWE Azure OpenAI endpoint>
LLM_LITE_SWE_AZURE_OPENAI_KEY=<placeholder for LLM Lite SWE Azure OpenAI key>

LLM_LITE_FRA_AZURE_OPENAI_ENDPOINT=<placeholder for LLM Lite FRA Azure OpenAI endpoint>
LLM_LITE_FRA_AZURE_OPENAI_KEY=<placeholder for LLM Lite FRA Azure OpenAI key>

TOPIC_KEY=<placeholder for topic key>
TOPIC_ENDPOINT=<placeholder for topic endpoint>
PUBSUB_NAME=<placeholder for pubsub name>
TOPIC_NAME=<placeholder for topic name>
SB_CONNECTION_STRING=<placeholder for Service Bus connection string>

REDIS_PASSWORD=<placeholder for Redis password>
MASTER_KEY=<placeholder for Cosmos DB master key>

POSTGRES_DB_URL=postgresql://USER:PASSWORD@SERVERNAME-pg.postgres.database.azure.com:5432/postgres

There are many secrets here. Some are for LiteLLM, although weirdly prefixed with LLM_LITE instead. I do that sometimes! The others are to support the upload and process services.

To get these values into secrets, I use the following kustomization.yaml:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: inv-demo


resources:
- namespace.yaml
- pubsub.yaml
- upload.yaml
- process.yaml
- llmproxy.yaml
- redis.yaml

secretGenerator:
- name: invoices-secrets
  envs:
  - .env
  
generatorOptions:
  disableNameSuffixHash: true

The secretGenerator will create a secret called invoices-secrets in the inv-demo namespace. We can reference the secrets in the LiteLLM Kubernetes deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: litellm-deployment
  labels:
    app: litellm
spec:
  replicas: 2
  selector:
    matchLabels:
      app: litellm
  template:
    metadata:
      labels:
        app: litellm
    spec:
      containers:
      - name: litellm
        image: ghcr.io/berriai/litellm:main-latest
        args:
        - "--config"
        - "/app/proxy_server_config.yaml"
        ports:
        - containerPort: 4000
        volumeMounts:
        - name: config-volume
          mountPath: /app/proxy_server_config.yaml
          subPath: config.yaml
        env:
        - name: SWE_AZURE_OPENAI_ENDPOINT
          valueFrom:
            secretKeyRef:
              name: invoices-secrets
              key: LLM_LITE_SWE_AZURE_OPENAI_ENDPOINT
        - name: SWE_AZURE_OPENAI_KEY
          valueFrom:
            secretKeyRef:
              name: invoices-secrets
              key: LLM_LITE_SWE_AZURE_OPENAI_KEY
        - name: FRA_AZURE_OPENAI_ENDPOINT
          valueFrom:
            secretKeyRef:
              name: invoices-secrets
              key: LLM_LITE_FRA_AZURE_OPENAI_ENDPOINT
        - name: FRA_AZURE_OPENAI_KEY
          valueFrom:
            secretKeyRef:
              name: invoices-secrets
              key: LLM_LITE_FRA_AZURE_OPENAI_KEY
        - name: MASTER_KEY
          valueFrom:
            secretKeyRef:
              name: invoices-secrets
              key: MASTER_KEY
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: invoices-secrets
              key: POSTGRES_DB_URL
      volumes:
        - name: config-volume
          configMap:
            name: litellm-config-file

The ConfigMap content is mounted as /app/proxy_server_config.yaml. You need to specify the config file via the --config parameter, supplied in args.

Next, we simply mount all the environment variables that we need. The LiteLLM ConfigMap uses several of those via the os.environ references. There is also a DATABASE_URL that is not mentioned in the ConfigMap. The URL points to a PostgreSQL instance in Azure where information is kept to support the LiteLLM dashboard and other settings. If you do not want the dashboard feature, you can omit the database URL.

There’s one last thing: the process service needs to connect to LiteLLM via Kubernetes internal networking. Of course, that means we need a service:

apiVersion: v1
kind: Service
metadata:
  name: litellm-service
spec:
  selector:
    app: litellm
  ports:
  - protocol: TCP
    port: 80
    targetPort: 4000
  type: ClusterIP

With this service definition, the process service can set the OpenAI base URL as http://litellm-service to route all requests to the proxy via its internal IP address.

As you can probably tell from the kustomization.yaml file, the ConfigMap, Deployment and Service are in llmproxy.yaml. The other YAML files do the following:

namespace.yaml: creates the inv-demo namespace
upload.yaml: deploys the upload service (written in Python and uses FastAPI, 1 replica))
process.yaml: deploys the process service (written in Python as a Dapr grpc service, 2 replicas)
pubsub.yaml: creates a Dapr pubsub component that uses Azure Service Bus
redis.yaml: creates a standalone Redis instance to support multiple replicas of the LiteLLM proxy

To deploy all of the above, you just need to run the command below:

kubectl apply -k .

⚠️ Although this can be used in production, several shortcuts are taken. One thing that would be different is secrets management. Secrets would be in a Key Vault and made available to applications via the Secret Store CSI driver or other solutions.

With everything deployed, I see the following in k9s:

As a side note, I also use Diagrid to provide insights about the use of Dapr on the cluster:

upload and process are communicating via pubsub (Service Bus)

Dapr is only used between process and upload. The other services do not use Dapr and, as a result, are not visible here. The above is from Diagrid Conductor Free. As I said…. total side note! 🤷‍♂️

Back to the main topic…

The proxy in action

Let’s see if the proxy uses both Azure OpenAI instances. The dashboard below presents a view of the metrics after processing several documents:

It’s clear that the proxy uses both resources. Remember that this is the least-busy routing option. It picks the deployment with the least number of ongoing calls. Both these instances are only used by the process service so the expectation is a more or less even distribution.

LiteLLM Dashboard

If you configured authentication in combination with providing a URL to a PostGreSQL database, you can access the dashboard. To see the dashboard in action without deploying it, see https://litellm.vercel.app/docs/proxy/demo.

One of the things you can do is creating teams. Below, you see a team called dev which has access to only the gpt-4-preview model with unlimited TPM and RPM:

In addition to the team, a virtual key is created and assigned to the team. This virtual key starts with sk- and is used as the OpenAI API key in the process service:

We can now report on the use of OpenAI by the dev team:

Above, there’s a small section that’s unassigned because I used LiteLLM without a key and a master key before switching to a team-based key.

The idea here is that you can deploy the LiteLLM proxy centrally and hand out virtual keys to teams so they can all access their models via the proxy. We have not tested this in a production setting yet but it is certainly something worth exploring.

Conclusion

I have only scratched the surface of LiteLLM here but my experience with it so far is pretty good. If you want to deploy it as a central proxy server that developers can use to access models, deployment to Kubernetes and other environments with the container image is straightforward.

In this post I used Kubernetes but that is not required. It runs in Container Apps and other container runtimes as well. In fact, you do not need to run it in a container at all. It also works as a standalone application or can be used directly in your Python apps.

There is much more to explore but for now, if you need a transparent OpenAI-based proxy that works with many different models, take a look at LiteLLM.

Giving Microsoft’s Radius a spin

Microsoft recently announced Radius. As stated in their inaugural blog post, it is “a tool to describe, deploy, and manage your entire application”. With Radius, you describe your application in a bicep file. This can include containers, databases, the connections between those and much more. Radius is an open-source solution started from within Microsoft. The name is somewhat confusing because of RADIUS, a network authentication solution developed in 1991!

Starting point: app running locally

Instead of talking about it, let’s start with an application that runs locally on a development workstation and uses Dapr:

The ui is a Flask app that presents a text area and a button. When the user clicks the button, the code that handles the event calls the api using Dapr invoke. If you do not know what Dapr is, have a look at docs.dapr.io. The api saves the user’s question and a fake response to Redis. If Redis cannot be found, the api will simply log it could not save the data. The response is returned to the ui.

To run the application with Dapr on a development machine, I use a dapr.yaml file in combination with dapr run -f . See multi-app run for more details.

Here’s the yaml file:

version: 1
apps:
  - appID: ui
    appDirPath: ./ui
    appPort: 8001
    daprHTTPPort: 3510
    env:
      DAPR_APP: api
    command: ["python3","app.py"]
  - appID: api
    appDirPath: ./api
    appPort: 8000
    daprHTTPPort: 3511
    env:
      REDIS_HOST: localhost
      REDIS_PORT: 6379
      REDIS_DB: 0
    command: ["python3","app.py"]

Note that the api needs a couple of environment variables to find the Redis instance. The ui needs one environment variable DAPR_APP that holds the Dapr appId of the api. The Dapr invoke call needs this appId in order to find the api on the network.

In Python, the Dapr invoke call looks like this:

with DaprClient() as d:
        log.info(f"Making call to {dapr_app}")
        resp = d.invoke_method(dapr_app, 'generate', data=bytes_data,
                                 http_verb='POST', content_type='application/json')
        log.info(f"Response from API: {resp}")

The app runs fine locally if you have Python and the dependencies as listed in both the ui’s and api’s requirements.txt file. Let’s try to deploy the app with Radius.

Deploying the app with Radius

Before we can deploy the app with Radius, you need to install a couple of things:

rad CLI: I installed the CLI on MacOS; see the installation instructions for more details
VS Code extension: Radius uses a forked version of Bicep that is older than the current version of Bicep. The two will eventually converge but for now, you need to disable the official Bicep extension in VS Code and install the Radius Bicep extension. This is needed to support code like import radius as radius, which is not supported in the current version of Bicep.
Kubernetes cluster: Radius uses Kubernetes and requires the installation of the Radius control plane on that cluster. I deployed a test & dev AKS cluster in Azure and ensured it was set as my current context. Use kubectl config current-context to check that.
Install Dapr: our app uses Dapr and Radius supports it; however, Dapr needs to be manually installed on the cluster; if you have Dapr on your local machine, run dapr init -k to install it on Kubernetes

Now you can clone my raddemo repo. Use git clone https://github.com/gbaeke/raddemo.git. In the raddemo folder, you will see two folders: api and ui. In the root folder, run the following command:

rad init

Select Yes to use the current folder.

Running rad init does the following:

Installs Radius to the cluster in the radius-system namespace
Creates a new environment and workspace (called default)
Sets up a local-dev recipe pack: recipes allow you to install resources your app needs like Redis, MySQL, etc…

After installation, this is the view on the radius-system Kubernetes namespace with k9s:

There should also be a .rad folder with a rad.yaml file:

workspace:
  application: "raddemo"

The file defines a workspace with our application name raddemo. raddemo is the name of the folder where I ran rad init. You can have multiple workspaces defined with one selected as the default. For instance, you could have a dev and prod workspace where each workspace uses a different Kubernetes cluster and environment. The default could be set to dev but you can easily switch to prod using the rad CLI. Check this overview of workspaces for more information. I am going to work with just one workspace called default, which uses an environment called default. When you just run rad init, those are the defaults.

You also get a default app.bicep file:

import radius as radius
param application string

resource demo 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'demo'
  properties: {
    application: application
    container: {
      image: 'radius.azurecr.io/samples/demo:latest'
      ports: {
        web: {
          containerPort: 3000
        }
      }
    }
  }
}

This is deployable code. If you run rad run app.bicep, a Kubernetes pod will be deployed to your cluster, using the image above. Radius would also setup port forwarding to access the app on it’s containerPort (3000).

We will change this file to deploy the ui. We will remove the application parameter and define our own application. That application needs an environment which we will pass in via a parameter:

import radius as radius

@description('Specifies the environment for resources.')
param environment string

resource app 'Applications.Core/applications@2023-10-01-preview' = {
  name: 'raddemo'
  properties: {
    environment: environment
  }
}

resource ui 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'ui'
  properties: {
    application: app.id
    container: {
      image: 'gbaeke/radius-ui:latest'
      ports: {
        web: {
          containerPort: 8001
        }
      }
    }
    extensions: [
      {
        kind: 'daprSidecar'
        appId: 'ui'
      }
    ]
  }
}

Above, we define the following:

a resource of type Applications.Core/applications: because applications run on Kubernetes, you can use a different namespace than the default and also set labels and annotations. All labels and annotations would be set on all resources belonging to the app, such as containers
the app resource needs an environment: the environment parameter is defined in the Bicep file and is set automatically by the rad CLI; it will match the environment used by your current workspace; environments can also have cloud credentials attached to deploy resources in Azure or AWS; we are not using that here
a resource of type Applications.Core/containers: this will create a pod in a Kubernetes namespace; the container belongs to the app we defined (application property) and uses the image gbaeke/ui-radius:latest on Docker Hub. Radius supports Dapr via extensions. The Dapr sidecar is added via these extensions with the app Id of ui.

In Kubernetes, this results in a pod with two containers: the ui container and the Dapr sidecar.

ui and Dapr sidecar

When you run rad run app.bicep, you should see the resources in namespace default-raddemo. The logs of all containers should stream to your console and local port 8001 should be mapped to the pod’s port 8001. http://localhost:8001 should show:

We will end this post by also deploying the api. It also needs Dapr and we need to update the definition of the ui container by adding an environment variable:

import radius as radius

@description('Specifies the environment for resources.')
param environment string

resource app 'Applications.Core/applications@2023-10-01-preview' = {
  name: 'raddemo'
  properties: {
    environment: environment
  }
}

resource ui 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'ui'
  properties: {
    application: app.id
    container: {
      image: 'gbaeke/radius-ui:latest'
      ports: {
        web: {
          containerPort: 8001
        }
      }
      env: {
        DAPR_APP: api.name  // api name is the same as the Dapr app id here
      }
    }
    extensions: [
      {
        kind: 'daprSidecar'
        appId: 'ui'
      }
    ]
  }
}

resource api 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'api'
  properties: {
    application: app.id
    container: {
      image: 'gbaeke/radius-api:latest'
      ports: {
        web: {
          containerPort: 8000
        }
      }
    }
    extensions: [
      {
        kind: 'daprSidecar'
        appId: 'api'
        appPort: 8000
      }
    ]
  }
}

Above, we added the api container, enabled Dapr, and set the Dapr appId to api. In the ui, we set environment variable DAPR_APP to api.name. We can do this because the name of the api resource is the same as the appId. This also makes Radius deploy the api before the ui. Note that the api does not have Redis environment variables. It will default to finding Redis at localhost, which will fail. But that’s ok.

You now have two pods in your namespace:

Yes, there are three here but Redis will be added in a later post.

Note that instead of running rad run app.bicep, you can also run rad deploy app.bicep. The latter simply deploys the application. It does not forward ports or stream logs.

Summary

In this post, we touched on the basics of using Radius to deploy an application that uses Dapr. Under the hood, Radius uses Kubernetes to deploy container resources specified in the Bicep file. To run the application, simply run rad run app.bicep to deploy the app, stream all logs and set up port forwarding.

We barely scratched the surface here so in a next post, we will add Redis via a recipe, and make the application available publicly via a gateway. Stay tuned!

Enhancing Semantic Search with a Streamlit UI

In a previous blog post, we discussed two Python programs, upload_vectors.py and search_vectors.py. These programs were used to create and search vectors, respectively. The upload_vectors.py script created vectors from chunks of a larger text and stored them in Pinecone, while the search_vectors.py script enabled semantic search on the text. In this blog post, we will discuss how to create a user interface (UI) for these two programs using Streamlit.

🚀 I kickstarted the Streamlit app by handing over the text-based version to ChatGPT and asking it to work its magic ✨💻. Yes, it was that easy! Afterwards, I made several manual changes to make it look the way I wanted.

Pinecone, Vectors, Embeddings, and Semantic Search: What’s all that about?

Pinecone is a vector database service that allows for easy storage and retrieval of high-dimensional vectors. It is optimized for similarity search, which makes it a perfect fit for tasks like semantic search. Our script stores vectors in Pinecone by parsing an RSS feed, chunking the blog posts, and creating the vectors with OpenAI’s embedding APIs.

Vectors are mathematical representations of data in the form of an array of numbers. In our case, we use vectors to represent chunks of text retrieved from blog posts. These vectors are generated using a process called embedding, which is a way of representing complex data, like text, in a lower-dimensional space while preserving the essential information.

Semantic search is a type of search that goes beyond keyword matching to understand the meaning and context of the query. By using vector embeddings, we can compare the similarity between queries and stored texts to find the most relevant results. Pinecone does that search for us and simply returns a number of matching chunks (pieces of text).

What is Streamlit?

Streamlit is a Python library that makes it easy to create custom web apps for machine learning and data science projects. You can build interactive UIs with minimal code, allowing you to focus on the core logic of your application.

Here’s an example of creating an extremely simple Streamlit app:

import streamlit as st

st.title('Hello, Streamlit!')
st.write('This is a simple Streamlit app.')

This code would generate a web app with a title and a text output. You can also create more complex UIs with user input, like sliders, text inputs, and buttons.

Creating a Streamlit UI for Semantic Search

Now let’s examine the provided code for creating a Streamlit UI for the search_vectors.py program. The code can be broken down into the following sections:

Import necessary libraries and check environment variables.
Set up the tokenizer and define the tiktoken_len function.
Create the UI elements, including the title, text input, dropdown, sliders, and buttons.
Define the main search functionality that is triggered when the user clicks the “Search” button.

Here is the full code:

import os
import pinecone
import openai
import tiktoken
import streamlit as st

# check environment variables
if os.getenv('PINECONE_API_KEY') is None:
    st.error("PINECONE_API_KEY not set. Please set this environment variable and restart the app.")
if os.getenv('PINECONE_ENVIRONMENT') is None:
    st.error("PINECONE_ENVIRONMENT not set. Please set this environment variable and restart the app.")
if os.getenv('OPENAI_API_KEY') is None:
    st.error("OPENAI_API_KEY not set. Please set this environment variable and restart the app.")

# use cl100k_base tokenizer for gpt-3.5-turbo and gpt-4
tokenizer = tiktoken.get_encoding('cl100k_base')


def tiktoken_len(text):
    tokens = tokenizer.encode(
        text,
        disallowed_special=()
    )
    return len(tokens)

# create a title for the app
st.title("Search blog feed 🔎")

# create a text input for the user query
your_query = st.text_input("What would you like to know?")
model = st.selectbox("Model", ["gpt-3.5-turbo", "gpt-4"])

with st.expander("Options"):

    max_chunks = 5
    if model == "gpt-4":
        max_chunks = 15

    max_reply_tokens = 1250
    if model == "gpt-4":
        max_reply_tokens = 2000

    col1, col2 = st.columns(2)

    # model dropdown
    with col1:
        chunks = st.slider("Number of chunks", 1, max_chunks, 5)
        temperature = st.slider("Temperature", 0.0, 1.0, 0.0)

    with col2:
        reply_tokens = st.slider("Reply tokens", 750, max_reply_tokens, 750)
    

# create a submit button
if st.button("Search"):
    # get the Pinecone API key and environment
    pinecone_api = os.getenv('PINECONE_API_KEY')
    pinecone_env = os.getenv('PINECONE_ENVIRONMENT')

    pinecone.init(api_key=pinecone_api, environment=pinecone_env)

    # set index
    index = pinecone.Index('blog-index')


    # vectorize your query with openai
    try:
        query_vector = openai.Embedding.create(
            input=your_query,
            model="text-embedding-ada-002"
        )["data"][0]["embedding"]
    except Exception as e:
        st.error(f"Error calling OpenAI Embedding API: {e}")
        st.stop()

    # search for the most similar vector in Pinecone
    search_response = index.query(
        top_k=chunks,
        vector=query_vector,
        include_metadata=True)

    # create a list of urls from search_response['matches']['metadata']['url']
    urls = [item["metadata"]['url'] for item in search_response['matches']]

    # make urls unique
    urls = list(set(urls))

    # create a list of texts from search_response['matches']['metadata']['text']
    chunk_texts = [item["metadata"]['text'] for item in search_response['matches']]

    # combine texts into one string to insert in prompt
    all_chunks = "\n".join(chunk_texts)

    # show urls of the chunks
    with st.expander("URLs", expanded=True):
        for url in urls:
            st.markdown(f"* {url}")
    

    with st.expander("Chunks"):
        for i, t in enumerate(chunk_texts):
            # remove newlines from chunk
            tokens = tiktoken_len(t)
            t = t.replace("\n", " ")
            st.write("Chunk ", i, "(Tokens: ", tokens, ") - ", t[:50] + "...")
    with st.spinner("Summarizing..."):
        try:
            prompt = f"""Answer the following query based on the context below ---: {your_query}
                                                        Do not answer beyond this context!
                                                        ---
                                                        {all_chunks}"""


            # openai chatgpt with article as context
            # chat api is cheaper than gpt: 0.002 / 1000 tokens
            response = openai.ChatCompletion.create(
                model=model,
                messages=[
                    { "role": "system", "content":  "You are a truthful assistant!" },
                    { "role": "user", "content": prompt }
                ],
                temperature=temperature,
                max_tokens=max_reply_tokens
            )

            st.markdown("### Answer:")
            st.write(response.choices[0]['message']['content'])

            with st.expander("More information"):
                st.write("Query: ", your_query)
                st.write("Full Response: ", response)

            with st.expander("Full Prompt"):
                st.write(prompt)

            st.balloons()
        except Exception as e:
            st.error(f"Error with OpenAI Completion: {e}")

A closer look

The code first imports the necessary libraries and checks if the required environment variables are set, displaying an error message if they are not. The libraries you need are in requirements.txt on GitHub. You can install them with:

pip3 install -r requirements.txt

ℹ️ I recommend using a Python virtual environment when you install these dependencies; see poetry (just one example)

The tiktoken_len function calculates the token length of a given text using the tokenizer. This is used to display the tokens of each chunk of text we set to the ChatCompletion API. Depending on the model, 4096 or 8192 tokens are supported.

The UI is built using Streamlit functions, such as st.title, st.text_input, st.selectbox, and st.columns. These functions create various UI elements that the user can interact with to input their query and set search parameters. If you look at the code, you will see how easy it is to add those elements.

With the UI elements, you can set:

the number of text chunks to return from Pinecone and to forward to the ChatCompletion API (using st.slider)
the number of tokens to reply with (using st.slider)
the model: gpt-3.5-turbo or gpt-4 (ensure you have access to the gpt-4 API)
the temperature (using st-slider)

The options are shown in two columns with st.columns.

The main search functionality is triggered when the user clicks the “Search” button. The code then vectorizes the query, searches for the most similar vectors in Pinecone, and displays the URLs and chunks found. Finally, the selected model is used to generate an answer based on the chunks found and the user’s query. Often, gpt-4 will provide the best answer. It seems to be able to better understand all the chunks of text thrown at it.

Running the code

To run the code you need the following:

A Pinecode API key and environment
An OpenAI API key

It is easiest to run the code with Docker. If you have it installed, run the following command:

docker run -p 8501:8501 -e OPENAI_API_KEY="YOURKEY" \
    -e PINECONE_API_KEY="YOURKEY" \
    -e PINECONE_ENVIRONMENT="YOURENV" gbaeke/blogsearch

The gbaeke/blogsearch image is available on Docker Hub. You can also build your own with the Dockerfile provided on GitHub.

After running the image, go to http://localhost:8501 and first use the Upload page to create your Pinecode index and store vectors in it. You can use my blog’s feed or any other feed. You can experiment with the chunk size and chunk overlap.

You can add multiple RSS feeds one-by-one as long as you turn off Recreate index before each new upload. After you have populated the index, use the Search page to start searching:

Above, we ask what we can do with Pinecone and let gpt-4 do the answering. The similarity search will search for 5 similar items and return them. We show the original URLs these results come from. In the Chunks section, you can see the original chunks because they are also in Pinecone as metadata. After the answer, you can find the full JSON returned by the ChatCompletion API and the full prompt we sent to that API.

Conclusion

In this blog post, we showed you how to create a Streamlit UI for the search_vectors.py script we talked about in a previous post. Streamlit allows you to easily build interactive web applications for your machine learning and data science projects. We also created a UI to upload posts to Pinecone. The full program allows you to add as much data as you want and query that data with semantic search, summarized and synthesized by the GPT model of choice. Give it a try and let me know what you think.

Draft 2 and Ingress with Web Application Routing

If you read the previous article on Draft 2, we went from source code to deployed application in a few steps:

az aks draft create: creates a Dockerfile and Kubernetes manifests (deployment and service manifests)
az aks draft setup-gh: setup GitHub OIDC
az aks draft generate-workflow: create a GitHub workflow that builds and pushes the container image and deploys the application to Kubernetes

If you answer the questions from the commands above correctly, you should be up and running fairly quickly! 🚀

The manifests default to a Kubernetes service that uses the type LoadBalancer to configure an Azure public load balancer to access your app. But maybe you want to test your app with TLS and you do not want to configure a certificate in your container image? That is where the ingress configuration comes in.

You will need to do two things:

Configure web application routing: configures Ingress Nginx Controller and relies on Open Service Mesh (OSM) and the Secret Store CSI Driver for Azure Key Vault. That way, you are shielded from having to do all that yourself. I did have some issues with web application routing as described below.
Use az aks draft update to configure the your service to work with web application routing; this command will ask you for two things:
- the hostname for your service: you decide this but the name should resolve to the public IP of the Nginx Ingress Controller installed by web application routing
- a URI to a certificate on Azure Key Vault: you will need to deploy a Key Vault and upload or create the certificate

Configure web application routing

Although it should be supported, I could not enable the add-on on one of my existing clusters. On another one, it did work. I decided to create a new cluster with the add-on by running the following command:

az aks create --resource-group myResourceGroup --name myAKSCluster --enable-addons web_application_routing

⚠️ Make sure you use the most recent version of the Azure CLI aks-preview extension.

On my cluster, that gave me a namespace app-routing-system with two pods:

Although the add-on should also install Secrets Store CSI Driver, Open Service Mesh, and External DNS, that did not happen in my case. I installed the first two from the portal. I did not bother installing External DNS.

Create a certificate

I created a Key Vault in the same resource group as my AKS cluster. I configured the access policies to use Azure RBAC (role-based access control). It did not work with the traditional access policies. I granted myself and the identity used by web application routing full access:

Key Vault Administrator for myself and the user-assigned managed id of web app routing add-on

You need to grant the user-assigned managed identity of web application routing access because a SecretProviderClass will be created automatically for that identity. The Secret Store CSI Driver uses that SecretProviderClass to grab a certificate from Key Vault and generate a Kubernetes secret for it. The secret will later be used by the Kubernetes Ingress resource to encrypt HTTP traffic. How you link the Ingress resource to the certificate is for a later step.

Now, in Key Vault, generate a certificate:

In Key Vault, click Certificates and create a new one

Above, I use nip.io with the IP address of the Ingress Controller to generate a name that resolves to the IP. For example, 10.2.3.4.nip.io will resolve to 10.2.3.4. Try it with ping. It’s truly a handy service. Use kubectl get svc -n app-routing-system to find the Ingress Controller public (external) IP.

Now we have everything in place for draft to modify our Kubernetes service to use the ingress controller and certificate.

Using az aks draft update

Back on your machine, in the repo that you used in the previous article, run az aks draft update. You will be asked two questions:

Hostname: use <IP Address of Nginx>.nip.io (same as in the common name of the cert without CN=)
URI to the certificate in Key Vault: you can find the URI in the properties of the certificate

There will be a copy button at the right of the certificate identifier

Draft will now update your service to something like:

apiVersion: v1
kind: Service
metadata:
  annotations:
    kubernetes.azure.com/ingress-host: IPADDRESS.nip.io
    kubernetes.azure.com/tls-cert-keyvault-uri: https://kvdraft.vault.azure.net/certificates/mycert/IDENTIFIER
  creationTimestamp: null
  name: super-api
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: super-api
  type: ClusterIP
status:
  loadBalancer: {}

The service type is now ClusterIP. The annotations will be used for several things:

to create a placeholder deployment that mounts the certificate from Key Vault in a volume AND creates a secret from the certificate; the Secret Store CSI Driver always needs to mount secrets and certs in a volume; rather than using your application pod, they use a placeholder pod to create the secret
to create an Ingress resource that routes to the service and uses the certificate in the secret created via the placeholder pod
to create an IngressBackend resource in Open Service Mesh

In my default namespace, I see two pods after deployment:

the placeholder pod starts with keyvault and creates the secret; the other pod is my app

Note that above, I actually used a Helm deployment instead of a manifest-based deployment. That’s why you see release-name in the pod names.

The placeholder pod creates a csi volume that uses a SecretProviderClass to mount the certificate:

The SecretProviderClass references your Key Vault and managed identity to access the Key Vault:

If you have not assigned the correct access policy on Key Vault for the userAssignedIdentityID, the certificate cannot be retrieved and the pod will not start. The secret will not be created either.

I also have a secret with the cert inside:

Secret created by Secret Store CSI Driver; referenced by the Ingress

And here is the Ingress:

Ingress; note it says 8080 instead of the service port 80; do not change it! Never mind the app. in front of the IP; your config will not have that if you followed the instructions

All of this gets created for you but only after running az aks draft update and when you commit the changes to GitHub, triggering the workflow.

Did all this work smoothly from the first time?

The short answer is NO! 😉At first I thought Draft would take care of installing the Ingress components for me. That is not the case. You need to install and configure web application routing on your cluster and configure the necessary access rights.

I also thought web application routing would install and configure Open Service Mesh and Secret Store CSI driver. That did not happen although that is easily fixed by installing them yourself.

I thought there would be some help with certificate generation. That is not the case. Generating a self-signed certificate with Key Vault is easy enough though.

Once you have web application routing installed and you have a Key Vault and certificate, it is simple to run az aks draft update. That changes your Kubernetes service definition. After pushing that change to your repo, the updated service with the web application routing annotations can be deployed.

I got some 502 Bad Gateway errors from Nginx at first. I removed the OSM-related annotations from the Ingress object and tried some other things. Finally, I just redeployed the entire app and then it just started working. I did not spend more time trying to find out why it did not work from the start. The fact that Open Service Mesh is used, which has extra configuration like IngressBackends, will complicate troubleshooting somewhat. Especially if you have never worked with OSM, which is what I expect for most people.

Conclusion

Although this looks promising, it’s all still a bit rough around the edges. Adding OSM to the mix makes things somewhat more complicated.

Remember that all of this is in preview and we are meant to test drive it and provide feedback. However, I fear that, because of the complexity of Kubernetes, these tools will never truly make it super simple to get started as a developer. It’s just a tough nut to crack!

My own point of view here is that Draft v2 without az aks draft update is very useful. In most cases though, it’s enough to use standard Kubernetes services. And if you do need an ingress controller, most are easy to install and configure, even with TLS.

Quick Guide to Azure Container Apps

Now that Azure Container Apps (ACA) is generally available, it is time for a quick guide. These quick guides illustrate how to work with a service from the command line and illustrate the main features.

Prerequisites

All commands are run from bash in WSL 2 (Windows Subsystem for Linux 2 on Windows 11)
Azure CLI and logged in to an Azure subscription with an Owner role (use az login)
ACA extension for Azure CLI: az extension add --name containerapp --upgrade
Microsoft.App namespace registered: az provider register --namespace Microsoft.App; this namespace is used since March
If you have never used Log Analytics, also register Microsoft.OperationalInsights: az provider register --namespace Microsoft.OperationalInsights
jq, curl, sed, git

With that out of the way, let’s go… 🚀

Step 1: Create an ACA environment

First, create a resource group, Log Analytics workspace, and the ACA environment. An ACA environment runs multiple container apps and these apps can talk to each other. You can create multiple environments, for example for different applications or customers. We will create an environment that will not integrate with an Azure Virtual Network.

RG=rg-aca
LOCATION=westeurope
ENVNAME=env-aca
LA=la-aca # log analytics workspace name

# create the resource group
az group create --name $RG --location $LOCATION

# create the log analytics workspace
az monitor log-analytics workspace create \
  --resource-group $RG \
  --workspace-name $LA

# retrieve workspace ID and secret
LA_ID=`az monitor log-analytics workspace show --query customerId -g $RG -n $LA -o tsv | tr -d '[:space:]'`

LA_SECRET=`az monitor log-analytics workspace get-shared-keys --query primarySharedKey -g $RG -n $LA -o tsv | tr -d '[:space:]'`

# check workspace ID and secret; if empty, something went wrong
# in previous two steps
echo $LA_ID
echo $LA_SECRET

# create the ACA environment; no integration with a virtual network
az containerapp env create \
  --name $ENVNAME \
  --resource-group $RG\
  --logs-workspace-id $LA_ID \
  --logs-workspace-key $LA_SECRET \
  --location $LOCATION \
  --tags env=test owner=geert

# check the ACA environment
az containerapp env list -o table

Step 2: Create a front-end container app

The front-end container app accepts requests that allow users to store some data. Data storage will be handled by a back-end container app that talks to Cosmos DB.

The front-end and back-end use Dapr. This does the following:

Name resolution: the front-end can find the back-end via the Dapr Id of the back-end
Encryption: traffic between the front-end and back-end is encrypted
Simplify saving state to Cosmos DB: using a Dapr component, the back-end can easily save state to Cosmos DB without getting bogged down in Cosmos DB specifics and libraries

Check the source code on GitHub. For example, the code that saves to Cosmos DB is here.

For a container app to use Dapr, two parameters are needed:

–enable-dapr: enables the Dapr sidecar container next to the application container
–dapr-app-id: provides a unique Dapr Id to your service

APPNAME=frontend
DAPRID=frontend # could be different
IMAGE="ghcr.io/gbaeke/super:1.0.5" # image to deploy
PORT=8080 # port that the container accepts requests on

# create the container app and make it available on the internet
# with --ingress external; the envoy proxy used by container apps
# will proxy incoming requests to port 8080

az containerapp create --name $APPNAME --resource-group $RG \
--environment $ENVNAME --image $IMAGE \
--min-replicas 0 --max-replicas 5 --enable-dapr \
--dapr-app-id $DAPRID --target-port $PORT --ingress external

# check the app
az containerapp list -g $RG -o table

# grab the resource id of the container app
APPID=$(az containerapp list -g $RG | jq .[].id -r)

# show the app via its id
az containerapp show --ids $APPID

# because the app has an ingress type of external, it has an FQDN
# let's grab the FQDN (fully qualified domain name)
FQDN=$(az containerapp show --ids $APPID | jq .properties.configuration.ingress.fqdn -r)

# curl the URL; it should return "Hello from Super API"
curl https://$FQDN

# container apps work with revisions; you are now at revision 1
az containerapp revision list -g $RG -n $APPNAME -o table

# let's deploy a newer version
IMAGE="ghcr.io/gbaeke/super:1.0.7"

# use update to change the image
# you could also run the create command again (same as above but image will be newer)
az containerapp update -g $RG --ids $APPID --image $IMAGE

# look at the revisions again; the new revision uses the new
# image and 100% of traffic
# NOTE: in the portal you would only see the last revision because
# by default, single revision mode is used; switch to multiple 
# revision mode and check "Show inactive revisions"

az containerapp revision list -g $RG -n $APPNAME -o table

Step 3: Deploy Cosmos DB

We will not get bogged down in Cosmos DB specifics and how Dapr interacts with it. The commands below create an account, database, and collection. Note that I switched the write replica to eastus because of capacity issues in westeurope at the time of writing. That’s ok. Our app will write data to Cosmos DB in that region.

uniqueId=$RANDOM
LOCATION=useast # changed because of capacity issues in westeurope at the time of writing

# create the account; will take some time
az cosmosdb create \
  --name aca-$uniqueId \
  --resource-group $RG \
  --locations regionName=$LOCATION \
  --default-consistency-level Strong

# create the database
az cosmosdb sql database create \
  -a aca-$uniqueId \
  -g $RG \
  -n aca-db

# create the collection; the partition key is set to a 
# field in the document called partitionKey; Dapr uses the
# document id as the partition key
az cosmosdb sql container create \
  -a aca-$uniqueId \
  -g $RG \
  -d aca-db \
  -n statestore \
  -p '/partitionKey' \
  --throughput 400

Step 4: Deploy the back-end

The back-end, like the front-end, uses Dapr. However, the back-end uses Dapr to connect to Cosmos DB and this requires extra information:

a Dapr Cosmos DB component
a secret with the connection string to Cosmos DB

Both the component and the secret are defined at the Container Apps environment level via a component file.

# grab the Cosmos DB documentEndpoint
ENDPOINT=$(az cosmosdb list -g $RG | jq .[0].documentEndpoint -r)

# grab the Cosmos DB primary key
KEY=$(az cosmosdb keys list -g $RG -n aca-$uniqueId | jq .primaryMasterKey -r)

# update variables, IMAGE and PORT are the same
APPNAME=backend
DAPRID=backend # could be different

# create the Cosmos DB component file
# it uses the ENDPOINT above + database name + collection name
# IMPORTANT: scopes is required so that you can scope components
# to the container apps that use them

cat << EOF > cosmosdb.yaml
componentType: state.azure.cosmosdb
version: v1
metadata:
- name: url
  value: "$ENDPOINT"
- name: masterkey
  secretRef: cosmoskey
- name: database
  value: aca-db
- name: collection
  value: statestore
secrets:
- name: cosmoskey
  value: "$KEY"
scopes:
- $DAPRID
EOF

# create Dapr component at the environment level
# this used to be at the container app level
az containerapp env dapr-component set \
    --name $ENVNAME --resource-group $RG \
    --dapr-component-name cosmosdb \
    --yaml cosmosdb.yaml

# create the container app; the app needs an environment 
# variable STATESTORE with a value that is equal to the 
# dapr-component-name used above
# ingress is internal; there is no need to connect to the backend from the internet

az containerapp create --name $APPNAME --resource-group $RG \
--environment $ENVNAME --image $IMAGE \
--min-replicas 1 --max-replicas 1 --enable-dapr \
--dapr-app-port $PORT --dapr-app-id $DAPRID \
--target-port $PORT --ingress internal \
--env-vars STATESTORE=cosmosdb

Step 5: Verify end-to-end connectivity

We will use curl to call the following endpoint on the front-end: /call. The endpoint expects the following JSON:

{
 "appId": <DAPR Id to call method on>,
 "method": <method to call>,
 "httpMethod": <HTTP method to use e.g., POST>,
 "payload": <payload with key and data field as expected by Dapr state component>
}

As you have noticed, both container apps use the same image. The app was written in Go and implements both the /call and /savestate endpoints. It uses the Dapr SDK to interface with the Dapr sidecar that Azure Container Apps has added to our deployment.

To make the curl commands less horrible, we will use jq to generate the JSON to send in the payload field. Do not pay too much attention to the details. The important thing is that we save some data to Cosmos DB and that you can use Cosmos DB Data Explorer to verify.

# create some string data to send
STRINGDATA="'$(jq --null-input --arg appId "backend" --arg method "savestate" --arg httpMethod "POST" --arg payload '{"key": "mykey", "data": "123"}' '{"appId": $appId, "method": $method, "httpMethod": $httpMethod, "payload": $payload}' -c -r)'"

# check the string data (double quotes should be escaped in payload)
# payload should be a string and not JSON, hence the quoting
echo $STRINGDATA

# call the front end to save some data
# in Cosmos DB data explorer, look for a document with id 
# backend||mykey; content is base64 encoded because 
# the data is not json

echo curl -X POST -d $STRINGDATA https://$FQDN/call | bash

# create some real JSON data to save; now we need to escape the
# double quotes and jq will add extra escapes
JSONDATA="'$(jq --null-input --arg appId "backend" --arg method "savestate" --arg httpMethod "POST" --arg payload '{"key": "myjson", "data": "{\"name\": \"geert\"}"}' '{"appId": $appId, "method": $method, "httpMethod": $httpMethod, "payload": $payload}' -c -r)'"

# call the front end to save the data
# look for a document id backend||myjson; data is json

echo curl -v -X POST -d $JSONDATA https://$FQDN/call | bash

Step 6: Check the logs

Although you can use the Log Stream option in the portal, let’s use the command line to check the logs of both containers.

# check frontend logs
az containerapp logs show -n frontend -g $RG

# I want to see the dapr logs of the container app
az containerapp logs show -n frontend -g $RG --container daprd

# if you do not see log entries about our earlier calls, save data again
# the log stream does not show all logs; log analytics contains more log data
echo curl -v -X POST -d $JSONDATA https://$FQDN/call | bash

# now let's check the logs again but show more earlier logs and follow
# there should be an entry method with custom content; that's the
# result of saving the JSON data

az containerapp logs show -n frontend -g $RG --tail 300 --follow

Step 7: Use az containerapp up

In the previous steps, we used a pre-built image stored in GitHub container registry. As a developer, you might want to quickly go from code to deployed container to verify if it all works in the cloud. The command az containerapp up lets you do that. It can do the following things automatically:

Create an Azure Container Registry (ACR) to store container images
Send your source code to ACR and build and push the image in the cloud; you do not need Docker on your computer
Alternatively, you can point to a GitHub repository and start from there; below, we first clone a repo and start from local sources with the –source parameter
Create the container app in a new environment or use an existing environment; below, we use the environment created in previous steps

# clone the super-api repo and cd into it
git clone https://github.com/gbaeke/super-api.git && cd super-api

# checkout the quickguide branch
git checkout quickguide

# bring up the app; container build will take some time
# add the --location parameter to allow az containerapp up to 
# create resources in the specified location; otherwise it uses
# the default location used by the Azure CLI
az containerapp up -n super-api --source . --ingress external --target-port 8080 --environment env-aca

# list apps; super-api has been added with a new external Fqdn
az containerapp list -g $RG -o table

# check ACR in the resource group
az acr list -g $RG -o table

# grab the ACR name
ACR=$(az acr list -g $RG | jq .[0].name -r)

# list repositories
az acr repository list --name $ACR

# more details about the repository
az acr repository show --name $ACR --repository super-api

# show tags; az containerapp up uses numbers based on date and time
az acr repository show-tags --name $ACR --repository super-api

# make a small change to the code; ensure you are still in the
# root of the cloned repo; instead of Hello from Super API we
# will say Hi from Super API when curl hits the /
sed -i s/Hello/Hi/g cmd/app/main.go

# run az containerapp up again; a new container image will be
# built and pushed to ACR and deployed to the container app
az containerapp up -n super-api --source . --ingress external --target-port 8080 --environment env-aca

# check the image tags; there are two
az acr repository show-tags --name $ACR --repository super-api

# curl the endpoint; should say "Hi from Super API"
curl https://$(az containerapp show -g $RG -n super-api | jq .properties.configuration.ingress.fqdn -r)

Conclusion

In this quick guide (well, maybe not 😉) you have seen how to create an Azure Container Apps environment, add two container apps that use Dapr and used az containerapp up for a great inner loop dev experience.

I hope this was useful. If you spot errors, please let me know. Also check the quick guides on GitHub: https://github.com/gbaeke/quick-guides

Quick Guide to the Secret Store CSI driver for Azure Key Vault on AKS

Yesterday, I posted the Quick Guide to Kubernetes Workload Identity on AKS. This post contains a similar guide to enabling and using the Secret Store CSI driver for Azure Key Vault on AKS.

All commands assume bash. You should have the Azure CLI installed and logged in to the subscription as the owner (because you need to configure RBAC in the scripts below).

Step 1: Enable the driver

The command to enable the driver on an existing cluster is below. Please set the variables to point to your cluster and resource group:

RG=YOUR_RESOURCE_GROUP
CLUSTER=YOUR_CLUSTER_NAME

az aks enable-addons --addons=azure-keyvault-secrets-provider --name=$CLUSTER --resource-group=$RG

If the driver is already enabled, you will simply get a message stating that.

Step 2: Create a Key Vault

In this step, we create a Key Vault and configure RBAC. We will also add a sample secret.

# replace <SOMETHING> with a value like your initials for example
KV=<SOMETHING>$RANDOM

# name of the key vault secret
SECRET=demosecret

# value of the secret
VALUE=demovalue

# create the key vault and turn on Azure RBAC; we will grant a managed identity access to this key vault below
az keyvault create --name $KV --resource-group $RG --location westeurope --enable-rbac-authorization true

# get the subscription id
SUBSCRIPTION_ID=$(az account show --query id -o tsv)

# get your user object id
USER_OBJECT_ID=$(az ad signed-in-user show --query objectId -o tsv)

# grant yourself access to key vault
az role assignment create --assignee-object-id $USER_OBJECT_ID --role "Key Vault Administrator" --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RG/providers/Microsoft.KeyVault/vaults/$KV

# add a secret to the key vault
az keyvault secret set --vault-name $KV --name $SECRET --value $VALUE

You can use the portal to check the Key Vault and see the secret:

If you go to Access Policies, you will notice that the Key Vault uses Azure RBAC:

Step 3: Grant a managed identity access to Key Vault

In the previous step, your account was granted access to Key Vault. In this step, we will grant the same access to the managed identity that the secret store csi provider will use. We will need to configure the managed identity we want to use in later steps.

This guide uses the managed identity created by the secret store provider. It lives in the resource group associated with your cluster. By default, that group starts with MC_. The account is called azurekeyvaultsecretsprovider-<CLUSTER-NAME>.

# grab the managed identity principalId assuming it is in the default
# MC_ group for your cluster and resource group
IDENTITY_ID=$(az identity show -g MC\_$RG\_$CLUSTER\_westeurope --name azurekeyvaultsecretsprovider-$CLUSTER --query principalId -o tsv)

# grant access rights on Key Vault
az role assignment create --assignee-object-id $IDENTITY_ID --role "Key Vault Administrator" --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RG/providers/Microsoft.KeyVault/vaults/$KV

Above, we grant the Key Vault Administrator role. In production, that should be a role with less privileges.

Step 4: Create a SecretProviderClass

Let’s create and apply the SecretProviderClass in one step.

AZURE_TENANT_ID=$(az account show --query tenantId -o tsv)
CLIENT_ID=$(az aks show -g $RG -n $CLUSTER --query addonProfiles.azureKeyvaultSecretsProvider.identity.clientId -o tsv)

cat <<EOF | kubectl apply -f -
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: demo-secret
  namespace: default
spec:
  provider: azure
  secretObjects:
  - secretName: demosecret
    type: Opaque
    data:
    - objectName: "demosecret"
      key: demosecret
  parameters:
    usePodIdentity: "false"
    useVMManagedIdentity: "true"
    userAssignedIdentityID: "$CLIENT_ID"
    keyvaultName: "$KV"
    objects: |
      array:
        - |
          objectName: "demosecret"
          objectType: secret
    tenantId: "$AZURE_TENANT_ID"
EOF

After retrieving the Azure AD tenant Id and managed identity client Id, the SecretProviderClass is created. Pay special attention to the following fields:

userAssignedIdentityID: the clientId (⚠️ not the principalId we retrieved earlier) of the managed identity used by the secret store provider; you can use other user-assigned managed identities or even a system-assigned managed identity assigned to the virtual machine scale set that runs your agent pool; I recommend using user-assigned identity
- above, the clientId is retrieved via the az aks command
keyvaultName: the name you assigned your Key Vault
tenantId: the Azure AD tenant Id where your identities live
usePodIdentity: not recommended because pod identity will be replaced by workload identity
useVMManagedIdentity: set to true even if you use user-assigned managed identity

Step 5: Mount the secrets in pods

Create pods that use the secrets.

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: secretpods
  name: secretpods
spec:
  replicas: 1
  selector:
    matchLabels:
      app: secretpods
  template:
    metadata:
      labels:
        app: secretpods
    spec:
      containers:
      - image: nginx
        name: nginx
        env:
          - name:  demosecret
            valueFrom:
              secretKeyRef:
                name:  demosecret
                key:  demosecret
        volumeMounts:
          - name:  secret-store
            mountPath:  "mnt/secret-store"
            readOnly: true
      volumes:
        - name:  secret-store
          csi:
            driver: secrets-store.csi.k8s.io
            readOnly: true
            volumeAttributes:
              secretProviderClass: "demo-secret"
EOF

The above command creates a deployment that runs nginx. The Key Vault secrets are mounted in a volume that is mounted at mnt/secret-store. The Key Vault secret is also available as an environment variable demosecret.

Step 6: Verify

Issue the commands below to get a shell to the pods of the nginx deployment and check the mount path and environment variable:

export POD_NAME=$(kubectl get pods -l "app=secretpods" -o jsonpath="{.items[0].metadata.name}")

# if this does not work, check the status of the pod
# if still in ContainerCreating there might be an issue
kubectl exec -it $POD_NAME -- sh

cd /mnt/secret-store
ls # the file containing the secret is listed
cat demosecret; echo # demovalue is revealed

# echo the value of the environment variable
echo $demosecret # demovalue is revealed

Important: the secret store CSI provider always mounts secrets in a volume. A Kubernetes secret (here used to populate the environment variable) is not created by default. It is created here because of the secretObjects field in the SecretProviderClass.

Conclusion

The above commands should make it relatively straightforward to try the secret store CSI provider and understand what it does. It works especially well in GitOps scenarios where you cannot store secrets in Git and you do not have pipelines that can retrieve Azure Key Vault secrets at deploy time.

If you spot errors in the above commands, please let me know!

Quick Guide to Kubernetes Workload Identity on AKS

IMPORTANT: the steps below are not relevant anymore; the steps in the quick guide have been updated; see https://github.com/gbaeke/quick-guides/blob/main/workload-identity/README.md for the correct steps.

Some things that have changed:

you can now use managed identities instead of app registrations; federated token configuration is at the managed identity level
you do not need to install the webhook
the azwi CLI is not required

I recently had to do a demo about Workload Identity on AKS and threw together some commands to enable and verify the setup. It contains bits and pieces from the documentation plus some extras. I wrote another post some time ago with more background.

All commands are for bash and should be run sequentially in the same shell to re-use the variables.

Step 1: Enable OIDC issuer on AKS

You need an existing AKS cluster for this. You can quickly deploy one from the portal. Note that workload identity is not exclusive to AKS.

CLUSTER=<AKS_CLUSTER_NAME>
RG=<AKS_CLUSTER_RESOURCE_GROUP>

az aks update -n $CLUSTER -g $RG --enable-oidc-issuer

After enabling OIDC, retrieve the issuer URL with ISSUER_URL=$(az aks show -n $CLUSTER -g $RG --query oidcIssuerProfile.issuerUrl -o tsv). To check, run echo $ISSUER_URL. It contains a URL like https://oidc.prod-aks.azure.com/GUID/. You can issue the command below to obtain the OpenID configuration. It will list other URLs that can be used to retrieve keys that allow Azure AD to verify tokens it receives from Kubernetes.

curl $ISSUER_URL/.well-known/openid-configuration

Step 2: Install the webhook on AKS

Use the Helm chart to install the webhook. First, save the Azure AD tenant Id to a variable. The tenantId will be retrieved with the Azure CLI so make sure you are properly logged in. You also need Helm installed and a working Kube config for your cluster.

AZURE_TENANT_ID=$(az account show --query tenantId -o tsv)
 
helm repo add azure-workload-identity https://azure.github.io/azure-workload-identity/charts
 
helm repo update
 
helm install workload-identity-webhook azure-workload-identity/workload-identity-webhook \
   --namespace azure-workload-identity-system \
   --create-namespace \
   --set azureTenantID="${AZURE_TENANT_ID}"

Step 3: Create an Azure AD application

Although you can create the application directly in the portal or with the Azure CLI, workload identity has a CLI to make the whole process less error-prone and easier to script. Install azwi with brew: brew install Azure/azure-workload-identity/azwi.

Run the following commands. First, we save the application name in a variable. Use any name you like.

APPLICATION_NAME=WorkloadDemo
azwi serviceaccount create phase app --aad-application-name $APPLICATION_NAME

You can now check the application registrations in Azure AD. In my case, WorkloadDemo was created.

If you want to grant this application access rights to resources in Azure, first grab the appId:

APPLICATION_CLIENT_ID="$(az ad sp list --display-name $APPLICATION_NAME --query '[0].appId' -otsv)"

Now you can use commands such as az role assignment create to grant access rights. For example, here is how to grant the Reader role to your current Azure CLI subscription:

SUBSCRIPTION_ID=$(az account show --query id -o tsv)

az role assignment create --assignee-object-id $APPLICATION_CLIENT_ID --role "Reader" --scope /subscriptions/$SUBSCRIPTION_ID

Step 4: Create a Kubernetes service account

Although you can create the service account with kubectl or via a YAML manifest, azwi can help here as well:

SERVICE_ACCOUNT_NAME=sademo
SERVICE_ACCOUNT_NAMESPACE=default

azwi serviceaccount create phase sa \
  --aad-application-name "$APPLICATION_NAME" \
  --service-account-namespace "$SERVICE_ACCOUNT_NAMESPACE" \
  --service-account-name "$SERVICE_ACCOUNT_NAME"

This creates a service account that looks like the below YAML manifest:

apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    azure.workload.identity/client-id: <value of APPLICATION_CLIENT_ID>
  labels:
    azure.workload.identity/use: "true"
  name: sademo
  namespace: default

This is a regular Kubernetes service account. Later, you will configure your pod to use the service account.

The label is important because the webhook we installed earlier acts on service accounts with this label to perform all the behind-the-scenes magic! 😉

Note that workload identity does not use the Kubernetes service account token. That token is used to authenticate to the Kubernetes API server. The webhook will ensure that there is another token, its path is in $AZURE_FEDERATED_TOKEN_FILE, which is the token sent to Azure AD.

Step 5: Configure the Azure AD app for token federation

The application created in step 5 needs to be configured to trust specific tokens issued by your Kubernetes cluster. When AAD receives such a token, it returns an Azure AD token that your application in Kubernetes can use to authenticate to Azure.

Although you can manually configure the Azure AD app, azwi can be used here as well:

SERVICE_ACCOUNT_NAMESPACE=default

azwi serviceaccount create phase federated-identity \
  --aad-application-name "$APPLICATION_NAME" \
  --service-account-namespace "$SERVICE_ACCOUNT_NAMESPACE" \
  --service-account-name "$SERVICE_ACCOUNT_NAME" \
  --service-account-issuer-url "$ISSUER_URL"

In the AAD app, you will see:

Azure AD app federated credentials config

You find the above by clicking Certificates & Secrets and then Federated credentials.

Step 6: Deploy a workload

Run the following command to create a deployment and apply it in one step:

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: azcli-deployment
  namespace: default
  labels:
    app: azcli
spec:
  replicas: 1
  selector:
    matchLabels:
      app: azcli
  template:
    metadata:
      labels:
        app: azcli
    spec:
      serviceAccount: sademo
      containers:
        - name: azcli
          image: mcr.microsoft.com/azure-cli:latest
          command:
            - "/bin/bash"
            - "-c"
            - "sleep infinity"
EOF

This runs the latest version of the Azure CLI in Kubernetes.

Grab the first pod name (there is only one) and exec into the pod’s container:

POD_NAME=$(kubectl get pods -l "app=azcli" -o jsonpath="{.items[0].metadata.name}")

kubectl exec -it $POD_NAME -- bash

Step 7: Test the setup

In the container, issue the following commands:

echo $AZURE_CLIENT_ID
echo $AZURE_TENANT_ID
echo $AZURE_FEDERATED_TOKEN_FILE
cat $AZURE_FEDERATED_TOKEN_FILE
echo $AZURE_AUTHORITY_HOST

# list the standard Kubernetes service account secrets
cd /var/run/secrets/kubernetes.io/serviceaccount
ls 

# check the folder containing the AZURE_FEDERATED_TOKEN_FILE
cd /var/run/secrets/azure/tokens
ls

# you can use the AZURE_FEDERATED_TOKEN_FILE with the Azure CLI
# together with $AZURE_CLIENT_ID and $AZURE_TENANT_ID
# a password is not required since we are doing federated token exchange

az login --federated-token "$(cat $AZURE_FEDERATED_TOKEN_FILE)" \
--service-principal -u $AZURE_CLIENT_ID -t $AZURE_TENANT_ID

# list resource groups
az group list

If the last command works, that means you successfully logged on with workload identity ok AKS. You can list resource groups because you granted the Azure AD app the Reader role on your subscription.

Note that the option to use token federation was added to Azure CLI quite recently. At the time of this writing, May 2022, the image mcr.microsoft.com/azure-cli:latest surely has that capability.

Conclusion

I hope the above commands are useful if you want to quickly test or demo Kubernetes workload identity on AKS. If you spot errors, be sure to let me know!

Taking Azure Container Apps for a spin

At Ignite November 2021, Microsoft released Azure Container Apps as a public preview. It allows you to run containerized applications on a serverless platform, in the sense that you do not have to worry about the underlying infrastructure.

The underlying infrastructure is Kubernetes (AKS) as the control plane with additional software such as:

Dapr: distributed application runtime to easily work with state, pub/sub and other Dapr building blocks
KEDA: Kubernetes event-driven autoscaler so you can use any KEDA supported scaler, in addition to scaling based on HTTP traffic, CPU and memory
Envoy: used to provide ingress functionality and traffic splitting for blue-green deployment, A/B testing, etc…

Your apps actually run on Azure Container Instances (ACI). ACI was always meant to be used as raw compute to build platforms with and this is a great use case.

Note: there is some discussion in the community whether ACI (via AKS virtual nodes) is used or not; I will leave it in for now but in the end, it does not matter too much as the service is meant to hide this complexity anyway

Azure Container Apps does not care about the runtime or programming model you use. Just use whatever feels most comfortable and package it as a container image.

In this post, we will deploy an application that uses Dapr to save state to Cosmos DB. Along the way, we will explain most of the concepts you need to understand to use Azure Container Apps in your own scenarios. The code I am using is on GitHub and written in Go.

Configure the Azure CLI

In this post, we will use the Azure CLI exclusively to perform all the steps. Instead of the Azure CLI, you can also use ARM templates or Bicep. If you want to play with a sample that deploys multiple container apps and uses Bicep, be sure to check out this great Azure sample.

You will need to have the Azure CLI installed and also add the Container Apps extension:

az extension add \
  --source https://workerappscliextension.blob.core.windows.net/azure-cli-extension/containerapp-0.2.0-py2.py3-none-any.whl

The extension allows you to use commands like az containerapp create and az containerapp update.

Create an environment

An environment runs one or more container apps. A container app can run multiple containers and can have revisions. If you know how Kubernetes works, each revision of a container app is actually a scaled collection of Kubernetes pods, using the scalers discussed above. Each revision can be thought of as a separate Kubernetes Deployment/ReplicaSet that runs a specific version of your app. Whenever you modify your app, depending on the type of modification, you get a new revision. You can have multiple active revisions and set traffic weights to distribute traffic as you wish.

Container apps, revisions, pods, and containers

Note that above, although you see multiple containers in a pod in a revision, that is not the most common use case. Most of the time, a pod will have only one application container. That is entirely up to you and the rationale behind using one or more containers is similar to multi-container pods in Kubernetes.

To create an environment, be sure to register or re-register the Microsoft.Web provider. That provider has the kubeEnvironments resource type, which represents a Container App environment.

az provider register --namespace Microsoft.Web

Next, create a resource group:

az group create --name rg-dapr --location northeurope

I have chosen North Europe here, but the location of the resource group does not really matter. What does matter is that you create the environment in either North Europe or Canada Central at this point in time (November 2021).

Every environment needs to be associated with a Log Analytics workspace. You can use that workspace later to view the logs of your container apps. Let’s create such a workspace in the resource group we just created:

az monitor log-analytics workspace create \
  --resource-group rg-dapr \
  --workspace-name dapr-logs

Next, we want to retrieve the workspace client id and secret. We will need that when we create the Container Apps environment. Commands below expect the use of bash:

LOG_ANALYTICS_WORKSPACE_CLIENT_ID=`az monitor log-analytics workspace show --query customerId -g rg-dapr -n dapr-logs --out tsv`
LOG_ANALYTICS_WORKSPACE_CLIENT_SECRET=`az monitor log-analytics workspace get-shared-keys --query primarySharedKey -g rg-dapr -n dapr-logs --out tsv`

Now we can create the environment in North Europe:

az containerapp env create \
  --name dapr-ca \
  --resource-group rg-dapr \
  --logs-workspace-id $LOG_ANALYTICS_WORKSPACE_CLIENT_ID \
  --logs-workspace-key $LOG_ANALYTICS_WORKSPACE_CLIENT_SECRET \
  --location northeurope

The Container App environment shows up in the portal like so:

There is not a lot you can do in the portal, besides listing the apps in the environment. Provisioning an environment is extremely quick, in my case a matter of seconds.

Deploying Cosmos DB

We will deploy a container app that uses Dapr to write key/value pairs to Cosmos DB. Let’s deploy Cosmos DB:

uniqueId=$RANDOM
az cosmosdb create \
  --name dapr-cosmosdb-$uniqueId \
  --resource-group rg-dapr \
  --locations regionName='northeurope'

az cosmosdb sql database create \
    -a dapr-cosmosdb-$uniqueId \
    -g rg-dapr \
    -n dapr-db

az cosmosdb sql container create \
    -a dapr-cosmosdb-$uniqueId \
    -g rg-dapr \
    -d dapr-db \
    -n statestore \
    -p '/partitionKey' \
    --throughput 400

The above commands create the following resources:

A Cosmos DB account in North Europe: note that this uses session-level consistency (remember that for later in this post 😉)
A Cosmos DB database that uses the SQL API
A Cosmos DB container in that database, called statestore (can be anything you want)

In Cosmos DB Data Explorer, you should see:

statestore collection will be used as a State Store in Dapr

Deploying the Container App

We can use the following command to deploy the container app and enable Dapr on it:

az containerapp create \
  --name daprstate \
  --resource-group rg-dapr \
  --environment dapr-ca \
  --image gbaeke/dapr-state:1.0.0 \
  --min-replicas 1 \
  --max-replicas 1 \
  --enable-dapr \
  --dapr-app-id daprstate \
  --dapr-components ./components-cosmosdb.yaml \
  --target-port 8080 \
  --ingress external

Let’s unpack what happens when you run the above command:

A container app daprstate is created in environment dapr-ca
The container app will have an initial revision (revision 1) that runs one container in its pod; the container uses image gbaeke/dapr-state:1.0.0
We turn off scaling by setting min and max replicas to 1
We enable ingress with the type set to external. That configures a public IP address and DNS name to reach our container app on the Internet; Envoy proxy is used under the hood to achieve this; TLS is automatically configured but we do need to tell the proxy the port our app listens on (–target-port 8080)
Dapr is enabled and requires that our app gets a Dapr id (–enable-dapr and –dapr-app-id daprstate)

Because this app uses the Dapr SDK to write key/value pairs to a state store, we need to configure this. That is were the –dapr-components parameter comes in. The component is actually defined in a file components-cosmosdb.yaml:

- name: statestore
  type: state.azure.cosmosdb
  version: v1
  metadata:
    - name: url
      value: YOURURL
    - name: masterkey
      value: YOURMASTERKEY
    - name: database
      value: YOURDB
    - name: collection
      value: YOURCOLLECTION

In the file, the name of our state store is statestore but you can choose any name. The type has to be state.azure.cosmosdb which requires the use of several metadata fields to specify the URL to your Cosmos DB account, the key to authenticate, the database, and collection.

In the Go code, the name of the state store is configurable via environment variables or arguments and, by total coincidence, defaults to statestore 😉.

func main() {
	fmt.Printf("Welcome to super api\n\n")

	// flags
	... code omitted for brevity
	// State store name
	f.String("statestore", "statestore", "State store name")

The flag is used in the code that writes to Cosmos DB with the Dapr SDK (s.config.Statestore in the call to daprClient.SaveState below):

// write data to Dapr statestore
	ctx := r.Context()
	if err := s.daprClient.SaveState(ctx, s.config.Statestore, state.Key, []byte(state.Data)); err != nil {
		w.WriteHeader(http.StatusInternalServerError)
		fmt.Fprintf(w, "Error writing to statestore: %v\n", err)
		return
	} else {
		w.WriteHeader(http.StatusOK)
		fmt.Fprintf(w, "Successfully wrote to statestore\n")
	}

After running the az containerapp create command, you should see the following output (redacted):

{
  "configuration": {
    "activeRevisionsMode": "Multiple",
    "ingress": {
      "allowInsecure": false,
      "external": true,
      "fqdn": "daprstate.politegrass-37c1a51f.northeurope.azurecontainerapps.io",
      "targetPort": 8080,
      "traffic": [
        {
          "latestRevision": true,
          "revisionName": null,
          "weight": 100
        }
      ],
      "transport": "Auto"
    },
    "registries": null,
    "secrets": null
  },
  "id": "/subscriptions/SUBID/resourceGroups/rg-dapr/providers/Microsoft.Web/containerApps/daprstate",
  "kind": null,
  "kubeEnvironmentId": "/subscriptions/SUBID/resourceGroups/rg-dapr/providers/Microsoft.Web/kubeEnvironments/dapr-ca",
  "latestRevisionFqdn": "daprstate--6sbsmip.politegrass-37c1a51f.northeurope.azurecontainerapps.io",
  "latestRevisionName": "daprstate--6sbsmip",
  "location": "North Europe",
  "name": "daprstate",
  "provisioningState": "Succeeded",
  "resourceGroup": "rg-dapr",
  "tags": null,
  "template": {
    "containers": [
      {
        "args": null,
        "command": null,
        "env": null,
        "image": "gbaeke/dapr-state:1.0.0",
        "name": "daprstate",
        "resources": {
          "cpu": 0.5,
          "memory": "1Gi"
        }
      }
    ],
    "dapr": {
      "appId": "daprstate",
      "appPort": null,
      "components": [
        {
          "metadata": [
            {
              "name": "url",
              "secretRef": "",
              "value": "https://ACCOUNTNAME.documents.azure.com:443/"
            },
            {
              "name": "masterkey",
              "secretRef": "",
              "value": "MASTERKEY"
            },
            {
              "name": "database",
              "secretRef": "",
              "value": "dapr-db"
            },
            {
              "name": "collection",
              "secretRef": "",
              "value": "statestore"
            }
          ],
          "name": "statestore",
          "type": "state.azure.cosmosdb",
          "version": "v1"
        }
      ],
      "enabled": true
    },
    "revisionSuffix": "",
    "scale": {
      "maxReplicas": 1,
      "minReplicas": 1,
      "rules": null
    }
  },
  "type": "Microsoft.Web/containerApps"
}

The output above gives you a hint on how to define the Container App in an ARM template. Note the template section. It defines the containers that are part of this app. We have only one container with default resource allocations. It is possible to set environment variables for your containers but there are none in this case. We will set one later.

Also note the dapr section. It defines the app’s Dapr id and the components it can use.

Note: it is not a good practice to enter secrets in configuration files as we did above. To fix that:

add a secret to the Container App in the az containerapp create command via the --secrets flag. E.g. --secrets cosmosdb='YOURCOSMOSDBKEY'
in components-cosmosdb.yaml, replace value: YOURMASTERKEY with secretRef: cosmosdb

The URL for the app is https://daprstate.politegrass-37c1a51f.northeurope.azurecontainerapps.io. When I browse to it, I just get a welcome message: Hello from Super API on Container Apps.

Every revision also gets a URL. The revision URL is https://daprstate–6sbsmip.politegrass-37c1a51f.northeurope.azurecontainerapps.io. Of course, this revision URL gives the same result. Our app has only one revision.

Save state

The application has a /state endpoint you can post a JSON payload to in the form of:

{
  "key": "keyname",
  "data": "datatostoreinkey"
}

We can use curl to try this:

curl -v -H "Content-type: application/json" -d '{ "key": "cool","data": "somedata"}' 'https://daprstate.politegrass-37c1a51f.northeurope.azurecontainerapps.io/state'

Trying the curl command will result in an error because Dapr wants to use strong consistency with Cosmos DB and we configured it for session-level consistency. That is not very relevant for now as that is related to Dapr and not Container Apps. Switching the Cosmos DB account to strong consistency will fix the error.

Update the container app

Let’s see what happens when we update the container app. We will add an environment variable WELCOME to change the welcome message that the app displays. Run the following command:

az containerapp update \
  --name daprstate \
  --resource-group rg-dapr \
  --environment-variables WELCOME='Hello from new revision'

The template section in the JSON output is now:

"template": {
    "containers": [
      {
        "args": null,
        "command": null,
        "env": [
          {
            "name": "WELCOME",
            "secretRef": null,
            "value": "Hello from new revision"
          }
        ],
        "image": "gbaeke/dapr-state:1.0.0",
        "name": "daprstate",
        "resources": {
          "cpu": 0.5,
          "memory": "1Gi"
        }
      }
    ]

It is important to realize that, when the template changes, a new revision will be created. We now have two revisions, reflected in the portal as below:

The new revision is active and receives 100% of the traffic. When we hit the / endpoint, we get Hello from new revision.

The idea here is that you deploy a new revision and test it before you make it active. Another option is to send a small part of the traffic to the new revision and see how that goes. It’s not entirely clear to me how you can automate this, including automated tests, similar to how progressive delivery controllers like Argo Rollouts and Flagger work. Tip to the team to include this! 😉

The az container app create and update commands can take a lot of parameters. Use az container app update –help to check what is supported. You will also see several examples.

Check the logs

Let’s check the container app logs that are sent to the Log Analytics workspace attached to the Container App environment. Make sure you still have the log analytics id in $LOG_ANALYTICS_WORKSPACE_CLIENT_ID:

az monitor log-analytics query   --workspace $LOG_ANALYTICS_WORKSPACE_CLIENT_ID   --analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'daprstate' | project ContainerAppName_s, Log_s, TimeGenerated | take 50"   --out table

This will display both logs from the application container and the Dapr logs. One of the log entries shows that the statestore was successfully initialized:

... msg="component loaded. name: statestore, type: state.azure.cosmosdb/v1"

Conclusion

We have only scratched the surface here but I hope this post gave you some insights into concepts such as environments, container apps, revisions, ingress, the use of Dapr and logging. There is much more to look at such as virtual network integration, setting up scale rules (e.g. KEDA), automated deployments, and much more… Stay tuned!

Kubernetes Blue-Green deployments with Argo Rollouts

In this post, we will take a look at 🟦/🟩 blue-green deployments in Kubernetes. With blue-green deployments, you deploy a new version of an application or service next to the live and stable version. After manual or automatic checks, you promote the new version to become the live version. Switching between versions is simply a networking change. This could be a change in a router configuration or, in the case of Kubernetes, a change in a Kubernetes service.

Note: there often is confusion about what is the 🟦 blue and what is the 🟩 green service; usually the green service is the live and stable one; the blue service is the newly deployed preview service you intend to promote; some documents switch it around; I sometimes do that as well, for instance on my YouTube channel 😉

A Kubernetes deployment resource does not have a StrategyType for blue-green deployments. It only supports RollingUpdate or Recreate. You can easily work around that with multiple deployments and services, as discussed by Nills Franssens here: Simple Kubernetes blue-green deployments.

When I need to do blue-green, I prefer using a progressive delivery controller such as Argo Rollouts or Flagger. They are both excellent pieces of software that make it easy to do blue-green deployments, in addition to canary deployments and automated tests. In this post, we will look at Argo Rollouts.

Want to see a video instead?

Installing Argo Rollouts

Installing Argo Rollouts is documented here. For a quick install, just do:

kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

Argo Rollouts comes with a kubectl plugin for its CLI. Install it with brew install argoproj/tap/kubectl-argo-rollouts. That allows you to run the CLI with kubectl argo rollouts. If you do not use brew, install the plugin manually.

Deploy your application with a Rollout

Argo Rollouts uses a replacement for a Deployment resource: a Rollout. The YAML for a Rollout is almost identical to a Deployment except that the apiVersion and Kind are different. In the spec you can add a strategy section to specify whether you want a blueGreen or a canary rollout. Below is an example of a rollout for a simple API:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: superapi
spec:
  replicas: 2
  selector:
    matchLabels:
      app: superapi
  template:
    metadata:
      labels:
        app: superapi
    spec:
      containers:
      - name: superapi
        image: ghcr.io/gbaeke/super:1.0.2
        resources:
          requests:
            memory: "128Mi"
            cpu: "50m"
          limits:
            memory: "128Mi"
            cpu: "50m"
        env:
          - name: WELCOME
            valueFrom:
              configMapKeyRef:
                name: superapi-config
                key: WELCOME
        ports:
        - containerPort: 8080
  strategy:
    blueGreen:
      activeService: superapi-svc-active
      previewService: superapi-svc-preview
      autoPromotionEnabled: false

You will notice that the blueGreen strategy requires two services: an activeService and a previewService. Both settings refer to a Kubernetes service resource. Below is the activeService (previewService is similar and uses the same selector):

kind: Service
apiVersion: v1
metadata:
  name:  superapi-svc-active
spec:
  selector:
    app:  superapi
  type:  ClusterIP
  ports:
  - name:  http
    port:  80
    targetPort:  8080

The only thing we have to do, in this example, is to deploy the rollout and the two services with kubectl apply. In this post, however, we will use Kustomize to deploy everything.

Deploying a rollout with Kustomize

To deploy the rollout and its services with Kustomize, we can use the kustomization.yaml below:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: blue-green

nameSuffix: -geba
namePrefix: dev-

commonLabels:
  app: superapi
  version: v1
  env: dev


configurations:
  - https://argoproj.github.io/argo-rollouts/features/kustomize/rollout-transform.yaml

resources:
  - namespace.yaml
  - rollout.yaml
  - service-active.yaml
  - service-preview.yaml

configMapGenerator:
- name: superapi-config
  literals:
    - WELCOME=Hello from v1!
    - PORT=8080

With Kustomize, we can ensure we deploy our resources to a specific namespace. Above, that is the blue-green namespace. We also add a prefix and suffix to the names of Kubernetes resources we create and we add labels as well (commonLabels). For this to work properly with a rollout, you have to add the configurations section. Without it, Kustomize will not know what to do with the rollout resource (kind=rollout).

Note that we also use a configMapGenerator that creates a ConfigMap that sets a welcome message. If you look at the rollout spec, you will see that the pod template uses it to set the WELCOME environment variable. The API that we deploy will respond with that message when you hit the root, for instance with curl.

To deploy with Kustomize, we can run kubectl apply -k . from the folder holding kustomization.yaml and the manifests in the resources list.

Checking the initial rollout with the UI

When we initially deploy our application, there is only one version of our app. The rollout uses a ReplicaSet to deploy two pods, similarly to a Deployment. Both the activeService and the previewService point to these two pods.

Argo Rollouts has a UI you can start with kubectl argo rollouts dashboard -n blue-green. The rollout is visualized as below:

In a tool like Octant, the resource viewer shows the relationships between the actual Kubernetes resources:

Above, you can clearly see the Rollout creates a ReplicaSet which, in turn, creates the Pods (click image to enlarge). Both services point to the same pods.

Upgrading to a new version

We will now upgrade to a new version of the application: v2. To simulate this, we can simply modify the WELCOME message in the ConfigMapGenerator in kustomization.yaml. When we run kubectl apply -k . again, Kustomize will create a new ConfigMap with a different name (containing a hash) and will update that name in the pod template of the rollout. When you update the pod template of the rollout, the rollout knows it needs to upgrade with the blue-green strategy. This, again, is identical to how a deployment behaves. In the UI, we now see:

There are now two revisions, both backed by a ReplicaSet. Each ReplicaSet controls two pods. One set of pods is for the active service, the other set for the preview. We can click on the rollout to see those details:

Above, we can clearly see that revision one is the stable and active service. That is our initial v1 deployment. Revision 2 is the preview service, the v2 deployment. We can port forward to that service and view the welcome message:

In Octant, this is what we see in Resource Viewer:

Above, we can clearly see the rollout now uses two ReplicaSets to run the active and preview pods. The rollout also modified the service selectors and the labels on the pods by adding a label like rollouts-pod-template-hash:758d6b4845. Each revision has its own hash.

Promotion

Currently, the rollout is in a paused state. The Argo Rollouts UI shows this but you can also view this with the CLI by running kubectl argo rollouts get rollout dev-superapi-geba:

Getting the status of the rollout with the CLI

Above the status is paused with a message of BlueGreenPause. You can clearly see the green service is the stable and active one (v1) and the blue service is the preview service (v2). We can now promote the preview service to become stable and active.

To promote the service, in the web UI, click Promote and then Sure?. With the CLI, just run kubectl argo rollouts promote dev-superapi-geba. When you run the get command again, you will see:

Above, you can see the status as ✔️ Healthy. Revision 2 is now stable and active. Revision 1 will be scaled down by setting the number of pods in the ReplicaSet to 0. In the web UI, you now see:

Note that it is still possible to rollback to revision one by clicking the Rollback button or using the CLI. That will keep Revision 2 active and create a Revision 3 for you to preview. After clicking Promote and Sure? again, you will then make Revision 3 active which is the initial v1 service.

Conclusion

If you have the need for blue-green deployments, it is highly recommended to use a progressive delivery controller like Argo Rollouts. It makes the whole process more intuitive and gives you fine control over upgrade, abort, promote and rollback operations. Above, we looked at blue-green with a manual pause, check, and promote. There are other options, such as analysis based on metrics with an automatic promotion that we will look at in later posts.

What is AG-UI?

Microsoft Agent Framework

The Code

The Server (server.py)

The Main Agent (agents/main_agent.py)

The Tools (tools/)

The Subagent (tools/storyteller.py)

Testing with a client

Why This Works

Wrap Up

Share this:

A look at LiteLLM

Deploying LiteLLM on Kubernetes

The proxy in action

LiteLLM Dashboard

Conclusion

Share this:

Starting point: app running locally

Deploying the app with Radius

Summary

Share this:

Pinecone, Vectors, Embeddings, and Semantic Search: What’s all that about?

What is Streamlit?

Creating a Streamlit UI for Semantic Search

A closer look

Running the code

Conclusion

Share this:

Configure web application routing

Create a certificate

Using az aks draft update

Did all this work smoothly from the first time?

Conclusion

Share this:

Prerequisites

Step 1: Create an ACA environment

Step 2: Create a front-end container app

Step 3: Deploy Cosmos DB

Step 4: Deploy the back-end

Step 5: Verify end-to-end connectivity

Step 6: Check the logs

Step 7: Use az containerapp up

Conclusion

Share this:

Step 1: Enable the driver

Step 2: Create a Key Vault

Step 3: Grant a managed identity access to Key Vault

Step 4: Create a SecretProviderClass

Step 5: Mount the secrets in pods

Step 6: Verify

Conclusion

Share this:

Step 1: Enable OIDC issuer on AKS

Step 2: Install the webhook on AKS

Step 3: Create an Azure AD application

Step 4: Create a Kubernetes service account

Step 5: Configure the Azure AD app for token federation

Step 6: Deploy a workload

Step 7: Test the setup

Conclusion

Share this:

Configure the Azure CLI

Create an environment

Deploying Cosmos DB

Deploying the Container App

Save state

Update the container app

Check the logs

Conclusion

Share this:

Installing Argo Rollouts

Deploy your application with a Rollout

Deploying a rollout with Kustomize

Checking the initial rollout with the UI

Upgrading to a new version

Promotion

Conclusion

Share this: