copilotkit – baeke.info

Hosting an AG-UI Agent on Microsoft Foundry

A while back I wrote about building an AI agent server with AG-UI and the Microsoft Agent Framework. That post walked through what AG-UI is, how you stand up a small agent server, and how a web client talks to it. Quick recap before we move on:

AG-UI is a protocol between a frontend and an agent backend. The agent emits a stream of typed events — RUN_STARTED, TEXT_MESSAGE_START/CONTENT/END, TOOL_CALL_*, RUN_FINISHED — over Server-Sent Events. The UI decides how to render them.
The runtime sits between the browser and the agent. In the previous post (and again here) that runtime is CopilotKit: the browser talks to a Next.js route, that route forwards to the agent over HttpAgent, and CopilotKit handles the SSE wire format on both ends.
The agent itself is small: an HTTP endpoint that takes a message history and streams AG-UI events back. Last time I built it with Microsoft Agent Framework. The principle is the same regardless of framework — what matters is that the events on the wire are AG-UI.

In the previous post the agent ran as a plain Python service. That’s fine for a demo, but the obvious next question is: where do you actually run this thing? You need a host that gives you a stable HTTPS endpoint, sane authentication, autoscale, telemetry, and ideally something that knows it’s running an agent and not a generic web app.

That brings us to Microsoft Foundry hosted agents.

What are Foundry hosted agents?

Microsoft Foundry (the Azure AI platform formerly known as Azure AI Foundry) has a feature called hosted agents. You bring a container image, your agent server, and Foundry runs it behind a managed endpoint inside your Foundry project. Foundry handles:

The public or private HTTPS endpoint, scoped under your project
Entra ID authentication on that endpoint
A managed identity inside the container so the agent can call Azure OpenAI without secrets; your code can use DefaultAzureCredential() to pick it up
Auto-injection of FOUNDRY_PROJECT_ENDPOINT and APPLICATIONINSIGHTS_CONNECTION_STRING into the container’s environment
Telemetry into Application Insights

What you provide is the container and a small manifest that tells Foundry which protocol the agent speaks, how much CPU/memory it needs, and which model deployment to bind. There is no agent SDK you must adopt — Foundry only cares that your container exposes the right HTTP contract.

Note that this post used the new version of hosted agents, announced on April 22nd 2026. The previous version ran on Azure Container Apps. It is unclear what this new version runs on but now your agents run in MicroVMs with one MicroVM per session. A MicroVM can have its own state such as files saved to the MicroVMs filesystem.

Two protocols: Responses vs. Invocations

To interact with Foundry hosted agents you speak one of two protocols, and the choice matters:

Responses protocol: Foundry uses the well know OpenAI Responses protocol. It owns conversation history and can store it server-side on the platform
Invocations protocol. Foundry transport-only for the request body. Each request is a POST /invocations to your container, your agent owns the turn end-to-end, and you stream whatever response format you like — including SSE. You manage state. In this post, with CopilotKit, state is managed on the client.

For an AG-UI agent, Invocations is the one you want. AG-UI is itself an SSE event stream with its own well-defined event shapes. Invocations lets the AG-UI bytes flow through Foundry untouched. The browser sends a turn, Foundry routes it to your container, your agent emits AG-UI events, and the runtime/browser consumes them.

Foundry hosted agents always work with sessions, regardless of protocol. In a chat-based app like the one we are building, you would not want each turn in a conversation be handled by a new session. Our client will set a session id per conversation. By passing the session id at each call to the invocations endpoint, we are assured that sessions are not created at every turn.

⚠️ You cannot simply choose your endpoints freely. With AG-UI for example, you typically choose the endpoint yourself (e.g. /chat). Not so with Foundry Hosted Agents!

What we will build?

The diagram below shows what we will build end-to-end:

There is not a lot of difference with the previous post. The main difference is the agent that runs as a Microsoft Foundry hosted agents and that the runtime connects to the invocations endpoint exposed by that agent.

Check https://github.com/gbaeke/hostedagentv2 for the source code. It is best to follow the README if you want to deploy this yourself, accompanied by the explanations in this post.

Writing the agent

Here is the entire agent loop for this sample (paraphrased from src/ag-ui-invocations/main.py):

			
from azure.ai.agentserver.invocations import InvocationAgentServerHost
from pydantic_ai import Agent
from pydantic_ai.ag_ui import handle_ag_ui_request
agent = Agent(model, instructions="You are a helpful assistant. ...")
@agent.tool_plain
async def get_weather(location: str) -> dict:
    ...  # call Open-Meteo, return dict
app = InvocationAgentServerHost()
@app.invoke_handler
async def handle_invoke(request, ):
    return await handle_ag_ui_request(agent, request)
if __name__ == "__main__":
    app.run()

		

Three libraries do the heavy lifting:

azure-ai-agentserver-invocations — a thin Starlette host that exposes POST /invocations on port 8088. Every call from the CopilotKit runtime will be handled by this host.
pydantic-ai — defines the Agent, the model binding, and the @agent.tool_plain decorator for tools
pydantic_ai.ag_ui.handle_ag_ui_request — translates a Pydantic AI agent run into an AG-UI SSE stream.

Authentication to Azure OpenAI is DefaultAzureCredential → bearer token for https://ai.azure.com/.default. No API keys, no AOAI endpoint env var — the AOAI base URL is derived by stripping the path off FOUNDRY_PROJECT_ENDPOINT.

The container needs nothing exotic — python:3.12-slim, pip install -r requirements.txt, CMD ["python", "main.py"], expose 8088. The full Dockerfile is six lines.

In summary, this is a simple agent with a weather tool written with Pydantic AI. Pydantic AI supports AG-UI out of the box. With a few extra lines of code, the AG-UI event stream is wired to the invocations endpoint. Nothing more, nothing less…

⚠️ Important: this agent is based on a Microsoft sample and then edited to wire up a weather tool. See Microsoft Learn for more information on how to create the sample, including the Bicep to deploy all required Azure resources, with azd ai agent init. I used this sample:

Make sure you use the latest version of azd. If you are using an old version, you might use the older version of Microsoft Foundry hosted agents!

Publishing to Foundry

Two manifests describe the agent to Foundry. They look similar; their audiences differ.

agent.yaml is the runtime ContainerAgent spec read by Foundry itself. It’s hard-coded:

			
kind: hosted
name: ag-ui-invocations
protocols:
  - protocol: invocations
    version: 1.0.0
resources:
  cpu: "0.25"
  memory: 0.5Gi
environment_variables:
  - name: AZURE_AI_MODEL_DEPLOYMENT_NAME
    value: gpt-4.1-mini

		

agent.manifest.yaml is the template azd reads to drive deployment. It uses {{AZURE_AI_MODEL_DEPLOYMENT_NAME}} placeholders that azd substitutes at deploy time. FOUNDRY_PROJECT_ENDPOINT and APPLICATIONINSIGHTS_CONNECTION_STRING are not declared here — Foundry injects them into the container automatically. Declaring them would shadow the platform values.

What you need on the Foundry side before azd up:

A Foundry project (the unit that owns agents, model deployments, and connections).
A model deployment — here, gpt-4.1-mini, declared in azure.yaml under services.ag-ui-invocations.config.deployments. azd creates it on first provision.
An Azure Container Registry connection on the project. Hosted agents pull their image from ACR. The Bicep in infra/ creates one if you don’t bring your own.
An identity with Cognitive Services OpenAI User (or equivalent) on the AOAI account, so the container’s managed identity can call the model.

The full deploy is one command:

azd up

That provisions the resources, builds the image with remoteBuild: true (azd ships the build to ACR rather than building locally), pushes it, and registers the agent with Foundry. After it returns, the agent is live at:

			
https://<account>.services.ai.azure.com/api/projects/<project>/agents/ag-ui-invocations/endpoint/protocols/invocations?api-version=v1

Full details with step-by-step generic instructions can be found on Microsoft Learn. If you want to create the AG-UI example, use the AG-UI (Invocations) (Bring Your Own) template.

The web client

The web app is a small Next.js 14 App Router project under web/. Two pieces matter:

app/api/copilotkit/route.ts runs CopilotRuntime server-side. CopilotRuntime is the runtime layer — it accepts a request from the browser and forwards it to the Foundry agent over HttpAgent:

  const runtime = new CopilotRuntime({
    agents: {
      ag_ui_invocations: new HttpAgent({ url: agentUrl, headers }),
    },
  });

app/page.tsx renders <CopilotChat> and registers a render-only useCopilotAction({ name: "get_weather", available: "frontend", render }). CopilotKit intercepts the agent’s get_weather tool call by name and draws a WeatherCard instead of dumping JSON into the chat.

The interesting bit is how the runtime authenticates to the Foundry-hosted agent. The browser doesn’t hold any credentials. Authentication happens server-side in the Next.js route, with Entra ID:

			
const credential = new DefaultAzureCredential();
const token = await credential.getToken("https://ai.azure.com/.default");
return {
  Authorization: `Bearer ${token.token}`,
  "Foundry-Features": "HostedAgents=V1Preview",
};

		

In this example, we do not care about user authentication. DefaultAzureCredential() is simply used by the CopilotKit runtime to authenticate to the Foundry hosted agent.

Two more wiring details worth flagging:

The route appends agent_session_id=<uuid> to the agent URL. The UUID is generated once per browser tab in app/providers.tsx and passed via <CopilotKit headers={{ "X-Foundry-Session-Id": ... }}>. Same tab → same Foundry sandbox until idle. Page refresh → new sandbox.
No history is stored anywhere. The browser keeps the message array in React state and replays the full transcript on every turn. Runtime, agent, and Azure OpenAI are all stateless. A page refresh loses the conversation. If you want durable history, that’s where the Responses protocol comes in — but then you’re not on AG-UI anymore, and that’s a different post.

What does it look like?

The web client is a simple Next.js app that uses the CopilotKit React components:

CopilotKit web UI with a component to render a weather tool call result

By default, you can check what is going on under the hood. For example, the screenshot below shows how the AG-UI events can be inspected:

CopilotKit inspector at the left with live view on events

As noted above, this application connects to the CopilotKit runtime. The runtime connects to the agent with Entra ID.

In Microsoft Foundry, the deployed agent shows up in the list of agents next to other types such as prompt agents:

The hosted agent, listed under a no code prompt agent

When you open the hosted agent, you see:

Hosted agent with v2 as the last version (version with the weather tool)

The playground user interface does not make too much sense because this agent requires a payload that works with the invocations protocol. The protocol is shown in the upper right part of the screen.

Note that every agent has an identity. This identity was granted the Azure AI User role on both resource and project level. The agent uses this identity to connect to the model (gpt-4.1-mini) for inference.

Roles for the agent identity (AgentIdentity)

You can find these identities in Entra ID via the Microsoft Entra admin center:

Agent identity in the Entra ID admin center

When you run this agent, you get out of the box telemetry. For example, sessions:

One session is still active. Without activity a session goes to idle after 15 minutes. Remember that we set a session id per conversation to not have a session per conversation turn.

The session contains the log stream of the agent:

Logs from the agent as reported by the session

We also get traces:

Automatic tracing to App Insights (auto-provisioned and linked by the Bicep from the template)

Traces for the invocations protocol are not very useful out of the box:

Wrapping up

The shape of the system hasn’t changed since the previous post: browser → runtime → agent → model. What changed is where the agent lives. By dropping the same kind of AG-UI agent into a Foundry hosted agent with Invocations protocol a container image and two short manifests, you get a managed, Entra-authenticated endpoint with telemetry and session affinity, without giving up the AG-UI event stream the UI already knows how to consume.

There is still much to learn about Microsoft Foundry hosted agents. This post was written a few days after the release of the second, MicroVM-based, version of it. If you spot any errors or things are unclear, drop a comment!

Building an AI Agent Server with AG-UI and Microsoft Agent Framework

In this post, I want to talk about the Python backend I built for an AG-UI demo project. It is part of a larger project that also includes a frontend that uses CopilotKit:

This post discusses the Python AG-UI server that is built with Microsoft Agent Framework.

All code is on GitHub: https://github.com/gbaeke/agui. Most of the code for this demo was written with GitHub Copilot with the help of Microsoft Docs MCP and Context7. 🤷

What is AG-UI?

Before we dive into the code, let’s talk about AG-UI. AG-UI is a standardized protocol for building AI agent interfaces. Think of it as a common language that lets your frontend talk to any backend agent that supports it, no matter what technology you use.

The protocol gives you some nice features out of the box:

Remote Agent Hosting: deploy your agents as web services (e.g. FastAPI)
Real-time Streaming: stream responses using Server-Sent Events (SSE)
Standardized Communication: consistent message format for reliable interactions (e.g. tool started, tool arguments, tool end, …)
Thread Management: keep conversation context across multiple requests

Why does this matter? Well, without a standard like AG-UI, every frontend needs custom code to talk to different backends. With AG-UI, you build your frontend once and it works with any AG-UI compatible backend. The same goes for backends – build it once and any AG-UI client can use it.

Under the hood, AG-UI uses simple HTTP POST requests for sending messages and Server-Sent Events (SSE) for streaming responses back. It’s not complicated, but it’s standardized. And that’s the point.

AG-UI has many more features than the ones discussed in this post. Check https://docs.ag-ui.com/introduction for the full picture.

Microsoft Agent Framework

Now, you could implement AG-UI from scratch but that’s a lot of work. This is where Microsoft Agent Framework comes in. It’s a Python (and C#) framework that makes building AI agents really easy.

The framework handles the heavy lifting when it comes to agent building:

Managing chat with LLMs like Azure OpenAI
Function calling (tools)
Streaming responses
Multi-turn conversations
And a lot more

The key concept is the ChatAgent. You give it:

A chat client (like Azure OpenAI)
Instructions (the system prompt)
Tools (functions the agent can call)

And you’re done. The agent knows how to talk to the LLM, when to call tools, and how to stream responses back.

What’s nice about Agent Framework is that it integrates with AG-UI out of the box, similar to other frameworks like LangGraph, Google ADK and others. You write your agent code and expose it via AG-UI with basically one line of code. The framework translates everything automatically – your agent’s responses become AG-UI events, tool calls get streamed correctly, etc…

The integration with Microsoft Agent Framework was announced on the blog of CopilotKit, the team behind AG-UI. The blog included the diagram below to illustrate the capabilities:

From https://www.copilotkit.ai/blog/microsoft-agent-framework-is-now-ag-ui-compatible

The Code

Let’s look at how this actually works in practice. The code is pretty simple. Most of the code is Microsoft Agent Framework code. AG-UI gets exposed with one line of code.

The Server (server.py)

The main server file is really short:

import uvicorn
from api import app
from config import SERVER_HOST, SERVER_PORT

def main():
    print(f"🚀 Starting AG-UI server at http://{SERVER_HOST}:{SERVER_PORT}")
    uvicorn.run(app, host=SERVER_HOST, port=SERVER_PORT)

if __name__ == "__main__":
    main()

That’s it. We run a FastAPI server on port 8888. The interesting part is in api/app.py:

from fastapi import FastAPI
from agent_framework.ag_ui.fastapi import add_agent_framework_fastapi_endpoint
from agents.main_agent import agent

app = FastAPI(title="AG-UI Demo Server")

# This single line exposes your agent via AG-UI protocol
add_agent_framework_fastapi_endpoint(app, agent, "/")

See that add_agent_framework_fastapi_endpoint() call? That’s all you need. This function from Agent Framework takes your agent and exposes it as an AG-UI endpoint. It handles HTTP requests, SSE streaming, protocol translation – everything.

You just pass in your FastAPI app, your agent, and the route path. Done.

The Main Agent (agents/main_agent.py)

Here’s where we define the actual agent with standard Microsoft Agent Framework abstractions:

from agent_framework import ChatAgent
from agent_framework.azure import AzureOpenAIChatClient
from azure.identity import DefaultAzureCredential
from config import AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT_NAME
from tools import get_weather, get_current_time, calculate, bedtime_story_tool

# Create Azure OpenAI chat client
chat_client = AzureOpenAIChatClient(
    credential=DefaultAzureCredential(),
    endpoint=AZURE_OPENAI_ENDPOINT,
    deployment_name=AZURE_OPENAI_DEPLOYMENT_NAME,
)

# Create the AI agent with tools
agent = ChatAgent(
    name="AGUIAssistant",
    instructions="You are a helpful assistant with access to tools...",
    chat_client=chat_client,
    tools=[get_weather, get_current_time, calculate, bedtime_story_tool],
)

This is the heart of the backend. We create a ChatAgent with:

A name: “AGUIAssistant”
Instructions: the system prompt that tells the agent how to behave
A chat client: AzureOpenAIChatClient that handles communication with Azure OpenAI
Tools: a list of functions the agent can call

The code implements a few toy tools and a sub-agent to illustrate how AG-UI handels tool calls. The tools are discussed below:

The Tools (tools/)

In Agent Framework, tools can be Python functions with a decorator:

from agent_framework import ai_function
import httpx
import json

@ai_function(description="Get the current weather for a location")
def get_weather(location: str) -> str:
    """Get real weather information for a location using Open-Meteo API."""
    # Step 1: Geocode the location
    geocode_url = "https://geocoding-api.open-meteo.com/v1/search"
    # ... make HTTP request ...
    
    # Step 2: Get weather data
    weather_url = "https://api.open-meteo.com/v1/forecast"
    # ... make HTTP request ...
    
    # Return JSON string
    return json.dumps({
        "location": resolved_name,
        "temperature": current["temperature_2m"],
        "condition": condition,
        # ...
    })

The @ai_function decorator tells Agent Framework “this is a tool the LLM can use”. The framework automatically:

Generates a schema from the function signature
Makes it available to the LLM
Handles calling the function when needed
Passes the result back to the LLM

You just write normal Python code. The function takes typed parameters (location: str) and returns a string. Agent Framework does the rest.

The weather tool calls the Open-Meteo API to get real weather data. In an AG-UI compatible client, you can intercept the tool result and visualize it any way you want before the LLM generates an answer from the tool result:

Above, when the user asks for weather information, AG-UI events inform the client that a tool call has started and ended. It also streams the tool result back to the client which uses a custom component to render the information. This happens before the chat client generates the answer based on the tool result.

The Subagent (tools/storyteller.py)

This is where it gets interesting. In Agent Framework, a ChatAgent can become a tool with .as_tool():

from agent_framework import ChatAgent
from agent_framework.azure import AzureOpenAIChatClient

# Create a specialized agent for bedtime stories
bedtime_story_agent = ChatAgent(
    name="BedTimeStoryTeller",
    description="A creative storyteller that writes engaging bedtime stories",
    instructions="""You are a gentle and creative bedtime story teller.
When given a topic, create a short, soothing bedtime story for children.
Your stories should be 3-5 paragraphs long, calming, and end peacefully.""",
    chat_client=chat_client,
)

# Convert the agent to a tool
bedtime_story_tool = bedtime_story_agent.as_tool(
    name="tell_bedtime_story",
    description="Generate a calming bedtime story based on a theme",
    arg_name="theme",
    arg_description="The theme for the story (e.g., 'a brave rabbit')",
)

This creates a subagent – another ChatAgent with different instructions. When the main agent needs to tell a bedtime story, it calls tell_bedtime_story which delegates to the subagent.

Why is this useful? Because you can give each agent specialized instructions. The main agent handles general questions and decides which tool to use. The storyteller agent focuses only on creating good stories. Clean separation of concerns.

The subagent has its own chat client and can have its own tools too if you want. It’s a full agent, just exposed as a tool.

And because it is a tool, you can render it with the standard AG-UI tool events:

Testing with a client

In src/backend there is a Python client client_raw.py. When you run that client against the server and invoke a tool, you will see something like below:

This client simply uses httpx to talk the AG-UI server and inspects and renders the AG-UI events as they come in.

Why This Works

Let me tell you what I like about this setup:

Separation of concerns: The frontend doesn’t know about Python, Azure OpenAI, or any backend details. It just speaks AG-UI. You could swap the backend for a C# implementation or something else entirely – the frontend wouldn’t care. Besides of course the handling of specific tool calls.

Standard protocol: Because we use AG-UI, any AG-UI client can talk to this backend. We use CopilotKit in the frontend but you could use anything that speaks AG-UI. Take the Python client as an example.

Framework handles complexity: Streaming, tool calls, conversation history, protocol translation – Agent Framework does all of this. You just write business logic.

Easy to extend: Want a new tool? Write a function with @ai_function. Want a specialized agent? Create a ChatAgent and call .as_tool(). That’s it.

The AG-UI documentation explains that the protocol supports 7 different features including human-in-the-loop, generative UI, and shared state. Our simple backend gets all of these capabilities because Agent Framework implements the protocol.

Note that there are many more capabilities. Check the AG-UI interactive Dojo to find out: https://dojo.ag-ui.com/microsoft-agent-framework-python

Wrap Up

This is a simple but powerful pattern for building AI agent backends. You write minimal code and get a lot of functionality. AG-UI gives you a standard way to expose your agent, and Microsoft Agent Framework handles the implementation details.

If you want to try this yourself, the code is in the repo. You’ll need an Azure OpenAI deployment and follow the OAuth setup. After that, just run the code as instructed in the repo README!

The beauty is in the simplicity. Sometimes the best code is the code you don’t have to write.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31