Deploying a multi-agent solution with MCP and A2A to Azure Container Apps

In previous posts, we discussed multi-agent scenarios, how A2A servers work (here and here) and how to deploy the infrastructure to host a multi-agent application on Azure with Azure Container Apps and AI Foundry.

In this post, we will take a look at deploying the different components of the solution as containers in Azure Container Apps. This is what we will build:

Multi-agent solution with MCP and A2A

There are four main components:

ComponentDescription
Conversation AgentPresents a chat interface to the user. Built with Chainlit and Semantic Kernel. Uses an OpenAI model. This could be switched to an Azure OpenAI model easily.

The agent uses two tools, rag and web, hosted by the MCP server.
MCP Tools ServerMCP server built with Python FastMCP. It exposes two tools, web and rag. The tools use an A2A client to interact with the A2A servers for the web and rag agents.

Not exposed to the Internet. Used to demonstrate MCP and A2A together. We could have called the A2A servers directly from the conversation agent without MCP.
A2A Server for Foundry Agent (does RAG)This agent uses an Azure AI Foundry Agent with a hosted file-based RAG tool to provide answers about Contoso products.

Not exposed to the Internet. Communicates privately with the Azure AI Foundry project.
A2A Server for OpenAI Agent (does web searches)This agent uses an OpenAI Agent SDK agent with the hosted web search tool.

Not exposed to the Internet. Communicates over the Internet with the OpenAI backend. This could easily be replaced with an Azure AI Foundry Agent that uses Bing Search. As this is an example about A2A, using a different technology makes more sense. 😊

Before delving into the four different components, it is important to know that the mcp, web and rag containers do not use their internal ingresses to communicate over TLS. That means that the mcp container for example, will talk to the web container using http://ca-web instead of something like https://ca-web.internal.ACA_environment_default_domain.

There is something to be said for using messaging to facilitate communication between agents. They are a form of microservices after all. In this example however, all communication is synchronous and uses HTTP.

This is a technical example that could be implemented in a single in-process agent with two tools. However, the emphasis is on multi-agent communication across process boundaries with Google’s Agent2Agent protocol.

Let’s gets started with the Conversation Agent!

Conversation Agent

The conversation agent maintains a conversation with the end user and keeps track of chat history. The agent, written in Semantic Kernel, has two tools:

  • web-search: uses the OpenAI Agent A2A server to search the web via OpenAI’s hosted web search tool
  • rag-search: uses the Azure AI Foundry A2A server to search for Contoso projects via a hosted RAG tool

The user interface to the agent is provided by Chainlit:

Chainlit UI

Above, I asked for information about a project. The agent is configured to use the rag-search tool to find project information. Under the hood, an A2A Server that wraps an Azure AI Foundry Agent is used to obtain this information. Via a filter, Chainlit supports visualizing when tools are called as can be seen at the top of the screen. It basically has hooks into the kernel object that gets created by Semantic Kernel.

The code for this Chainlit-hosted agent is on GitHub. The code in main.py uses an environment variable, MCP_SERVER_URL, that contains the address of the MCP server. As discussed above this will be http://containername/mcp (e.g., http://ca-mcp/mcp).

Following the typical Semantic Kernel approach, a kernel is created . Here is a snippet of code:

# Create the Semantic Kernel
        kernel = Kernel()
        
        # Add AI service to kernel
        ai_service = OpenAIChatCompletion(ai_model_id="gpt-4o")
        kernel.add_service(ai_service)
        logger.debug("Kernel and AI service initialized successfully")
        
        # Add MCP tools plugin to kernel (uses global client)
        tools_plugin = MCPToolsPlugin()
        kernel.add_plugin(tools_plugin, plugin_name="mcp_tools")
        logger.debug("MCP tools plugin added to kernel")

Note that we are not using Semantic Kernel’s built-in support for remote MCP servers that use streamable HTTP. Instead, we create a plugin via the MCPToolsPlugin class. That class defines two kernel functions, rag_search and web_search. In such a function, you can do what you want. I did not have to use MCP and could have called the A2A servers directly using the A2A client.

In our functions, we do use the MCP client from FastMCP to call the appropriate tool on the MCP server. The call to the A2A servers is implemented in the MCP server’s tools.

⚠️ This approach was chosen to illustrate that even if your framework does not natively support MCP, under the hood this is always LLM function calling. Kernel functions in Semantic Kernel are simply an abstraction on top of function calling. If you use Semantic Kernel’s native support for MCP, the tools on the MCP server would automatically be created as kernel functions. This native support requires much less code.

Now that we have the conversation agent up and running with Chainlit and Semantic Kernel, let’s look at the MCP server.

MCP Server

The conversation agent uses an MCP client (from the FastMCP library) to call tools hosted by the MCP server. This illustrates the separation of tool implementation from agent implementation.

The MCP server is implemented in main.py. In its most basic form, an MCP server with a few tools is really simple. This MCP server just defines two tools: a web tool and a rag tool.

The web tool looks like this:

@mcp.tool()
async def web_tool(query: str) -> str:
    """
    Perform a web search for the given query.
    
    Args:
        query: The search query to perform
        
    Returns:
        Search results as a string
    """
    logger.info(f"Web tool called with query: {query}")
    logger.info(f"Using web A2A agent at: {WEB_A2A_BASE_URL}")
    
    try:
        return await _send_a2a_message(query, WEB_A2A_BASE_URL)
    except Exception as e:
        logger.error(f"Error performing web search: {e}")
        return f"Error performing web search: {str(e)}"

This tool only does one thing: send a message to the A2A server on the address in WEB_A2A_BASE_URL. In Azure Container Apps, this URL is http://ca-web. The rag tool is implemented in a similar way. You can check the code of the _send_a2a_message function on GitHub.

⚠️ The addresses of the A2A servers are supplied to the mcp container app via environment variables WEB_A2A_BASE_URL and RAG_A2A_BASE_URL.

We now have the following implemented:

conversation --tool call--> MCP Server --run tool--> A2A Server

All traffic is synchronous and over http (not https)! Everything depends on the correct tool call being made by the conversation agent and the agents in the A2A servers. The rest is just plumbing! No magic! 😊

A2A Servers

You can check my earlier posts about A2A servers for background information:

It is important to note that the A2A server (rag) uses Azure AI Foundry. To authenticate to AI Foundry, we need to use a managed identity.

The rag container needs the following environment variables:

  • RAG_A2A_BASE_URL: required to set the correct url in the agent card
  • INTERNAL_PORT: port to run on (e.g., 80)
  • FOUNDRY_PROJECT: url to the Foundry project (e.g., https://FOUNDRY-RESOURCE.services.ai.azure.com/api/projects/FOUNDRY-PROJECT
  • ASSISTANT_ID: id of the agent you want to use; needs to exist in Foundry project
  • CLIENT_ID: the client id of the user assigned managed identity; this identity is created in the Bicep script; a role is assigned as well

During deployment of the container apps, a managed identity (that has the client id above) is assigned to the container. In the A2A server code that contains the code to talk to Foundry, this identity is used as follows:

if client_id:
            logger.info(f"Using ManagedIdentityCredential with client ID: {client_id}")
            credential = ManagedIdentityCredential(client_id=client_id)
        else:
            logger.info("Using DefaultAzureCredential")
            credential = DefaultAzureCredential()

This allows for the use of the Azure CLI identity when the rag agent is running on you local machine. Full code is in Agent_Executor.py.

⚠️ If you run the rag A2A server on your local machine, ensure you allow your IP address in the firewall settings of the Azure AI Foundry resource.

Full code for the A2A servers:

Deployment

To make it easy to deploy the containers to the Azure Container Apps environment (discussed in previous post), use the following script: https://github.com/gbaeke/multi_agent_aca/blob/main/deploy_containers.sh

At the top of the script, change the variables to match your environment:

ACR_NAME="SHORT_ACR_NAME"
ACR_URL="SHORT_ACR_NAME.azurecr.io"
RESOURCE_GROUP="RESOURCE_GROUP"
CONTAINER_APP_ENV="CONTAINER_APP_ENV_NAME"
MANAGED_IDENTITY="MANAGED_IDENTITY_NAME"

To deploy, simply run deploy_containers.sh --to-build conversation,mcp,web,rag. This does the following:

  • Builds and pushes the four containers using an ACR Task (no local Docker required)
  • Deploys the four containers with appropriate secrets and environment variables; serets are read from a .env file

Ensure that you have this .env in the same folder with the following values:

OPENAI_API_KEY="your_openai_api_key_here"
# Replace with your actual OpenAI API key

FOUNDRY_PROJECT="your_foundry_project_url"
# The URL of the Foundry project endpoint you're connecting to
# Find it in the properties of the AI Foundry project

ASSISTANT_ID="your_assistant_id_here"
# The unique ID of the agent you're referencing

This should deploy the four containers as shown below:

conversation, mcp, web and rag containers

Now grab the ingress URL (aka Application Url) of the conversation container:

Application URL (ingress URL) to the conversation app

Paste that URL in your browser. Hopefully the Chainlit UI is shown. If not, check the following:

  • Chainlit container has the MCP_SERVER_URL set to http://ca-mcp/mcp and also has you OpenAI key in OPENAI_API_KEY
  • MCP container has the WEB_A2A_BASE_URL and RAG_A2A_BASE_URL url set to http://ca-web and http://ca-rag
  • Web container has WEB_A2A_BASE_URL set to http://ca-web and also has an OPENAI_API_KEY
  • Rag container has RAG_A2A_BASE_URL set to http://ca-rag and has environment variables set to use the Azure AI Foundry agent; also check the managed identity of the container has access rights to AI Foundry

Normally these should all be set by both the Bicep and the container deployment script.

Wrapping Up

If you’ve made it this far and tried to implement this yourself, you’ve likely realized how much effort it takes to get everything up and running. About 99% of the work is infrastructure and plumbing; only 1% is actual agent code. In more complex agentic applications, the ratio may shift slightly, but infrastructure will still dominate the effort.

We have not even touched on things like logging, metrics, tracing the end-to-end communication path, load balancing, saving agent state and much, much more.

This brings me back to a key point from an earlier post:


If you can build your multi-agent solution in-process, or use an agent PaaS like Azure AI Foundry, do it.


Only choose the approach I described above when no other viable option exists or when you’re building larger solutions where multiple teams develop agents that must coexist within the same system.

Deploying AI Foundry Agents and Azure Container Apps to support an Agent2Agent solution

In previous posts, I discussed multi-agent solutions and the potential use of Google’s Agent2Agent protocol (A2A). In this post, we will deploy the infrastructure for an end-to-end solution like follows:

Multi-agent solution in Azure

Here’s a short description of the components.

ComponentDescription
Foundry ProjectBasic Foundry project with a private endpoint. The private endpoint ensures private communication between the RAG Agent container and the Azure Foundry agent.
Virtual NetworkProvides subnet to integrate Azure Container Apps Environment in a private network. This allows container apps to connect to Azure AI Foundry privately.
Container Apps EnvironmentIntegrated in our private network. Hosts Container Apps.
Container AppsContainer apps for conversation agent, MCP server, RAG agent and web agent. Only the conversation agent is publicly available.
Main components of the deployment

In what follows, we will first provide more information about Azure AI Foundry and then proceed to deploy all components except the Azure Container Apps themselves. We will deploy the actual app components in a follow-up post.

Azure AI Foundry Project

Azure AI Foundry is Microsoft’s enterprise platform for building, deploying, and managing AI applications—especially those using large language models (LLMs) and generative AI. It brings together everything you need: production-ready infrastructure, access to powerful models from providers like OpenAI, Mistral, and Meta, and tools for customization, monitoring, and scaling—all in one unified environment.

It’s designed to support the full AI development lifecycle:

  • Explore and test models and services
  • Build and customize applications or agents
  • Deploy to production
  • Monitor, evaluate, and improve performance

You can work either through the Azure AI Foundry portal or directly via SDKs in your preferred development environment.

You will do your work in a project. When you create a project in Azure AI Foundry, you’ll choose between two types:

Foundry Project

This type is recommended for most cases and is what we will use to define our RAG agent. Agents in projects are generally available (GA). You deploy models like gpt-4o directly to the project. There is no need to create a connection to an Azure OpenAI resource. It can be configured with a private endpoint to ensure private communication.

This matches exactly with our needs. Note that we will deploy a basic Foundry environment with a private endpoint and not a standard environment. For more information about basic versus standard, check the Foundry documentation.

Later, when we create the resources via Bicep, two resources will be created:

  • The Azure AI Foundry resource: with private endpoint
  • The Azure AI Foundry Project: used to create our RAG agent

Hub-based Project

This type has some additional options like Prompt Flow. However, agents in hub-based projects are not generally available at the time of writing. A hub-based project is not the best match for our needs here.

⚠️ In general, always use an Foundry Project versus a Hub-based Project unless you need a specific feature that, at the time of creation, is not yet available in Foundry projects.

As explained above, a Foundry project is part of an AI Foundry resource. Here is the resource in the portal (hub-based projects are under AI Hubs):

AI Foundry resource

Inside the resource, you can create a project. The above resource has one project:

Projects in the Foundry resource: your Foundry Project

To work with your project, you can click Go to Azure AI Foundry portal in the Overview tab:

In the Foundry Portal, you can proceed to create agents. However, if you have enabled a private endpoint, ensure you can access your Azure virtual network via a jump host or VPN. If that is not possible, allow your IP to access the Foundry resource in the Networking section of the resource. When you do not have access, you will see the following error:

No access to manage agents in the project

⚠️ Even after giving access, it will take a while for the change to propagate.

If you have access, you will see the following screen to add and configure agents:

Creating and debugging agents in your AI Foundry Project

Deployment with Bicep

You can check https://github.com/gbaeke/multi_agent_aca/tree/main/bicep to find Bicep files together with a shell script to deploy the resources. Also check the README for more information.

In Bicep, you first create an account (type is Microsoft.CognitiveServices/accounts). This matches the fndry-a2a resource in one of the screenshots above. In a later step, you add the project. The snippet below shows how the account gets created:

resource account 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' = {
  name: aiFoundryName
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  kind: 'AIServices'
  sku: {
    name: 'S0'
  }
  properties: {
    // Networking
    publicNetworkAccess: 'Enabled'
    
    networkAcls: {
      bypass: 'AzureServices'
      defaultAction: 'Deny'
      ipRules: [
        {
          value: 'IP address'
        }
      ]
    }

    // Specifies whether this resource support project management as child resources, used as containers for access management, data isolation, and cost in AI Foundry.
    allowProjectManagement: true

    // Defines developer API endpoint subdomain
    customSubDomainName: aiFoundryName

    // Auth
    disableLocalAuth: false
  }
}

It’s at this level you block public network access. The private endpoint and related network resources are created in other sections of the Bicep file.

Once you have this account, you can create the project. This matches with the fndry-a2a-proj project in one of the screenshots above. Here is the Bicep snippet:

resource project 'Microsoft.CognitiveServices/accounts/projects@2025-04-01-preview' = {
  name: defaultProjectName
  parent: account
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {}
}

Later, we will create agents in this project. However, an agent needs a supported model. In this case, we will use gpt-4o-mini so we need to deploy it:

resource modelDeployment 'Microsoft.CognitiveServices/accounts/deployments@2024-10-01'= {
  parent: account
  name: 'gpt-4o-mini'
  sku : {
    capacity: 1
    name: 'GlobalStandard'
  }
  properties: {
    model:{
      name: 'gpt-4o-mini'
      format: 'OpenAI'
      version: '2024-07-18'
    }
  }
}

⚠️ Above, a capacity of 1 only allows for 1000 tokens per minute. You will probably want to increase that. If not, you run into issues when you test your agents because you will quickly hit the limit.

In the Foundry Portal, the model is shown as follows:

gpt-4o-mini deployment (next to manually deployed gpt-4o)

I will not go into the rest of the Bicep code. Most of it is network related (network, subnets, private endpoint, private DNS, DNS network links, etc..).

Creating the RAG Agent

Although we can create the agent using the Foundry SDK, we will create and test it via the Foundry Portal. As a first step, create or modify an agent. You might get a question first about the model you want to use with your agents.

In your agent, do the following:

  • give the agent a name
  • select a model from the list of deployed models
  • set instructions

I used the following instructions:

You retrieve information about Contoso projects using your knowledge tools. Always use your knowledge tools to answer the user. If you cannot find the answer via tools, respond you do not know.

Name, model and instructions

Next, scroll down and click + Add next to Knowledge. You will see the following screen:

List of agent knowledge tool

Select the Files tool and upload the files from https://github.com/gbaeke/multi_agent_aca/tree/main/project_descriptions. Use git clone https://github.com/gbaeke/multi_agent_aca.git to grab those files.

After selecting the local files, click Upload and Save to upload these files so the agent can search them. Behind the scenes, the files are chunked, chunks are vectorized and stored in a vector database. However, this is all hidden from you. Your agent configuration should now show the knowledge tool:

Knowledge tool added to agent

You should now test your agent. At the top of the configuration section, there is a Try in Playground link.

When I ask about EduForge, I get the following:

Asking about EduForge with a reponse from the files tool (+ annotation)

When you click View Run Info (at the end of the response), the use of the tool should be shown in the trace:

Tracing shows the tool calls and the file_search tool

If this works, you have a simple agent in Foundry that has access to a file_search tool to perform RAG (retrieval-augmented generation).

Wrapping up

We have now deployed the RAG agent with Azure AI Foundry. We created a Foundry resource in Azure with a private endpoint. The Foundry resource has one project within it. The project contains our RAG agent.

But remember, we want to wrap this agent with Google’s Agent2Agent. To achieve that, we will deploy the A2A server that uses the Foundry agent as a container in the Container Apps Environment.

We will take a look at how that works in a next post. In that post, we will use these agents as tools via MCP and provide the MCP tools to our conversation agent. The conversation agent will use Semantic Kernel.

Stay tuned! 😊

Google’s A2A: taking a closer look

In the previous post, I talked about options to build multi-agent solutions. The last option used Google’s A2A. A2A provides a wrapper around your agent, basically a JSON-RPC API, that standardizes how you talk to your agent. In this post we take a closer look at the basics of A2A with simple synchronous message exchange.

⚠️ A2A is still in development. We do not use it in production yet!

The idea is to build solutions that look like this (just one of the many possibilities):

The conversation agent is an agent that uses tools to get the job done. It wouldn’t be much of an agent without tools right? The tools are custom tools created by the developer that call other agents to do work. The other agents can be written in any framework and use any development language. How the agent works internally is irrelevant. When the conversation agent detects (via standard function calling) that the RAG tool needs to be executed, that tool will call the RAG agent over A2A and return the results.

A2A does not dictate how you build your agent. In the example below, an Azure AI Foundry Agent sits at the core. That agent can use any of its hosted tools or custom tools to get the job done. Because this is a RAG Agent, it might use the built-in Azure AI Search or SharePoint knowledge source. As a developer, you use the Azure AI Foundry SDK or Semantic Kernel to interact with your agent as you see fit. Although you do not have to, it is common to wrap your agent in a class and provide one or more methods to interact with it. For example, an invoke() method and an invoke_streaming() method.

Here is a minimal example for the AI Foundry Agent (the yellow box):

class RAGAgent:
    def __init__(self):
        # INITIALIZATION CODE NOT SHOWN
        self.project = AIProjectClient(
            credential=DefaultAzureCredential(),
            endpoint=endpoint)
        self.agent = self.project.agents.get_agent(agent_id)

    async def invoke(self, question: str) -> str:
        thread = self.project.agents.threads.create()

        message = self.project.agents.messages.create(
            thread_id=thread.id,
            role="user",
            content=question
        )
        run = self.project.agents.runs.create_and_process(
            thread_id=thread.id,
            agent_id=self.agent.id)
        messages = list(self.project.agents.messages.list(thread_id=thread.id, order=ListSortOrder.ASCENDING))

        # ...

This code has nothing to do with Google A2A and could be implemented in many other ways. This is about to change because we will now call the above agent from A2A’s AgentExecutor. The AgentExecutor is a key server‑side interface: when a client sends a message, the A2A server calls execute() on your AgentExecutor instance, and your implementation handles the logic and sends updates via an event queue. Here’s how your agent is used by A2A. When a client sends a message it works its way down to your agent via several A2A components:

It’s important to understand the different types of message exchange in A2A. This post will not look at all of them. You can find more information in the A2A documentation. This post uses synchronous messaging via message/send where the response is a simple message and not a, potentially longer running, task.

Let’s dive into the AgentExecutor (it processes the message we send) and work our way up to the A2A client.

AgentExecutor

Let’s take a look at a bare bones implementation of AgentExecutor that works with plain/text input and output messages and without streaming:

Client --message--> A2A Server --> Agent Executor --> Agent

and

Agent --> Agent Executor --> A2A Server --message--> Client
class RAGAgentExecutor(AgentExecutor):

    def __init__(self):
        self.agent = RAGAgent()

    async def execute(self, context: RequestContext, event_queue: EventQueue):
        message_text = context.get_user_input()
        
        result = await self.agent.invoke(message_text)

        await event_queue.enqueue_event(new_agent_text_message(result))
        
    async def cancel(self, context: RequestContext, event_queue: EventQueue):
        raise Exception("Cancel not supported")

When a message is sent to the A2A server via JSON-RPC, the execute() method of the RAGAgentExecutor is called. At server startup, __init__ creates our AI Foundry RAGAgent which does the actual work.

Inside the execute() method, we assume the context contains a message. We use the get_user_input() helper to extract the message text (user query). We then simply call our agent’s invoke() method with that query and return the result via the event_queue. The A2A server uses an event_queue to provide responses back to the caller. In this case, the response will be a simple plain/text message.

This is probably as simple as it gets and is useful to understand A2A’s basic operation. In many cases though, you might want to return a longer running task instead of a message and provide updates to the client via streaming. That would require creating the task and streaming the task updates to the client. The client would need to be modified to handle this.

But wait, we still need to create the server that uses this AgentExecutor. Let’s take a look!

A2A Server

The A2A Python SDK uses starlette and uvicorn to create the JSON-RPC server. You don’t really need to know anything about this because A2A does this under the covers for you. The server needs to do a couple of things:

  • Create one or more skills: skills represent a specific capability or function your agent offers—for instance, “currency conversion,” “document summary” or “meeting scheduling”.
  • Create an agent card: an agent card is like a business card for your agent; it tells others what the agent can do; the above skills are part of the agent card; the agent card is published at /.well-known/agent.json on the agents domain (e.g., localhost:9999 on your local machine)
  • Create a request handler: the request handler ties the server to the AgentExecutor you created earlier
  • Create the A2AStarletteApplication: it ties the agent card and the request handler together
  • Serve the A2AStarletteApplication with uvicorn on an address and port of your choosing

This is what it looks like in code:

import logging
import uvicorn
from a2a.server.apps import A2AStarletteApplication
from a2a.server.request_handlers import DefaultRequestHandler
from a2a.server.tasks import InMemoryTaskStore
from a2a.types import AgentCapabilities, AgentCard, AgentSkill
from agent_executor import RagAgentExecutor

def main():
    skill = AgentSkill(
        id="rag_skill",
        name="RAG Skill",
        description="Search knowledge base for project information",
        tags=["rag", "agent", "information"],
        examples=["What is project Astro and what tech is used in it?"],
    )
    agent_card = AgentCard(
        name="RAG Agent",
        description="A simple agent that searches the knowledge base for information",
        url="http://localhost:9998/",
        defaultInputModes=["text"],
        defaultOutputModes=["text"],
        skills=[skill],
        version="1.0.0",
        capabilities=AgentCapabilities(),
    )
    request_handler = DefaultRequestHandler(
        agent_executor=RagAgentExecutor(),
        task_store=InMemoryTaskStore(),
    )
    server = A2AStarletteApplication(
        http_handler=request_handler,
        agent_card=agent_card,
    )
    uvicorn.run(server.build(), host="0.0.0.0", port=9998)
if __name__ == "__main__":
    main()

Validating the agent card

When you run the A2A server on your local machine and expose it to the public with ngrok or other tools, you can use https://a2aprotocol.ai/a2a-protocol-validator to validate it. When I do this for the RAG Agent, I get the following:

In JSON, the agent card is as follows:

{
  "capabilities": {},
  "defaultInputModes": [
    "text"
  ],
  "defaultOutputModes": [
    "text"
  ],
  "description": "A simple agent that searches the knowledge base for information",
  "name": "RAG Agent",
  "protocolVersion": "0.2.5",
  "skills": [
    {
      "description": "Search knowledge base for project information",
      "examples": [
        "What is project Astro and what tech is used in it?"
      ],
      "id": "rag_agent",
      "name": "RAG Agent",
      "tags": [
        "rag",
        "agent",
        "information"
      ]
    }
  ],
  "url": "http://Geerts-MacBook-Air-2.local:9998/",
  "version": "1.0.0"
}

Now it is time to actually start talking to the agent.

Using the A2A client to talk to the agent

With the server up and running and the Agent Card verified, how do we exchange messages with the server?

In our case, where the server supports only text and there is no streaming, the client can be quite simple:

  • Create an httpx client and set timeout higher depending on how long it takes to get a response; this client is used by the A2ACardResolver and A2AClient
  • Retrieve the agent card with the A2ACardResolver
  • Create a client with A2AClient. It needs the agent card as input and will use the url in the agent card to connect to the A2A server
  • Create a Message, include it in a MessageRequest and send the MessageRequest with the client. We use the non-streaming message_send() method.
  • Handle the response from the client

The code below shows what this might look like:

import uuid

import httpx
from a2a.client import A2ACardResolver, A2AClient
from a2a.types import (
    AgentCard,
    Message,
    MessageSendParams,
    Part,
    Role,
    SendMessageRequest,
    TextPart,
)

PUBLIC_AGENT_CARD_PATH = "/.well-known/agent.json"
BASE_URL = "http://localhost:9998"


async def main() -> None:
    timeout = httpx.Timeout(200.0, read=200.0, write=30.0, connect=10.0)
    async with httpx.AsyncClient(timeout=timeout) as httpx_client:
        # Initialize A2ACardResolver
        resolver = A2ACardResolver(
            httpx_client=httpx_client,
            base_url=BASE_URL,
        )

        final_agent_card_to_use: AgentCard | None = None

        try:
            print(
                f"Fetching public agent card from: {BASE_URL}{PUBLIC_AGENT_CARD_PATH}"
            )
            _public_card = await resolver.get_agent_card()
            print("Fetched public agent card")
            print(_public_card.model_dump_json(indent=2))

            final_agent_card_to_use = _public_card

        except Exception as e:
            print(f"Error fetching public agent card: {e}")
            raise RuntimeError("Failed to fetch public agent card")

        client = A2AClient(
            httpx_client=httpx_client, agent_card=final_agent_card_to_use
        )
        print("A2AClient initialized")

        message_payload = Message(
            role=Role.user,
            messageId=str(uuid.uuid4()),
            parts=[Part(root=TextPart(text="Is there a project with the word Astro? If so, describe it."))],
        )
        request = SendMessageRequest(
            id=str(uuid.uuid4()),
            params=MessageSendParams(
                message=message_payload,
            ),
        )
        print("Sending message")

        response = await client.send_message(request)
        print("Response:")
        print(response.model_dump_json(indent=2))


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

Above, the entire response is printed as JSON. That is useful to learn what the responses look like. This is part of the response:

{
  "id": "6cc795d8-fa84-4734-8b5a-dccd3a22142d",
  "jsonrpc": "2.0",
  "result": {
    "contextId": null,
    "extensions": null,
    "kind": "message",
    "messageId": "fead200d-0ea4-4ccb-bf1c-ed507b38d79d",
    "metadata": null,
    "parts": [
      {
        "kind": "text",
        "metadata": null,
        "text": "RESPONSE FROM RAG AGENT"
      }
    ],
    "referenceTaskIds": null,
    "role": "agent",
    "taskId": null
  }
}

Simply sending the response as a string on the event queue results in a message with one text part. The result from the RAG agent is in the text property. For a longer running task with streaming updates, the response would be quite different.

You can now easily interact with your agent using this client. For example:

  • use the client in any application (need not be an agent)
  • use the client in a workflow engine like LangGraph
  • use the client in an agent tool; the agent can be written in any framework; when the agent identifies a tool call is needed, the tool is run which contains A2AClient code to interact with the A2A Agent

The entire flow

The diagram below shows the end-to-end flow:

Try it yourself

On GitHub, check https://github.com/gbaeke/multi_agent_aca/tree/main/a2a_simple for a skeleton implementation of a calculator agent. The CalculatorAgent class’s invoke() method always returns “I did not do anything!” It’s up to you to change that!

You can run this A2A server as-is and connect to it with test_client.py. To use an actual agent, update the CalculatorAgent class’s invoke() method with a real agent written in your preferred framework.

Check the README.md for more instructions.

That’s it for this post! In a next one, we will look at a more complex example that streams messages to the client. Stay tuned!

Building multi-agent solutions: what are your options?

When we meet with customers, the topic of a “multi-agent solution” often comes up. This isn’t surprising. There’s a lot of excitement around their potential to transform business processes, strengthen customer relationships, and more.

The first question you have to ask yourself though is this: “Do I really need a multi-agent solution?”. Often, we find that a single agent with a range of tools or a workflow is sufficient. If that’s the case, always go for that option!

On the other hand, if you do need a multi-agent solution, there are several things to think about. Suppose you want to build something like this:

Generic multi-agent setup

Users interact with a main agent that maintains the conversation with the user. When the user asks about a project, a RAG agent retrieves project information. If the user also asks to research or explain the technologies used in the project, the web agent is used to retrieve information from the Internet.

⚠️ If I were to follow my own advice, this would be a single agent with tools. There is no need for multiple agents here. However, let’s use this as an example because it’s easy to reason about.

What are some of your options to build this? The list below is not exhaustive but contains common patterns:

  • Choose a framework (or use the lower-level SDKs) and run everything in the same process
  • Choose an Agent PaaS like Azure AI Foundry Agents: the agents can be defined in the platform; they run independently and can be linked together using the connected agents feature
  • Create the agents in your framework of choice, run them as independent processes and establish a method of communication between these agents; in this post, we will use Google’s A2A (Agent-to-Agent) as an example. Other options are ACP (Agent Communication Protocol, IBM) or “roll your own”

Let’s look at these three in a bit more detail.

In-Process Agents

Running multiple agents in the same process and have them work together is relatively easy. Let’s look at how to do this with OpenAI Agents SDK. Other frameworks use similar approaches.

Multi-agent in-process using the OpenAI Agents SDK

Above, all agents are written using the OpenAI Agents SDK. In code, you first define the RAG and Web Agent as agents with their own tools. In the OpenAI Agents SDK, both the RAG tool and the web search tool are hosted tools provided by OpenAI. See https://openai.github.io/openai-agents-python/tools/ for more information about the FileSearchTool and the WebSearchTool.

Next, the Conversation Agent gets created using the same approach. This time however, two tools are addedd: the RAG Agent Tool and the Web Agent Tool. These tools get called by the Conversation Agent based on their description. This simply is tool calling in action where each tool calls another agent and returns the agent result. The way these agents interact with each other is hidden from you. The SDK simply takes care of it for you.

You can find an example of this in my agent_config GitHub repo. The sample code below shows how this works:

rag_agent = create_agent_from_config("rag")
web_agent = create_agent_from_config("web")

agent_as_tools = {
    "rag": {
        "agent": rag_agent,
        "name": "rag",
        "description": "Provides information about projects"
    },
    "web": {
        "agent": web_agent,
        "name": "web",
        "description": "Gets information about technologies"
    }
}

conversation_agent = create_agent_from_config("conversation", agent_as_tools)

result = await Runner.run(conversation_agent, user_question)

Note that I am using a helper function here that creates an agent from a configuration file that contains the agent instructions, model and tools. Check my previous post for more information. The repo used in this post uses slightly different agents but the concept is the same.

Creating a multi-agent solution in a single process, using a framework that supports calling other agents as tools, is relatively straightforward. However, what if you want to use the RAG Agent in other agents or workflows? In other words, you want reusability! Let’s see how to do this with the other approaches.

Using a Agent PaaS: Azure AI Foundry Agents

Azure AI Foundry Agents is a PaaS solution to create and run agents with enterprise-level features such as isolated networking. After creating an Azure AI Foundry resource and project, you can define agents in the portal:

Agents defined in Azure AI Foundry

⚠️ You can also create these agents from code (e.g., Foundry SDK or Semantic Kernel) which gives you extra flexibility in agent design.

The web and rag agents have their own tools, including hosted tools provided by Foundry, and can run on their own. This is already an improvement compared to the previous approach: agents can be reused from other agents, workflows or any other application.

Azure AI Foundry allows you to connect agents to each other. This uses the same approach as in the OpenAI Agents SDK: agents as tools. Below, the Conversation Agent is connected to the other two agents:

Connected Agents for the Conversation Agent

The configuration of a connected agent is shown below and has a name and description:

It all fits together like in the diagram below:

Multi-agent with Azure AI Foundry

As discussed above, each agent is a standalone entity. You can interact with these agents using the AI Foundry Agents protocol, which is an evolution of the OpenAI Assistant’s protocol. You can read more about it here. In short, to talk to an agent you do the following:

  • Create the agent in code or reference an existing agent (e.g., our conversation agent)
  • Create a thread
  • Put a message on the thread (e.g., the user’s question or a question from another agent via the connected agents principle)
  • Run the thread on the agent and grab the response

Below is an example in Python:

from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.agents.models import ListSortOrder

project = AIProjectClient(
    credential=DefaultAzureCredential(),
    endpoint="https://YOUR_FOUNDRY_ENDPOINT")

agent = project.agents.get_agent("YOUR_ASSISTANT_ID")

thread = project.agents.threads.create()
print(f"Created thread, ID: {thread.id}")

message = project.agents.messages.create(
    thread_id=thread.id,
    role="user",
    content="What tech is used in some of Contoso's projects?"
)

run = project.agents.runs.create_and_process(
    thread_id=thread.id,
    agent_id=agent.id)

if run.status == "failed":
    print(f"Run failed: {run.last_error}")
else:
    messages = project.agents.messages.list(thread_id=thread.id, order=ListSortOrder.ASCENDING)

    for message in messages:
        if message.text_messages:
            print(f"{message.role}: {message.text_messages[-1].text.value}")

The connected agents feature uses the same protocol under the hood. Like in the OpenAI Agents SDK, this is hidden from you.

When you mainly use Azure AI Foundry agents, there is no direct need for agent-to-agent protocols like A2A or ACP. In fact, even when you have an agent that is not created in Azure AI Foundry, you can simply create a tool in that agent. The tool can then use the thread/message/run approach to get a response from the agent hosted in Foundry. This can all run isolated in your own network if you wish.

You could argue that the protocol used by Azure AI Foundry is not an industry standard. You cannot simply use this protocol in combination with other frameworks. Unless you use something like https://pypi.org/project/llamphouse/, a project written by colleagues of mine which is protocol compatible with the OpenAI Assistants API.

Let’s take a look at the third approach which uses a protocol that aspires to be a standard and can be used together with any agent framework: Google’s A2A.

Using Google’s A2A in a multi-agent solution

The basic idea of Google’s A2A is the creation of a standard protocol for agent-to-agent communication. Without going into the details of A2A, that’s for another post, the solution looks like this:

A multi-agent solution with A2A

A2A allows you to wrap any agent, written in any framework, in a standard JSON-RPC API. With an A2A client, you can send messages to the API which uses an Agent Executor around your actual agent. Your agent provides the response and a message is sent back to the client.

Above, there are two A2A-based agents:

  • The RAG Agent uses Azure AI Foundry and its built-in vector store tool
  • The Web Agent uses OpenAI Agent SDK and its hosted web search tool

The conversation agent can be written in any framework as long as you define tools for that agent that use the A2A protocol (via an A2A client) to send messages to the other agents. This again is agents as tools in action.

To illustrate this standards-based approach, let’s use the A2A Inspector to send a message to the RAG Agent. As long as your agent has an A2A wrapper, this inspector will be able to talk to it. First, we connect to the agent to get its agent card:

Connecting to the RAG Agent with A2A

The agent card is defined in code and contains information about what the agent can do via skills. Once connected, I can send a message to the agent using the A2A protocol:

Sending a message which results in a task

The message that got sent was the following (JSON-RPC):

{
  "id": "msg-1752245905034-georiakp8",
  "jsonrpc": "2.0",
  "method": "message/send",
  "params": {
    "configuration": {
      "acceptedOutputModes": [
        "text/plain",
        "video/mp4"
      ]
    },
    "message": {
      "contextId": "27effaaa-98af-44c4-b15f-10d682fd6496",
      "kind": "message",
      "messageId": "60f95a30-535a-454f-8a8d-31f52d7957b5",
      "parts": [
        {
          "kind": "text",
          "text": "What is project Astro (I might have the name wrong though)"
        }
      ],
      "role": "user"
    }
  }
}

This was the response:

{
  "artifacts": [
    {
      "artifactId": "d912666b-f9ff-4fa6-8899-b656adf9f09c",
      "parts": [
        {
          "kind": "text",
          "text": "Project \"Astro\" appears to refer to \"Astro Events,\" which is a web platform designed for users to discover, share, and RSVP to astronomy-related events worldwide. The platform includes features such as interactive sky maps, event notifications, and a community forum for both amateur and professional astronomers. If you were thinking about astronomy or space-related projects, this may be the correct project you had in mind【4:0†astro_events.md】. If you're thinking of something else, let me know!"
        }
      ]
    }
  ],
  "contextId": "27effaaa-98af-44c4-b15f-10d682fd6496",
  "history": [
    HISTORY HERE
  ],
  "id": "d5af08b3-93a0-40ec-8236-4269c1ed866d",
  "kind": "task",
  "status": {
    "state": "completed",
    "timestamp": "2025-07-11T14:58:38.029960+00:00"
  },
  "validation_errors": []
}

If you are building complex multi-agent solutions, where multiple teams write their agents in different frameworks and development languages, establishing communication standards pays off in the long run.

However, this approach is much more complex than the other two approaches. We have only scratched the surface of A2A here and have not touched on the following aspects:

  • How to handle authentication?
  • How to handle long running tasks?
  • How to scale your agents to multiple instances and how to preserve state?
  • How to handle logging and tracing across agent boundaries?

⚠️ Most of the above is simply software engineering and has not much to do with LLM-based agents!

Conclusion

In this article, we discussed three approaches to building a multi-agent solution

ApproachComplexityReusabilityStandardizationBest For
In-processLowLimitedNoSimple, single-team use cases
Agent PaaSMediumGoodNo (vendor-specific)Org-wide, moderate complexity
A2A ProtocolHighExcellentYesCross-team, cross-platform needs

When you really need a multi-agent solution, I strongly believe that the first two approaches should cover 90% of use cases.

In complex cases, the last option can be considered although it should not be underestimated. To make this option a bit more clear, a follow-up article will discuss how to create and connect agents with A2A in more detail.

Creating an agent with the Azure AI Agent SDK

Source: Microsoft

Azure AI Agents Service simplifies building intelligent agents by combining advanced AI models, tools, and technology from Microsoft, OpenAI, and partners like Meta and Cohere. It enables integration with knowledge sources such as Bing, SharePoint, and Azure AI Search, and lets agents perform actions across Microsoft and third-party applications using Logic Apps, Azure Functions, and Code Interpreter. With Azure AI Foundry, you get an intuitive agent-building experience, backed by enterprise-grade features like customizable storage, private networking, secure authentication, and detailed observability through OpenTelemetry.

At the time of this writing (December 2024), Azure AI Foundry did not provide a user interface yet to create these agents in the portal. In this post, we will use the Azure AI Foundry SDK to create the agent from code.

You can find the code in this repository: https://github.com/gbaeke/agent_service/tree/main/agentui

How does it work?

The agent service uses the same wire protocol as the Azure OpenAI Assistants API. The Assistants API was developed as an alternative to the chat completions API. The big difference is that the Assistants API is stateful: your interactions with the AI model are saved as messages on a thread. You simply add messages to the thread for the model to respond.

For more information, check this video:

To get started, you need three things:

  • An agent: the agent uses a model and instructions about how it should behave. In addition, you add knowledge sources and tools. Knowledge sources can be files you upload to the agent or existing sources such as files on SharePoint. Tools can be built-in tools like code interpreter or custom tools like any API or custom functions that you write.
  • A thread: threads receive messages from users and the assistant (the model) responds with assistant messages. In a chat application, each of the user’s conversations can be a thread. Note that threads are created, independent of an agent. The thread is associated with the agent when you add a message.
  • Messages: you add messages to a thread and check the thread for new messages. Messages can contain both text and images. For example, if you use the code interpreter tool and you asked for a chart, the chart will be created and handed to you as a file id. To render the chart, you would need to download it first based on its id.

Creating the agent

Before we create the agent, we need to connect to our Azure AI Foundry project. To do that (and more), we need the following imports:

import os
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import CodeInterpreterTool
from azure.identity import DefaultAzureCredential
from fastapi import FastAPI
from typing import Dict
from azure.ai.projects.models import FunctionTool, ToolSet
from typing import Any, Callable, Set, Dict
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import requests
import base64

We will use the AIProjectClient to get a reference to an Azure AI Foundry project. We do that with the following code:

# Set up credentials and project client
credential = DefaultAzureCredential()
conn_str = os.environ["PROJECT_CONNECTION_STRING"]
project_client = AIProjectClient.from_connection_string(
    credential=credential, conn_str=conn_str
)

Note that we authenticate with Entra ID. On your local machine, ensure you are logged on via the Azure CLI with az login. Your account needs at least AI Developer access to the Foundry project.

You also need the connection string to your project. The code requires it in the PROJECT_CONNECTION_STRING environment variable. You can find the connection string in Azure AI Foundry:

AI Foundry project connection string

We can now create the agent with the following code:

agent = project_client.agents.create_agent(
    model="gpt-4o-mini",
    name="my-agent",
    instructions="You are helpful agent with functions to turn on/off light and get temperature in a location. If location is not specified, ask the user.",
    toolset=toolset
)

Above, the agent uses gpt-4o-mini. You need to ensure that model is deployed in your Azure AI Foundry Hub. In our example, we also provide the assistant with tools. We will not provide it with knowledge.

What’s inside the toolset?

  • built-in code interpreter tool: provides a way for the model to write Python code, execute it and provide the result back to the model; the result can be text and/or images.
  • custom tools: in our case, custom Python functions to turn on/off lights and look up weather information in a location.

There are other tool types that we will not discuss in this post.

Adding tools

Let’s look at adding our own custom functions first. In the code, three functions are used as tools:

def turn_on_light(room: str) -> str:
    return f"Light in room {room} turned on"

def turn_off_light(room: str) -> str:
    return f"Light in room {room} turned off"

def get_temperature(location: str) -> str:
    # check the github repo for the code

The SDK provides helpers to turn these functions into tools the assistant understands:

user_functions: Set[Callable[..., Any]] = {
    turn_on_light,
    turn_off_light,
    get_temperature
}
functions = FunctionTool(user_functions)
toolset = ToolSet()
toolset.add(functions)

Now we need to add the built-in code interpreter:

code_interpreter = CodeInterpreterTool()
toolset.add(code_interpreter)

Now we have a toolset with three custom functions and the code interpreter. This toolset is given to the agent via the toolset parameter.

Now that we have an agent, we need to provide a way to create a thread and add messages to the thread.

Creating a thread

We are creating an API so we will create and endpoint to create a thread:

@app.post("/threads")
def create_thread() -> Dict[str, str]:
    thread = project_client.agents.create_thread()
    return {"thread_id": thread.id}

As discussed earlier, a thread is created as a separate entity. It is not associated with the agent when you create it. When we later add a message, the thread will be associated with the agent that should process the message.

Working with messages

Next, we will provide an endpoint that accepts a thread id and a message you want to add to it:

@app.post("/threads/{thread_id}/messages")
def send_message(thread_id: str, request: MessageRequest):
    created_msg = project_client.agents.create_message(
        thread_id=thread_id,
        role="user",
        content=request.message  # Now accessing message from the request model
    )
    run = project_client.agents.create_and_process_run(
        thread_id=thread_id,
        assistant_id=agent.id
    )
    if run.status == "failed":
        return {"error": run.last_error or "Unknown error"}

    messages = project_client.agents.list_messages(thread_id=thread_id)
    last_msg = messages.get_last_message_by_sender("assistant")
    
    last_msg_text = last_msg.text_messages[0].text.value if last_msg.text_messages else None
    last_msg_image = last_msg.image_contents[0].image_file if last_msg.image_contents else None
    
    last_msg_image_b64 = None
    if last_msg_image:
        file_stream = project_client.agents.get_file_content(file_id=last_msg_image.file_id)
        base64_encoder = base64.b64encode
        byte_chunks = b"".join(file_stream)  # Concatenate all bytes from the iterator.
        last_msg_image_b64 = base64_encoder(byte_chunks).decode("utf-8")
        
    return {"assistant_text": last_msg_text, 
            "assistant_image": last_msg_image_b64}

The code is pretty self-explanatory. In summary, here is what happens:

  • a message is created with the create_message method; the message is added to the specified thread_id as a user message
  • the thread is run on the agent specified by the agent.id
  • to know if the run is finished, polling is used; the create_and_process_run hides that complexity for you
  • messages are retrieved from the thread but only the last assistant message is used
  • we extract the text and image from the message if it is present
  • when there is an image, we use get_file_content to retrieve the file content from the API; that functions returns an Iterator of bytes that are joined together and base64 encoded
  • the message and image are returned

Testing the API

When we POST to the threads enpoint, this is the response:

{
  "thread_id": "thread_meYRMrkRtUiI1u0ZGH0z7PEN"
}

We can use that id to post to the messages endpoint. For example in a .http file:

POST http://localhost:8000/threads/thread_meYRMrkRtUiI1u0ZGH0z7PEN/messages
Content-Type: application/json

{
    "message": "Create a sample bar chart"
}

The response to the above request should be something like below:

{
  "assistant_text": "Here is a sample bar chart displaying four categories (A to D) with their corresponding values. If you need any modifications or another type of chart, just let me know!",
  "assistant_image": "iVBORw0KGgoAAAANSUhEUgAABpYAAARNCAYAAABYAnNeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAB7CAAAewgFu0HU+AADWf0lEQ..."
}

In this case, the model determined that the code interpreter should be used to create the sample bar chart. When you ask for something simpler, like the weather, you get the following response:

{
  "assistant_text": "The current temperature in London is 11.4°C. If you need more information or updates, feel free to ask!",
  "assistant_image": null
}

In this case, our custom weather function was used to answer. The assistant determines what tools should be used to provide an answer.

Integration in a web app

The GitHub repository contains a sample UI to try the API:

Sample UI and a chat combining weather and plotting

Beautiful, is it not? 😂

Conclusion

The Azure AI Agent service makes it relatively easy to create an agent that has access to knowledge and tools. The assistant decides on its own how to use the knowledge and tools. However, you can steer the assistant via its instructions and influence how the assistant behaves.

The SDK makes it easy to add your own custom functions as tools, next to the built-in tools that it supports. Soon, there will be an Agent Service user interface in Azure AI Foundry. You will be able to create agents in code that reference the agents you have built in Foundry.

To try it for yourself, use the code in the GitHub repo. Note that the code is demo code with limited error handling. It’s merely meant to demonstrate first steps.

Enjoy and let me know what you build with it! 😉