End-to-end authorization with Entra ID and MCP

Building an MCP server is really easy. Almost every language and framework has MCP support these days, both from a client and server perspective. For example: FastMCP (Python, Typescript), csharp-sdk and many more!

However, in an enterprise scenario where users use a web-based conversational agent that uses tools on an MCP server, those tools might need to connect to back-end systems using the identity of the user. Take a look at the following example:

A couple of things are important here:

  • The user logs on to the app and requests an access token that is valid for the MCP Server; you need app registrations in Entra ID to make this work
  • The MCP Server verifies that this token is from Entra ID and contains the correct audience; we will use FastMCP in Python which has some built-in functionality for token validation
  • When the user asks the agent a question that requires Azure AI Search, the agent decides to make a tool call; the tool on the MCP server does the actual work
  • The tool needs access to the token (there’s a helper in FastMCP for that); next it converts the token to a token valid for Azure AI Search
  • The tool can now perform a search in Azure AI Search using new functionality discussed here

⚠️ MCP is actually not that important here. This technique which uses OAuth 2.0 and OBO flows in Entra ID is a well established pattern that’s been in use for ages!

🧑‍💻 Full source code here: https://github.com/gbaeke/mcp-obo

Let’s get started and get this working!

Entra ID App Registrations

In this case, we will create two registrations: one for the front-end and one for the back-end, the MCP Server. Note that the front-end here will be a command-line app that uses a device flow to authenticate. It uses a simple token cache to prevent having to log on time and again.

Front-end app registration

We will create this in the Azure Portal. I assume you have some knowledge of this so I will not provide detailed step-by-step instructions:

  • Go to App Registrations
  • Create a new registration, FastMCP Auth Web
  • In Authentication, ensure you enable Allow public client flows

You will need the client ID of this app registration in the MCP client we will build later.

Back-end app registration

This is for the MCP server and needs more settings:

  • Create a new registration, FastMCP Auth API
  • In Certificates and secrets, add a secret. We will need this to implement the on-behalf-of flow to obtain the Azure AI Search token
  • In Expose an API, set the Application ID URI to https://CLIENTIDOFAPP. I also added a scope, execute. In addition, add the front-end app client Id to the list of Authorized client applications:
Expose API of MCP app registration

In order to exchange a token for this service for Azure AI Search, we also need to add API permissions:

Permissions for Azure Cognitive, ehhm, AI Search

When you add the above permission, find it in APIs my organization uses:

MCP Client

We can now write an MCP client that calls the MCP server with an access token for the API we created above. As noted before, this will be a command-line app written in Python.

The code simply uses MSAL to initiate a device flow. It also uses a token cache to avoid repeated logins. The code can be found here.

Once we have a token, we can construct an MCP client (with FastMCP) as follows:

token = get_jwt_token()

headers["Authorization"] = f"Bearer {token}"

transport_url = "http://localhost:8000/mcp/"

transport = StreamableHttpTransport(
        url=transport_url,
        headers=headers
    )

client = MCPClient(transport=transport)

This code ensures that requests to the MCP server have an Authorization header that contains the bearer token acquired by get_jwt_token(). The MCP server will validate this token strictly.

The code to connect to the MCP server looks like this:

try:
    logger.info("Connecting to the MCP server...")
    
    # Use the client as an async context manager
    async with client:
        # List available tools on the server
        tools = await client.list_tools()
        logger.info(f"Found {len(tools)} tools on the server")
        
        # Call search tool
        logger.info("Calling search tool...")
        search_result = await client.call_tool("get_documents", {"query": "*"})
        logger.info(f"Search Result: {search_result.structured_content}")
        documents = search_result.structured_content.get("documents", [])
        if documents:
            logger.info("Documents found:")
            for doc in documents:
                name = doc.get("name", "Unnamed Document")
                logger.info(f"  - {name}")
        else:
            logger.info("No documents found.")
    
except Exception as e:
    logger.error(f"Error connecting to MCP server: {e}")

Note that there is no AI Agent involved here. We simply call the tool directly. In my case the document search only lists three documents out of five in total because I only have access to those three. We will look at the MCP server code to see how this is implemented next.

⚠️ Example code of the client is here: https://github.com/gbaeke/mcp-obo/blob/main/mcp_client.py

MCP Server

We will write an MCP server that uses the streamable-http transport on port 8000. The URL to connect to from the client then becomes http://localhost:8000/mcp. That was the URL used by the MCP client above.

The server has one tool: get_documents that takes a query (string) as parameter. By default, the query is set to * which returns all documents. The tool does the following:

  • Obtains the access token with the get_access_token() helper function from FastMCP
  • Exchanges the token for a token with scope https://search.azure.com/.default
  • Creates a SearchClient for Azure AI Search that includes the AI Search endpoint, index name and credential. Note that that credential has nothing to do with the token obtained above. It’s simply a key provided by Azure AI Search to perform searches. The token is used in the actual search request to filter the results.
  • Performs the search, passing in the token via the x_ms_query_source_authorization parameter. You need to use this version of the Azure AI Search Python library: azure-search-documents==11.6.0b12

Here is the code:

# Get the access token from the context
access_token: AccessToken = get_access_token()
original_token = access_token.token

# Exchange token for Microsoft Search token
logger.info("Exchanging token for Microsoft Search access")
search_result = await exchange_token(original_token, scope="https://search.azure.com/.default")
if not search_result["success"]:
    return {"error": "Could not retrieve documents due to token exchange failure."}
else:
    logger.info("Search token exchange successful")
    search_token = search_result["access_token"]
    search_client = SearchClient(endpoint="https://srch-geba.search.windows.net", index_name="document-permissions-push-idx", credential=AzureKeyCredential(os.getenv("AZURE_SEARCH_KEY")))
    results = search_client.search(search_text="*", x_ms_query_source_authorization=search_token, select="name,oid,group", order_by="id asc")
    documents = [
        {
        "name": result.get("name"),
        "oid": result.get("oid"),
        "group": result.get("group")
        }
        for result in results
    ]
    return {"documents": documents}

The most important work is done by the exchange_token() function. It obtains an access token for Azure AI Search that contains the oid (object id) of the user.

Here’s that function:

async def exchange_token(original_token: str, scope: str = "https://graph.microsoft.com/.default") -> dict:
    
    obo_url = f"https://login.microsoftonline.com/{TENANT_ID}/oauth2/v2.0/token"
    
    data = {
        "grant_type": "urn:ietf:params:oauth:grant-type:jwt-bearer",
        "client_id": CLIENT_ID,
        "client_secret": CLIENT_SECRET,
        "assertion": original_token,
        "scope": scope,
        "requested_token_use": "on_behalf_of"
    }
    
    try:
        response = requests.post(obo_url, data=data)
        
        if response.status_code == 200:
            token_data = response.json()
            return {
                "success": True,
                "access_token": token_data["access_token"],
                "expires_in": token_data.get("expires_in"),
                "token_type": token_data.get("token_type"),
                "scope_used": scope,
                "method": "OBO"
            }
        else:
            return {
                "success": False,
                "error": response.text,
                "status_code": response.status_code,
                "scope_attempted": scope,
                "method": "OBO"
            }
    except Exception as e:
        return {
            "success": False,
            "error": str(e),
            "method": "OBO"
        }

Above, the core is in the call to the obo_url which presents the original token to obtain a new one. This will only work if the API permissions are correct on the FastMCP Auth API app registration. When the call is successful, we return a dictionary that contains the access token in the access_token key.

Full code of the server: https://github.com/gbaeke/mcp-obo/blob/main/mcp/main.py

You have now seen the entire flow from client login to calling a method (tool) on the MCP server to connecting to Azure AI Search downstream via an on-behalf-of flow.

But wait! How do we create the Azure AI Search index with support for the permission filter? Let’s take look…

Azure AI Search Configuration

When you want to use permission filtering with the x_ms_query_source_authorization parameter, do the following:

  • Create the index with support for permission filtering
  • Your index needs fields like oid (object Ids) and group (group object Ids) with the correct permission filter option
  • When you add documents to an index, for instance with the push APIs, you need to populate the oid and group fields with identifiers of users and groups that have access
  • Perform a search with the x_ms_query_source_authorization like show below:
results = search_client.search(
search_text="*",
x_ms_query_source_authorization=token_to_use,
select="name,oid,group",
order_by="id asc"
)

Above, token_to_use is the access token for Azure AI Search!

On GitHub, check this notebook from Microsoft to create and populate the index with your own user’s oid. You will need an Azure Subscription with an Azure AI Search instance. The free tier will do. If you use VS Code, use the Jupyter extension to run this notebook.

At the time of this writing, this feature was in preview. Ensure you use the correct version of the AI Search library for Python: azure-search-documents==11.6.0b12.

Wrapping up

I hope this post gives you some ideas on how to build agents that use MCP tools with end-to-end user authentication and authorization. This is just one possible approach. Authorization in the MCP specification has evolved significantly in early 2025 and works somewhat differently.

For most enterprise scenarios where you control both code and configuration (such as with Entra ID), bearer authentication with OBO is often sufficient.

Also consider whether you need MCP at all. If you aren’t sharing tools across multiple agents or projects, a simple API might be enough. For even less overhead, you can embed the tool code directly in your agent and run everything in the same process. Simple and effective.

If you spot any errors or have questions, feel free to reach out!

Building Configurable AI Agents with the OpenAI Agents SDK

In this post, I will demonstrate how to build an AI agent system where agents can collaborate in different ways. The goal is to create agents that can either work independently with their own tools or collaborate with other agents through two distinct patterns: using agents as tools or handing off control to other agents.

In this post, I work directly with OpenAI models. You can also use Azure OpenAI if you want, but there are some caveats. Check this guide on using Azure OpenAI and potentially APIM with the OpenAI Agents SDK for more details.

All code can be found in this repo: https://github.com/gbaeke/agent_config. Not all code is shown in this post so be sure to check the repo.

Agent Factory and Configuration System

The core of this system is an agent factory that creates agents from JSON configurations stored either on the filesystem or in Redis. The factory reads configuration files that define:

  • Agent name and instructions
  • Which AI model to use (e.g., gpt-4o-mini)
  • Available tools from a centralized tool registry
  • Validation against a JSON schema

For example, a weather agent configuration looks like:

{
    "name": "Weather Agent",
    "instructions": "You are a helpful assistant for weather questions...",
    "model": "gpt-4o-mini",
    "tools": ["get_current_weather", "get_current_temperature", "get_seven_day_forecast"]
}

The agent factory validates each configuration against a schema and can load configurations from either JSON files in the configs/ directory or from Redis when USE_REDIS=True (env var). This flexibility allows for dynamic configuration management in a potential production setting. Besides the configuration above, other configuration could be useful such as MCP configuration, settings like temperature, guardrails and much more.

⚠️ Note that this is example code to explore ideas around agent configuration, agent factories, agents-as-tools versus handoffs, etc…

Tool Registry

All available tools are maintained in a centralized tools.py file that exports an all_tools dictionary. This registry includes:

  • Function-based tools decorated with @function_tool
  • External API integrations (like web search): the built-in web search tool from OpenAI is used as an example here
  • Remote service calls: example tool that uses a calculator agent exposes via an API (FastAPI); this is the same as agent-as-tool discussed below but the agent is remote and served as an API.

In a production environment, tool management would likely be handled differently – for example, through a dedicated tool registry service implementing the Model Context Protocol (MCP). This would allow tools to be dynamically registered, versioned, and accessed across multiple services while maintaining consistent interfaces and behaviors. The registry service could handle authentication, rate limiting, and monitoring of tool usage across all agents in the system.

Agent-as-Tool vs Handoff Patterns

The system supports two distinct collaboration patterns:

Agent-as-Tool

With this pattern, one agent uses another agent as if it were a regular tool. The main agent remains in control of the conversation flow. For example:

agent_as_tools = {
    "weather": {
        "agent": weather_agent,
        "name": "weather", 
        "description": "Get weather information based on the user's question"
    }
}
conversation_agent = create_agent_from_config("conversation", agent_as_tools)

When the conversation agent needs weather information, it calls the weather agent as a tool, gets the result, and continues processing the conversation. The main agent simply passes what is deems necessary to the agent used as a tool and uses the agent response to form an output.

The way you describe the tool is important here. It influences what the conversation agents sends to the weather agent as a parameter (the user’s question).

Handoff Pattern

With handoffs, control is transferred to another agent entirely. The receiving agent takes over the conversation until it’s complete or hands control back. This is implemented by passing agents to the handoffs parameter:

agent_handoffs = [simulator_agent]
conversation_agent = create_agent_from_config("conversation", {}, agent_handoffs)

The key difference is control: agent-as-tool keeps the original agent in charge, while handoffs transfer complete control to the receiving agent.

To implement the handoff pattern and to allow transfer back to the original agent, support from the UI is needed. In the code, which uses a simple text-based UI, this is done by using a current_agent variable that refers to the agent currently in charge and by falling back to the base conversation agent when the user types ‘exit`. Note that this pattern is quite tricky to implement correctly. Often, the main agent thinks it can do the simulation on its own. When the user does not type exit but asks to go back to the conversation agent, the simulator agent might seem to comply but in reality, you are still in the simulator. This can be solved by prompting both agents properly but do not expect it to be automatic.

A look at the code

If you look at agent_from_config.py (the main script), you will notice that it is very simple. Most of the agent creation logic is in agent_factory.py which creates the agent from a config file or a config stored in Redis.

# Create specialized agents
weather_agent = create_agent_from_config("weather")
news_agent = create_agent_from_config("news")
simulator_agent = create_agent_from_config("simulator")

# Configure agents as tools
agent_as_tools = {
    "weather": {
        "agent": weather_agent,
        "name": "weather",
        "description": "Get weather information based on the user's full question"
    },
    "news": {
        "agent": news_agent,
        "name": "news", 
        "description": "Get news information based on the user's full question"
    }
}

# Configure handoff agents
agent_handoffs = [simulator_agent]

# Create main agent with both patterns
conversation_agent = create_agent_from_config("conversation", agent_as_tools, agent_handoffs)

Above, we create three agents: weather, news (with OpenAI built-in web search) and simulator. These agents are used by the conversation agent created at the end. To provide the conversation agent with two agents as tools and one agent handoff, the create_agent_from_config function that returns a value of type Agent has two optional parameters:

  • a dictionary that with references to agents and their tool descriptions (used by the main agent to know when to call the agent)
  • a list with agents to handoff to

In this code, you need to build these arrays in code. This could also be done via the configuration system but that was not implemented.

To simulate a chat session, the following code is used:

async def chat():
    current_agent = conversation_agent
    convo: list[TResponseInputItem] = []
    
    while True:
        user_input = input("You: ")
        
        if user_input == "exit":
            if current_agent != conversation_agent:
                current_agent = conversation_agent  # Return to main agent
            else:
                break
        
        convo.append({"content": user_input, "role": "user"})
        result = await Runner.run(current_agent, convo)
        
        convo = result.to_input_list()
        current_agent = result.last_agent  # Track agent changes

We always start with the conversation agent. When the conversation agent decides to do a handoff, the last_agent property of the result of the last run will be the simulation agent. The current agent is then set to that agent so the conversations stays within the simulation agent. Note that the code also implements callbacks to tell you which agent is answering and what tools are called. Those callbacks are defined in agent_factory.py.

Built-in Tracing

The OpenAI Agents SDK includes tracing capabilities that are enabled by default. Every agent interaction, tool call, and handoff is automatically traced and can be viewed in the OpenAI dashboard. This provides visibility into:

  • Which agent handled each part of a conversation
  • What tools were called and when
  • Performance metrics for each interaction
  • The full conversation flow across multiple agents

Tracing can be customized or disabled if needed, but the default implementation provides comprehensive observability out of the box.

This is what the traces look like:

These traces provide detailed insights into a conversation’s flow. Track down issues and adjust agent configs, especially instructions, when things go awry.

Conclusion

In this post, we looked at a simple approach to build multi-agent systems using the OpenAI Agents SDK. The combination of configurable agents, centralized tool management, and flexible collaboration patterns creates a foundation for more complex AI workflows. The agent factory pattern allows for easy deployment and management of different agent configurations, while the built-in tracing provides the observability needed for production systems.

However, much more effort is required to implement this in production with more complex agents. As always keep things as simple as possible and implement the minimum amount of agents possible. You should also ask yourself if you even need multi-agent because state management, chat history, tracing, testing etc… become increasingly complex in a multi-agent world.