AI Interop in 5 minutes - part 1 - Interop Protocols
Simple explanation of MCP, ACP, ANP, A2A, RAG, and more!
Ok, 4 minutes, 55 seconds left! By the time you finish reading this article, you will have a good, high level understanding of the AI interop landscape.
Ready? Let’s go!
What is Interoperability?
Interoperability means one thing: systems being able to work together.
Systems can be Humans, Tools, AI Agents, Browser, Apps, …etc.
Working together means being able to communicate, collaborate, schedule, delegate and use each other’s capabilities.
Human ↔ Human Interop
You don’t need to learn this one, since you are already doing it.
Imagine this:
You’ve got WhatsApp and Telegram.
Your friend has Telegram and Signal.
If the two of you want to chat, you have to use Telegram. That’s your interop protocol.
The “modality“? if you message, it’s “text“.
You can text, call, video chat, send images or files, so it can also be multimodal.
Both of you can reach out to each other, so it’s also bi-directional.
Ok, now let’s apply this to AI:
Human ↔ AI Interop
How do we, humans communicate with AI?
Well, we use chat, we use voice and video, we upload pictures and files. This is called Multimodal communication. It means that AI can understand information from multiple sources or "modalities" like text, images, audio, and video. Some models only understand a single modality, eg. chat, some understand more.
Multimodal Interop
What it is: The ability for humans to interact with AI through images, audio, video, gestures, touch, ...etc.
Examples: Drawing something and asking AI to interpret; speaking to an AI while pointing at something on screen.
Interfaces: Use media protocols (e.g. WebRTC, WebSocket, image/audio formats).
Emerging standards: MultiModal Interaction Framework by W3C and Web of Things (WoT), as well as some XR/AR interfaces.
Unidirectional vs Bi-directional Communication
Directionality is another important aspect:
LLM-s are unidirectional, ie. you can initiate a chat with an LLM, but an LLM cannot wake you up in the middle of the night because it feels like chatting.
Agents on the other hand are bi-directional. They can trigger alerts, respond to events, schedule workflows.
Note: modalities can also be uni- or bi-directional! Some models only respond in one mode, eg. Midjourney replies with images only. Others, like ChatGPT, can respond with both text and images. LLM-s like Sora can even generate video.
Ok, let’s keep going!
AI ↔ Tooling Interop
This layer is about how AI agents access, query, or act on other systems, such as APIs, file systems, databases, apps on your laptop, SaaS tools or even physical tools like cameras or sensors.
Here are the established AI ↔ Tooling interoperability mechanisms:
1. MCP – Model Context Protocol
Purpose: Provide a standardised, runtime interface for agents to interact with tools and services.
Approach: Each tool has an MCP server wrapper. (Although a single MCP can also reach out to multiple tools!)
uni-directional, read-write: Supports reading from and writing to tools, but only the agent can initiate communication.
Use case: “Fetch latest sales data from Snowflake” “Push a summary to Slack” “Run SQL query X.” “Open the browser and navigate to bbc.co.uk“
Example:
Agent sends a call to the MCP Server. The MCP server then takes this call and reaches out to a DB to get the response.
{
"method": "query",
"params": { "sql": "SELECT * FROM users LIMIT 10" }
}
2. RAG – Retrieval-Augmented Generation
Purpose: Give an LLM access to external information (usually documents) without fine-tuning.
Approach: Vector store (like FAISS, Weaviate) + retriever model + prompt injection.
Uni-directional and Read-Only: AI pulls content into its prompt; can’t modify source.
Use case: “Answer this question using our company wiki” “Search legal PDFs to write a brief.”
Example:
Agent sends query → top-k docs retrieved → passed into LLM prompt.
3. Function Calling & Tool Use (eg. OpenAI, Claude)
Purpose: Let AI choose from a set of defined tools using structured arguments.
Approach: Agent receives a tool manifest and decides when/how to use each one.
Dynamic: Function outputs are parsed, may trigger more steps.
Use case: “Calculate sum” “Query weather API” “Create event”
Methods:
OpenAI’s
function_calling
(now also known as "tools")Claude’s
tool_use
JSON schemaLangChain / CrewAI toolchains
Example:
function_call { "function": "getWeather", "args": { "city": "London" }}
What else is out there?
You can also connect tools with AI systems via Plugins, API Wrappers, Middlewares, custom runtimes, …etc.
Also, there are some “new” initiatives for standardised AI ↔ Tooling Interop:
(Given how fast AI evolves, by the time you read this, they will be “old“ initiatives!)
OpenActions: A W3C initiative to describe tool capabilities semantically (for both AI and humans).
AutoGraph (Google): Tool capability graphs for agents.
XLang (OpenAI): A speculative, high-level language for describing agent capabilities.
Service Discovery: Combining ANP or A2A with tool registries (e.g. “who can get stock prices?”).
Ok, let’s move onto how AI-s chat with each other!
AI ↔ AI Interop
This layer is about agent-to-agent communication. Generally, this includes discovery, requesting a task to be performed, then streaming the response back.
Why do we need this? Because not all agents are the same. One might be tuned for reasoning, another for image recognition. One might be local and fast, another might be remote and smart. One might have access to proprietary data, another might provide publicly available info.
These are the current standards:
1. A2A – Agent-to-Agent Communication
https://github.com/google/A2A
A2A is an open protocol for structured communication between AI agents, developed by Google and supported across multiple frameworks.
How it works:
Agent Cards: Each agent hosts a
.well-known/agent.json
file describing its capabilities, endpoints, and security settings. This acts like a business card for machines.Task Request/Response: Agent A sends a structured task request to Agent B. B processes it and returns a result.
Streaming & Multi-turn: If needed, the response can be streamed or evolve into a dialogue (e.g. asking for more input).
Example:
An internal assistant agent sends a dataset to a third-party charting agent to generate a visual. Once the chart is ready, it is streamed back.
2. ACP – Agent Communication Protocol
https://agentcommunicationprotocol.dev
ACP is IBM’s open standard with open governance for agent interoperability. It comes with BeeAI platform, a reference implementation of the protocol.
ACP has a flexible topology, agents can be composed, chained, or coordinated in any way you like.
How it works:
ACP defines a standard REST API for agents for:
Discoverability (registry-based, offline or open, via
.well-known/agent.yml
)Receiving and responding to tasks
Handling both synchronous and asynchronous jobs
Streaming results when needed
Example:
Three agents, a file-watcher, a summariser, and a notifier collaborate to monitor a folder, summarise new content, and send updates to the user. No internet required.
ANP – Agent Network Protocol
https://agent-network-protocol.com
ANP is a decentralized protocol for discovering, verifying, and securely communicating with agents across the internet.
How it works:
Decentralized Identifiers (DIDs): Each agent gets a cryptographically verifiable identity.
Semantic Metadata: Agents publish capabilities using semantic graphs (e.g., JSON-LD).
Discovery: Agents can discover each other without a central registry.
Secure Communication: Once discovered, agents negotiate communication protocols (e.g., via A2A).
Example:
A local agent wants to find a "TranslationAgent" for Japanese legal documents. It searches the network, discovers one with strong credentials, verifies its DID, and sends a task via A2A. Trust, discovery, and routing are all handled by ANP.
Desktop ↔ Desktop Interop
How do apps on the same desktop, or in the same browser communicate?
While these standards were not designed with LLMs in mind - it’s important to understand them as they complement the AI Interop Standards.
1. Browser Interop
Modern browsers implement several web standards for inter-app communication. The notable ones are:
2. Desktop Interop
Desktop Deployment and Integration platforms like Connectifi, Interop.io and Here.io(formerly OpenFin) also use well-defined interop standards.
The most notable one in Finance is:
FDC3 - Financial Desktop Connectivity and Collaboration Consortium
FDC3 is an open standard for applications on financial desktop to interoperate and exchange data with each other.
Classic and Academic Agent Protocols
Interoperability in multi-agent systems isn’t a new thing – researchers have worked on agent communication languages for decades. In the 1990s and early 2000s, several foundational protocols were created in academia which laid the groundwork for today’s developments. The most notable are KQML, FIPA-ACL, and ACTP.
While these were also not designed with modern LLMs in mind, they introduced key ideas about how autonomous agents could share knowledge and requests.
ACL – Agent Communication Language (FIPA-ACL)
KQML – Knowledge Query and Manipulation Language
ACTP – Agent Communication Transfer Protocol
They were heavily influenced by the Speech Act Theory, where each message has a performative like inform, request, confirm, query, etc., indicating the intent of the communication.
This concept of using predefined verbs to communicate Intent is common in Tech:
GraphQL:
query, mutation, subscription
REST:
GET, POST, PUT, PATCH, DELETE
DB-s:
CRUD: Create, Read, Update, and Delete
FDC3:
ViewQuote, ViewInstrument, StartCall, …
…and so on.
So much more to cover!
The world of interoperability is large and we have only just scratched the surface. Most of these standards are still emerging and the real complexity starts at the intersection of these protocols.
But that’s for the next post.
Update! Part 2 and Part 3 are out, links are below!
Thank you for reading!
Ok, this may have been a bit more than 5 minutes!
Thank you for reading it, I would love to hear your thoughts in the comments!
🩵