How to build Multi-Agents for AI use cases

AI Agents

6 min read

Understand how to build multiple agents that can purchase and interact with other agents in an autonomous fashion

In previous posts we spoke about the challenges that AI Builders face when commercializing their AIs, and the benefits of using Nevermined Protocol when building AI Agents. Via the Nevermined App and the Payments Libraries, AI builders can augment their agents, allowing them to:

Register their agents and payment plans and enabling them to be discovered
Facilitate the monetization of the AI agent in Fiat in Crypto
Enable integration of their agent APIs, or integrate with other agents easily
Account for the usage of their AI agents

Because all of these features can be used programmatically integrating the Payment Libraries, this opens the possibility of building AI Agents using these libraries to take advantage of all these functionalities automatically. This means that building AI agent-to-agent scenarios becomes simpler than ever.

Taking into account the above, with the pieces we have at hand, builders can deliver agents that can:

Register themselves, any file created as a result of a computation, or even other AI agents that a “parent” agent spins up. On top of that agents can automatically register payment plans to get paid.
Agents can discover other agents automatically. Using Nevermined’s Search API, AI Agents can find other agents registered within the Nevermined Ecosystem. This opens the possibility of decomposing complex tasks and delegating parts of them to more specialized agents.
Because AI Agents control an API Key that abstracts a wallet, they can purchase payment plans and get access to the APIs of other agents or files attached to them.
Any AI Agent implementing the Nevermined Query Protocol exposes a generic HTTP API that can be integrated automatically by another agent facilitating a direct communication between both agents.
All the access control, accounting and throttling of requests can be done automatically via Nevermined infrastructure.

Just give an AI Agent the Payment Libraries and let them do some magic!

Let’s show it with an example

Imagine you want to implement a simple AI Agent that gets a Youtube URL and returns the summarized audio of the video. This is useful because instead of watching a few hours of video you can just get the highlights summarized in MP3 format. Taking this use case into account, our new AI agent would need to accomplish the following steps for a given Youtube URL:

Extract the text transcription for a Youtube video
Summarize the text transcription
Convert the summary to a speech in MP3 format
Upload the generated audio file somewhere (for example IPFS)

For implementing this flow, let’s say a user starts with an agent that is able to do steps 3 & 4. We will call this the Audio AI Agent. However, we don’t know how to do 1 & 2 (generate the text summary from the video). Because we know there are some existing AI Agents that do exactly that, instead of implementing that ourselves we can delegate that to an external agent, augmenting the capabilities we have with others we find somewhere else. We will call the agent that does tasks 1 & 2 the Youtube AI Agent.

Let’s see some code. First our Audio AI Agent, when receiving an AI task, needs to get the summary via the Youtube AI Agent. To engage the Youtube agent, the Audio agent needs to check the balance of credits required for accessing the Youtube agent. If it doesn’t have enough (or any), the Audio agent can purchase the Youtube agent’s credits from it’s Payment Plan:

logger.info(`Transcribing video to text with external agent …`)

// The external agent is associated to the Plan: PLAN_YOUTUBE_DID

const balanceResult = await payments.getPlanBalance(PLAN_YOUTUBE_DID)

logger.info(`Youtube Plan balance: ${balanceResult.balance}`)

if (balanceResult.balance < 1) {

logger.warn(‘Insufficient balance to query the Youtube AI Agent’)

logger.info(‘Ordering more credits…’)

await payments.orderPlan(PLAN_YOUTUBE_DID)

}

Now that the Audio agent has paid for the plan, it now has enough credits to query the external Youtube agent. That allows the Audio agent to generate the specific JWT access token to query the Youtube agent using the payments.getServiceAccessConfig method.

And because the Youtube agent is using the Nevermined Query Protocol we don’t need to figure out how to integrate it, just using the payments.query.createTask is enough to send a task:

const accessConfig = await payments.getServiceAccessConfig(AGENT_YOUTUBE_DID)

logger.info(`Querying Youtube Agent DID: ${AGENT_YOUTUBE_DID}`)

const aiTask = {

query: ‘https://www.youtube.com/watch?v=yubzJw0uiE5’,

name: “transcribe”,

“additional_params”: [],

“artifacts”: []

}

const taskResult = await payments.query.createTask(AGENT_YOUTUBE_DID, aiTask, accessConfig)

if (taskResult.status !== 201) {

logger.error(`Failed to create task: ${taskResult.data}`)

return

}

Now that the Audio agent has created the Youtube summarization task, it waits until the Youtube agent finishes doing the job before querying the result:

const taskId = taskResult.data.task.task_id

logger.info(`Checking Youtube task status for task ID: ${taskId}`)

const taskWithSteps = await payments.query.getTaskWithSteps(did, taskId, accessConfig)

if (taskWithSteps.status !== 200) {

logger.error(`Failed to get Youtube task: ${fullTaskResult.data}`)

return

}

youtubeTask = taskWithSteps.data.task

logger.info(`Task status: ${JSON.stringify(youtubeTask.task_status)}`)

if (youtubeTask.task_status === AgentExecutionStatus.Completed) {

logger.info(`Youtube Task completed with cost: ${youtubeTask.cost}`)

logger.info(` Output: ${youtubeTask.output}`)

} else if (youtubeTask.task_status === AgentExecutionStatus.Failed) {

logger.error(`Task failed with message ${youtubeTask.output}`)

}

If the task was completed successfully, the Audio agent will receive the summary of the video in the youtubeTask.output variable and the total cost in credits in the youtubeTask.cost variable. When the Audio agent is out of the Youtube agent’s credits, the Audio agent will need to top-up again.

Now that the Audio agent has the summary, it can continue converting the summary to audio and completing the task:

logger.info(`Converting text to audio …`)

const fileSpeech = await openaiTools.text2speech(step.input_query)

logger.info(`Speech file generated: ${fileSpeech}`)

const cid = await uploadSpeechFileToIPFS(fileSpeech)

logger.info(`Speech file uploaded to IPFS: ${cid}`)

In our case, because our Audio agent also implements the Nevermined Query Protocol, we update the status of the final step and report back the credits to charge the user. In this example we added some extra cost to subsidize the cost that the Audio agent had to pay for the Youtube video summary:

await payments.query.updateStep(step.did, {

…step,

step_status: AgentExecutionStatus.Completed,

is_last: true,

output: ‘success’,

output_artifacts: [cid],

cost: 50 + (youtubeTask.cost * 2)

})

This will resolve the latest step and the task created by the user. Our user will get the IPFS CID with the MP3 file and be charged for the successful task.

You can find the code of this example and some others in the Nevermined Docs Website.

Want to know more?

In our above example we saw how to enable an agent to agent communication allowing to orchestrate the interaction between two AI Agents. We did this based on the Nevermined Protocol and using the Nevermined Payments Libraries. If you have any further questions, feel free to explore our Documentation site or contact us via Discord.

Kudos points: if you enjoyed this article, let us know by clapping or sharing it with someone who should read this too. 👏👏