Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

Create Persistent MCP Servers on Amazon Bedrock AgentCore with Strands Agents Integration

Transforming AI Agents: Enabling Seamless Long-Running Task Management

Introduction to AI’s Evolution in Task Handling

Common Approaches to Handling Long-Running Tasks

  • Context Messaging
  • Async Task Management

Context Messaging: Keeping Connections Alive

  • Implementation Overview
  • When to Use Context Messaging
  • Limitations of Context Messaging

Async Task Management: The Fire-and-Forget Model

  • Implementation Overview
  • Limitations of Async Task Management

Moving Toward a Robust Solution

  • Integrating External Persistence

Implementation with Amazon Bedrock AgentCore and Strands Agents

  • MCP Server Implementation
  • Strands Agents Integration

Conclusion: Building Reliable AI Agents for Complex Tasks

About the Authors

Transforming AI Agents into Autonomous Workers with Persistent State Management

Introduction

Artificial Intelligence (AI) agents are rapidly evolving from simple chat interfaces into sophisticated autonomous workers capable of handling complex, time-intensive tasks across various sectors. As organizations increasingly deploy AI agents to train machine learning (ML) models, process large datasets, and run intricate simulations, a new standard for agent-server integration—the Model Context Protocol (MCP)—has emerged. However, a significant challenge persists: many of these operations can take minutes or hours to complete, far exceeding conventional session timeframes.

Imagine your AI agent initiating a multi-hour data processing job, only for you to close your laptop, returning days later to find the completed results. This seamless interaction requires innovative solutions for managing task states across sessions. By utilizing Amazon Bedrock AgentCore and Strands Agents for persistent state management, organizations can enable reliable task execution in production environments. But how can we achieve this?

Achieving Persistent Task Execution: An Overview

This blog post outlines a comprehensive approach to ensure seamless, cross-session task execution. We will:

  1. Introduce a context message strategy that maintains ongoing communication between servers and clients during extended operations.
  2. Develop an asynchronous task management framework for AI agents to initiate long-running processes without blocking other operations.
  3. Showcase how to combine these strategies with Amazon Bedrock AgentCore and Strands Agents for robust, production-ready AI agents.

Common Approaches for Handling Long-Running Tasks

When designing MCP servers for long-running tasks, a fundamental architectural decision arises: should the server maintain an active connection with real-time updates, or should it decouple task execution from the initial request? This decision leads to two distinct approaches: context messaging and asynchronous task management.

Using Context Messaging

The context messaging approach keeps continuous communication between the MCP server and client during task execution. It utilizes MCP’s built-in context object to send periodic updates to the client. This method excels for tasks expected to complete within 10–15 minutes, providing several advantages:

  • Straightforward implementation
  • No additional polling logic needed
  • Minimal overhead
  • Simple client integration

Using Asynchronous Task Management

In contrast, the asynchronous task management approach separates task initiation from execution and result retrieval. After initiating the MCP tool, it sends a task ID while executing the task in the background. This model is ideal for enterprise scenarios where tasks might run for hours, and users are allowed the flexibility to disconnect and reconnect. The benefits include:

  • True fire-and-forget operation
  • Support for long-running tasks (hours)
  • Data loss prevention via persistent storage
  • Resilience against network interruptions

Implementing Context Messaging

Context Messaging serves as a solution for moderately long operations, maintaining active connections. For instance, if a data scientist uses an MCP server to train a complex ML model that takes 10–15 minutes, a proper strategy must be in place to ensure the connection doesn’t drop due to time limits. Here’s the workflow:

from mcp.server.fastmcp import Context, FastMCP
import asyncio

mcp = FastMCP(host="0.0.0.0", stateless_http=True)

@mcp.tool()
async def model_training(model_name: str, epochs: int, ctx: Context) -> str:
    for i in range(epochs):
        progress = (i + 1) / epochs
        await asyncio.sleep(5)
        await ctx.report_progress(progress=progress, total=1.0,
                                   message=f"Step {i + 1}/{epochs}")

    return f"{model_name} training completed."

if __name__ == "__main__":
    mcp.run(transport="streamable-http")

In this code sample, the Context object enables progress updates during model training, effectively keeping the connection alive.

Limitations of Context Messaging

While context messaging has its benefits, it comes with limitations, including:

  • Continuous connection required
  • Resource consumption for open connections
  • Vulnerability to network instability

For truly long-running operations, consider transitioning to asynchronous task management.

Implementing Asynchronous Task Management

The asynchronous task management pattern enables a "fire-and-forget" model, where tasks are initiated, processed in the background, and results can be checked later. The workflow includes:

  1. Task initiation: Client requests a task and receives a task ID.
  2. Background processing: Server executes the task without requiring an active client connection.
  3. Status checking: Clients can reconnect and check progress using the task ID.
  4. Result retrieval: Results can be fetched whenever needed.
from mcp.server.fastmcp import FastMCP
import asyncio
import uuid

mcp = FastMCP(host="0.0.0.0", stateless_http=True)
tasks = {}

async def _execute_model_training(task_id: str, model_name: str, epochs: int):
    for i in range(epochs):
        tasks[task_id]["progress"] = (i + 1) / epochs
        await asyncio.sleep(2)
    tasks[task_id]["status"] = "completed"
    tasks[task_id]["result"] = f"{model_name} training completed."

@mcp.tool()
def model_training(model_name: str, epochs: int = 10) -> str:
    task_id = str(uuid.uuid4())
    tasks[task_id] = {"status": "started", "progress": 0.0}
    asyncio.create_task(_execute_model_training(task_id, model_name, epochs))
    return f"Model Training initiated with task ID: {task_id}."

@mcp.tool()
def check_task_status(task_id: str):
    return tasks.get(task_id, {"error": "Task not found"})

if __name__ == "__main__":
    mcp.run(transport="streamable-http")

The tasks are stored in-memory, allowing clients to check task status independently.

Limitations and Moving Toward Solutions

However, in-memory task management is fragile. If the server restarts, all task information is lost. Therefore, integrating with external persistent storage—like Amazon Bedrock AgentCore Memory—ensures data is not lost due to server issues.

Amazon Bedrock AgentCore and Strands Agents Implementation

Persistent State Management

By integrating Amazon Bedrock AgentCore with Strands Agents, we can manage persistent states effectively. Here’s how the MCP server uses AgentCore Memory:

from bedrock_agentcore.memory import MemoryClient

async def _execute_model_training(model_name: str, epochs: int, memory_id: str):
    for i in range(epochs):
        await asyncio.sleep(2)
    response = agentcore_memory_client.create_event(memory_id=memory_id, ...)

This approach allows users to retrieve task results even after a disconnection by storing task outcomes directly to AgentCore Memory.

Workflow with Strands Agents

Integrating with Strands Agents enhances conversational context management. Users provide session identifiers for each interaction, facilitating a continuous experience even after disconnections.

Conclusion

In this post, we explored practical approaches for AI agents to manage long-running tasks effectively. By leveraging context messaging and asynchronous task management combined with persistent state management, organizations can build reliable AI agents capable of performing complex tasks without losing data or frustrating users.

We encourage you to try implementing these strategies in your own AI projects. Think about the enhancements they could bring to your AI assistants and how they could transform user experiences.

To further enhance your understanding, check out the official Amazon Bedrock AgentCore documentation and explore sample notations.


About the Authors

Haochen Xie, Flora Wang, Yuan Tian, and Hari Prasanna Das are experts at the AWS Generative AI Innovation Center, focusing on making generative AI solutions robust and user-friendly across various industries.

Latest

9 Flawed Attempts at the ChatGPT Caricature Trend

The Latest Viral Trend: ChatGPT Caricatures Take Over Social...

Empowering Humanoid Robots: Portescap’s Role in Process and Control Today

The Rise of Humanoid Robotics: Powering the Future with...

Boll & Branch Implements AI for Streamlining Team Workflows.

Navigating AI Strategies: How Boll & Branch is Integrating...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Mastering Throttling and Service Availability in Amazon Bedrock: An In-Depth Guide

Mastering Error Handling in Generative AI Applications with Amazon Bedrock Understanding and Mitigating 429 ThrottlingExceptions and 503 ServiceUnavailableExceptions In this comprehensive guide, we explore effective strategies...

Iberdrola Improves IT Operations with Amazon Bedrock AgentCore

Transforming IT Operations: How Iberdrola Leverages AI and AWS to Enhance Change and Incident Management This heading encapsulates the focus on Iberdrola's innovative use of...

Enhancing LLM Fine-Tuning with Hugging Face and Amazon SageMaker AI

Heading Suggestions Transforming Enterprise AI: The Shift to Specialized Large Language Models Optimizing AI Solutions: From General Purpose to Tailored Language Models Unlocking the Power of Fine-Tuned...