Bridging the Gap: Creating Custom Model Parsers for Strands Agents on Amazon SageMaker

Navigating Response Format Incompatibilities

Understanding Strands Custom Parsers

Implementation Overview

Step 1: Install ml-container-creator

Step 2: Generate Deployment Project

Step 3: Build and Deploy

Understanding the Response Format

Implementing a Custom Model Parser

Conclusion

Key Takeaways

About the Authors

Building Custom Model Parsers for Strands Agents with Amazon SageMaker

Organizations are increasingly harnessing the power of custom large language models (LLMs) hosted on Amazon SageMaker AI real-time endpoints. By leveraging preferred serving frameworks like SGLang, vLLM, or TorchServe, they’re optimizing costs and ensuring greater control over their deployments. However, this flexibility brings a notable challenge: response format compatibility with Strands agents.

The Challenge

While many custom serving frameworks return responses in OpenAI-compatible formats, Strands agents expect responses that align with the Bedrock Messages API. The misalignment causes integration issues despite both systems functioning independently. Although the Amazon Bedrock Mantle distributed inference engine has supported OpenAI messaging formats since December 2025, SageMaker’s flexibility means that diverse models can introduce unique prompt and response formats—many of which do not conform to standard APIs.

Bridging the Gap

The solution to this challenge lies in crafting custom model parsers. By extending the SageMakerAIModel, organizations can translate the format of their model server’s responses into the expected format for Strands agents. This approach allows them to utilize their chosen serving frameworks while maintaining compatibility with the Strands Agents SDK.

Implementation Overview

This blog will guide you through the process of building custom model parsers for Strands agents while deploying Llama 3.1 with SGLang on SageMaker using the awslabs/ml-container-creator tool.

Implementation Layers

Our implementation consists of three primary layers:

Model Deployment Layer: Serving Llama 3.1 with SGLang to return OpenAI-compatible responses.
Parser Layer: Creating a custom LlamaModelProvider class that extends SageMakerAIModel to handle Llama 3.1’s response format.
Agent Layer: Developing a Strands agent that utilizes the custom provider for conversational AI, effectively parsing the model’s responses.

Step 1: Install ml-container-creator

We’ll begin by installing the necessary tools to create the serving container for our model.

# Install Yeoman globally
npm install -g yo

# Clone and install ml-container-creator
git clone https://github.com/awslabs/ml-container-creator
cd ml-container-creator
npm install && npm link

# Verify installation
yo --generators # Should show ml-container-creator

Step 2: Generate Deployment Project

After the installation, we can generate a deployment project featuring our selected model and serving framework.

# Run the generator
yo ml-container-creator

# Configuration options:
# - Framework: transformers
# - Model Server: sglang
# - Model: meta-llama/Llama-3.1-8B-Instruct
# - Deploy Target: codebuild
# - Instance Type: ml.g6.12xlarge (GPU)
# - Region: us-east-1

This will create a structured project with necessary components, such as the Dockerfile, build configuration, and deployment scripts.

Step 3: Build and Deploy

Now, we can build and deploy the created container to SageMaker.

cd llama-31-deployment

# Build container with CodeBuild
./deploy/submit_build.sh

# Deploy to SageMaker
./deploy/deploy.sh arn:aws:iam::ACCOUNT:role/SageMakerExecutionRole

This process builds the Docker image, pushes it to Amazon Elastic Container Registry (ECR), and finally creates a real-time endpoint on SageMaker.

Step 4: Understanding the Response Format

Llama 3.1 returns responses in an OpenAI-compatible format, while Strands requires adherence to the Bedrock Messages API format. Here’s an example of Llama’s response:

{
  "id": "cmpl-abc123",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "I'm doing well, thank you for asking!"},
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 12,
    "total_tokens": 35
  }
}

With the difference in formats established, we need to implement a custom model parser to ensure smooth interaction.

Step 5: Implementing a Custom Model Parser

The following is a simplified version of how to create a stream method in the custom model parser:

def stream(self, messages: List[Dict[str, Any]], tool_specs: list, system_prompt: Optional[str], **kwargs):
    # Build payload messages
    payload_messages = []
    if system_prompt:
        payload_messages.append({"role": "system", "content": system_prompt})

    # Add user messages
    for msg in messages:
        payload_messages.append({"role": "user", "content": msg['content'][0]['text']})

    payload = {
        "messages": payload_messages,
        "max_tokens": kwargs.get('max_tokens', self.max_tokens),
        "temperature": kwargs.get('temperature', self.temperature),
        "stream": True,
    }

    try:
        response = self.runtime_client.invoke_endpoint_with_response_stream(
            EndpointName=self.endpoint_name,
            ContentType="application/json",
            Accept="application/json",
            Body=json.dumps(payload)
        )

        # Processing streaming response
        for event in response['Body']:
            chunk = event['PayloadPart']['Bytes'].decode('utf-8')
            # Extract and yield data...
    except Exception as e:
        yield {
            "type": "error",
            "error": {
                "message": f"Endpoint invocation failed: {str(e)}",
                "type": "EndpointInvocationError"
            }
        }

This stream method allows the Strands agent to properly parse and usually respond according to its expectations.

Step 6: Initialize and Test Your Agent

Once the custom parser is implemented, initializing a Strands agent becomes straightforward:

from strands.agent import Agent

# Initialize custom provider
provider = LlamaModelProvider(
  endpoint_name="llama-31-deployment-endpoint",
  region_name="us-east-1",
  max_tokens=1000,
  temperature=0.7
)

# Create the agent
agent = Agent(
  name="llama-assistant",
  model=provider,
  system_prompt="You are a helpful AI assistant powered by Llama 3.1, deployed on Amazon SageMaker."
)

# Test the agent
response = agent("What are the key benefits of deploying LLMs on SageMaker?")
print(response.content)

The complete implementation, including a Jupyter notebook and the associated GitHub repository, offers detailed explanations and a hands-on approach to effectively build your own custom model parser.

Conclusion

Creating custom model parsers for Strands agents enables seamless integration of various LLM deployments on SageMaker, regardless of their response formats. By extending SageMakerAIModel and implementing the necessary parsing logic, organizations can leverage their chosen serving frameworks without sacrificing compatibility.

Key Takeaways

The awslabs/ml-container-creator tool simplifies the deployment of BYOC models on SageMaker.
Custom parsers are essential for bridging the gap between diverse model server response formats and Strands’ expectations.
The stream() method is a pivotal integration point for custom providers.

By following this guide, you’re better equipped to deploy and integrate advanced dialogue systems, unlocking the true potential of LLMs in your applications.

About the Author

Dan Ferguson is a Sr. Solutions Architect at AWS, based in New York, USA. As a machine learning services expert, Dan supports customers in effectively integrating ML workflows to achieve sustainable solutions.

Exclusive Content:

Creating a Custom Model Provider for Strands Agents Using LLMs on SageMaker AI Endpoints

Bridging the Gap: Creating Custom Model Parsers for Strands Agents on Amazon SageMaker

Navigating Response Format Incompatibilities

Understanding Strands Custom Parsers

Implementation Overview

Step 1: Install ml-container-creator

Step 2: Generate Deployment Project

Step 3: Build and Deploy

Understanding the Response Format

Implementing a Custom Model Parser

Conclusion

Key Takeaways

About the Authors

Building Custom Model Parsers for Strands Agents with Amazon SageMaker

The Challenge

Bridging the Gap

Implementation Overview

Implementation Layers

Step 1: Install ml-container-creator

Step 2: Generate Deployment Project

Step 3: Build and Deploy

Step 4: Understanding the Response Format

Step 5: Implementing a Custom Model Parser

Step 6: Initialize and Test Your Agent

Conclusion

Key Takeaways

About the Author

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe