Enhancing QA Efficiency with Multi-Agent AI: The SAARAM Solution at Amazon

Revolutionizing Test Case Generation in the AMET Region

Tackling Traditional QA Challenges through Innovation

A Human-Centric Approach to AI in Quality Assurance

Building a Robust Multi-Agent Architecture

Iterative Workflow Development: From Manual to Automated

Key Features and Benefits of SAARAM

Measurable Business Impact and Improved Test Coverage

Lessons Learned from the Journey of Developing SAARAM

Future Directions and Next Steps for Automation

Conclusion: The Future of AI-Driven Quality Assurance

Transforming Quality Assurance: The Evolution of Test Case Generation at Amazon.ae

At Amazon.ae, we proudly serve around 10 million customers monthly across the MENA region, including the UAE, Saudi Arabia, Egypt, Türkiye, and South Africa. Our AMET Payments team is responsible for managing a complex web of transactions and payment experiences, publishing an average of five new features monthly. However, this ambitious pace came with challenges, particularly in the realm of quality assurance (QA) testing.

The Strain of Traditional QA Processes

Historically, generating test cases consumed a significant amount of resources—approximately one week of manual effort per project, requiring an entire full-time engineer’s dedication per year just for this task. This process involved meticulously analyzing business requirement documents (BRDs), design documents, and UI mocks, all to create comprehensive test cases that would meet the needs of our diverse customer base.

Given changing customer preferences and compliance across different regulatory environments, we needed a solution to streamline this labor-intensive process.

Enter SAARAM: Revolutionizing QA with AI

To address the inefficiencies of manual test case generation, we developed SAARAM (QA Lifecycle App). This multi-agent AI system leverages Amazon Bedrock’s Claude Sonnet by Anthropic and Strands Agents SDK to drastically cut test case generation time from a week to mere hours, all while improving the quality of our test coverage.

A Human-Centric Approach

Our initial attempts at integrating AI into our QA processes relied on traditional single-agent models that often produced generic test cases. Realizing that human cognitive processes are more nuanced, we shifted our focus to study how experienced testers perform their tasks.

By breaking down their cognitive workflows into specific phases—such as journey analysis and scenario identification—we were able to create a multi-agent system designed to mimic these human processes precisely.

The Architecture of SAARAM

Our new approach employs a sophisticated multi-agent workflow where each agent specializes in a different aspect of the testing process. For example, the Customer Segment Creator focuses on analyzing user segments, while the User Journey Mapper creates flow diagrams to visualize customer interactions. This method ensures comprehensive test coverage aligned with specific customer journeys and payment methods.

Overcoming Traditional AI Limitations

The challenges with single-agent AI models—such as context limits and high rates of hallucination—prompted us to devise a more complex, interconnected system. Our iterative approach allowed us to refine each agent, ensuring they worked in harmony to produce actionable test cases.

Results: A Transformation in QA

The results of implementing SAARAM have been remarkable:

Reduced Test Case Generation Time: From a week to just hours
Resource Optimization: Our QA effort decreased from one full-time engineer to 0.2 FTE
Increased Coverage: Identifying 40% more edge cases than in manual processes
Consistency: Achieving 100% adherence to test case standards

These improvements not only optimize our internal resources but also enhance customer payment experiences, leading to increased satisfaction and faster feature deployment across our services.

Lessons Learned: Insights for the Future

Our journey with SAARAM has imparted several valuable lessons:

Understand Expert Cognitive Processes: AI systems designed by studying human expertise produce more effective outcomes.
Implement Structured Outputs: This reduces hallucinations and enhances the reliability of AI-generated outputs.
Design Multi-Agent Architectures: These systems facilitate specialized roles, resulting in deeper analysis and error mitigation.

Looking Ahead: Future Applications

The modular architecture developed around SAARAM not only adapts to payment QA but can be extended to various domains within Amazon, such as retail systems testing and customer service scenario generation. Future enhancements might also integrate knowledge bases with historical test cases, providing context that can further improve the quality of AI-generated outputs.

As we look forward, the combination of human expertise, thoughtful design, and advanced AI technology will continue to redefine our quality assurance processes and improve the experiences of millions of customers in the MENA region and beyond.

At Amazon, we’re excited to be at the forefront of transforming traditional QA methods into intelligent, automated processes. If you’re keen to explore similar innovations, don’t hesitate to look into our implementation resources and see how your teams might embark on their own journey toward optimized QA practices.

Exclusive Content:

Accelerating Test Case Generation in the Amazon AMET Payments Team Using Strands Agents