Enhancing QA Efficiency with Multi-Agent AI: The SAARAM Solution at Amazon
Revolutionizing Test Case Generation in the AMET Region
Tackling Traditional QA Challenges through Innovation
A Human-Centric Approach to AI in Quality Assurance
Building a Robust Multi-Agent Architecture
Iterative Workflow Development: From Manual to Automated
Key Features and Benefits of SAARAM
Measurable Business Impact and Improved Test Coverage
Lessons Learned from the Journey of Developing SAARAM
Future Directions and Next Steps for Automation
Conclusion: The Future of AI-Driven Quality Assurance
Transforming Quality Assurance: The Evolution of Test Case Generation at Amazon.ae
At Amazon.ae, we proudly serve around 10 million customers monthly across the MENA region, including the UAE, Saudi Arabia, Egypt, Türkiye, and South Africa. Our AMET Payments team is responsible for managing a complex web of transactions and payment experiences, publishing an average of five new features monthly. However, this ambitious pace came with challenges, particularly in the realm of quality assurance (QA) testing.
The Strain of Traditional QA Processes
Historically, generating test cases consumed a significant amount of resources—approximately one week of manual effort per project, requiring an entire full-time engineer’s dedication per year just for this task. This process involved meticulously analyzing business requirement documents (BRDs), design documents, and UI mocks, all to create comprehensive test cases that would meet the needs of our diverse customer base.
Given changing customer preferences and compliance across different regulatory environments, we needed a solution to streamline this labor-intensive process.
Enter SAARAM: Revolutionizing QA with AI
To address the inefficiencies of manual test case generation, we developed SAARAM (QA Lifecycle App). This multi-agent AI system leverages Amazon Bedrock’s Claude Sonnet by Anthropic and Strands Agents SDK to drastically cut test case generation time from a week to mere hours, all while improving the quality of our test coverage.
A Human-Centric Approach
Our initial attempts at integrating AI into our QA processes relied on traditional single-agent models that often produced generic test cases. Realizing that human cognitive processes are more nuanced, we shifted our focus to study how experienced testers perform their tasks.
By breaking down their cognitive workflows into specific phases—such as journey analysis and scenario identification—we were able to create a multi-agent system designed to mimic these human processes precisely.
The Architecture of SAARAM
Our new approach employs a sophisticated multi-agent workflow where each agent specializes in a different aspect of the testing process. For example, the Customer Segment Creator focuses on analyzing user segments, while the User Journey Mapper creates flow diagrams to visualize customer interactions. This method ensures comprehensive test coverage aligned with specific customer journeys and payment methods.
Overcoming Traditional AI Limitations
The challenges with single-agent AI models—such as context limits and high rates of hallucination—prompted us to devise a more complex, interconnected system. Our iterative approach allowed us to refine each agent, ensuring they worked in harmony to produce actionable test cases.
Results: A Transformation in QA
The results of implementing SAARAM have been remarkable:
- Reduced Test Case Generation Time: From a week to just hours
- Resource Optimization: Our QA effort decreased from one full-time engineer to 0.2 FTE
- Increased Coverage: Identifying 40% more edge cases than in manual processes
- Consistency: Achieving 100% adherence to test case standards
These improvements not only optimize our internal resources but also enhance customer payment experiences, leading to increased satisfaction and faster feature deployment across our services.
Lessons Learned: Insights for the Future
Our journey with SAARAM has imparted several valuable lessons:
- Understand Expert Cognitive Processes: AI systems designed by studying human expertise produce more effective outcomes.
- Implement Structured Outputs: This reduces hallucinations and enhances the reliability of AI-generated outputs.
- Design Multi-Agent Architectures: These systems facilitate specialized roles, resulting in deeper analysis and error mitigation.
Looking Ahead: Future Applications
The modular architecture developed around SAARAM not only adapts to payment QA but can be extended to various domains within Amazon, such as retail systems testing and customer service scenario generation. Future enhancements might also integrate knowledge bases with historical test cases, providing context that can further improve the quality of AI-generated outputs.
As we look forward, the combination of human expertise, thoughtful design, and advanced AI technology will continue to redefine our quality assurance processes and improve the experiences of millions of customers in the MENA region and beyond.
At Amazon, we’re excited to be at the forefront of transforming traditional QA methods into intelligent, automated processes. If you’re keen to explore similar innovations, don’t hesitate to look into our implementation resources and see how your teams might embark on their own journey toward optimized QA practices.