Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Efficiently Process Multi-Page Documents with Human Review Using Amazon Bedrock Data Automation and SageMaker AI

Enhancing Document Processing with Amazon Bedrock Data Automation and SageMaker AI

Streamlining Workflow Efficiency and Accuracy through Intelligent Automation


Understanding Confidence Scores

Solution Overview

Prerequisites

Deploy the Solution

Adding a Worker to the Private Workforce

Testing the Solution

Clean Up

Conclusion

About the Authors

Streamlining Document Processing with Amazon Bedrock Data Automation

In today’s fast-paced business landscape, organizations across various industries grapple with large volumes of multi-page documents that demand intelligent processing for accurate information extraction. Despite advancements in automation, human expertise remains indispensable in scenarios requiring verification of data accuracy and quality.

The Rise of Amazon Bedrock Data Automation

In March 2025, Amazon Web Services (AWS) unveiled Amazon Bedrock Data Automation, a groundbreaking tool designed to automate the extraction, transformation, and generation of insights from unstructured multimodal content—ranging from documents and images to videos and audio. This innovation streamlines document processing workflows, alleviating time-consuming tasks such as data preparation, model management, and orchestration through a unified multimodal inference API. The result? A solution that offers industry-leading accuracy at a lower cost compared to traditional methods.

Key Features of Amazon Bedrock Data Automation

Amazon Bedrock Data Automation simplifies complex tasks associated with document processing, such as:

  • Document Splitting: Efficiently segments large documents for targeted processing.
  • Classification and Extraction: Automates the identification and retrieval of relevant information.
  • Normalization and Validation: Ensures data consistency and accuracy.
  • Visual Grounding with Confidence Scores: Adds transparency and trustworthiness to the insights generated.

While these advanced capabilities significantly boost automation, certain scenarios still necessitate human intervention. This is where the integration with Amazon SageMaker AI elevates the process, combining automation with an essential human review loop.

The Importance of Human Review Loops

Incorporating human review in document processing workflows allows organizations to maintain high accuracy levels while maximizing operational efficiency. Human review loops offer the following benefits:

  • Validation of AI Predictions: Especially when confidence scores are low.
  • Effective Handling of Edge Cases: Ensuring all exceptions are appropriately managed.
  • Regulatory Compliance: Maintaining oversight for legal and ethical standards.
  • Ongoing Model Improvement: Creating feedback loops that enhance model performance over time.

By strategically implementing human review loops, organizations can allocate human resources to uncertain portions of documents while allowing automated systems to manage standard extractions, striking an optimal balance between efficiency and accuracy.

Understanding Confidence Scores

Confidence scores are a critical component in determining when to engage human reviewers. They represent the percentage of certainty that Amazon Bedrock Data Automation associates with its extraction accuracy.

The scoring can generally be interpreted as follows:

  • High Confidence (90-100%): Indicates a high certainty regarding the extraction.
  • Medium Confidence (70-89%): Suggests reasonable certainty but acknowledges potential errors.
  • Low Confidence (<70%): Indicates high uncertainty, warranting human verification.

Organizations are encouraged to test Amazon Bedrock Data Automation with their specific datasets to identify the confidence threshold that triggers a human review workflow.

Solution Architecture Overview

Deploying a serverless solution for processing multi-page documents with human review loops involves several key components:

  1. Document Upload: Files are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket.
  2. Triggering Workflow: An Amazon EventBridge rule detects new uploads and activates the AWS Step Functions workflow.
  3. Lambda Functions: These execute the document processing via Amazon Bedrock Data Automation and evaluate confidence scores.
  4. Human Review Loop: Documents flagged with low confidence scores are routed to SageMaker AI for manual review and correction.
  5. Final Output: The corrected data is stored in an S3 bucket, delivering ready-for-use insights for downstream systems.

Deploying the Solution

To implement this solution, users must have the AWS Cloud Development Kit (AWS CDK), Node.js, and Docker installed. Following the setup, the build script automates the creation of the necessary AWS resources, including S3 buckets, Amazon Bedrock Data Automation projects, and Lambda functions.

Testing and Validation

After deploying the system, upload a test document to the designated S3 bucket, and monitor the processing via the AWS Step Functions console or CloudWatch logs. Human reviewers will then be prompted to verify any low-confidence extractions, ensuring data quality before final utilization.

Conclusion

By leveraging the combined capabilities of Amazon Bedrock Data Automation and SageMaker AI, organizations can achieve remarkable automation efficiency alongside human-level accuracy in document processing. This solution is adaptable across various document types and customizable to meet specific business needs.

Explore this approach for your document processing challenges, and find the complete implementation in our GitHub repository. For more insights on AWS document intelligence solutions, don’t hesitate to check out the detailed documentation.

If you’ve found success with similar implementations or have insights to share, please leave your comments below or reach out directly with your questions. Happy building!

About the Authors

Joe Morotti, Prashanth Ramanathan, Andy Hall, and Vikas Shah, a dedicated team at AWS, collectively specialize in various technological spheres, including financial services, Generative AI, and document intelligence solutions, bringing decades of industry experience to the forefront of innovation.


Feel free to reach out for any clarifications or guidance in your document processing journey!

Latest

Integrating Responsible AI in Prioritizing Generative AI Projects

Prioritizing Generative AI Projects: Incorporating Responsible AI Practices Responsible AI...

Robots Shine at Canton Fair, Highlighting Innovation and Smart Technology

Innovations in Robotics Shine at the 138th Canton Fair:...

Clippy Makes a Comeback: Microsoft Revitalizes Iconic Assistant with AI Features in 2025 | AI News Update

Clippy's Comeback: Merging Nostalgia with Cutting-Edge AI in Microsoft's...

Is Generative AI Prompting Gartner to Reevaluate Its Research Subscription Model?

Analyst Downgrades and AI Disruption: A Closer Look at...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

Integrating Responsible AI in Prioritizing Generative AI Projects

Prioritizing Generative AI Projects: Incorporating Responsible AI Practices Responsible AI Overview Generative AI Prioritization Methodology Example Scenario: Comparing Generative AI Projects First Pass Prioritization Risk Assessment Second Pass Prioritization Conclusion About the...

Developing an Intelligent AI Cost Management System for Amazon Bedrock –...

Advanced Cost Management Strategies for Amazon Bedrock Overview of Proactive Cost Management Solutions Enhancing Traceability with Invocation-Level Tagging Improved API Input Structure Validation and Tagging Mechanisms Logging and Analysis...

Creating a Multi-Agent Voice Assistant with Amazon Nova Sonic and Amazon...

Harnessing Amazon Nova Sonic: Revolutionizing Voice Conversations with Multi-Agent Architecture Introduction to Amazon Nova Sonic Explore how Amazon Nova Sonic facilitates natural, human-like speech conversations for...