Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Leveraging Amazon OpenSearch ML Connector APIs

Enhancing Data Ingestion in Amazon OpenSearch with Machine Learning Connectors

Introduction

When augmenting data for Amazon OpenSearch, various challenges arise. This post explores how to address these through two key third-party ML connectors.

Solution Overview

Utilizing Amazon Comprehend with OpenSearch, this section outlines how to set up the infrastructure and necessary resources.

Prerequisites

Ensure you have the required AWS account access to implement the solution effectively.

Part 1: The Amazon Comprehend ML Connector

Setting Up OpenSearch for Amazon Comprehend

Learn how to link OpenSearch with Amazon Comprehend by configuring IAM roles correctly for API access.

Setting Up the ML Connector

Follow the steps to establish the connection between OpenSearch and Amazon Comprehend.

Registering the Amazon Comprehend API Connector

Understand how to register the API connector within OpenSearch.

Testing the Amazon Comprehend API

Validate the setup by invoking the API and inspecting the results.

Creating an Ingest Pipeline for Language Annotation

Learn how to create a pipeline that leverages Amazon Comprehend’s capabilities during data ingestion.

Part 2: The Amazon Bedrock ML Connector

Load Sentences from JSON Documents

Discover how to load and structure data from JSON files for processing.

Creating the OpenSearch ML Connector to Amazon Bedrock

Establish a connector to access Amazon Bedrock’s Titan embeddings.

Testing the Amazon Titan Embeddings Model

Verify the proper configuration by testing the embedding model.

Creating the Index Pipeline with Titan Embeddings

Learn about setting up a pipeline designed to work with the Titan embeddings.

Creating an Index

Details about how to create an index tailored for semantic searches across multiple languages.

Loading Dataframes into the Index

Step through the process of indexing documents and generating embeddings.

Performing Semantic k-NN Searches

Discover how to execute k-NN searches that utilize the indexed data.

Clean Up

Instructions for terminating resources to avoid unnecessary charges.

Benefits of Using the ML Connector

Explore the advantages of integrating ML connectors in OpenSearch for enhanced functionality and efficiency.

Conclusion

Recap the key benefits and encourage experimentation with the provided GitHub resources for further exploration.

About the Authors

Brief bios of the contributors, highlighting their expertise in AWS and data analytics.

Augmenting Data in Amazon OpenSearch with Third-Party ML Connectors

When working with Amazon OpenSearch, augmenting your data before ingesting it is a common requirement. This is especially true in scenarios where you want to enrich log files with geographic information based on IP addresses or identify the languages of customer comments. Typically, this enrichment process involves external processes that can complicate data pipelines and lead to failures. However, OpenSearch provides a robust solution through various third-party machine learning (ML) connectors, streamlining the process.

This blog post highlights two powerful ML connectors: Amazon Comprehend and Amazon Bedrock.

Using the Amazon Comprehend Connector for Language Detection

The first connector we’ll explore is the Amazon Comprehend connector. By utilizing this connector, you can easily invoke the LangDetect API to determine the languages of your ingested documents.

Solution Overview

To illustrate the language detection capability, we will use Amazon OpenSearch alongside Amazon Comprehend. We’ve provided the necessary source code, an Amazon SageMaker notebook, and an AWS CloudFormation template in the sample-opensearch-ml-rest-api GitHub repository.

(Insert architecture diagram)

Prerequisites

Before running the full demo, ensure you have an AWS account that grants access to the necessary services.

Part 1: Setting Up the Amazon Comprehend ML Connector

Enabling Access to Amazon Comprehend

To allow OpenSearch to make calls to Amazon Comprehend, you need an IAM role with permissions to invoke the DetectDominantLanguage API. The CloudFormation template creates this role, aptly named --SageMaker-OpenSearch-demo-role. Follow these steps to link the role to your OpenSearch cluster:

  1. Open the OpenSearch Dashboard and sign in.
  2. Navigate to Security > Roles.
  3. Search for ml_full_access and select the Mapped Users link.
  4. Add the ARN for the IAM role you created, allowing OpenSearch to interface with the necessary AWS resources.

Configuring the ML Connector

Next, set up the ML connector to connect OpenSearch to Amazon Comprehend. Collect an authorization token using your IAM role, then configure the connector as follows:

awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, 'es', session_token=credentials.token)

payload = {
    "name": "Comprehend lang identification",
    "description": "comprehend model",
    "version": 1,
    "protocol": "aws_sigv4",
    "credential": {
        "roleArn": sageMakerOpenSearchRoleArn
    },
    "parameters": {
        "region": "us-east-1",
        "service_name": "comprehend",
        "api_name": "DetectDominantLanguage",
        "api": "Comprehend_${parameters.api_version}.${parameters.api_name}",
        "response_filter": "$"
    },
    "actions": [...]
}

comprehend_connector_response = requests.post(url, auth=awsauth, json=payload)

Registering the Amazon Comprehend API Connector

Once the connector is set up, register it with OpenSearch:

payload = {
    "name": "comprehend lang id API",
    "function_name": "remote",
    "description": "API to detect the language of text",
    "connector_id": comprehend_connector
}

Testing the Amazon Comprehend API

After registration, test the API:

payload = {
    "parameters": {
        "Text": "你知道厕所在哪里吗"
    }
}

Expected output will show the language code as zh with a high score:

{
   "inference_results":[
      {
         "output":[
            {
               "name":"response",
               "dataAsMap":{
                  "response":{
                     "Languages":[
                        {
                           "LanguageCode":"zh",
                           "Score":1.0
                        }
                     ]
                  }
               }
            }
         ],
         "status_code":200
      }
   ]
}

Creating an Ingest Pipeline

Set up an OpenSearch ingest pipeline that utilizes the Amazon Comprehend API to annotate the language of your documents.

{
  "description": "ingest identify lang with the comprehend API",
  "processors":[
    {
      "ml_inference": {
        "model_id": comprehend_model_id,
        "input_map": [
            {
               "Text": "Text"
            }
        ],
        "output_map": [
            {  
               "detected_language": "response.Languages[0].LanguageCode",
               "language_score": "response.Languages[0].Score"
            }
        ]
      }
    }
  ]
}

Part 2: Utilizing the Amazon Bedrock ML Connector for Semantic Search

Next, we will demonstrate how to enhance OpenSearch capabilities using the Amazon Bedrock connector to access the Amazon Titan Text Embeddings v2 model.

Overview of Amazon Bedrock

Amazon Bedrock provides an easy interface to various powerful AI foundation models, including those from Amazon and industry leaders. This allows you to customize models for your specific needs while adhering to security and responsible AI practices.

Steps to Set Up the Amazon Bedrock Connector

  1. Creating the OpenSearch ML Connector: Similar to the Comprehend connector, define the parameters and setup for Amazon Bedrock.

  2. Creating an Index: Configure the index to accommodate sentence vectors and related data.

  3. Setting Up the Ingest Pipeline: Streamline data processing using OpenSearch ingestion capabilities.

Example of Indexing Real Data

You can use Pandas to load sentences from language-specific JSON files and prepare them for indexing:

import json
import pandas as pd

def load_sentences(file_name):
    sentences = []
    with open(file_name, 'r', encoding='utf-8') as file:
        for line in file:
            try:
                data = json.loads(line)
                sentences.append({
                    'sentence': data['sentence'],
                    'sentence_english': data['sentence_english']
                })
            except json.JSONDecodeError:
                continue
    return pd.DataFrame(sentences)

german_df = load_sentences('german.json')

Performing Semantic k-NN Searches

Once the data is indexed, you can run K-NN searches to find semantically similar sentences across multiple languages.

search_query = {
    "query": {
        "knn": {
            "sentence_vector": {
                "vector": query_vector,
                "k": 30
            }
        }
    }
}

This will allow you to retrieve relevant documents based on the context and language of your query.

Conclusion

By leveraging the OpenSearch ML connectors for both Amazon Comprehend and Amazon Bedrock, you can significantly enhance your data ingestion process, making it easier to integrate powerful ML capabilities directly into your data pipeline.

For more hands-on implementation details, be sure to visit the GitHub repository and explore the full demo.

About the Authors

John Trollinger – Principal Solutions Architect specializing in OpenSearch and Data Analytics.

Shwetha Radhakrishnan – Solutions Architect focused on Data Analytics & Machine Learning at AWS.


By utilizing OpenSearch’s ML connectors, not only can you simplify your architecture and reduce operational costs, but you also gain the flexibility to handle complex ML use cases efficiently. Happy analyzing!

Latest

Introducing the AWS Well-Architected Responsible AI Lens

Introducing the AWS Well-Architected Responsible AI Lens: A Guide...

ChatGPT: Not Useless, but Far From Flawless

The Unstoppable Rise of GenAI in Higher Education: A...

Delta Launches the D-Bot Robotics Platform at SPS 2025 to Enhance Flexible and Intelligent Automation

Delta Electronics Unveils Innovative D-Bot Robotics Platform at SPS...

Google Develops Generative AI for Video Soundtracks and Dialogue

Google DeepMind Unveils Video-to-Audio Technology to Enhance Generative AI...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

How Care Access Reduced Data Processing Costs by 86% and Increased...

Streamlining Medical Record Analysis: How Care Access Transformed Operations with Amazon Bedrock's Prompt Caching This heading encapsulates the essence of the post, emphasizing the focus...

Accelerating PLC Code Generation with Wipro PARI and Amazon Bedrock

Streamlining PLC Code Generation: The Wipro PARI and Amazon Bedrock Collaboration Revolutionizing Industrial Automation Code Development with AI Insights Unleashing the Power of Automation: A New...

Optimize AI Operations with the Multi-Provider Generative AI Gateway Architecture

Streamlining AI Management with the Multi-Provider Generative AI Gateway on AWS Introduction to the Generative AI Gateway Addressing the Challenge of Multi-Provider AI Infrastructure Reference Architecture for...