Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

How PDI Developed a Robust Enterprise-Grade RAG System for AI Applications on AWS

Transforming Enterprise Knowledge Accessibility: The PDIQ Solution

Introduction to PDI Technologies

Challenges in Knowledge Accessibility

Overview of PDI Intelligence Query (PDIQ)

Solution Architecture

Process Flow

Crawlers

Handling Images

Document Processing

Outcomes and Next Steps

Conclusion

About the Authors

PDI Technologies: Revolutionizing Knowledge Access with PDI Intelligence Query

In today’s fast-paced business landscape, efficient data management and accessibility are crucial for success. PDI Technologies has long been recognized as a global leader in the convenience retail and petroleum wholesale industries, leveraging 40 years of experience to enhance profitability and operational efficiency for clients worldwide. Their innovative spirit has led to the creation of PDI Intelligence Query (PDIQ)—an AI-powered assistant designed to streamline knowledge access within the organization.

The Challenge: Fragmented Knowledge Management

Despite PDI’s vast experience and innovative solutions, a significant internal challenge persisted: the fragmentation of information scattered across multiple systems including websites, Confluence pages, SharePoint sites, and various other data sources. The company’s internal teams struggled to retrieve and utilize information effectively, an issue exacerbated by the increasing demand for AI-driven insights.

Recognizing the need for a comprehensive solution, PDI Technologies set out to develop PDIQ—a transformative tool that consolidates and enhances access to company knowledge through a user-friendly chat interface. PDIQ is engineered to overcome several challenges:

  • Content Extraction: Automatically pulling data from diverse platforms with various authentication requirements.
  • Model Flexibility: Facilitating the selection and application of the most suitable Large Language Models (LLMs) for varied processing needs.
  • Semantic Processing: Indexing content for contextual and meaningful retrieval.
  • Knowledge Refresh: Keeping information up-to-date through scheduled crawling.
  • Enterprise-Specific Context: Ensuring AI interactions are relevant to specific business scenarios.

Solution Architecture: How PDIQ Works

An Overview of PDIQ’s Architecture

The design of PDIQ is intricate yet efficient, involving a multitude of services on Amazon Web Services (AWS). Here’s a breakdown of its key components:

  • Scheduler: Managed by Amazon EventBridge, it executes the crawling schedule.
  • Crawlers: Powered by AWS Lambda, these collect data from various sources including web pages, Confluence, Azure DevOps, and SharePoint.
  • Data Storage: Information is stored in Amazon S3 and pertinent metadata is cataloged in Amazon DynamoDB.
  • Notification Services: Amazon SNS and Amazon SQS facilitate communication and queue management among the different services.
  • Embedding Generation: Amazon Bedrock offers access to foundational models for processing data, while Amazon Aurora stores the vector embeddings for retrieval.

Ensuring Security with a Zero-Trust Model

PDIQ embraces a zero-trust security model to safeguard sensitive information. There are distinct access controls for administrators and end-users:

  • Administrators manage crawlers and data through configured user groups and encrypted credentials.
  • End-users access knowledge bases based on validated group permissions, enhancing security without compromising on usability.

Step-by-Step Process Flow

Understanding how PDIQ operates highlights its innovative capabilities:

Data Collection via Crawlers

Crawlers, customizable by administrators, are the backbone of data collection. They support various configurations to target specific information sources, ensuring a comprehensive knowledge base.

Types of Crawlers:

  • Web Crawler: Uses Puppeteer to convert HTML to markdown, capturing full context and relationships.
  • Confluence Crawler: Extracts page content while preserving hierarchy and relationships.
  • Azure DevOps Crawler: Aggregates information about codebases and project documentation.
  • SharePoint Crawler: Utilizes Microsoft Graph API to pull documents and maintain version histories.

Image Handling and Document Processing

Images extracted from data sources are stored in Amazon S3, with metadata tags ensuring easy reference. Image captions are generated to enhance searchability and are linked back to the original documents.

The critical document processing phase focuses on generating vector embeddings through a series of steps—captioning images, breaking documents into chunks, summarizing content, and creating embeddings. This multi-step approach enriches the document context and optimizes retrieval effectiveness.

Achieving Business Outcomes

By integrating this sophisticated architecture, PDI Technologies has experienced numerous benefits:

  • Efficiency Boost: Support teams resolve inquiries faster, leading to quicker customer responses.
  • Increased Customer Satisfaction: Accurate and relevant information strengthens customer relationships.
  • Cost Reduction: Automation reduces operational overhead and allows staff to focus on complex issues.
  • Business Flexibility: The solution is adaptable for various business units without extensive redesigns.

Future Enhancements

As PDI continues to evolve PDIQ, plans are underway for additional enhancements, including:

  • New crawlers for additional data sources.
  • Multilingual support for global operations.
  • Advanced document understanding features.

Conclusion

PDI Technologies has set a benchmark in enterprise knowledge management by developing PDIQ, an AI-driven assistant that fosters efficient knowledge access and improves operational efficiencies. By leveraging AWS’s scalable architecture, PDIQ optimally balances performance, cost, and security. As the company enhances this innovative solution, it stands poised to redefine how enterprises globally manage and access their knowledge assets.


About the Authors

Samit Kumbhani is a Senior Solutions Architect at AWS, focusing on scalable cloud solutions. His diverse interests include cricket and traveling.

Jhorlin De Armas leads AI-driven platform design at PDI Technologies and specializes in serverless architectures.

David Mbonu is an AWS Sr. Solutions Architect with extensive experience in enterprise solutions and focuses on AI/ML innovations.

Latest

Schema-Compliant AI Responses: Structured Outputs in Amazon Bedrock

Transforming AI Development: Introducing Structured Outputs on Amazon Bedrock A...

The Top Five Space Heaters in the US for Instant Warmth in a Chilly Home | Winter Edition

Finding the Perfect Space Heater: A Comprehensive Guide to...

A Practical Guide to Using Amazon Nova Multimodal Embeddings

Harnessing the Power of Amazon Nova Multimodal Embeddings: A...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Schema-Compliant AI Responses: Structured Outputs in Amazon Bedrock

Transforming AI Development: Introducing Structured Outputs on Amazon Bedrock A Game-Changer for JSON Responses and Workflow Efficiency Say Goodbye to Traditional JSON Generation Challenges Unveiling Structured Outputs:...

Transforming Document Classification: How Associa Leverages the GenAI IDP Accelerator and...

Revolutionizing Document Management: How Associa Utilizes Generative AI for Efficient Document Classification Revolutionizing Document Management: How Associa is Utilizing Generative AI A guest post co-written by...

Boosting Your Marketing Creativity with Generative AI – Part 2: Creating...

Streamlining Marketing Campaigns with Generative AI: A Comprehensive Guide The Value of Historical Campaign Data Solution Overview Procedure Analyzing the Reference Image Dataset Generating Reference Image Embeddings Index Reference Images...