Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Metagenomi Creates Millions of Innovative Enzymes Economically with AWS Inferentia

Expanding Natural Enzyme Diversity Using Generative AI: Cost-Effective Approaches with Progen2 on AWS

Collaborators:

  • Audra Devoto, Owen Janson, Christopher Brown (Metagenomi)
  • Adam Perry (Tennex)

Overview of Generative AI in Enzyme Development

Implementing Progen2 on AWS Inferentia

Scaling Inference with AWS Batch

Cost Comparisons: Progen2 vs. Traditional Approaches

Scaling Generation to Millions of Proteins

Conclusion

About the Authors

Revolutionizing Enzyme Diversity with Generative AI: Insights from Metagenomi and Tennex

This post was written in collaboration with Audra Devoto, Owen Janson, and Christopher Brown of Metagenomi, along with Adam Perry of Tennex.

Introduction

In the quest for innovative biotechnology solutions, the need for diverse and efficient enzymes has never been more acute. At Metagenomi, we believe that augmenting the extensive natural diversity of high-value enzymes through generative AI, specifically protein language models (pLMs), holds the key to unlocking new therapeutic potentials. This approach allows us to generate orders of magnitude more predicted examples within targeted enzyme classes, providing a pathway to discover variants with enhanced stability, specificity, and efficacy.

The Power of Generative AI

Generative AI empowers researchers to expand the natural enzyme diversity available for therapeutic applications. By leveraging a comprehensive database of known enzymes, we utilize pLMs to create a plethora of enzyme variants, filtering them through multi-model workflows to predict their characteristics. This not only streamlines enzyme engineering but also opens up avenues for developing potentially curative therapeutics using CRISPR gene editing enzymes from our proprietary database, MGXdb.

However, the generation of these enzymes at scale can be financially burdensome, particularly as model complexity and variant numbers increase. To address these challenges, we’ve embarked on methods to significantly reduce costs while enhancing throughput in enzyme generation.

Cost-Effective High-Throughput Workflows with Progen2 on AWS Inferentia

Introducing Progen2

Our journey towards cost-effective high-throughput protein design involves the implementation of the Progen2 autoregressive transformer model on AWS Inferentia. This EC2 Inf2 instance type not only is more cost-efficient but also provides higher availability as Spot Instances. Through a trial and error approach, we initially ran Progen2 on NVIDIA L40S GPUs, which served as a foundation for this larger-scale initiative.

The migration to AWS Inferentia necessitated a tracing and bucketing technique that optimizes the performance of Progen2. Although this approach introduces some changes that could impact model accuracy, it has allowed us to significantly minimize inference times and costs associated with enzyme generation workflows.

Testing Model Accuracy

To ensure accuracy while implementing Progen2 on EC2 Inf2 instances, we compared generated output using this new model against the native implementation on NVIDIA GPUs. Our tests focused on generating 1,000 protein sequences for each of 10 prompts sourced from UniprotKB, allowing us to assess the perplexity and sequence integrity of the results.

The outcomes revealed that the tracing and bucketing implementation maintained similar sequence characteristics compared to the native approach, thereby assuring us of its reliability for further applications.

Scaling Inference with AWS Batch

To expand our protein generation capabilities, we turned to AWS Batch, which facilitates the efficient scaling of computational tasks. By running batch jobs on EC2 Inf2 Spot Instances, we achieved remarkable cost savings—up to 56%—compared to our previous implementations.

The architecture supports the orchestration of numerous batch jobs that simultaneously handle diverse computational tasks, such as downloading models and processing generated sequences. This robust environment allows us to efficiently conduct protein generation, track outcomes using well-structured pipelines, and easily manage the infrastructure.

Cost Comparisons and Savings

Our primary goal is to make protein sequence generation economical while maximizing the diversity of enzyme classes. Through our recent projects, we found that generating 10,000 sequences with Progen2 on EC2 Inf2 Spot Instances drastically reduced costs. The economic model aims to minimize expenses while maintaining high throughput, crucial for biotechnology start-ups striving for scalability.

Moreover, additional savings can be achieved by executing jobs at half precision, which has shown surprisingly equivalent results in sequence generation.

Pushing the Boundaries: Generating Millions of Proteins

To test our optimized workflows, we conducted extensive trials fine-tuning models on enzymes sourced from Metagenomi’s extensive database. Utilizing our AWS AI pipeline, we generated over 1 million enzyme sequences, experimenting with different parameters, such as sampling methods and generation temperatures.

The latter phases involved validating generated sequences with hybrid techniques incorporating both AI and traditional approaches, ensuring that our outputs were both innovative and valid.

Conclusion

In summary, we have outlined practical methods to significantly reduce costs associated with large-scale protein design projects by up to 56% using AWS EC2 Inf instances. This major step has allowed Metagenomi to explore the frontier of enzyme diversity and discover millions of novel enzymes across high-value protein classes.

With AWS Inferentia at our disposal, we aspire to foster innovation in protein generation, making advanced biotechnology applications more accessible and economically viable. To learn more about EC2 Inf instances and implement your own workflows, check out the AWS Neuron documentation.

About the Authors

Audra Devoto: Data Scientist with expertise in metagenomics and large genomics datasets on AWS.

Owen Janson: Bioinformatics Engineer focused on cloud infrastructure for genomic analysis.

Adam Perry: Co-Founder of Tennex, specializing in AWS cloud architecture for biotech startups.

Christopher Brown, PhD: Head of Discovery at Metagenomi, an expert in enzyme systems for gene editing.

Jamal Arif: Senior Solutions Architect at AWS, focusing on AI and cloud-native architectures.

Pavel Novichkov, PhD: Senior Solutions Architect at AWS, specializing in genomics and life sciences.

Explore the future of biotechnology with us as we leverage generative AI to pioneer the next generation of enzyme diversity!

Latest

Exploitation of ChatGPT via SSRF Vulnerability in Custom GPT Actions

Addressing SSRF Vulnerabilities: OpenAI's Patch and Essential Security Measures...

This Startup Is Transforming Touch Technology for VR, Robotics, and Beyond

Sensetics: Pioneering Programmable Matter to Digitize the Sense of...

Leveraging Artificial Intelligence in Education and Scientific Research

Unlocking the Future of Learning: An Overview of Humata...

European Commission Violates Its Own AI Guidelines by Utilizing ChatGPT in Public Documents

ICCL Files Complaint Against European Commission Over Generative AI...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

Collaboration Patterns for Multi-Agent Systems with Strands Agents and Amazon Nova

Harnessing the Power of Multi-Agent Generative AI: Patterns and Applications Overview of Multi-Agent Generative AI Systems Explore how collaborative agents enhance performance beyond single models. Unlocking the...

Enhancing Enterprise Search Using Cohere Embed 4 Multimodal Embeddings Model on...

Introducing Cohere Embed 4: Unleashing Multimodal Embeddings on Amazon Bedrock for Enterprise Search Dive into the Future of Business Document Analysis Enhanced Capabilities for Multimodal Document...

How Clario Leverages Generative AI on AWS to Automate Clinical Research...

Revolutionizing Clinical Outcome Assessments: Enhancing Data Quality and Efficiency with AI at Clario About Clario Business Challenge Solution Solution Architecture Benefits and Results Lessons Learned and Best Practices Next Steps and...