Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Mastering einsum for Deep Learning: Building a Transformer with Multi-Head Self-Attention from the Ground Up

Understanding Einsum Operations in Machine Learning with Pytorch and Self-Attention Mechanisms: A Step-by-Step Guide

If you are a machine learning researcher/engineer nowadays you should definitely be aware of einsum operations!

Personally speaking, I used to give up understanding git repos because of einsum operations. The reason: even though I felt pretty comfortable with tensor operations einsum was not in my arsenal.

Long story short, I decided I want to get familiar with the einsum notation. Since I am particularly interested in transformers and self-attention in computer vision, I have a huge playground.

In this article, I will extensively try to familiarize myself with einsum (in Pytorch), and in parallel, I will implement the famous self-attention layer, and finally a vanilla Transformer.

The code is totally educational! I haven’t trained any large self-attention model yet but I plan to. Truthfully speaking, I learned much more in the process than I initially expected.

If you want to delve into the theory first, feel free to check my articles on attention and transformer.

If not, let the game begin!

The code of this tutorial is available on GitHub. Show your support with a star!

**Why einsum?**
First, einsum notation is all about elegant and clean code. Many AI industry specialists and researchers use it consistently.

To convince you even more, let’s see an example:

You want to merge 2 dims of a 4D tensor, first and last.

“`
x = einops.rearrange(x, ‘b c h w -> (b w) c h’)
“`

Neat and clean!

Second reason: if you care about batched implementations of custom layers with multi-dimensional tensors, einsum should definitely be in your arsenal!

Third reason: translating code from PyTorch to TensorFlow or NumPy becomes trivial.

I am completely aware that it takes time to get used to it. That’s why I decided to implement some self-attention mechanisms.

**The einsum and einops notation basics**
If you know the basics of einsum and einops you may skip this section.

**Scaled dot product self-attention**
Implementation details and code explanation.

**Multi-Head Self-Attention**
Introduction to multiple heads in computations and implementation details.

**TransformerEncoder**
Building Transformer blocks and Transformer Encoder using the implemented modules.

**Conclusion**
It took me some time to solidify my understanding of self-attention and einsum, but it was a fun ride. In the next article, I will try to implement more advanced self-attention blocks for computer vision. Meanwhile, use our Github repository in your next project and let us know how it goes out.

Don’t forget to star our repository to show us your support!

If you feel like your PyTorch fundamentals need some extra practice, learn from the best ones out there. Use the code aisummer35 to get an exclusive 35% discount from your favorite AI blog 🙂

**Acknowledgments**
A huge shout out to Alex Rogozhnikov (@arogozhnikov) for the awesome einops lib.

Here is a list of other resources that significantly accelerated my learning on einsum operations, attention, or transformers.

Deep Learning in Production Book: Learn how to build, train, deploy, scale and maintain deep learning models. Understand ML infrastructure and MLOps using hands-on examples.

Overall, understanding einsum operations and utilizing them in your machine learning projects can greatly improve the efficiency and readability of your code. It may take some time to get used to, but the benefits are worth the effort. Happy coding!

Latest

Run IBM’s AI Chatbot Locally in Your Web Browser

IBM Unveils Granite 4.0 Nano AI Models: Localized Chatbots...

Accelerating PLC Code Generation with Wipro PARI and Amazon Bedrock

Streamlining PLC Code Generation: The Wipro PARI and Amazon...

8 Items I’m Getting Rid Of to Make Room for the Holidays

Decluttering Essentials: Items to Purge This Season 1. Winter Clothing Alyssa...

Deploy Geospatial Agents Using Foursquare Spatial H3 Hub and Amazon SageMaker AI

Transforming Geospatial Analysis: Deploying AI Agents for Rapid Spatial...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

Accelerating PLC Code Generation with Wipro PARI and Amazon Bedrock

Streamlining PLC Code Generation: The Wipro PARI and Amazon Bedrock Collaboration Revolutionizing Industrial Automation Code Development with AI Insights Unleashing the Power of Automation: A New...

Optimize AI Operations with the Multi-Provider Generative AI Gateway Architecture

Streamlining AI Management with the Multi-Provider Generative AI Gateway on AWS Introduction to the Generative AI Gateway Addressing the Challenge of Multi-Provider AI Infrastructure Reference Architecture for...

MSD Investigates How Generative AI and AWS Services Can Enhance Deviation...

Transforming Deviation Management in Biopharmaceuticals: Harnessing Generative AI and Emerging Technologies at MSD Transforming Deviation Management in Biopharmaceutical Manufacturing with Generative AI Co-written by Hossein Salami...