Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Self-Supervised Learning on CIFAR Images with Pytorch code: BYOL Tutorial

Unveiling the BYOL Method: A Step-by-Step Tutorial for Self-Supervised Learning

After presenting SimCLR, a contrastive self-supervised learning framework, I decided to demonstrate another infamous method, called BYOL. Bootstrap Your Own Latent (BYOL), is a new algorithm for self-supervised learning of image representations. BYOL has two main advantages:

It does not explicitly use negative samples. Instead, it directly minimizes the similarity of representations of the same image under a different augmented view (positive pair). Negative samples are images from the batch other than the positive pair.
BYOL is claimed to require smaller batch sizes, which makes it an attractive choice.
Below, you can examine the method. Unlike the original paper, I call the online network student and the target network teacher.

Overview of BYOL method. Source: BYOL paper

Online network aka student: compared to SimCLR, there is a second MLP, called predictor, which makes the whole method asymmetric. Asymmetric compared to what? Well, to the teacher model (target network).

Why is that important?

Because the teacher model is updated only through exponential moving average (EMA) from the student’s parameters. Ultimately, at each iteration, a tiny percentage (less than 1%) of the parameters of the student is passed to the teacher. Thus, gradients flow only through the student network.

Another key difference between Simclr and BYOL is the loss function.

Loss function
The predictor MLP is only applied to the student, making the architecture asymmetric. This is a key design choice to avoid mode collapse. Mode collapse here would be to output the same projection for all the inputs.

Finally, the authors defined the following mean squared error between the L2-normalized predictions and target projections:

The L2 loss can be implemented as follows. L2 normalization is applied beforehand.

Code is available on GitHub

Tracking down what’s happening in self-supervised pretraining: KNN accuracy
Nonetheless, the loss in self-supervised learning is not a reliable metric to track. What I found out to be the best way to track what’s happening while training, is to measure the ΚΝΝ accuracy.

The critical advantage of using KNN is that we don’t have to train a linear classifier on top each time, so it’s faster and completely unsupervised.

Note: Measuring KNN only applies to image classification, but you get the idea. For this purpose, I made a class to encapsulate the logic of KNN in our context:

The article then continues to discuss modifying ResNet by adding MLP projection heads and implementing the actual BYOL method. It also details the results of the method in terms of KNN accuracy and pretraining epochs.

In conclusion, the article provides a comprehensive overview of the BYOL method, its implementation, and the results obtained through pretraining on CIFAR-10. It showcases the power of self-supervised learning and the impressive results that can be achieved without any labeled data. The article also provides resources for further learning and exploration in the field of deep learning and self-supervised pretraining.

Latest

Expediting Genomic Variant Analysis Using AWS HealthOmics and Amazon Bedrock AgentCore

Transforming Genomic Analysis with AI: Bridging Data Complexity and...

ChatGPT Collaboration Propels Target into AI-Driven Retail — Retail Technology Innovation Hub

Transforming Retail: Target's Ambitious AI Integration and the Launch...

Alphabet’s Intrinsic and Foxconn Aim to Enhance Factory Automation with Advanced Robotics

Intrinsic and Foxconn Join Forces to Revolutionize Manufacturing with...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

MSD Investigates How Generative AI and AWS Services Can Enhance Deviation...

Transforming Deviation Management in Biopharmaceuticals: Harnessing Generative AI and Emerging Technologies at MSD Transforming Deviation Management in Biopharmaceutical Manufacturing with Generative AI Co-written by Hossein Salami...

Best Practices and Deployment Patterns for Claude Code Using Amazon Bedrock

Deploying Claude Code with Amazon Bedrock: A Comprehensive Guide for Enterprises Unlock the power of AI-driven coding assistance with this step-by-step guide to deploying Claude...

Bringing Tic-Tac-Toe to Life Using AWS AI Solutions

Exploring RoboTic-Tac-Toe: A Fusion of LLMs, Robotics, and AWS Technologies An Interactive Experience Solution Overview Hardware and Software Strands Agents in Action Supervisor Agent Move Agent Game Agent Powering Robot Navigation with...