Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Training Llama 3.3 Swallow: A Japanese Sovereign LLM Using Amazon SageMaker HyperPod

Unveiling Llama 3.3 Swallow: Advancements in Japanese Language Processing with a 70-Billion-Parameter Model

A Technical Report Overview by Kazuki Fujii, Lead Developer

Unveiling the Llama 3.3 Swallow: Advancing Japanese Language AI

In recent advancements in artificial intelligence, the successful development of Llama 3.3 Swallow, led by Kazuki Fujii, marks a significant milestone in Japanese language processing. This blog post summarizes a technical report detailing the project spearheaded by the Institute of Science Tokyo, which employed Amazon SageMaker HyperPod to train a 70-billion-parameter large language model (LLM). This model notably enhances Japanese language capabilities, outperforming various industry leaders, including GPT-4o-mini.

Overview of Llama 3.3 Swallow

The Llama 3.3 Swallow builds upon Meta’s Llama 3.3 architecture, offering specialized enhancements tailored for Japanese. Developed through collaboration between the Okazaki Laboratory and the Yokota Laboratory at the School of Computing, Institute of Science Tokyo, alongside the National Institute of Advanced Industrial Science and Technology (AIST), this model is now accessible in two variants on Hugging Face.

Training Methodology

Training the base model involved continual pre-training from Meta’s Llama 3.3 70B Instruct format, utilizing the Swallow Corpus Version 2, a curated Japanese web corpus derived from Common Crawl. Employing the Swallow Education Classifier, the team ensured high-quality training data was extracted, totaling approximately 314 billion tokens.

For the instruction-tuned variant, fine-tuning focused solely on Japanese dialogue and code generation tasks. By deliberately excluding English dialogue data, the team maintained a firm focus on enhancing Japanese capabilities.

Performance and Benchmarks

In rigorous evaluations, the base model demonstrated remarkable understanding and generation of Japanese text, consistently outperforming leading models such as OpenAI’s GPT-4o and GPT-3.5. Moreover, the instruction-tuned model excelled particularly in Japanese MT-Bench assessments.

Training Infrastructure Architecture

The training infrastructure for Llama 3.3 Swallow comprised Amazon SageMaker HyperPod, with an emphasis on performance, scalability, and observability. Using 32 ml.p5.48xlarge Amazon EC2 instances (H100, 80 GB, 256 GPUs), the team facilitated continual pre-training over a period of 16 days and 6 hours.

High-Performance Networking: The deployment leveraged NCCL over Elastic Fabric Adapter (EFA) for rapid inter-GPU communication, essential for distributed training.

Storage Architecture: A hierarchical storage approach was implemented, combining Amazon S3 for long-term storage with FSx for Lustre as a high-performance parallel file system, ensuring efficient data access crucial for training tasks.

Software Stack and Optimizations

Built on SageMaker HyperPod DLAMI, the software stack integrated CUDA drivers, NCCL, and AWS-OFI-NCCL for optimal performance. Using Megatron-LM as the primary framework, the project capitalized on advanced features for scaling LLM training, incorporating sophisticated model parallelism techniques.

Advanced Parallelism and Communication

The 4D parallelism strategy maximized GPU utilization through data, tensor, pipeline, and sequence parallelism. Additionally, overlapping communication across these domains significantly reduced blocking time, enhancing overall efficiency.

Checkpointing and Experiment Management

An optimized checkpointing strategy facilitated faster save times and minimized training interruptions. With a newly developed memory prediction tool, the team effectively monitored GPU memory usage and optimized configuration settings.

Conclusion

The Llama 3.3 Swallow project showcases innovative methods in large language model training and cloud infrastructure, pushing the boundaries of AI capabilities in the Japanese language. The insights gained from this endeavor offer valuable lessons for future research, development, and applications in various domains.

As the team continues to refine training pipelines and enhance Japanese language capabilities, they plan to open source optimization tools developed during the project, fostering collaboration and innovation within the AI community.


Resources and References

For further reading and access to the model, visit Hugging Face.

About the Authors

The development team includes Kazuki Fujii, a master’s student at the Tokyo Institute of Technology, and senior specialists from Amazon Web Services, each contributing their unique expertise in machine learning and high-performance computing.


This post serves not only as an overview of the technical report but also as a call to action for researchers and engineers enthusiastic about advancing machine learning in Japanese language applications.

Latest

Create an AI-Driven Proactive Cost Management System for Amazon Bedrock – Part 1

Proactively Managing Costs in Amazon Bedrock: Implementing a Cost...

I Tested ChatGPT’s Atlas Browser as a Competitor to Google

OpenAI's ChatGPT Atlas: A New Challenger to Traditional Browsers? OpenAI's...

Pictory AI: Rapid Text-to-Video Transformation for Content Creators | AI News Update

Revolutionizing Content Creation: The Rise of Pictory AI in...

Guillermo Del Toro Criticizes Generative AI

Guillermo del Toro Raises Alarm on AI's Impact on...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

Create an AI-Driven Proactive Cost Management System for Amazon Bedrock –...

Proactively Managing Costs in Amazon Bedrock: Implementing a Cost Sentry Solution Introduction to Cost Management Challenges As organizations embrace generative AI powered by Amazon Bedrock, they...

Designing Responsible AI for Healthcare and Life Sciences

Designing Responsible Generative AI Applications in Healthcare: A Comprehensive Guide Transforming Patient Care Through Generative AI The Importance of System-Level Policies Integrating Responsible AI Considerations Conceptual Architecture for...

Integrating Responsible AI in Prioritizing Generative AI Projects

Prioritizing Generative AI Projects: Incorporating Responsible AI Practices Responsible AI Overview Generative AI Prioritization Methodology Example Scenario: Comparing Generative AI Projects First Pass Prioritization Risk Assessment Second Pass Prioritization Conclusion About the...