Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

NVIDIA Nemotron 3 Nano Omni Model Now Accessible on Amazon SageMaker JumpStart

Announcing the Day Zero Availability of NVIDIA Nemotron 3 Nano Omni on Amazon SageMaker JumpStart: Transforming Multimodal Intelligence for Enterprises

Overview of NVIDIA Nemotron 3 Nano Omni

Key Features and Architecture

Unlocking Enterprise Use Cases

Step-by-Step Guide to Deployment on SageMaker JumpStart

Inference Capabilities and Example Use Cases

Recommended Parameters for Optimal Performance

Conclusion: Elevate Your Applications with Nemotron 3 Nano Omni

About the Authors

Announcing the Day Zero Availability of NVIDIA Nemotron 3 Nano Omni on Amazon SageMaker JumpStart

We’re thrilled to announce an exciting advancement in multimodal AI capabilities: the day zero availability of the NVIDIA Nemotron 3 Nano Omni on Amazon SageMaker JumpStart! This state-of-the-art model integrates the understanding of video, audio, images, and text into a single, efficient architecture, empowering enterprise customers to craft intelligent applications that can "see," "hear," and "reason" across different modalities in one inference pass.

In this post, we’ll guide you through the architecture and key features of Nemotron 3 Nano Omni, delve into the enterprise applications it enables, and provide insights on how to deploy and run inference using Amazon SageMaker JumpStart.

Overview of NVIDIA Nemotron 3 Nano Omni

The NVIDIA Nemotron 3 Nano Omni is an open, multimodal large language model boasting 30 billion total parameters with 3 billion active parameters. Built on a Mamba2 Transformer Hybrid Mixture of Experts (MoE) architecture, it integrates three main components:

  • Nemotron 3 Nano LLM: Serves as the backbone for language processing.
  • CRADIO v4-H: Functions as the vision encoder for understanding images and videos.
  • Parakeet: Acts as the speech encoder for audio transcription and comprehension.

This unified architecture allows the model to process various input types—video, audio, images, and text—while generating text output. With a significant 131K token context length, support for chain-of-thought reasoning, tool calling, and JSON output along with word-level timestamps for transcription tasks, the model operates at FP8 precision on SageMaker JumpStart, ensuring an optimal mix of accuracy and efficiency for enterprise workloads. It is also accessible under the NVIDIA Open Model Agreement for commercial use.

Transforming Enterprise Agent Workflows

Modern enterprise workflows are inherently multimodal. Agents must interpret screens, documents, audio, video, and text often within the same reasoning loop. Traditionally, organizations have relied on separate models for vision, speech, and language, leading to increased latency, complicated orchestration, fragmented context, and escalating costs.

Nemotron 3 Nano Omni addresses these challenges by serving as the multimodal perception and context sub-agent in a larger architectural ecosystem. It effectively equips agents with the ability to read screens, interpret documents, transcribe speech, and analyze video, all while maintaining a unified multimodal context across reasoning loops.

Input Capabilities

The model supports an array of input types:

Input Type Supported Formats Constraints
Video mp4 Up to 2 minutes, 256 frames max
Audio wav, mp3 Up to 1 hour, 8kHz+ sampling rate
Image JPEG, PNG (RGB) Standard resolution
Text String Up to 131K context

Enterprise Use Cases

The multimodal prowess of Nemotron 3 Nano Omni empowers various enterprise applications:

  1. Computer Use Agents: Powering the perception loop for agents navigating user interfaces, it simplifies operations across incident management dashboards, browser automation, and more.
  2. Document Intelligence: Capable of interpreting complex documents, charts, and mixed media, it enhances workflows for compliance involving contracts and financial documents.
  3. Audio and Video Understanding: In customer service or research workflows, it maintains continuous context across audio and video, enriching applications from meeting analysis to verification tasks.

Getting Started with SageMaker JumpStart

Deploying Nemotron 3 Nano Omni with Amazon SageMaker JumpStart is straightforward. You can do this in just a few steps:

Deploy via SageMaker Studio

  1. Open Amazon SageMaker Studio.
  2. In the left navigation panel, select JumpStart.
  3. Search for Nemotron 3 Nano Omni.
  4. Choose the model card and click Deploy.
  5. Configure instance settings and click Deploy to create the endpoint.

Deploy via SageMaker Python SDK

For programmatic deployment, use the SageMaker Python SDK as follows:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(
    model_id="huggingface-vlm-nvidia-nemotron3-nano-omni-30ba3b-reasoning-fp8",
    role="",
)

predictor = model.deploy(
    accept_eula=True,
)

Running Inference

Once deployed, you can send multimodal requests to the endpoint, starting with image understanding:

import base64

def encode_image(image_path):
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

image_b64 = encode_image("example.jpg")

payload = {
  "messages": [{
    "role": "user",
    "content": [{"type": "text", "text": "Describe this image in detail."},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}}],
  }],
  "max_tokens": 1024,
  "temperature": 0.2,
}

response = predictor.predict(payload)
print(response["choices"][0]["message"]["content"])

Recommended Inference Parameters

Depending on your use case, certain hyperparameter settings may enhance your results:

Mode Temperature top_p max_tokens Use Case
Thinking 0.6 0.95 20480 Complex reasoning
Instruct 0.2 N/A 1024 General tasks, ASR

For reasoning and complex tasks, enable Thinking mode; for straightforward requests, use Instruct mode for quicker responses.

Conclusion

The NVIDIA Nemotron 3 Nano Omni transitions multimodal AI to another level within Amazon SageMaker JumpStart. By unifying video, audio, image, and text understanding into a single efficient model, it streamlines the development of agentic applications while delivering cutting-edge accuracy and efficiency.

Start leveraging Nemotron 3 Nano Omni for your next intelligent application by deploying it today from Amazon SageMaker JumpStart. For more details, visit the NVIDIA Nemotron model page on Hugging Face.


About the Authors

Dan Ferguson is a Solutions Architect at AWS based in New York, focusing on supporting customers in efficient and sustainable ML integration.

Malav Shastri is a Software Development Engineer at AWS, positioned on the Amazon SageMaker JumpStart and Amazon Bedrock teams, enhancing access to state-of-the-art models.

Vivek Gangasani leads Solutions Architecture for SageMaker Inference, focusing on deploying and optimizing generative AI models and workflows.

By harnessing the power of Nemotron 3 Nano Omni, enterprises can finally achieve the comprehensive, multimodal intelligence their agents deserve. Don’t miss this opportunity to revolutionize your AI capabilities!

Latest

MoneySuperMarket Enhances ChatGPT Insurance App Features

MoneySuperMarket Expands ChatGPT App with New Financial Services Features MoneySuperMarket...

AtkinsRéalis and Oxford Collaborate to Develop Robots for Nuclear Energy Facilities

Transforming Nuclear Energy: The Rise of Autonomous Robotics Revolutionizing Safety...

Transforming Copywriting and Customer Segmentation with Generative AI

Global Info Research Unveils Comprehensive Insights into the AI...

Generative AI Increases Cybersecurity Risks in Machine Learning

Research Warns of Cyber Risks and Bias in Businesses...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Streamline Repetitive Tasks Using Amazon Quick Flows

Streamlining Workflows: Automate Your Tasks with Amazon Quick Flows Transform Time-Consuming Processes into Efficient AI-Powered Automations Introduction to Amazon Quick Flows Why Automate Common Tasks? Getting Started: Prerequisites...

Optimizing Company Memory in Amazon Bedrock Using Amazon Neptune and Mem0

Enhancing AI Chatbot Performance with Contextual Memory: A Collaboration Between Trend Micro and AWS Overview of the Innovative Solution Memory Creation and Update Process Memory Retrieval Mechanism Response-Memory...

Leveraging Multimodal Biological Foundation Models in Therapeutics and Patient Care

Unlocking the Power of Multimodal Biological Foundation Models in Healthcare and Life Sciences Harnessing AI for Comprehensive Decision-Making The Advantages of Multimodal Biological Foundation Models Real-World Applications...