Announcing the Day Zero Availability of NVIDIA Nemotron 3 Nano Omni on Amazon SageMaker JumpStart: Transforming Multimodal Intelligence for Enterprises

Overview of NVIDIA Nemotron 3 Nano Omni

Key Features and Architecture

Unlocking Enterprise Use Cases

Step-by-Step Guide to Deployment on SageMaker JumpStart

Inference Capabilities and Example Use Cases

Recommended Parameters for Optimal Performance

Conclusion: Elevate Your Applications with Nemotron 3 Nano Omni

About the Authors

Announcing the Day Zero Availability of NVIDIA Nemotron 3 Nano Omni on Amazon SageMaker JumpStart

We’re thrilled to announce an exciting advancement in multimodal AI capabilities: the day zero availability of the NVIDIA Nemotron 3 Nano Omni on Amazon SageMaker JumpStart! This state-of-the-art model integrates the understanding of video, audio, images, and text into a single, efficient architecture, empowering enterprise customers to craft intelligent applications that can "see," "hear," and "reason" across different modalities in one inference pass.

In this post, we’ll guide you through the architecture and key features of Nemotron 3 Nano Omni, delve into the enterprise applications it enables, and provide insights on how to deploy and run inference using Amazon SageMaker JumpStart.

Overview of NVIDIA Nemotron 3 Nano Omni

The NVIDIA Nemotron 3 Nano Omni is an open, multimodal large language model boasting 30 billion total parameters with 3 billion active parameters. Built on a Mamba2 Transformer Hybrid Mixture of Experts (MoE) architecture, it integrates three main components:

Nemotron 3 Nano LLM: Serves as the backbone for language processing.
CRADIO v4-H: Functions as the vision encoder for understanding images and videos.
Parakeet: Acts as the speech encoder for audio transcription and comprehension.

This unified architecture allows the model to process various input types—video, audio, images, and text—while generating text output. With a significant 131K token context length, support for chain-of-thought reasoning, tool calling, and JSON output along with word-level timestamps for transcription tasks, the model operates at FP8 precision on SageMaker JumpStart, ensuring an optimal mix of accuracy and efficiency for enterprise workloads. It is also accessible under the NVIDIA Open Model Agreement for commercial use.

Transforming Enterprise Agent Workflows

Modern enterprise workflows are inherently multimodal. Agents must interpret screens, documents, audio, video, and text often within the same reasoning loop. Traditionally, organizations have relied on separate models for vision, speech, and language, leading to increased latency, complicated orchestration, fragmented context, and escalating costs.

Nemotron 3 Nano Omni addresses these challenges by serving as the multimodal perception and context sub-agent in a larger architectural ecosystem. It effectively equips agents with the ability to read screens, interpret documents, transcribe speech, and analyze video, all while maintaining a unified multimodal context across reasoning loops.

Input Capabilities

The model supports an array of input types:

Input Type	Supported Formats	Constraints
Video	mp4	Up to 2 minutes, 256 frames max
Audio	wav, mp3	Up to 1 hour, 8kHz+ sampling rate
Image	JPEG, PNG (RGB)	Standard resolution
Text	String	Up to 131K context

Enterprise Use Cases

The multimodal prowess of Nemotron 3 Nano Omni empowers various enterprise applications:

Computer Use Agents: Powering the perception loop for agents navigating user interfaces, it simplifies operations across incident management dashboards, browser automation, and more.
Document Intelligence: Capable of interpreting complex documents, charts, and mixed media, it enhances workflows for compliance involving contracts and financial documents.
Audio and Video Understanding: In customer service or research workflows, it maintains continuous context across audio and video, enriching applications from meeting analysis to verification tasks.

Getting Started with SageMaker JumpStart

Deploying Nemotron 3 Nano Omni with Amazon SageMaker JumpStart is straightforward. You can do this in just a few steps:

Deploy via SageMaker Studio

Open Amazon SageMaker Studio.
In the left navigation panel, select JumpStart.
Search for Nemotron 3 Nano Omni.
Choose the model card and click Deploy.
Configure instance settings and click Deploy to create the endpoint.

Deploy via SageMaker Python SDK

For programmatic deployment, use the SageMaker Python SDK as follows:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(
    model_id="huggingface-vlm-nvidia-nemotron3-nano-omni-30ba3b-reasoning-fp8",
    role="",
)

predictor = model.deploy(
    accept_eula=True,
)

Running Inference

Once deployed, you can send multimodal requests to the endpoint, starting with image understanding:

import base64

def encode_image(image_path):
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

image_b64 = encode_image("example.jpg")

payload = {
  "messages": [{
    "role": "user",
    "content": [{"type": "text", "text": "Describe this image in detail."},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}}],
  }],
  "max_tokens": 1024,
  "temperature": 0.2,
}

response = predictor.predict(payload)
print(response["choices"][0]["message"]["content"])

Recommended Inference Parameters

Depending on your use case, certain hyperparameter settings may enhance your results:

Mode	Temperature	top_p	max_tokens	Use Case
Thinking	0.6	0.95	20480	Complex reasoning
Instruct	0.2	N/A	1024	General tasks, ASR

For reasoning and complex tasks, enable Thinking mode; for straightforward requests, use Instruct mode for quicker responses.

Conclusion

The NVIDIA Nemotron 3 Nano Omni transitions multimodal AI to another level within Amazon SageMaker JumpStart. By unifying video, audio, image, and text understanding into a single efficient model, it streamlines the development of agentic applications while delivering cutting-edge accuracy and efficiency.

Start leveraging Nemotron 3 Nano Omni for your next intelligent application by deploying it today from Amazon SageMaker JumpStart. For more details, visit the NVIDIA Nemotron model page on Hugging Face.

About the Authors

Dan Ferguson is a Solutions Architect at AWS based in New York, focusing on supporting customers in efficient and sustainable ML integration.

Malav Shastri is a Software Development Engineer at AWS, positioned on the Amazon SageMaker JumpStart and Amazon Bedrock teams, enhancing access to state-of-the-art models.

Vivek Gangasani leads Solutions Architecture for SageMaker Inference, focusing on deploying and optimizing generative AI models and workflows.

By harnessing the power of Nemotron 3 Nano Omni, enterprises can finally achieve the comprehensive, multimodal intelligence their agents deserve. Don’t miss this opportunity to revolutionize your AI capabilities!

Exclusive Content:

NVIDIA Nemotron 3 Nano Omni Model Now Accessible on Amazon SageMaker JumpStart

Announcing the Day Zero Availability of NVIDIA Nemotron 3 Nano Omni on Amazon SageMaker JumpStart: Transforming Multimodal Intelligence for Enterprises

Overview of NVIDIA Nemotron 3 Nano Omni

Key Features and Architecture

Unlocking Enterprise Use Cases

Step-by-Step Guide to Deployment on SageMaker JumpStart

Inference Capabilities and Example Use Cases

Recommended Parameters for Optimal Performance

Conclusion: Elevate Your Applications with Nemotron 3 Nano Omni

About the Authors

Announcing the Day Zero Availability of NVIDIA Nemotron 3 Nano Omni on Amazon SageMaker JumpStart

Overview of NVIDIA Nemotron 3 Nano Omni

Transforming Enterprise Agent Workflows

Input Capabilities

Enterprise Use Cases

Getting Started with SageMaker JumpStart

Deploy via SageMaker Studio

Deploy via SageMaker Python SDK

Running Inference

Recommended Inference Parameters

Conclusion

About the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe