NVIDIA TAO

Looking for a faster, easier way to create highly accurate, customized, and enterprise-ready AI models to power your vision AI applications? The open-source TAO for AI training and optimization delivers everything you need, putting the power of the world’s best Vision Transformers (ViTs) in the hands of every developer and service provider. You can now create state-of-the-art computer vision models and deploy them on any device—GPUs, CPUs, and MCUs—whether at the edge or in the cloud.



Download TAO     Get Started

What Is the NVIDIA TAO?

Eliminate the need for mountains of data and an army of data scientists as you create AI/machine learning models and speed up the development process with transfer learning. This powerful technique instantly transfers learned features from an existing neural network model to a new customized one.

The open-source NVIDIA TAO, built on TensorFlow and PyTorch, uses the power of transfer learning while simultaneously simplifying the model training process and optimizing the model for inference throughput on practically any platform. The result is an ultra-streamlined workflow. Take one of the pretrained models, adapt them to your own real or synthetic data, then optimize for inference throughput. All without needing AI expertise or large training datasets.



What is TAO Toolkit and how does it fit into AI model development workflow?

Key Benefits

TAO toolkit lets you train models with Jupyter notebooks easily.

Train Models Efficiently

Use TAO’s AutoML capability to eliminate the need for manual tuning and get to your solutions faster.

TAO toolkit helps you build highly accurate AI models for your use-case.

Build Highly Accurate AI

Use SOTA Vision Transformer and NVIDIA pretrained models to create highly accurate and custom AI models for your use case.

TAO toolkit allows you to optimize the model for inference.

Optimize for Inference

Go beyond customization and achieve up to 4X performance by optimizing the model for inference.

TAO toolkit helps you deploy optimized models with ease.

Deploy on Any Device

Deploy optimized models on GPUs, CPUs, MCUs, and more.

Faster Time-to-Market With NVIDIA NIM

NVIDIA NIM™ is a set of inference microservices that include industry standard APIs, domain-specific code, optimized inference engines, and enterprise runtime. These foundation models can be used as-is for inference using NVIDIA NIMs or fine-tuned for custom vision AI tasks.


  • NV-CLIP is a commercial vision foundation model based on popular CLIP architecture, trained using self-supervised learning on almost 1B image-text pairs. This model features both a text and vision encoder for prompt-based inference.
  • NV-DINOv2 is a commercial vision foundation model trained using self-supervised learning on almost 1B images. This model can be quickly fine-tuned for various vision AI tasks with only a handful of training data.
  • GroundingDINO is a commercial vision model with both a text and vision encoder to enable zero-shot detection and segmentation.

NVIDIA TAO is available as a part of NVIDIA AI Enterprise, an enterprise-ready AI software platform to speed time to value while mitigating the potential risks of open-source software.


Vision AI API Catalog
TAO Toolkit is part of NVIDIA AI Enterprise to help deploy AI anywhere.

Why It Matters to Your AI Development

Bring Customized Generative AI to Your Application

Generative AI is a transformative force that will change many industries. Driving this are foundation models that've been trained on a large corpus of text, image, sensor, and other data. Now with TAO, you can fine-tune and customize these foundation models and create domain-specific generative AI applications. TAO enables fine-tuning of multi-modal models, such as NV-DINOv2, NV-CLIP, Grounding-DINO, Mask-GroundingDINO, FoundationPose , and more.

TAO also enables integrations with several cloud and third-party MLOPs services to provide developers and enterprises with an optimized AI workflow.


Read the Blog
Models trained with the NVIDIA TAO can be deployed on any platform.

Auto-Labeling Using Text Prompts

New AI-assisted annotation capabilities give you a faster and less expensive way to auto-label object detection and segmentation masks. Developers can detect and segment any object without needing to train or fine-tune by just using text prompts and descriptors such as "red car" or "box on the conveyor belt." Developers can also fine-tune the model to improve accuracy on objects.


Watch the Video

Create Custom Multi-Modal Fusion Models

In many industries, AI systems rely on various sensors to perceive and interact with their environment. Each sensor type, such as cameras, LiDAR, or radar, provides unique information but also has inherent limitations.

Developers can now create custom multi-modal fusion models in TAO for detecting objects and creating 3D bounding boxes combining image (RGB) and LiDAR point cloud data. TAO offers the BEVFusion model that integrates data from multiple sensors, such as LiDAR and camera, into a unified bird's-eye view (BEV) representation.


Learn More
Models trained with the NVIDIA TAOcan be deployed on any platform.

Deploy Models on Any Platform

NVIDIA TAO can help power AI across billions of devices. It supports model export in ONNX, an open format for better interoperability. This makes it possible to deploy a model trained with the NVIDIA TAO on any computing platform.


Learn More About the Integration with Mediatek Learn More About the Integration With STMicroelectronics Learn More About the Integration with ARM Ethos-U NPUs Learn More About the Integration With Edge Impulse Learn More About the Integration With Nota LaunchX Learn More About Integration With Arm CPU and NPU

Inference Performance

Unlock peak inference performance with NVIDIA pretrained models across platforms from the edge with NVIDIA Jetson™ solutions to the cloud featuring NVIDIA Ampere architecture GPUs. For more details on batch size and other models, check the detailed performance datasheet.



Model Arch
Resolution
Accuracy
Jetson Orin Nano
Jetson Orin Nx
Jetson Orin 64GB
A2
T4
L4
L40
H100
PeopleSemSegFormer
SegFormer
512x512
91% mIoU
6.6
9.7
24.2
23
40 
83
210
454
Retail Object Detection
DINO - FAN-B
960x544
97%
2.3
3.4
8.1
8.8
15.4 
34
89
167
DINO COCO
DINO - FAN-S
960x544
72% mAP50
3.1
4.4
11.2
11.7
20
44
120
213
GC-ViT ImageNet
GC-ViT-Tiny
224x224
84% Top1 Accuracy
75
110
293
336
517
1266
3118
6381
OCRNet
ResNet50 - Bi-LSTM
32x100
93%
935
1373
3876
2094
3649 
8036
18970
55720
OCDNet
DCN-ResNet18
640x640
81% Hmean
31
45
120
93
155
333
940
1468
Optical Inspection
Siamese CNN
2x512x128
100%/<1% FP
399
482
1538
1391
2314
2821
10390
24110

Customer Stories

OneCup AI Customer Story

OneCup AI

OneCup AI’s computer vision system tracks and classifies animal activity using NVIDIA pretrained models and TAO, significantly reducing their development time from months to weeks.


Learn More
KoiReader Customer Story

KoiReader

KoiReader developed an AI-powered machine vision solution using NVIDIA developer tools including TAO to help PepsiCo achieve precision and efficiency in dynamic distribution environments.


Read the Blog
Trifork Customer Story

Trifork

Trifork jumpstarted their AI model development with NVIDIA pretrained models and TAO Toolkit to develop their AI-based baggage tracking solution for airports.


Learn More
arruga ai
booz allen
Kion group
Inex tech
Lexmark ventures
nota ai
One cup AI
Rocketboots
SmartCow
two i
appen
cvedia
hasty
lexset
lightly
Rendered AI
Sky Engine
Yuva AI
roboflow

General FAQ

Transfer learning is the process of transferring learned features from one application to another. It’s a commonly used training technique where a model trained on one task is re-trained for use on a different task. You can apply transfer learning on vision, speech, and language-understanding models.
Yes. With the standard ONNX output, a TAO model can be deployed on any device that supports ONNX-RT or has a compiler to convert ONNX to hardware runtime.
Yes. Entire TAO code is now available as open source on GitHub.
Yes. For exact licensing terms, refer to model EULA. However, unencrypted models are only available with NVIDIA AI Enterprise licenses.
TAO supports 100+ permutations of NVIDIA-optimized model architectures and backbones. These include state-of-the-art vision foundation models, like NV-CLIP, NV-DINOv2, and GroudningDINO, along with Vision Transformers (ViT) and efficient CNNs.

You can find the full matrix of supported model architectures here.
Under the hood, TAO uses TensorFlow and PyTorch frameworks, but those are completely abstracted away from the user. Users operate TAO through documented spec files, and no prior knowledge of deep learning framework is required.
NVIDIA AI Enterprise is an end-to-end, secure, cloud-native AI software platform optimized to accelerate enterprises to the leading edge of AI. Benefits of using TAO with NVIDIA AI Enterprise:
  • Access to exclusive commercial foundation models for vision AI
  • Validation and integration for NVIDIA AI open-source software
  • Access to AI solution workflows to speed time to production
  • Certifications to deploy AI everywhere
  • Enterprise-grade support, security, manageability, and API stability to mitigate potential risks of open source software
You can download the sample Jupyter notebooks from the NGC catalog.
  • Vision models can be deployed through DeepStream or NVIDIA Triton™.
  • You can also deploy the models in ONNX format on any platform.

Refer to the documentation section for deployment details.

Yes, TAO can be deployed at the infrastructure level using VMs from the cloud or can be deployed in various cloud services like Amazon EKS, Azure AKS, Google GKE, Google Vertex AI, Azure Machine Learning, or Google Colab. Please refer to the documentation to learn more about running the TAO on AWS, Azure, or GCP.
You can only train with TAO on an x86 system. You can, however, deploy the optimized models on a Jetson solution.
NVIDIA Metropolis is an application framework, set of developer tools, and partner ecosystem that brings visual data and AI together to improve operational efficiency and safety across a broad range of industries. Learn More here.

Resources

New Blog—TAO 5.5

The NVIDIA TAO version 5.5 brings new foundational models and training capabilities.


Read the Blog

Blog—Vision Transformers

Learn how to improve accuracy and robustness of vision AI apps with Vision Transformers (ViTs) and NVIDIA TAO


Read the Blog

Blog—Character Detection and Recognition

Learn how to train and deploy a custom optical character detection and recognition model using NVIDIA TAO and NVIDIA Triton.


Read the Blog - Part 1 |   Part 2


Simplify and speed up AI training with TAO.

Get started