Learn about TensorRT 8.2 and the new TensorRT framework integrations, which accelerate inference in PyTorch and TensorFlow with just one line of code.
Today NVIDIA released TensorRT 8.2, with optimizations for billion parameter NLU models. These include T5 and GPT-2, used for translation and text generation, making it possible to run NLU apps in real time.
TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for AI applications. TensorRT is used across several industries including healthcare, automotive, manufacturing, internet/telecom services, financial services, and energy.
PyTorch and TensorFlow are the most popular deep learning frameworks having millions of users. The new TensorRT framework integrations now provide a simple API in PyTorch and TensorFlow with powerful FP16 and INT8 optimizations to accelerate inference by up to 6x.
Highlights include
- TensorRT 8.2: Optimizations for T5 and GPT-2 run real-time translation and summarization with 21x faster performance compared to CPUs.
- TensorRT 8.2: Simple Python API for developers using Windows.
- Torch-TensorRT: Integration for PyTorch delivers up to 6x performance vs in-framework inference on GPUs with just one line of code.
- TensorFlow-TensorRT: Integration of TensorFlow with TensorRT delivers up to 6x faster performance compared to in-framework inference on GPUs with one line of code.
Resources
- Torch-TensorRT is available today in the PyTorch Container from the NGC catalog.
- TensorFlow-TensorRT is available today in the TensorFlow Container from the NGC catalog.
- TensorRT is freely available to members of the NVIDIA Developer Program.
- Learn more on the TensorRT product page.
Learn more
- GTC Session A31336: Accelerate Deep Learning Inference in Production with TensorRT
- GTC Session A31107: Accelerate PyTorch Inference with TensorRT
- Up to 6x Faster Inference in PyTorch on GPUs with Torch-TensorRT
- Getting Started: Torch-TensorRT, TensorFlow-TensorRT
- Real-Time Inference for T5 and GPT-2 with TensorRT
- Notebook: T5 Translation with TensorRT
- Notebook: GPT-2 Text Generation with TensorRT