Categories
Misc

Speech AI Expands Global Reach With Telugu Language Breakthrough

More than 75 million people speak Telugu, predominantly in India’s southern regions, making it one of the most widely spoken languages in the country. Despite such prevalence, Telugu is considered a low-resource language when it comes to speech AI. This means there aren’t enough hours’ worth of speech datasets to easily and accurately create AI Read article >

The post Speech AI Expands Global Reach With Telugu Language Breakthrough appeared first on NVIDIA Blog.

Categories
Misc

AI at the Point of Care: Startup’s Portable Scanner Diagnoses Brain Stroke in Minutes

For every minute that a stroke is left untreated, the average patient loses nearly 2 million neurons. This means that for each hour in which treatment fails to occur, the brain loses as many neurons as it does in more than three and a half years of normal aging. With one of the world’s first Read article >

The post AI at the Point of Care: Startup’s Portable Scanner Diagnoses Brain Stroke in Minutes appeared first on NVIDIA Blog.

Categories
Misc

Hittin’ the Sim: NVIDIA’s Matt Cragun on Conditioning Autonomous Vehicles in Simulation

Training, testing and validating autonomous vehicles requires a continuous pipeline — or data factory — to introduce new scenarios and refine deep neural networks. A key component of this process is simulation. AV developers can test a virtually limitless number of scenarios, repeatably and at scale, with high-fidelity, physically based simulation. And like much of Read article >

The post Hittin’ the Sim: NVIDIA’s Matt Cragun on Conditioning Autonomous Vehicles in Simulation appeared first on NVIDIA Blog.

Categories
Misc

Banking on AI: Deutsche Bank, NVIDIA to Accelerate Adoption of AI for Financial Services

Deutsche Bank Wednesday announced a partnership with NVIDIA to accelerate the use of AI and machine learning in the financial services sector. The announcement follows months of testing to explore use cases that could support the bank’s strategic ambitions to 2025 and beyond. “Accelerated computing and AI are at a tipping point, and we’re bringing Read article >

The post Banking on AI: Deutsche Bank, NVIDIA to Accelerate Adoption of AI for Financial Services appeared first on NVIDIA Blog.

Categories
Misc

License for the AI Autobahn: NVIDIA AI Enterprise 3.0 Introduces New Tools to Speed Success

From rapidly fluctuating demand to staffing shortages and supply chain complexity, enterprises have navigated numerous challenges the past few years. Many companies seeking strong starts to 2023 are planning to use AI and accelerated computing to drive growth while saving costs. To support these early adopters — as well as those just beginning their AI Read article >

The post License for the AI Autobahn: NVIDIA AI Enterprise 3.0 Introduces New Tools to Speed Success appeared first on NVIDIA Blog.

Categories
Misc

Visual Effects Artist Jay Lippman Takes Viewers Behind the Camera This Week ‘In the NVIDIA Studio’

Time to tackle one of the most challenging tasks for aspiring movie makers — creating aesthetically pleasing visual effects — courtesy of visual effects artist and filmmaker Jay Lippman this week In the NVIDIA Studio.

The post Visual Effects Artist Jay Lippman Takes Viewers Behind the Camera This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Categories
Misc

Upcoming Webinar: Using ML Models in ROS2 to Robustly Estimate Distance to Obstacles

Join this webinar on December 13 and learn how to estimate obstacle distances with stereo cameras using bespoke, pre-trained DNN models ESS, and Bi3D.

Join this webinar on December 13 and learn how to estimate obstacle distances with stereo cameras using bespoke, pre-trained DNN models ESS, and Bi3D.

Categories
Offsites

Private Ads Prediction with DP-SGD

Ad technology providers widely use machine learning (ML) models to predict and present users with the most relevant ads, and to measure the effectiveness of those ads. With increasing focus on online privacy, there’s an opportunity to identify ML algorithms that have better privacy-utility trade-offs. Differential privacy (DP) has emerged as a popular framework for developing ML algorithms responsibly with provable privacy guarantees. It has been extensively studied in the privacy literature, deployed in industrial applications and employed by the U.S. Census. Intuitively, the DP framework enables ML models to learn population-wide properties, while protecting user-level information.

When training ML models, algorithms take a dataset as their input and produce a trained model as their output. Stochastic gradient descent (SGD) is a commonly used non-private training algorithm that computes the average gradient from a random subset of examples (called a mini-batch), and uses it to indicate the direction towards which the model should move to fit that mini-batch. The most widely used DP training algorithm in deep learning is an extension of SGD called DP stochastic gradient descent (DP-SGD).

DP-SGD includes two additional steps: 1) before averaging, the gradient of each example is norm-clipped if the L2 norm of the gradient exceeds a predefined threshold; and 2) Gaussian noise is added to the average gradient before updating the model. DP-SGD can be adapted to any existing deep learning pipeline with minimal changes by replacing the optimizer, such as SGD or Adam, with their DP variants. However, applying DP-SGD in practice could lead to a significant loss of model utility (i.e., accuracy) with large computational overheads. As a result, various research attempts to apply DP-SGD training on more practical, large-scale deep learning problems. Recent studies have also shown promising DP training results on computer vision and natural language processing problems.

In “Private Ad Modeling with DP-SGD”, we present a systematic study of DP-SGD training on ads modeling problems, which pose unique challenges compared to vision and language tasks. Ads datasets often have a high imbalance between data classes, and consist of categorical features with large numbers of unique values, leading to models that have large embedding layers and highly sparse gradient updates. With this study, we demonstrate that DP-SGD allows ad prediction models to be trained privately with a much smaller utility gap than previously expected, even in the high privacy regime. Moreover, we demonstrate that with proper implementation, the computation and memory overhead of DP-SGD training can be significantly reduced.

Evaluation

We evaluate private training using three ads prediction tasks: (1) predicting the click-through rate (pCTR) for an ad, (2) predicting the conversion rate (pCVR) for an ad after a click, and 3) predicting the expected number of conversions (pConvs) after an ad click. For pCTR, we use the Criteo dataset, which is a widely used public benchmark for pCTR models. We evaluate pCVR and pConvs using internal Google datasets. pCTR and pCVR are binary classification problems trained with the binary cross entropy loss and we report the test AUC loss (i.e., 1 – AUC). pConvs is a regression problem trained with Poisson log loss (PLL) and we report the test PLL.

For each task, we evaluate the privacy-utility trade-off of DP-SGD by the relative increase in the loss of privately trained models under various privacy budgets (i.e., privacy loss). The privacy budget is characterized by a scalar ε, where a lower ε indicates higher privacy. To measure the utility gap between private and non-private training, we compute the relative increase in loss compared to the non-private model (equivalent to ε = ∞). Our main observation is that on all three common ad prediction tasks, the relative loss increase could be made much smaller than previously expected, even for very high privacy (e.g., ε <= 1) regimes.

DP-SGD results on three ads prediction tasks. The relative increase in loss is computed against the non-private baseline (i.e., ε = ∞) model of each task.

Improved Privacy Accounting

Privacy accounting estimates the privacy budget (ε) for a DP-SGD trained model, given the Gaussian noise multiplier and other training hyperparameters. Rényi Differential Privacy (RDP) accounting has been the most widely used approach in DP-SGD since the original paper. We explore the latest advances in accounting methods to provide tighter estimates. Specifically, we use connect-the-dots for accounting based on the privacy loss distribution (PLD). The following figure compares this improved accounting with the classical RDP accounting and demonstrates that PLD accounting improves the AUC on the pCTR dataset for all privacy budgets (ε).

Large Batch Training

Batch size is a hyperparameter that affects different aspects of DP-SGD training. For instance, increasing the batch size could reduce the amount of noise added during training under the same privacy guarantee, which reduces the training variance. The batch size also affects the privacy guarantee via other parameters, such as the subsampling probability and training steps. There is no simple formula to quantify the impact of batch sizes. However, the relationship between batch size and the noise scale is quantified using privacy accounting, which calculates the required noise scale (measured in terms of the standard deviation) under a given privacy budget (ε) when using a particular batch size. The figure below plots such relations in two different scenarios. The first scenario uses fixed epochs, where we fix the number of passes over the training dataset. In this case, the number of training steps is reduced as the batch size increases, which could result in undertraining the model. The second, more straightforward scenario uses fixed training steps (fixed steps).

The relationship between batch size and noise scales. Privacy accounting requires a noise standard deviation, which decreases as the batch size increases, to meet a given privacy budget. As a result, by using much larger batch sizes than the non-private baseline (indicated by the vertical dotted line), the scale of Gaussian noise added by DP-SGD can be significantly reduced.

In addition to allowing a smaller noise scale, larger batch sizes also allow us to use a larger threshold of norm clipping each per-example gradient as required by DP-SGD. Since the norm clipping step introduces biases in the average gradient estimation, this relaxation mitigates such biases. The table below compares the results on the Criteo dataset for pCTR with a standard batch size (1,024 examples) and a large batch size (16,384 examples), combined with large clipping and increased training epochs. We observe that large batch training significantly improves the model utility. Note that large clipping is only possible with large batch sizes. Large batch training was also found to be essential for DP-SGD training in Language and Computer Vision domains.

The effects of large batch training. For three different privacy budgets (ε), we observe that when training the pCTR models with large batch size (16,384), the AUC is significantly higher than with regular batch size (1,024).

Fast per-example Gradient Norm Computation

The per-example gradient norm calculation used for DP-SGD often causes computational and memory overhead. This calculation removes the efficiency of standard backpropagation on accelerators (like GPUs) that compute the average gradient for a batch without materializing each per-example gradient. However, for certain neural network layer types, an efficient gradient norm computation algorithm allows the per-example gradient norm to be computed without the need to materialize the per-example gradient vector. We also note that this algorithm can efficiently handle neural network models that rely on embedding layers and fully connected layers for solving ads prediction problems. Combining the two observations, we use this algorithm to implement a fast version of the DP-SGD algorithm. We show that Fast-DP-SGD on pCTR can handle a similar number of training examples and the same maximum batch size on a single GPU core as a non-private baseline.

The computation efficiency of our fast implementation (Fast-DP-SGD) on pCTR.

Compared to the non-private baseline, the training throughput is similar, except with very small batch sizes. We also compare it with an implementation utilizing the JAX Just-in-Time (JIT) compilation, which is already much faster than vanilla DP-SGD implementations. Our implementation is not only faster, but it is also more memory efficient. The JIT-based implementation cannot handle batch sizes larger than 64, while our implementation can handle batch sizes up to 500,000. Memory efficiency is important for enabling large-batch training, which was shown above to be important for improving utility.

Conclusion

We have shown that it is possible to train private ads prediction models using DP-SGD that have a small utility gap compared to non-private baselines, with minimum overhead for both computation and memory consumption. We believe there is room for even further reduction of the utility gap through techniques such as pre-training. Please see the paper for full details of the experiments.

Acknowledgements

This work was carried out in collaboration with Carson Denison, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, and Avinash Varadarajan. We thank Silvano Bonacina and Samuel Ieong for many useful discussions.

Categories
Misc

AI Models Recap: Scalable Pretrained Models Across Industries

The year 2022 has thus far been a momentous, thrilling, and an overwhelming year for AI aficionados. Get3D is pushing the boundaries of generative 3D modeling,…

The year 2022 has thus far been a momentous, thrilling, and an overwhelming year for AI aficionados. Get3D is pushing the boundaries of generative 3D modeling, an AI model can now diagnose breast cancer from MRIs as accurately as board-certified radiologists, and state-of-the-art speech AI models have widened their horizons to extended reality.

Pretrained models from NVIDIA have redefined performance this year, amused us on the stage of America’s Got Talent, won four global contests and a Best Inventions 2022 award from Time Magazine.

In addition to empowering researchers and data scientists, NVIDIA pretrained models are also empowering developers to create cutting-edge AI applications, by offering deep learning pretrained models and speedier convergence. To enable this, NVIDIA has spearheaded the research behind building and training these pretrained models for use cases like automatic speech recognition, pose estimation, object detection, 3D generation, semantic segmentation, and many more.

Model deployment can be streamlined, and users have already reaped the benefits over the last 3 months with 870 different NVIDIA pretrained models that support more than 50 use cases across several industries.

This post walks through a few of the top pretrained AI models that are behind groundbreaking AI applications.

Speech recognition for all

NVIDIA NeMo is serving a variety of industries with cutting-edge AI application development for speech AI and natural language processing. The use cases include the creation of virtual assistants in Arabic and the facilitation of state-of-the-art automatic speech recognition (ASR) for financial audio.

For language-specific ASR, the NVIDIA NeMo deep learning conformer transducer pretrained model and conformer-ctc (connectionist temporal classification) pretrained model are well-liked. These models have high accuracy, a low word error rate, and a low character error rate due to their pretraining on a range of datasets, such as Librispeech and Mozilla Common Voice Data. They also have a robust AI architecture.

These models are laying the groundwork for state-of-the-art Kinyarwanda ASR model, Kabyle, Catalan, and many low-resource language pretrained models, which are bringing the usage of enhanced speech AI to low-resource languages, regions, and sectors.

For more information, see NeMo automatic speech recognition models.

Verifying speakers for the greater good

To determine ‘who talked when,’ voice AI enthusiasts and application developers are fusing deep neural network speech recognition with speaker diarization architecture.

Beyond well-known uses like multi-speaker transcription in video conferencing, developers are gaining benefit from this AI architecture for special use cases:

  • Clinical speech recordings and understanding medical conversations for effective healthcare
  • Captioning and separating teacher-student speech in the education sector

Pretrained embeddings of the modified Emphasized Channel Attention, Propagation, and Aggregation in TDNN (ECAPA-TDNN) model are accessible with the NVIDIA NeMo toolkit. Fisher, Voxceleb, and real room-reaction data were used to train this deep neural network model for speaker identification and verification.

One of the best solutions for speaker diarization, ECAPA is based on the time-delay neural network (TDNN) and SE (squeeze and excite) structure with 22.3M parameters. It outperforms traditional TDNNs by emphasizing channel attention, propagation, and aggregation, as well as significantly reducing error rates.

For more information, see Speaker Diarization.

Visionary image control with SegFormer AI models

SegFormer is visionary research that uses AI to pioneer world-class image control. The original model and its variants are thriving in a variety of industries, including manufacturing, healthcare, automotive and retail. Its enormous potential is best demonstrated by applications like virtual changing rooms, robotic image control, medical imaging and diagnostics, and vision analytics in self-driving cars.

The semantic segmentation AI algorithm, a computer vision method for separating various objects in images, is the foundation of SegFormer. To increase performance to meet particular needs, the fine-tuned SegFormer is pretrained on datasets like ADE20k and CityScapes at several resolutions, such as 512×512, 640×640, 1024×1024, and so on. The AI design, which draws inspiration from the Transformer model architecture, produces cutting-edge outcomes in a variety of tasks.

For more information, see the NVlabs/SegFormer GitHub repo.

Purpose-built, pretrained model for automotive low-code developers

By detecting and identifying cars, people, road signs, and two-wheelers to comprehend traffic flow, TrafficCamNet has been driving smart city initiatives and detection technology for the automotive sector.

The model has been thoroughly trained using a vast amount of data that includes pictures of actual traffic crossings in US cities. The deep neural network model NVIDIA DetectNet_v2 detector is used with ResNet18 as a feature extractor.  The AI architecture, which is sometimes referred to as GridBox object detection, employs bounding-box regression on a regular grid in the input image. The NVIDIA TAO toolbox can be used to access and further fine-tune the purpose-built, pretrained model TrafficCamNet for best-in-class accuracy.

For more information, see Purpose-Built Models.

Award-winning models

NVIDIA pretrained models have won numerous awards for their cutting-edge performance, extraordinary research, and exemplary ability to solve real-world problems. Here are some notable wins.

World’s largest genomics language model wins Gordon Bell Special Award 2022

Researchers from Argonne National Labs, NVIDIA, the Technical University of Munich, the University of Chicago, CalTech, Harvard University, and others developed one of the world’s largest genomics language models that predicts new COVID variants. For their work, they won the 2022 Gordon Bell Special Award.

The model informs timely public health intervention strategies and downstream vaccine development for emerging viral variants. The research was published in October 2022 and presents GenSLMs (genome-scale language models), which can accurately and rapidly identify variants of concern in the SARS-CoV-2 virus.

The large genomics language models were pretrained on >110M gene sequences and then a SARS-CoV-2 specific model was fine-tuned on 1.5M genomes with 2.5B and 25B trainable parameters, respectively. This research enables programmers to further genetic language modeling by creating applications that can assist different public health initiatives.

For more information, see Speaking the Language of the Genome: Gordon Bell Winner Applies Large Language Models to Predict New COVID Variants.

State-of-the-art vision model wins Robust Vision Challenge 2022

The Fully Attential Network (FAN) Transformer model from NVIDIA Research won the Robust Vision Challenge 2022. The team adopted the SegFormer head on top of an ImageNet-22k pretrained FAN-B-Hybrid model, as described in the Understanding The Robustness in Vision Transformers paper. The model was then further fine-tuned on a composed, large-scale dataset, similar to MSeg.

NVIDIA Research developed all the models used. The model achieved a state-of-the-art 87.1% accuracy and 35.8% mCE on ImageNet-1k and ImageNet-C with 76.8M parameters. We also demonstrated state-of-the-art accuracy and robustness in two downstream tasks, semantic segmentation and object detection.

For more information, see the NVlabs/FAN GitHub repo.

Winning the Telugu automatic speech recognition competition

 NVIDIA recently won the Telugu-ASR challenge conducted by IIIT-Hyderabad, India. They trained a Conformer-RNNT (recurrent neural network transducer) model from scratch using 2K hours of Telugu-only data provided by organizers. Their efforts helped achieve the first position on the leaderboard for the closed track with WER 13.12%.

For an open competition track, they performed transfer-learning on a pretrained SSL Conformer-RNNT checkpoint trained on 36K hours from 40 Indic languages. With WER 12.64%, they won the competition. The fine-tuned winning model can be used by developers to create applications for automatic speech recognition that will benefit the 83M Telugu speakers globally.

NVIDIA pretrained models

NVIDIA pretrained models remove the need for constructing models from the start or experimenting with other open-source models that don’t converge, making high-performing AI development simple, rapid, and accessible.

For more information, see AI models.

Categories
Misc

Just Released: NVIDIA DRIVE OS 6.0.5 Now Available

The latest NVIDIA DRIVE OS release includes customization and safety updates for supercharging autonomous vehicle development.

The latest NVIDIA DRIVE OS release includes customization and safety updates for supercharging autonomous vehicle development.