Categories
Misc

Fast-Track Deploying Machine Learning Models with OctoML CLI and NVIDIA Triton Inference Server

Read how OctoML CLI and NVIDIA Triton automate model optimization and containerization to run models on any cloud or data center, at scale, and at much lower cost.

Categories
Offsites

Mapping Urban Trees Across North America with the Auto Arborist Dataset

Over four billion people live in cities around the globe, and while most people interact daily with others — at the grocery store, on public transit, at work — they may take for granted their frequent interactions with the diverse plants and animals that comprise fragile urban ecosystems. Trees in cities, called urban forests, provide critical benefits for public health and wellbeing and will prove integral to urban climate adaptation. They filter air and water, capture stormwater runoff, sequester atmospheric carbon dioxide, and limit erosion and drought. Shade from urban trees reduces energy-expensive cooling costs and mitigates urban heat islands. In the US alone, urban forests cover 127M acres and produce ecosystem services valued at $18 billion. But as the climate changes these ecosystems are increasingly under threat.

Census data is typically not comprehensive, covering a subset of public trees and not including those in parks.

Urban forest monitoring — measuring the size, health, and species distribution of trees in cities over time — allows researchers and policymakers to (1) quantify ecosystem services, including air quality improvement, carbon sequestration, and benefits to public health; (2) track damage from extreme weather events; and (3) target planting to improve robustness to climate change, disease and infestation.

However, many cities lack even basic data about the location and species of their trees. Collecting such data via a tree census is costly (a recent Los Angeles census cost $2 million and took 18 months) and thus is typically conducted only by cities with substantial resources. Further, lack of access to urban greenery is a key aspect of urban social inequality, including socioeconomic and racial inequality. Urban forest monitoring enables the quantification of this inequality and the pursuit of its improvement, a key aspect of the environmental justice movement. But machine learning could dramatically lower tree census costs using a combination of street-level and aerial imagery. Such an automated system could democratize access to urban forest monitoring, especially for under-resourced cities that are already disproportionately affected by climate change. While there have been prior efforts to develop automated urban tree species recognition from aerial or street-level imagery, a major limitation has been a lack of large-scale labeled datasets.

Today we introduce the Auto Arborist Dataset, a multiview urban tree classification dataset that, at ~2.6 million trees and >320 genera, is two orders of magnitude larger than those in prior work. To build the dataset, we pulled from public tree censuses from 23 North American cities (shown above) and merged these records with Street View and overhead RGB imagery. As the first urban forest dataset to cover multiple cities, we analyze in detail how forest models can generalize with respect to geographic distribution shifts, crucial to building systems that scale. We are releasing all 2.6M tree records publicly, along with aerial and ground-level imagery for 1M trees.

The 23 cities in the dataset are spread across North America, and are categorized into West, Central, and East regions to enable analysis of spatial and hierarchical generalization.
The number of tree records and genera in the dataset, per city and per region. The holdout city (which is never seen during training in any capacity) for each region is in bold.

The Auto Arborist Dataset
To curate Auto Arborist, we started from existing tree censuses which are provided by many cities online. For each tree census considered, we verified that the data contained GPS locations and genus/species labels, and was available for public use. We then parsed these data into a common format, fixing common data entry errors (such as flipped latitude/longitude) and mapping ground-truth genus names (and their common misspellings or alternate names) to a unified taxonomy. We have chosen to focus on genus prediction (instead of species-level prediction) as our primary task to avoid taxonomic complexity arising from hybrid and subspecies and the fact that there is more universal consensus on genus names than species names.

Next, using the provided geolocation for each tree, we queried an RGB aerial image centered on the tree and all street-level images taken within 2-10 meters around it. Finally, we filtered these images to (1) maximize our chances that the tree of interest is visible from each image and (2) preserve user privacy. This latter concern involved a number of steps including the removal of images that included people as determined by semantic segmentation and manual blurring, among others.

Selected Street View imagery from the Auto Arborist dataset. Green boxes represent tree detections (using a model trained on Open Images) and blue dots represent projected GPS location of the labeled tree.

One of the most important challenges for urban forest monitoring is to do well in cities that were not part of the training set. Vision models must contend with distribution shifts, where the training distribution differs from the test distribution from a new city. Genus distributions vary geographically (e.g., there are more Douglas fir in western Canada than in California) and can also vary based on city size (LA is much larger than Santa Monica and contains many more genera). Another challenge is the long-tailed, fine-grained nature of tree genera, which can be difficult to disambiguate even for human experts, with many genera being quite rare.

The long-tailed distribution across Auto Arborist categories. Most examples come from a few frequent categories, and many categories have far fewer examples. We characterize each genus as frequent, common, or rare based on the number of training examples. Note that the test data is split spatially from the training data within each city, so not all rare genera are seen in the test set.

Finally, there are a number of ways in which tree images can have noise. For one, there is temporal variation in deciduous trees (for example, when aerial imagery includes leaves, but street-level images are bare). Moreover, public arboreal censuses are not always up-to-date. Thus, sometimes trees have died (and are no longer visible) in the time since the tree census was taken. In addition, aerial data quality can be poor (missing or obscured, e.g., by clouds).

Our curation process sought to minimize these issues by (1) only keeping images with sufficient tree pixels, as determined by a semantic segmentation model, (2) only keeping reasonably recent images, and (3) only keeping images where the tree position was sufficiently close to the street level camera. We considered also optimizing for trees seen in spring and summer, but decided seasonal variation could be a useful cue — we thus also released the date of each image to enable the community to explore the effects of seasonal variability.

Benchmark and Evaluation
To evaluate the dataset, we designed a benchmark to measure domain generalization and performance in the long tail of the distribution. We generated training and test splits at three levels. First, we split within each city (based on latitude or longitude) to see how well a city generalizes to itself. Second, we aggregate city-level training sets into three regions, West, Central, and East, holding out one city from each region. Finally, we merge the training sets across the three regions. For each of these splits, we report both accuracy and class-averaged recall for frequent, common and rare species on the corresponding held-out test sets.

Using these metrics, we establish a performance baseline using standard modern convolutional models (ResNet). Our results demonstrate the benefits of a large-scale, geospatially distributed dataset such as Auto Arborist. First, we see that more training data helps — training on the entire dataset is better than training on a region, which is better than training on a single city.

The performance on each city’s test set when training on itself, on the region, and on the full training set.

Second, training on similar cities helps (and thus, having more coverage of cities helps). For example, if focusing on Seattle, then it is better to train on trees in Vancouver than Pittsburgh.

Cross-set performance, looking at the pairwise combination of train and test sets for each city. Note the block-diagonal structure, which highlights regional structure in the dataset.

Third, more data modalities and views help. The best performing models combine inputs from multiple Street View angles and overhead views. There remains much room for improvement, however, and this is where we believe the larger community of researchers can help.

Get Involved
By releasing the Auto Arborist Dataset, we step closer to the goal of affordable urban forest monitoring, enabling the computer vision community to tackle urban forest monitoring at scale for the first time. In the future, we hope to expand coverage to more North American cities (particularly in the South of the US and Mexico) and even worldwide. Further, we are excited to push the dataset to the more fine-grained species level and investigate more nuanced monitoring, including monitoring tree health and growth over time, and studying the effects of environmental factors on urban forests.

For more details, see our CVPR 2022 paper. This dataset is part of Google’s broader efforts to empower cities with data about urban forests, through the Environmental Insights Explorer Tree Canopy Lab and is available on our GitHub repo. If you represent a city that is interested in being included in the dataset please email [email protected].

Acknowledgements
We would like to thank our co-authors Guanhang Wu, Trevor Edwards, Filip Pavetic, Bo Majewski, Shreyasee Mukherjee, Stanley Chan, John Morgan, Vivek Rathod, and Chris Bauer. We also thank Ruth Alcantara, Tanya Birch, and Dan Morris from Google AI for Nature and Society, John Quintero, Stafford Marquardt, Xiaoqi Yin, Puneet Lall, and Matt Manolides from Google Geo, Karan Gill, Tom Duerig, Abhijit Kundu, David Ross, Vighnesh Birodkar from Google Research (Perception team), and Pietro Perona for their support. This work was supported in part by the Resnick Sustainability Institute and was undertaken while Sara Beery was a Student Researcher at Google.

Categories
Offsites

Quantum Advantage in Learning from Experiments

In efforts to learn about the quantum world, scientists face a big obstacle: their classical experience of the world. Whenever a quantum system is measured, the act of this measurement destroys the “quantumness” of the state. For example, if the quantum state is in a superposition of two locations, where it can seem to be in two places at the same time, once it is measured, it will randomly appear either ”here” or “there”, but not both. We only ever see the classical shadows cast by this strange quantum world.

A growing number of experiments are implementing machine learning (ML) algorithms to aid in analyzing data, but these have the same limitations as the people they aim to help: They can’t directly access and learn from quantum information. But what if there were a quantum machine learning algorithm that could directly interact with this quantum data?

In “Quantum Advantage in Learning from Experiments”, a collaboration with researchers at Caltech, Harvard, Berkeley, and Microsoft published in Science, we show that a quantum learning agent can perform exponentially better than a classical learning agent at many tasks. Using Google’s quantum computer, Sycamore, we demonstrate the tremendous advantage that a quantum machine learning (QML) algorithm has over the best possible classical algorithm. Unlike previous quantum advantage demonstrations, no advances in classical computing power could overcome this gap. This is the first demonstration of a provable exponential advantage in learning about quantum systems that is robust even on today’s noisy hardware.

Quantum Speedup
QML combines the best of both quantum computing and the lesser-known field of quantum sensing.

Quantum computers will likely offer exponential improvements over classical systems for certain problems, but to realize their potential, researchers first need to scale up the number of qubits and to improve quantum error correction. What’s more, the exponential speed-up over classical algorithms promised by quantum computers relies on a big, unproven assumption about so-called “complexity classes” of problems — namely, that the class of problems that can be solved on a quantum computer is larger than those that can be solved on a classical computer.. It seems like a reasonable assumption, and yet, no one has proven it. Until it’s proven, every claim of quantum advantage will come with an asterisk: that it can do better than any known classical algorithm.

Quantum sensors, on the other hand, are already being used for some high-precision measurements and offer modest (and proven) advantages over classical sensors. Some quantum sensors work by exploiting quantum correlations between particles to extract more information about a system than it otherwise could have. For example, scientists can use a collection of N atoms to measure aspects of the atoms’ environment like the surrounding magnetic fields. Typically the sensitivity to the field that the atoms can measure scales with the square root of N. But if one uses quantum entanglement to create a complex web of correlations between the atoms, then one can improve the scaling to be proportional to N. But as with most quantum sensing protocols, this quadratic speed-up over classical sensors is the best one can ever do.

Enter QML, a technology that straddles the line between quantum computers and quantum sensors. QML algorithms make computations that are aided by quantum data. Instead of measuring the quantum state, a quantum computer can store quantum data and implement a QML algorithm to process the data without collapsing it. And when this data is limited, a QML algorithm can squeeze exponentially more information out of each piece it receives when considering particular tasks.

Comparison of a classical machine learning algorithm and a quantum machine learning algorithm. The classical machine learning algorithm measures a quantum system, then performs classical computations on the classical data it acquires to learn about the system. The quantum machine learning algorithm, on the other hand, interacts with the quantum states produced by the system, giving it a quantum advantage over the CML.

To see how a QML algorithm works, it’s useful to contrast with a standard quantum experiment. If a scientist wants to learn about a quantum system, they might send in a quantum probe, such as an atom or other quantum object whose state is sensitive to the system of interest, let it interact with the system, then measure the probe. They can then design new experiments or make predictions based on the outcome of the measurements. Classical machine learning (CML) algorithms follow the same process using an ML model, but the operating principle is the same — it’s a classical device processing classical information.

A QML algorithm instead uses an artificial “quantum learner.” After the quantum learner sends in a probe to interact with the system, it can choose to store the quantum state rather than measure it. Herein lies the power of QML. It can collect multiple copies of these quantum probes, then entangle them to learn more about the system faster.

Suppose, for example, the system of interest produces a quantum superposition state probabilistically by sampling from some distribution of possible states. Each state is composed of n quantum bits, or qubits, where each is a superposition of “0” and “1” — all learners are allowed to know the generic form of the state, but must learn its details.

In a standard experiment, where only classical data is accessible, every measurement provides a snapshot of the distribution of quantum states, but since it’s only a sample, it is necessary to measure many copies of the state to reconstruct it. In fact, it will take on the order of 2n copies.

A QML agent is more clever. By saving a copy of the n-qubit state, then entangling it with the next copy that comes along, it can learn about the global quantum state more quickly, giving a better idea of what the state looks like sooner.

Basic schematic of the QML algorithm. Two copies of a quantum state are saved, then a “Bell measurement” is performed, where each pair is entangled and their correlations measured.

<!–

Basic schematic of the QML algorithm. Two copies of a quantum state are saved, then a “Bell measurement” is performed, where each pair is entangled and their correlations measured.

–>

The classical reconstruction is like trying to find an image hiding in a sea of noisy pixels — it could take a very long time to average-out all the noise to know what the image is representing. The quantum reconstruction, on the other hand, uses quantum mechanics to isolate the true image faster by looking for correlations between two different images at once.

Results
To better understand the power of QML, we first looked at three different learning tasks and theoretically proved that in each case, the quantum learning agent would do exponentially better than the classical learning agent. Each task was related to the example given above:

  1. Learning about incompatible observables of the quantum state — i.e., observables that cannot be simultaneously known to arbitrary precision due to the Heisenberg uncertainty principle, like position and momentum. But we showed that this limit can be overcome by entangling multiple copies of a state.
  2. Learning about the dominant components of the quantum state. When noise is present, it can disturb the quantum state. But typically the “principal component” — the part of the superposition with the highest probability — is robust to this noise, so we can still glean information about the original state by finding this dominant part.
  3. Learning about a physical process that acts on a quantum system or probe. Sometimes the state itself is not the object of interest, but a physical process that evolves this state is. We can learn about various fields and interactions by analyzing the evolution of a state over time.

In addition to the theoretical work, we ran some proof-of-principle experiments on the Sycamore quantum processor. We started by implementing a QML algorithm to perform the first task. We fed an unknown quantum mixed state to the algorithm, then asked which of two observables of the state was larger. After training the neural network with simulation data, we found that the quantum learning agent needed exponentially fewer experiments to reach a prediction accuracy of 70% — equating to 10,000 times fewer measurements when the system size was 20 qubits. The total number of qubits used was 40 since two copies were stored at once.

Experimental comparison of QML vs. CML algorithms for predicting a quantum state’s observables. While the number of experiments needed to achieve 70% accuracy with a CML algorithm (“C” above) grows exponentially with the size of the quantum state n, the number of experiments the QML algorithm (“Q”) needs is only linear in n. The dashed line labeled “Rigorous LB (C)” represents the theoretical lower bound (LB) — the best possible performance — of a classical machine learning algorithm.

<!–

Experimental comparison of QML vs. CML algorithms for predicting a quantum state’s observables. While the number of experiments needed to achieve 70% accuracy with a CML algorithm (“C” above) grows exponentially with the size of the quantum state n, the number of experiments the QML algorithm (“Q”) needs is only linear in n. The dashed line labeled “Rigorous LB (C)” represents the theoretical lower bound (LB) — the best possible performance — of a classical machine learning algorithm.

–>

In a second experiment, relating to the task 3 above, we had the algorithm learn about the symmetry of an operator that evolves the quantum state of their qubits. In particular, if a quantum state might undergo evolution that is either totally random or random but also time-reversal symmetric, it can be difficult for a classical learner to tell the difference. In this task, the QML algorithm can separate the operators into two distinct categories, representing two different symmetry classes, while the CML algorithm fails outright. The QML algorithm was completely unsupervised, so this gives us hope that the approach could be used to discover new phenomena without needing to know the right answer beforehand.

Experimental comparison of QML vs. CML algorithms for predicting the symmetry class of an operator. While QML successfully separates the two symmetry classes, the CML fails to accomplish the task.

Conclusion
This experimental work represents the first demonstrated exponential advantage in quantum machine learning. And, distinct from a computational advantage, when limiting the number of samples from the quantum state, this type of quantum learning advantage cannot be challenged, even by unlimited classical computing resources.

So far, the technique has only been used in a contrived, “proof-of-principle” experiment, where the quantum state is deliberately produced and the researchers pretend not to know what it is. To use these techniques to make quantum-enhanced measurements in a real experiment, we’ll first need to work on current quantum sensor technology and methods to faithfully transfer quantum states to a quantum computer. But the fact that today’s quantum computers can already process this information to squeeze out an exponential advantage in learning bodes well for the future of quantum machine learning.

Acknowledgements
We would like to thank our Quantum Science Communicator Katherine McCormick for writing this blog post. Images reprinted with permission from Huang et al., Science, Vol 376:1182 (2022).

Categories
Misc

Getting datasets labelled

Hello, would you mind sharing your strategies on how you label thousands of images for training for custom datasets you can’t find online?

submitted by /u/jchasinga
[visit reddit] [comments]

Categories
Misc

Artem Cherkasov and Olexandr Isayev on Democratizing Drug Discovery With NVIDIA GPUs

It may seem intuitive that AI and deep learning can speed up workflows — including novel drug discovery, a typically years-long and several-billion-dollar endeavor. But professors Artem Cherkasov and Olexandr Isayev were surprised to find that no recent academic papers provided a comprehensive, global research review of how deep learning and GPU-accelerated computing impact drug Read article >

The post Artem Cherkasov and Olexandr Isayev on Democratizing Drug Discovery With NVIDIA GPUs appeared first on NVIDIA Blog.

Categories
Misc

Make your own neural networks with this Keras cheat sheet to deep learning in Python for beginners, with code samples.

Make your own neural networks with this Keras cheat sheet to deep learning in Python for beginners, with code samples. submitted by /u/joanna58
[visit reddit] [comments]
Categories
Misc

Is it against privacy of clients if I have a global tokenizer in Federated Learning (TFF)?

I am currently stuck in a dead end. I am trying to make an image caption generator from a federated approach. My initial idea was to have a different tokenizer for each client. That poses these issues however:

  1. Every client will have a different sized vocabulary, and thus a different shape of y, which will cause issues with the global model configuration.

  2. To counter the above issue, I could make size of y in each client equivalent to the largest size across all clients, and fill the extra columns in each client with 0.
    E.g: [0,1,1,1] mapped to a size of 6 would become [0,1,1,1,0,0]

  3. This brings me to the last possible flaw, which is that the same words in different clients will be having different indices. A word “rock” in client 1 might have an index of 6, while the same can have an index of 9 in another client. While training the global model, it will cause issues since the model is trying to learn different label indices for the same word, which will impact the accuracy?

This brings me to the final question: Is it against the idea of Federated Learning to tokenize all the words of all the training clients in a single tokenizer?

submitted by /u/ChaosAdm
[visit reddit] [comments]

Categories
Misc

Building a Computer Vision Application to Recognize Human Activities

This walkthrough shares how a user can quickly build and deploy a computer vision application with the NVIDIA NGC catalog and Google Vertex AI.

Join us on June 22 for the Build A Computer Vision Application with NVIDIA AI on Google Cloud Vertex AI live webinar, where we walk you step-by-step through using these resources to build your own action recognition application.

Advances in computer vision models are providing deeper insights to make our lives increasingly productive, our communities safer, and our planet cleaner.

We’ve come a long way from object detection that tells us whether a patient is walking or sitting on the floor but can’t alert us if the patient collapsed, for example. New computer vision models are overcoming these types of challenges by processing temporal information and predicting actions.

Building these models from scratch requires AI expertise, large amounts of training data, and loads of compute power. Fortunately, transfer learning enables you to build custom models with a fraction of these resources.

In this post, we walk through each step to build and deploy a computer vision application with NVIDIA AI software from the NGC catalog and run it on Google Cloud Vertex AI Workbench.

Software and infrastructure

The NGC catalog provides GPU-optimized AI frameworks, training and inference SDKs, and pretrained models that can be easily deployed through ready-to-use Jupyter notebooks.

Google Cloud Vertex AI Workbench is a single development environment for the entire AI workflow. It accelerates data engineering by deeply integrating with all of the services necessary to rapidly build and deploy models in production.

Accelerating application development by taking care of the plumbing

NVIDIA and Google Cloud have partnered to enable easy deployment of the software and models from the NGC catalog to Vertex AI Workbench. It’s made easy through ready-to-use Jupyter notebooks with a single click, instead of a dozen complex steps.

This quick deploy feature launches the JupyterLab instance on Vertex AI with an optimal configuration, preloads the software dependencies, and downloads the NGC notebook in one go. This enables you to start executing the code right away without needing any expertise to configure the development environment.

A Google Cloud account with free credits is plenty to build and run this application.

Live webinar

You can also join us on June 22 during our live webinar where we will walk you step-by-step through how to build your computer vision application that recognizes human action, using software from the NGC catalog and Vertex AI Workbench.

Get started

To follow along, you need the following resources:

Software

  • NVIDIA TAO Toolkit:  An AI-model-adaptation framework to fine-tune pretrained models with custom data and produce highly accurate computer vision, speech, and language understanding models.
  • Action Recognition model:  A five-class action recognition network to recognize what people do in an image.
  • Action Recognition Jupyter Notebook:  An example use case of Action_Recognition_Net using TAO Toolkit.

When you sign into the NGC catalog, you’ll see the curated content.

Screenshot of the NGC catalog.
Figure 1. NGC catalog

All Jupyter notebooks on NGC are hosted under Resources on the left pane. Find the TAO Action Recognition notebook.

There are a couple of ways to get started using the sample Jupyter notebooks from this resource:

Screenshot of the collection of AI software to run using the quick deploy feature.
Figure 2. Vertex AI Workbench Collection NGC catalog page

Take the easy route with quick deploy. It takes care of the end-to-end setup requirements like fetching the Jupyter notebook, configuring the GPU instance, installing dependencies, and running a JupyterLab interface to quickly get started with the development! Try it out by choosing Deploy on Vertex AI.

You see a window with detailed information about the resource and AI platform. The Deploy option leads to the Google Cloud Vertex AI platform Workbench.

The following information is preconfigured but can be customized, depending on the requirements of the resource:

  • Name of the notebook
  • Region
  • Docker container environment
  • Machine type, GPU type, Number of GPUs
  • Disk type and data size
Screenshot of the Google Cloud Vertex AI portal with preconfigured instance settings.
Figure 3. Google Cloud interface

You can keep the recommended configuration as-is or change as required before choosing Create. Creating the GPU compute instance and setting up the JupyterLab environment takes about a couple of minutes.

To start up the interface, choose Open, Open JupyterLab. The instance loads up with the resources (Jupyter notebooks) pulled and the environment set up as a kernel in the JupyterLab.

Screenshot of the JupyterLab environment with all the required resource icons.
Figure 4. Action recognition resource in the Vertex AI instance

The JupyterLab interface pulls the resources (custom container and Jupyter notebooks) from NGC. Select the custom kernel tao-toolkit-pyt in the JupyterLab interface.

Run the notebook

This action recognition Jupyter notebook showcases how to fine-tune an action recognition model that identifies five human actions. You use it for two actions in this dataset: fall-floor and ride-bike.

The notebook makes use of the HMDB51 dataset to fine-tune a pretrained model loaded from the NGC catalog. The notebook also showcases how to run inference on the trained model and deploy it into the real-time video analytics framework NVIDIA DeepStream.

Set up the env variables

Set the HOST_DATA_DIR, HOST_SPECS_DIR, HOST_RESULTS_DIR and env-key variables, then execute the cell. The data, specs, results folder, and Jupyter notebook are inside the action-recognition-net folder.

%env HOST_DATA_DIR=/absolute/path/to/your/host/data
# note: You could set the HOST_SPECS_DIR to folder of the experiments specs downloaded with the notebook
%env HOST_SPECS_DIR=/absolute/path/to/your/host/specs
%env HOST_RESULTS_DIR=/absolute/path/to/your/host/results

# Set your encryption key, and use the same key for all commands
%env KEY = nvidia_tao

Run the subsequent cells to download the HMDB51 dataset and unzip it into $HOST_DATA_DIR. The preprocessing scripts clip the video and generate optical flow out of it, which gets stored in the $HOST_DATA_DIR/processed_data directory.

!wget -P $HOST_DATA_DIR "https://github.com/shokoufeh-monjezi/TAOData/releases/download/v1.0/hmdb51_org.zip"
!mkdir -p $HOST_DATA_DIR/videos && unzip  $HOST_DATA_DIR/hmdb51_org.zip -d $HOST_DATA_DIR/videos
!mkdir -p $HOST_DATA_DIR/raw_data
!unzip $HOST_DATA_DIR/videos/hmdb51_org/fall_floor.zip -d $HOST_DATA_DIR/raw_data
!unzip $HOST_DATA_DIR/videos/hmdb51_org/ride_bike.zip -d $HOST_DATA_DIR/raw_data

 Finally, split the dataset into train and test and verify the contents by running the following code cell example, as given in the Jupyter notebook:

# download the split files and unrar
!wget -P $HOST_DATA_DIR https://github.com/shokoufeh-monjezi/TAOData/releases/download/v1.0/test_train_splits.zip

!mkdir -p $HOST_DATA_DIR/splits && unzip  $HOST_DATA_DIR/test_train_splits.zip -d $HOST_DATA_DIR/splits

# run split_HMDB to generate training split

!cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && python3 ./split_dataset.py $HOST_DATA_DIR/processed_data $HOST_DATA_DIR/splits/test_train_splits/testTrainMulti_7030_splits $HOST_DATA_DIR/train  $HOST_DATA_DIR/test

Verify the final test and train datasets:

!ls -l $HOST_DATA_DIR/train
!ls -l $HOST_DATA_DIR/train/ride_bike
!ls -l $HOST_DATA_DIR/test
!ls -l $HOST_DATA_DIR/test/ride_bike
Four photos of falling and bike riding for model training and testing
Figure 5. Example of data for training models and recognizing various actions

Download the pretrained model

You use the NGC CLI to get the pre-trained models. For more information, go to NGC and on the navigation bar, choose SETUP.

!ngc registry model download-version "nvidia/tao/actionrecognitionnet:trainable_v1.0" --dest $HOST_RESULTS_DIR/pretrained

Check the downloaded models. You should see resnet18_3d_rgb_hmdb5_32.tlt and resnet18_2d_rgb_hmdb5_32.tlt.

print("Check that model is downloaded into dir.")
!ls -l $HOST_RESULTS_DIR/pretrained/actionrecognitionnet_vtrainable_v1.0

Training specification

In the specs folder, you can find different specs files related to train, evaluate, infer, and export functions. Choose the train_rgb_3d_finetune.yaml file and you can change hyperparameters, such as the number of epochs, in this specs file.

Make sure that you edit the path in the specs file based on the path to the data and results folders in your system.

Train the model

We provide a pretrained RGB-only model trained on HMDB5 dataset. With the pretrained model, you can even get better accuracy with fewer epochs.

print("Train RGB only model with PTM")
!action_recognition train 
                  -e $HOST_SPECS_DIR/train_rgb_3d_finetune.yaml 
                  -r $HOST_RESULTS_DIR/rgb_3d_ptm 
                  -k $KEY 
                  model_config.rgb_pretrained_model_path=$HOST_RESULTS_DIR/pretrained/actionrecognitionnet_vtrainable_v1.0/resnet18_3d_rgb_hmdb5_32.tlt  
                  model_config.rgb_pretrained_num_classes=5

Evaluate the model

We provide two different sample strategies to evaluate the pretrained model on video clips.

  • center mode: Pick up the middle frames of a sequence to do inference. For example, if the model requires 32 frames as input and a video clip has 128 frames, then choose the frames from index 48 to index 79 to do the inference.
  • conv mode: Sample 10 sequences out of a single video and do inference. The final results are averaged.

Next, evaluate the RGB model trained with PTM:

!action_recognition evaluate 
                    -e $HOST_SPECS_DIR/evaluate_rgb.yaml 
                    -k $KEY 
                    model=$HOST_RESULTS_DIR/rgb_3d_ptm/rgb_only_model.tlt  
                    batch_size=1 
                    test_dataset_dir=$HOST_DATA_DIR/test 
                    video_eval_mode=center

                    video_eval_mode=center

 Inferences

In this section, you run the action recognition inference tool to generate inferences with the trained RGB models and print the results.

There are also two modes for inference just like evaluation: center mode and conv mode. The final output shows each input sequence label in the videos: [video_sample_path] [labels list for sequences in the video sample]          

!action_recognition inference 
           -e $HOST_SPECS_DIR/infer_rgb.yaml 
           -k $KEY 
           model=$HOST_RESULTS_DIR/rgb_3d_ptm/rgb_only_model.tlt 
           inference_dataset_dir=$HOST_DATA_DIR/test/ride_bike 
           video_inf_mode=center

 You can see an example of the results of the inference function on this dataset.

Screenshot of the code output for infer results.
Figure 6. Output results identifying human action (‘ride bike’)

Conclusion

NVIDIA TAO and the pretrained models help you accelerate your custom model development by eliminating the need for building models from scratch.

With the NGC catalog’s quick deploy feature, you can get access to an environment to build and run your computer vision application in a matter of minutes. This enables you to focus on development and avoid spending time on infrastructure setup.

Categories
Misc

Modernize Your Network Using NetDevOps

In part 2 of this series, we focus on solutions that optimize and modernize data center network operations.

In part 2 of this series, we focus on solutions that optimize and modernize data center network operations. In the first installment, Optimizing Your Data Center Network, we looked at updating your networking infrastructure and protocols.

NetDevOps is an ideology that has been permeating through the IT infrastructure diaspora for the past 5 years. As a theory, it can provide many areas to optimize infrastructure operations. 

We will discuss some applications of NetDevOps that can be applied to your operational workflows. 

These include:

  • Centralizing configuration management through Infrastructure as Code (IaC.)
  • Automating repetitive operations tasks.
  • Using automation to implement standardization and consistency in configurations.
  • Testing and validating changes using networking digital twin simulations.

Centralizing configuration management with IaC

The principles behind IaC have been used in software development for developers to contribute code to the same software project in parallel. But they also create a centralized repository where the code project—including networking configurations for servers, NICs, routers, and switches—can reside and act as a singular source of truth. 

The decentralized aspect of configuration management makes it fundamentally inefficient to enforce standardization. It also makes it difficult to determine the correct configuration or track changes. 

Using IaC with source control management software like Git can help resolve issues, ensuring the correct network configurations and code are available to all admins, servers, and switches.

Automating repetitive operations tasks

In large-scale infrastructures, components of the configuration will be the same regardless of the device. Configurations like syslog server, NTP server, SNMP settings, and other management settings can be automated with technology such as Zero Touch Provisioning (ZTP). ZTP can apply configurations to a switch-on boot to reduce the errors that may happen with manual configurations across many devices. Applying standard configurations and executing repetitive tasks are perfect for ZTP as it can be enforced consistently across every device.

Leveraging automation to implement standardization in configurations

Automation normally relies on an external tool to drive configurations after the device has fully booted. Automation is more dynamic and can be applied multiple times in a device’s operational cycle, whereas ZTP is used only during the first boot of each device.

Automation tools such as Ansible and Salt apply configurations at scale using templating technologies and scripting. These tools simplify infrastructure management by building standardized templates and relying only on key/value pair data structures to populate the templates. This way, an operator can be confident of the configurations and focus on validating that the correct configurations are going to the right devices.

Additionally, automation tools can apply configurations at scale. Any fixes for misconfigurations or bugs can be confidently applied to thousands of nodes with minimal effort and no risk of node misconfiguration due to a mistyped command or distracted administrator.

Testing and validating changes in networking Digital Twin simulations

When using automation to apply configurations at scale to multiple nodes, it is critical to understand the larger impact before committing changes. Applying changes to a few nodes as a test often doesn’t reveal what will happen when the changes are applied to every node. The NVIDIA Air infrastructure simulation platform creates a digital twin of the environment for users to test all changes before deploying them.

With a digital twin you can run automation in a safe sandbox, to ensure the changes will not cause any unforeseen outages. Coupling the digital twin with a validation technology, such as NVIDIA NetQ, can create an automated testing pipeline to ensure that all configuration changes do exactly what is expected from each change window.

Conclusion

This series covered ways to optimize your data center network. The first approach was through modernizing the network architecture protocol. The second post focused on delivering operational efficiency gains through NetDevOps. 

Optimization is critical to maintaining a high level of service, peak efficiency, and productivity. By leveraging the topics discussed, you’ll be able to optimize your data center network to be a more resilient platform that improves the overall performance of your business and save you money. 

I encourage you to find additional ways to streamline data center operations and further optimization by clicking on the additional resource links below.

Categories
Misc

Just Released: cuTENSOR V1.5

The high-performance CUDA library for tensor primitives now features updates to increase support, fix bugs, stop false-positive CUDA API errors, and more.