Month: December 2020

Misc

Pinterest Trains Visual Search Faster with Optimized Architecture on NVIDIA GPUs

Post author By
Post date December 16, 2020
No Comments on Pinterest Trains Visual Search Faster with Optimized Architecture on NVIDIA GPUs

Pinterest now has more than 440 million reasons to offer the best visual search experience. That’s because its monthly active users are tracking this high for its popular image sharing and social media service. Visual search enables Pinterest users to search for images using text, screenshots or camera photos. It’s the core AI behind how Read article >

The post Pinterest Trains Visual Search Faster with Optimized Architecture on NVIDIA GPUs appeared first on The Official NVIDIA Blog.

Misc

NVIDIA SimNet v20.12 Released

Post author By
Post date December 16, 2020
No Comments on NVIDIA SimNet v20.12 Released

With this release, use cases such as heat sinks, data center cooling, aerodynamics and deformation of solids in linear elastic regime can be solved.

NVIDIA recently announced the release of SimNet v20.12 with support for new physics such as Fluid Mechanics, Linear Elasticity and Conductive as well as Convective Heat Transfer. Systems governed by Ordinary Differential Equations (ODEs) as well as Partial Differential Equations (PDEs) can now be solved. With this release, use cases such as heat sinks, data center cooling, aerodynamics and deformation of solids in linear elastic regime can be solved.

Previously announced in Sep, NVIDIA SimNet is a Physics Informed Neural Networks (PINNs) toolkit for students and researchers who are either looking to get started with AI-driven physics simulations or are looking to leverage a powerful framework to implement their domain knowledge to solve complex nonlinear physics problems with real-world applications.

SimNet v20.12 highlights

Multi-parameter training of Complex Geometries and Physics:

As a result of enhancements in network architectures as well as performance improvements, SimNet v20.12 converges to a lower loss faster. This enables training on several parameters in a single run. For a 10-parameter Limerock, training and inference for 59,049 configurations (3 values for each design parameter) took 1000 V100 GPU hours. For same number of solver runs, the solver would take over 18.4 million hours (with 26 hours/configuration for a 12-core workstation)

Linear Elasticity in Solids:

Linear elastic solid deformation is now included in the release in both Navier-Cauchy as well as Equilibrium forms. The solution has good agreement with finite element results.

The stresses from the linear elasticity formulation from SimNet were used in a digital twin model, developed by University of Central Florida, using RNN to model fatigue crack growth in an aircraft panel.

Improved STL geometry library:

The PySDF library for STL geometries has been enhanced for about 10x more performance with better accuracy for complex geometries.

Integral form of Partial Differential Equations:

Some physics problems have no classical PDE (or strong) form but only a variational (or weak) form. This requires handling the PDEs in a different approach other than its original (classical) form, especially for interface problem, concave domain, singular problem, etc. In SimNet, the PDEs can be solved not only in their strong form, but also in their weak form.

For example, a point source represented by delta Dirac function cannot be solved by the differential equations based PINNs but an integral form can capture the singular behavior at the center.

Strong Scaling Performance:

For the multi-GPU cases, the learning rate is gradually increased from the baseline case and this allows the model to train without diverging early on and allows the model to converge faster as a result of the increased global batch size coupled with the increased learning rate. The loss function evolution as the number of GPUs is increased from 1 to 16 for the NVSwitch heat sink case shows a progressive scaling from 2x for the 2 GPU case to 8x for the 16 GPU case.

SimNet in other news / events:

At PHM Society 2020, Nvidia collaborated with UCF (University of Central Florida) to solve the use case of a Digital Twin of Aircraft cabin panels using SimNet and this was published in the PHM Society event. To learn more, watch the video (#2) here: https://www.phmsociety2020.com/corporate-sponsors/nvidia
At SC20, we showcased how SimNet can help medical researchers simulate and predict the underlying blood flow physics in an aneurysm. Read more here: https://news.developer.nvidia.com/sc20-demo-flow-physics-quantification-in-an-aneurysm-using-nvidia-simnet/

Read the paper, NVIDIA SimNet: an AI-accelerated multi-physics simulation framework here.

Give SimNet v20.12 a try by requesting access today.

Misc

Refreshing a Live Service Game

Post author By
Post date December 16, 2020
No Comments on Refreshing a Live Service Game

We talked to Haiyong Qian, NetEase Game Engine Development Researcher and Manager of NetEase Thunder Fire Games Technical Center, to see what he’s learned as the Justice team added NVIDIA ray-tracing solutions to their development pipeline.

How NetEase Thunder Fire Games keeps their Massively Multiplayer Online Game (MMO) “Justice” looking new years after release.

Delivering an endless stream of content to players in a live service game is an enormous undertaking. Managing that responsibility while staying graphically competitive is a herculean feat.

The fidelity bar is constantly being raised. Most games released in 2018 didn’t support real-time ray-tracing. Now, it’s a feature that players expect in cutting-edge games, and it’s been integrated into a wide range of titles. Justice – NetEase’s popular Chinese MMO – runs on an engine that debuted in 2012, but the game is beautiful by 2020 standards. This is thanks to talented artists, smart engine design, and the integration of real-time ray tracing and DLSS.

We talked with Haiyong Qian, NetEase Game Engine Development Researcher and Manager of NetEase Thunder Fire Games Technical Center, to see what he’s learned as the Justice team added NVIDIA ray-tracing solutions to their development pipeline.

NVIDIA: What is the development team size for Justice?

Qian: More than 300 members in the whole development team, while there are 20 members in the game engine teach team.

NVIDIA: Why did you decide to add ray traced effects into the game?

Qian: Applying ray tracing technology into the real-time rendering field, especially the gaming field, has always been the dream of our game developers, but it was impossible to achieve before due to the performance limitation. In 2018, NVIDIA launched the first RTX GPU, which paved the way for this dream to become true, and we did not hesitate to decide trying it in Justice.

NVIDIA: A lot of developers starting out with real-time ray tracing struggle with performance because they try to make everything reflective. Do you have any advice on materials to use when building an environment that will be ray traced?

Qian: There are still many optimization methods. For example, materials with high roughness in the scene do not need to participate in raytracing. In addition, if the game engine is based on the Deferred Rendering architecture, rays can be emitted in the screen space based on the GBuffer information to reduce the times of ray bouncing.

NVIDIA: How long did it take to add RTXGI to your game? What does RTXGI do to improve the look of the game?

Qian: Before integration of RTXGI, we have already completed the DX12 upgrade to our game engine and RT & DLSS integration. With these works done, adding RTXGI to the game is an easy task, which took about 2 weeks to finish. RTXGI solves some problems of traditional GI: light leaking and excessively long baking time, and it supports dynamic light sources, which greatly improves the expressiveness of the scene.

NVIDIA: What were your team’s biggest personal learnings about real-time ray tracing from working on Justice?

Qian: First of all, if there is a breakthrough in technology, there must be sufficient accumulation. Secondly, the combined team effort is very important. Without the close cooperation among our team members and the dedicated collaboration with NVIDIA China team, this could not be possible.

NVIDIA: How were you able to balance computationally expensive real-time ray tracing features with performance?

Qian: Justice is an MMO open world game, RT features are now available to several suitable scenes, which can achieve a good balance between image quality and performance. And of course, with the help of the killer app: DLSS, we will gradually open more and more RT scenes later.

NVIDIA: Did you experience any bottlenecks or challenges when incorporating real-time ray tracing into the demo? If so, how did you overcome them?

Qian: As the first in-house game engine which integrated RTX function in China, there were tons of difficulties and challenges. For more than 2 months, our entire team basically only slept 3 or 4 hours a day. There were endless tech issues left to be solved. Here I would like to take this chance to thank the NVIDIA China team for their generous help, helping us overcome difficulties one by one, and we finally made it to achieve today’s accomplishment.

NVIDIA: What has been made easier for your team with the integration of real-time ray tracing and DLSS into your pipeline?

Qian: It`s the advanced architecture of our engine. We can adopt RT and DLSS with only minor modifications to the render pipeline. Instead, DX12 API upgrade took the largest workload during the whole RTX development progress.

NVIDIA: How is real-time ray tracing and DLSS changing game development?

Qian: From the perspective of artists, it brings out a brand-new content creation pipeline and a better, richer visual quality. And from a game design perspective, RT can bring whole new gameplay elements.

NVIDIA: How has your audience responded to the new look of your game, after real-time raytracing and DLSS has been added?

Qian: Players are very excited. They are fully affirmed for the performance of RT and DLSS. You can see this from Weibo and Baidu Tieba gamer communities. There are also many feedbacks from overseas players on YouTube. Of course, after being stage RTX content on GTC China2018 and CES2019 of Jensen`s Keynote, we were so confident on China game content being accepted world widely.

NVIDIA: If you had built the game from the ground up with real-time ray tracing and DLSS in mind, what would you have done differently?

Qian: We probably consider having DX12 API support in our engine in the first place.

NVIDIA: Are you planning to release any other games with real-time ray tracing and DLSS?

Qian: Yes, there are several games under development by NetEase Thunder Fire studio which will feature RTX technologies. Please stay tuned.

NVIDIA: What real-time ray tracing effect in Justice are you most excited for your players to see?

Qian: All RT features can bring out more realistic representation of the game world. The most exciting one among them must be ray tracing reflections.

NVIDIA: What advice would you give to other developers who are building live service games, and want to keep their games looking competitive graphically?

Qian: Our strategy is to provide players with the best experience regardless it’s still in development or has been released to the public. Therefore, as long as it is a technology that can enhance the player’s gaming experience, we will go all out to implement it in game. And of course, first of all, you should have a good engine with relatively good extensibility, because none of us knows what an advanced technology will look like in the future.

NVIDIA: Can you talk about any future plan you want to incorporate NVIDIA technology into Justice (such as NVIDIA Real-Time Denoiser, RTX Direct Illumination, etc.)?

Qian: In terms of technology, we have always been radical, and we will constantly push various new tech features, including those you mentioned. As long as these technologies can improve our players’ experience, we`ll work on it.

NVIDIA: What made you decide on integrating the latest ray tracing technology which no games have tried before, e.g. releasing the first real-time RTX demo in China and the first RTXGI powered game in the world?

Qian: I will summarize here, mainly two points: 1) It fulfills the dream that real time ray tracing that can be applied to the game field; 2) We think these technologies can bring our players a better game experience.

NVIDIA: Adding ray tracing effects into an in-house game engine could be more challenging than using existing commercial engines. If so, what are the challenges and what are the strengths of Justice’s engine?

Qian:I think the difficulties are the same, but the difficulty of using commercial engines has been solved by others. For in-house engine we must overcome these difficulties by ourselves. As I said before, in this matter, the biggest challenge was to upgrade the engine to DX12, because when we designed this engine 8 years ago, DX12 had not yet been released, and its features were unforeseen at the time. Another big challenge is how to balance between RT effects and performance. Fortunately, our team has very rich experience in independent research, and our engine architecture also has full freedom of horizontal and vertical expansion capabilities. NVIDIA China content team also gave us very strong support. Eventually, these tasks were successfully accomplished.

Misc

Deep regression

Have people been using deep learning to do regression? I noticed
that fitting polynomials using least squares leads to much better
accuracy! Is there any rule of thumb to get arbitrary accuracy with
deep regression?

submitted by /u/matibilkis

[visit reddit]
[comments]

Misc

tutorials out there on object detection and counting using keras in R?

Post author By
Post date December 16, 2020
No Comments on tutorials out there on object detection and counting using keras in R?

Hello All,

I am new to TensorFlow and I have a problem wherein I need to
count boats on a lake using Keras. I have seen this done in two
separate papers now one counting
whales and another
counting ships in the ocean. however, both are using python.
While I am not apposed to learning another language, I was curious
if there are any tutorials out there about using Keras to count
objects coding in R. does anyone know of anything like this that I
could read over? atm I am stuck with either trying to muddle my way
through building a CNN from scratch without any guidance or
learning a new language, neither of which is something I am
particularly excited about tackling.

any help would be greatly appreciated.

submitted by /u/mthompson2100

[visit reddit]
[comments]

Misc

CUDA 11.2 Introduces Improved User Experience and Application Performance

Post author By
Post date December 16, 2020
No Comments on CUDA 11.2 Introduces Improved User Experience and Application Performance

CUDA 11.2 includes improved user experience and application performance through a combination of driver/toolkit compatibility enhancements, new memory suballocator feature, and compiler upgrades.

CUDA Toolkit is a complete, fully-featured software development platform for building GPU-accelerated applications, providing all the components needed to develop apps targeting every NVIDIA GPU platform.

CUDA 11 announced support for the new NVIDIA A100 based on the NVIDIA Ampere architecture, and CUDA 11.1 delivered support for NVIDIA GeForce RTX 30 Series and Quadro RTX Series GPU platforms.

Today, CUDA 11.2 is introducing improved user experience and application performance through a combination of driver/toolkit compatibility enhancements, new memory suballocator feature, and compiler enhancements including an LLVM upgrade.

This new 11.2 release also delivers programming model updates to CUDA Graphs and Cooperative Groups, as well as expanding support for latest generation operating systems and compilers.

We describe some of these innovative feature introductions with more detail in a new blog Enhancing Memory Allocation with New NVIDIA CUDA 11.2 Features, and we will publish additional blogs on compiler enhancements shortly. Follow all CUDA Developer Blogs here.

Download CUDA 11.2 Toolkit today.

View All CUDA DevBlogs.

Watch [GTC Fall Session] CUDA New Features and Beyond: Ampere Programming for Developers

Misc

Nsight Compute 2020.3 Simplifies CUDA Kernel Profiling and Optimization

Post author By
Post date December 16, 2020
No Comments on Nsight Compute 2020.3 Simplifies CUDA Kernel Profiling and Optimization

The 2020.3 release of NVIDIA Nsight Compute included in CUDA Toolkit 11.2 introduces several new features that simplify the process of CUDA kernel profiling and optimization.

Profile Series

The new Profile Series feature allows developers to configure ranges for multiple kernel parameters. Nsight Compute will automatically iterate through the ranges and profile each combination to help you find the best configuration. These parameters include the number of registers per thread, shared memory sizes, and the shared memory configuration. This automates a process that previously would need manual support, and can provide optimized performance configurations with minimal changes to source code.

The Profile Series configuration is available in the UI’s Interactive Profiling activity.

Import Source

This highly requested feature enables users to archive source files within their Nsight Compute results. It allows any user with access to the results to resolve performance data to lines in the source code, even if they don’t have access to the original source files. Sharing results with teammates and archiving them for future analysis are just a couple of uses for this new feature. Users can import source files with the (–import-source) command-line option or via the UI when configuring the profile.

Source Files can also be imported later via the Profile Menu.

Additionally, there are several other new capabilities available in this release. These include Memory Allocation Tracking, support for derived metrics, and additional configurations and advice for the recently released Application Replay feature.

For complete details, check out the Nsight Compute Release Notes.

Download Nsight Compute 2020.3 and check out featured spotlight video demonstrations on Roofline Analysis and Application Replay!

View all Nsight DevBlogs.

Misc

libcu++ Open-Source GPU-enable C++ Standard Library Updated

Post author By
Post date December 16, 2020
No Comments on libcu++ Open-Source GPU-enable C++ Standard Library Updated

libcu++, the NVIDIA C++ Standard Library, provides a C++ Standard Library for your entire system which can be used in and between CPU and GPU codes.

libcu++, the NVIDIA C++ Standard Library, provides a C++ Standard Library for your entire system which can be used in and between CPU and GPU codes. The NVIDIA C++ Standard Library is an open source project.

Version 1.4.0 of libcu++ is a major release providing several feature enhancements and bug fixes. It adds support for the following: , NVCC + MSVC support for , and backports of C++20 and C++17 features to C++14.

Other enhancements include improved and reorganized documentation, atomics decoupled from host Standard Library in MSVC, and revamped examples and benchmarks.

Additional information, examples and documentation can be found below.

libcu++ is available on GitHub and is included in the NVIDIA HPC SDK and the CUDA Toolkit.

Learn more:

Misc

torch 0.2.0 – Initial JIT support and many bug fixes

Post author By
Post date December 16, 2020
No Comments on torch 0.2.0 – Initial JIT support and many bug fixes

We are happy to announce that the version 0.2.0 of torch just
landed on CRAN.

This release includes many bug fixes and some nice new features
that we will present in this blog post. You can see the full
changelog in the NEWS.md
file.

The features that we will discuss in detail are:

Initial support for JIT tracing
Multi-worker dataloaders
Print methods for nn_modules

Multi-worker dataloaders

dataloaders now respond to the num_workers argument and will run
the pre-processing in parallel workers.

For example, say we have the following dummy dataset that does a
long computation:

library(torch) dat <- dataset( "mydataset", initialize = function(time, len = 10) { self$time <- time self$len <- len }, .getitem = function(i) { Sys.sleep(self$time) torch_randn(1) }, .length = function() { self$len } ) ds <- dat(1) system.time(ds[1])

 user system elapsed 0.029 0.005 1.027

We will now create two dataloaders, one that executes
sequentially and another executing in parallel.

seq_dl <- dataloader(ds, batch_size = 5) par_dl <- dataloader(ds, batch_size = 5, num_workers = 2)

We can now compare the time it takes to process two batches
sequentially to the time it takes in parallel:

seq_it <- dataloader_make_iter(seq_dl) par_it <- dataloader_make_iter(par_dl) two_batches <- function(it) { dataloader_next(it) dataloader_next(it) "ok" } system.time(two_batches(seq_it)) system.time(two_batches(par_it))

 user system elapsed 0.098 0.032 10.086 user system elapsed 0.065 0.008 5.134

Note that it is batches that are obtained in parallel, not
individual observations. Like that, we will be able to support
datasets with variable batch sizes in the future.

Using multiple workers is not necessarily
faster than serial execution because there’s a considerable
overhead when passing tensors from a worker to the main session as
well as when initializing the workers.

This feature is enabled by the powerful callr package and works in all
operating systems supported by torch. callr let’s us create
persistent R sessions, and thus, we only pay once the overhead of
transferring potentially large dataset objects to workers.

In the process of implementing this feature we have made
dataloaders behave like coro iterators. This means that
you can now use coro’s syntax for looping
through the dataloaders:

coro::loop(for(batch in par_dl) { print(batch$shape) })

[1] 5 1 [1] 5 1

This is the first torch release including the multi-worker
dataloaders feature, and you might run into edge cases when using
it. Do let us know if you find any problems.

Initial JIT support

Programs that make use of the torch package are inevitably R
programs and thus, they always need an R installation in order to
execute.

As of version 0.2.0, torch allows users to JIT trace torch R
functions into TorchScript. JIT (Just in time) tracing will invoke
an R function with example inputs, record all operations that
occured when the function was run and return a script_function
object containing the TorchScript representation.

The nice thing about this is that TorchScript programs are
easily serializable, optimizable, and they can be loaded by another
program written in PyTorch or LibTorch without requiring any R
dependency.

Suppose you have the following R function that takes a tensor,
and does a matrix multiplication with a fixed weight matrix and
then adds a bias term:

w <- torch_randn(10, 1) b <- torch_randn(1) fn <- function(x) { a <- torch_mm(x, w) a + b }

This function can be JIT-traced into TorchScript with jit_trace
by passing the function and example inputs:

x <- torch_ones(2, 10) tr_fn <- jit_trace(fn, x) tr_fn(x)

torch_tensor -0.6880 -0.6880 [ CPUFloatType{2,1} ]

Now all torch operations that happened when computing the result
of this function were traced and transformed into a graph:

tr_fn$graph

graph(%0 : Float(2:10, 10:1, requires_grad=0, device=cpu)): %1 : Float(10:1, 1:1, requires_grad=0, device=cpu) = prim::Constant[value=-0.3532 0.6490 -0.9255 0.9452 -1.2844 0.3011 0.4590 -0.2026 -1.2983 1.5800 [ CPUFloatType{10,1} ]]() %2 : Float(2:1, 1:1, requires_grad=0, device=cpu) = aten::mm(%0, %1) %3 : Float(1:1, requires_grad=0, device=cpu) = prim::Constant[value={-0.558343}]() %4 : int = prim::Constant[value=1]() %5 : Float(2:1, 1:1, requires_grad=0, device=cpu) = aten::add(%2, %3, %4) return (%5)

The traced function can be serialized with jit_save:

jit_save(tr_fn, "linear.pt")

It can be reloaded in R with jit_load, but it can also be
reloaded in Python with torch.jit.load:

import torch fn = torch.jit.load("linear.pt") fn(torch.ones(2, 10))

tensor([[-0.6880], [-0.6880]])

How cool is that?!

This is just the initial support for JIT in R. We will continue
developing this. Specifically, in the next version of torch we plan
to support tracing nn_modules directly. Currently, you need to
detach all parameters before tracing them; see an example
here. This will allow you also to take benefit of TorchScript
to make your models run faster!

Also note that tracing has some limitations, especially when
your code has loops or control flow statements that depend on
tensor data. See ?jit_trace to learn more.

New print method for nn_modules

In this release we have also improved the nn_module printing
methods in order to make it easier to understand what’s
inside.

For example, if you create an instance of an nn_linear module
you will see:

nn_linear(10, 1)

An `nn_module` containing 11 parameters. ── Parameters ────────────────────────────────────────────────────────────────── ● weight: Float [1:1, 1:10] ● bias: Float [1:1]

You immediately see the total number of parameters in the module
as well as their names and shapes.

This also works for custom modules (possibly including
sub-modules). For example:

my_module <- nn_module( initialize = function() { self$linear <- nn_linear(10, 1) self$param <- nn_parameter(torch_randn(5,1)) self$buff <- nn_buffer(torch_randn(5)) } ) my_module()

An `nn_module` containing 16 parameters. ── Modules ───────────────────────────────────────────────────────────────────── ● linear: <nn_linear> #11 parameters ── Parameters ────────────────────────────────────────────────────────────────── ● param: Float [1:5, 1:1] ── Buffers ───────────────────────────────────────────────────────────────────── ● buff: Float [1:5]

We hope this makes it easier to understand nn_module objects. We
have also improved autocomplete support for nn_modules and we will
now show all sub-modules, parameters and buffers while you
type.

torchaudio

torchaudio
is an extension for torch developed by Athos Damiani (@athospd), providing audio
loading, transformations, common architectures for signal
processing, pre-trained weights and access to commonly used
datasets. An almost literal translation from PyTorch’s Torchaudio
library to R.

torchaudio is not yet on CRAN, but you can already try the
development version available here.

You can also visit the pkgdown website for examples
and reference documentation.

Other features and bug fixes

Thanks to community contributions we have found and fixed many
bugs in torch. We have also added new features including:

element_size and bool Tensor methods by @dirkschumacher
checking the MD5 hashes of downloaded LibTorch binaries by
@dirkschumacher
initial development for the Distributions module by @krzjoa
the nn_batch_norm3d module implemented by @mattwarkentin
a Dockerfile with GPU support as well as an
installation guide by @y-vectorfield

You can see the full list of changes in the NEWS.md
file.

Thanks very much for reading this blog post, and feel free to
reach out on GitHub for help or discussions!

The photo used in this post preview is by
Oleg Illarionov on
Unsplash

Misc

torch 0.2.0 – Initial JIT support and many bug fixes

Post author By
Post date December 16, 2020
No Comments on torch 0.2.0 – Initial JIT support and many bug fixes

We are happy to announce that the version 0.2.0 of torch just
landed on CRAN.

This release includes many bug fixes and some nice new features
that we will present in this blog post. You can see the full
changelog in the NEWS.md
file.

The features that we will discuss in detail are:

Initial support for JIT tracing
Multi-worker dataloaders
Print methods for nn_modules

Multi-worker dataloaders

dataloaders now respond to the num_workers argument and will run
the pre-processing in parallel workers.

For example, say we have the following dummy dataset that does a
long computation:

library(torch) dat <- dataset( "mydataset", initialize = function(time, len = 10) { self$time <- time self$len <- len }, .getitem = function(i) { Sys.sleep(self$time) torch_randn(1) }, .length = function() { self$len } ) ds <- dat(1) system.time(ds[1])

 user system elapsed 0.029 0.005 1.027

We will now create two dataloaders, one that executes
sequentially and another executing in parallel.

seq_dl <- dataloader(ds, batch_size = 5) par_dl <- dataloader(ds, batch_size = 5, num_workers = 2)

We can now compare the time it takes to process two batches
sequentially to the time it takes in parallel:

seq_it <- dataloader_make_iter(seq_dl) par_it <- dataloader_make_iter(par_dl) two_batches <- function(it) { dataloader_next(it) dataloader_next(it) "ok" } system.time(two_batches(seq_it)) system.time(two_batches(par_it))

 user system elapsed 0.098 0.032 10.086 user system elapsed 0.065 0.008 5.134

Note that it is batches that are obtained in parallel, not
individual observations. Like that, we will be able to support
datasets with variable batch sizes in the future.

In the process of implementing this feature we have made
dataloaders behave like coro iterators. This means that
you can now use coro’s syntax for looping
through the dataloaders:

coro::loop(for(batch in par_dl) { print(batch$shape) })

[1] 5 1 [1] 5 1

This is the first torch release including the multi-worker
dataloaders feature, and you might run into edge cases when using
it. Do let us know if you find any problems.

Initial JIT support

Programs that make use of the torch package are inevitably R
programs and thus, they always need an R installation in order to
execute.

Suppose you have the following R function that takes a tensor,
and does a matrix multiplication with a fixed weight matrix and
then adds a bias term:

w <- torch_randn(10, 1) b <- torch_randn(1) fn <- function(x) { a <- torch_mm(x, w) a + b }

This function can be JIT-traced into TorchScript with jit_trace
by passing the function and example inputs:

x <- torch_ones(2, 10) tr_fn <- jit_trace(fn, x) tr_fn(x)

torch_tensor -0.6880 -0.6880 [ CPUFloatType{2,1} ]

Now all torch operations that happened when computing the result
of this function were traced and transformed into a graph:

tr_fn$graph

graph(%0 : Float(2:10, 10:1, requires_grad=0, device=cpu)): %1 : Float(10:1, 1:1, requires_grad=0, device=cpu) = prim::Constant[value=-0.3532 0.6490 -0.9255 0.9452 -1.2844 0.3011 0.4590 -0.2026 -1.2983 1.5800 [ CPUFloatType{10,1} ]]() %2 : Float(2:1, 1:1, requires_grad=0, device=cpu) = aten::mm(%0, %1) %3 : Float(1:1, requires_grad=0, device=cpu) = prim::Constant[value={-0.558343}]() %4 : int = prim::Constant[value=1]() %5 : Float(2:1, 1:1, requires_grad=0, device=cpu) = aten::add(%2, %3, %4) return (%5)

The traced function can be serialized with jit_save:

jit_save(tr_fn, "linear.pt")

It can be reloaded in R with jit_load, but it can also be
reloaded in Python with torch.jit.load:

import torch fn = torch.jit.load("linear.pt") fn(torch.ones(2, 10))

tensor([[-0.6880], [-0.6880]])

How cool is that?!

Also note that tracing has some limitations, especially when
your code has loops or control flow statements that depend on
tensor data. See ?jit_trace to learn more.

New print method for nn_modules

In this release we have also improved the nn_module printing
methods in order to make it easier to understand what’s
inside.

For example, if you create an instance of an nn_linear module
you will see:

nn_linear(10, 1)

An `nn_module` containing 11 parameters. ── Parameters ────────────────────────────────────────────────────────────────── ● weight: Float [1:1, 1:10] ● bias: Float [1:1]

You immediately see the total number of parameters in the module
as well as their names and shapes.

This also works for custom modules (possibly including
sub-modules). For example:

my_module <- nn_module( initialize = function() { self$linear <- nn_linear(10, 1) self$param <- nn_parameter(torch_randn(5,1)) self$buff <- nn_buffer(torch_randn(5)) } ) my_module()

An `nn_module` containing 16 parameters. ── Modules ───────────────────────────────────────────────────────────────────── ● linear: <nn_linear> #11 parameters ── Parameters ────────────────────────────────────────────────────────────────── ● param: Float [1:5, 1:1] ── Buffers ───────────────────────────────────────────────────────────────────── ● buff: Float [1:5]

We hope this makes it easier to understand nn_module objects. We
have also improved autocomplete support for nn_modules and we will
now show all sub-modules, parameters and buffers while you
type.

torchaudio

torchaudio is not yet on CRAN, but you can already try the
development version available here.

You can also visit the pkgdown website for examples
and reference documentation.

Other features and bug fixes

Thanks to community contributions we have found and fixed many
bugs in torch. We have also added new features including:

element_size and bool Tensor methods by @dirkschumacher
checking the MD5 hashes of downloaded LibTorch binaries by
@dirkschumacher
initial development for the Distributions module by @krzjoa
the nn_batch_norm3d module implemented by @mattwarkentin
a Dockerfile with GPU support as well as an
installation guide by @y-vectorfield

You can see the full list of changes in the NEWS.md
file.

Thanks very much for reading this blog post, and feel free to
reach out on GitHub for help or discussions!

The photo used in this post preview is by
Oleg Illarionov on
Unsplash