Month: January 2022

Misc

Learn to Build Real-Time Video AI Applications

Post author By
Post date January 25, 2022
No Comments on Learn to Build Real-Time Video AI Applications

Men walking through an office setting with hotspots to shows IVA. Learn the skills to transform raw video data from cameras into deep learning-based insights in real time

Video analytics rely on computerized processing and automatic analysis of video content to detect and determine temporal and spatial events. The field is anticipated to experience double-digit growth for the next decade, as videos are quickly becoming a primary media form for transferring information.

As the amount of video data generated grows at unprecedented rates, so does the ability and desire to analyze this information. Intelligent video analytics (IVA), which uses computer vision to extract valuable information from unstructured video data, is at the cutting edge of this emerging field.

The computer vision revolution

Computer vision, which uses deep learning models to help machines understand visual data, has improved drastically over the past few years thanks to HPC and neural networks. It transforms pixels to usable data through a range of tasks such as image classification, object detection, and segmentation.

Some of its use cases include behavior analysis, enhanced safety measures, operations management, optical inspection, and content filtering. It has also aided new industries such as autonomous vehicles, smart retail, smart cities, and smart healthcare. Recognizing the potential IVA holds, organizations are eager to develop applications that harness this technology.

Developing video AI applications

NVIDIA, through the DeepStream SDK and the TAO Toolkit, makes creating highly-performant video AI solutions easy and intuitive. The DeepStream SDK is a streaming analytics toolkit for constructing video processing pipelines. It provides the flexibility to select from various input formats, AI-based inference types, and outputs. Users also determine what to do with the results such as cold storage, composite for display, or further analysis downstream.

On the other hand, the TAO Toolkit uses transfer learning to efficiently train vision models. The software was designed with an emphasis on acceleration and optimization for video AI applications known to be computationally intensive. It can be deployed on low-power IoT devices for real-time analytics.

A new course to you get started

To help get you started, the NVIDIA Deep Learning Institute is offering a self-paced course titled Building Real-Time Video AI Applications, which covers the entire process of developing IVA applications.

This course provides an easy progression of foundational understanding, important concepts, terminologies, as well as a lab portion. The hands-on walk-through of the technical components provides an opportunity to build complete video AI applications.

It’s complemented by thorough explanations in each step of the development cycle to help you confidently make implementation decisions for your own project. The course also highlights important performance considerations to optimize the video AI application and meet deployment requirements.

Upon completion, you can earn a certificate of competency and begin to develop custom applications. Intelligent video analytics is an exciting area of AI with great opportunities.

Start learning now. >>

Misc

Animator Lets 3D Characters Get Their Groove on With NVIDIA Omniverse and Reallusion

Post author By
Post date January 25, 2022
No Comments on Animator Lets 3D Characters Get Their Groove on With NVIDIA Omniverse and Reallusion

Benjamin Sokomba Dazhi, aka Benny Dee, has learned the ins and outs of the entertainment industry from many angles — first as a rapper, then as a music video director and now as a full-time animator.

The post Animator Lets 3D Characters Get Their Groove on With NVIDIA Omniverse and Reallusion appeared first on The Official NVIDIA Blog.

Misc

NVIDIA Nsight Graphics 2022.1 Supports Latest Vulkan Ray Tracing Extension

Post author By
Post date January 25, 2022
No Comments on NVIDIA Nsight Graphics 2022.1 Supports Latest Vulkan Ray Tracing Extension

CUDA-X logo graphic The latest Nsight Graphics 2022.1 release supports Direct3D (11, 12, DXR), Vulkan 1.3, ray tracing extension, OpenGL, OpenVR, and the Oculus SDK.

Today, NVIDIA announced the latest Nsight Graphics 32022.1, which supports Direct3D (11, 12, DXR), Vulkan 1.3 ray tracing extension, OpenGL, OpenVR, and the Oculus SDK.

NVIDIA Nsight Graphics is a standalone developer tool that enables you to debug, profile, and export frames built with high-fidelity, 3D-graphic applications.

Download NVIDIA Nsight Graphics now.

Key highlights include

Support for the Vulkan 1.3 API.
Event Details view can now link to a new Object Browser view. The Object Browser view shows objects alongside their properties, usages, and related objects.
Linux Gaming improvements.
Vertex Selection for memory analysis.
Nsight Aftermath (crash debugging) improvements.

Vulkan 1.3 support

Nsight Graphics now has day one support for Vulkan 1.3. This version includes many new extensions to support developer productivity. If you’d like to learn more about this new version of Vulkan, check out the blog post here.

Object Browser view

The release adds the ability for you to link from the Event Details view to a new Object Browser view. The Object Browser view shows objects alongside their properties, usages, and related objects. From the Object Browser, you may also navigate to other purposed views, like resource viewers. This helps you save time by letting you quickly move from different but related viewers.

Linux gaming improvements

For Nsight Graphics users who are developing on Linux, you now have support for Ubuntu 21.04 and for Linux games, thanks to major improvements with the Linux Steam Runtime. This should make it easier to develop games that run on Linux by ensuring that the latest OS versions work correctly with Nsight Graphics.

Vertex selection

With this version of Nsight Graphics, you can now select a triangle in the Graphical tab of the Geometry view and see the corresponding vertex data from the Memory tab in the Geometry view. This makes it much easier to find geometry issues as you don’t need to know the specific vertex indices before using the viewer.

Nsight Graphics new Geometry View — *Figure 2. A selected triangle (left) and the data for vertices 81, 82 and 83 (right.)*

Table of data. — *Figure 2. A selected triangle (left) and the data for vertices 81, 82 and 83 (right.)*

Nsight Aftermath improvements

Nsight Aftermath now shows additional information when applications crash in driver-generated shaders, like those used for ray tracing. This will assist you in narrowing down the root cause of GPU instability and allow you to provide information to NVIDIA to help fix the problem.

Resources

Learn More and download NVIDIA Nsight Graphics.
Read the doocumentation available to get started on the tool.
Post your comments and questions on Forums.

Misc

Vulkan Fan? Six Reasons to Run It on NVIDIA

Post author By
Post date January 25, 2022
No Comments on Vulkan Fan? Six Reasons to Run It on NVIDIA

Many different platforms, same great performance. That’s why Vulkan is a very big deal. With the release Tuesday of Vulkan 1.3, NVIDIA continues its unparalleled record of day one driver support for this cross-platform GPU application programming interface for 3D graphics and computing. Vulkan has been created by experts from across the industry working together Read article >

The post Vulkan Fan? Six Reasons to Run It on NVIDIA appeared first on The Official NVIDIA Blog.

Misc

Strange behavior of "min_delta" in EarlyStopping callback while tuning

Post author By
Post date January 25, 2022
No Comments on Strange behavior of "min_delta" in EarlyStopping callback while tuning

I’m using keras-tuner in order to do an hyperparameter optimization of a neural network.

I’m using an Hyperband optimization, and I call the search method as:

tuner.search(training_gen(), epochs=50, validation_data=valid_gen(), callbacks=[stop_early], steps_per_epoch=np.round(int(num_samples / batch_size), decimals=0), validation_freq=1, validation_steps=100)

where the EarlyStopping callback is defined as:

stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0.1, mode='min', patience=15)

The Keras Hyperband algorithm seems to work well: it simulates the various models, and when it reaches the Early Stopping condition on a model, the training of that model stops and the training of a new model starts.
What I noticed is that with this EarlyStopping callback implementation, when EarlyStopping stops the training of a model, before starting the training of the next model, the following two consecutive Python errors occur (anyway, the simulation doesn’t exit or generate exceptions, but it goes on, by simulating the successive model):

W tensorflow/core/framework/op_kernel.cc:1755] Invalid argument: ValueError: callback pyfunc_2 is not found Traceback (most recent call last): File "/home/username/anaconda3/envs/myenv/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 233, in __call__ raise ValueError("callback %s is not found" % token) ValueError: callback pyfunc_2 is not found W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Invalid argument: ValueError: callback pyfunc_2 is not found Traceback (most recent call last): File "/home/username/anaconda3/envs/myenv/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 233, in __call__ raise ValueError("callback %s is not found" % token) ValueError: callback pyfunc_2 is not found [[{{node PyFunc}}]]

While if I don’t use the “min_delta” argument, these two errors don’t appear.
I also noted, looking for examples on the Internet, that, the “min_delta” argument of the EarlyStopping callback is never set – so it is always left at its default value – when tuning.
Do you know why?

PS:
I have another question:I noticed that if I set the Hyperband “max_epochs”, for example, equal to 100, the training of the model is performed by steps:
firstly, from epoch 1 to epoch 3; secondly, from 4 to 7; then, from 8 to 13; then, from 14 to 25; then, from 26 to 50; and finally, from 51 to 100.
If I set “patience=15”, I noticed that the EarlyStopping callback stops the training right after epoch 66 (thus, the first epoch at which EarlyStopping is able to operate, because 51+15=66); could it be a coincidence, or maybe should it be the normal behavior when tuning with Keras Hyperband, or what?

submitted by /u/RainbowRedditForum
[visit reddit] [comments]

Misc

Vulkan 1.3 Broadens Cross-Platform Functionality with Developer-Requested Features

Post author By
Post date January 25, 2022
No Comments on Vulkan 1.3 Broadens Cross-Platform Functionality with Developer-Requested Features

The latest release of Vulkan 1.3 simplifies render passes and makes synchronization easier.

A total of 23 of the most often requested Vulkan extensions developed by NVIDIA and other Khronos members are now incorporated into the brand new Vulkan 1.3 core specification. NVIDIA is ready with zero-day drivers for developers to immediately try out this significant new version of the industry’s only modern, cross-platform GPU API on their own systems.

Some of the most significant new core functionality in Vulkan 1.3 includes:

Dynamic rendering for simplified API use without subpasses.
Dynamic state to reduce the number of pipeline objects needed to avoid hitching.
Streamlined management of shader pipeline compilations.

Screenshot of Doom Eternal running an NVIDIA RTX GPU. — *Figure 1. Doom Eternal running at 200 FPS with every single game setting on a GeForce 2080ti.*

Nsight tools support

To help developers upgrade to Vulkan 1.3 with ease, developer tools have been upgraded to support the new functionality. This gives Vulkan developers the ability to jump into the new standard quickly and have the right tools to investigate and optimize, saving you time and frustration.

Nsight Graphics is a powerful debugger and profiler that helps you identify API issues quickly using the events view and API inspector. You can inspect Vulkan ray tracing acceleration structures, as well as look at and edit shaders in real time. The advanced shader profiler helps identify where the GPU is not executing shader instructions with full parallelism, so you can make modifications to your shaders for improved performance.

With the GPU Trace next-generation profiler you can view frames on a timeline with low-level GPU performance metrics. These metrics can help you fine-tune your Vulkan application and take full advantage of all GPU resources.

Nsight Systems is an application performance analysis tool designed to track GPU workloads to their CPU origins, uncovering bottlenecks. A system-wide view helps you analyze GPU workloads, GPU performance metrics, graphics APIs, compute APIs, frame stutter, and correlate them with each other.

“Vulkan is the cornerstone of Adobe’s multi-platform, multi-vendor rendering strategy for its Adobe Substance 3D products. Thanks to the ray tracing extensions that NVIDIA pioneered and contributed to Khronos, Vulkan gives native access to ray tracing hardware, offering exceptional ray tracing performance on supported devices. In addition, Nsight Graphics and Nsight Systems are invaluable tools when it comes to understanding and improving the performance of Vulkan ray tracing applications.” – Francois Beaune, Lead Software Engineer, Photorealistic Rendering at Adobe 3D & Immersive

Nsight Systems is a great place to start as you can verify if you are CPU or GPU limited. Its integration with Nsight Graphics makes for a seamless experience switching between the two as you performance tune the application.

These tools give you the power to harness the NVIDIA GPUs to their maximum potential and deliver high frame rates in games and other intensive Vulkan applications.

Screenshot of NVIDIA Nsight Graphics workflow. — *Figure 2. Correlating Vulkan API calls to WDDM queue packets using NVIDIA Nsight Systems.*

Vulkan support for NVIDIA RTX SDKs and DLSS

Vulkan developers can maximize the performance of real-time ray tracing in their applications with support from NVIDIA RTX SDKs. With NVIDIA RTX Direct Illumination developers can add millions of dynamic lights to game environments without worrying about performance or resource constraints. NVIDIA RTX Global Illumination provides scalable solutions to compute multibounce indirect lighting. NVIDIA Real-Time Denoiser is a spatio-agnostic, temporal, API agnostic, denoising library that’s designed to work with low ray-per-pixel images, and NVIDIA RTX Memory Utility reduces memory consumption of acceleration structures.

“Vulkan has empowered us to deliver bleeding-edge performance on our recent DOOM games running idTech. DOOM and DOOM Eternal showcase how Vulkan may be leveraged to achieve state-of-the-art visuals and gameplay at extremely high frame rates across a wide variety of platforms. The flexibility of the Vulkan API allows us to collaborate closely with our hardware partners to meet the creative vision of our games. This past year, we brought NVIDIA DLSS and Ray Tracing to DOOM Eternal, made possible by extensions developed by NVIDIA.” – Billy Khan, Director of Engine Technology at id Software

Every Vulkan developer can access DLSS upscaling technology on Windows and Linux. NVIDIA also added DLSS support for Vulkan API games on Proton and has DLSS support for both x86 and ARM-based platforms. With NVIDIA DLSS support for Vulkan, Linux gamers can use the dedicated Tensor Cores in their GeForce RTX GPUs to accelerate frame rates in DOOM Eternal, No Man’s Sky, and Wolfenstein: Youngblood.

Vulkan ray tracing debugging workflow. — *Figure 3. Vulkan ray tracing debugging is made easy with NVIDIA Nsight Graphics.*

Supporting new Vulkan functionality

NVIDIA ships Vulkan across a breadth of products and is deeply engaged in driving Vulkan’s evolution. In addition to supporting the Khronos Group as President, NVIDIA has chair positions in the Vulkan ray tracing, machine learning, and portability subgroups.

NVIDIA is often the first to spearhead the development of new Vulkan functionality. This includes the “VKRay” vendor extension, the only current implementation of Vulkan Mesh Shaders. Along with the first implementation of the new Vulkan Video extensions and the NVIDIA Cooperative Matrix Vulkan extension, which exposes Tensor Cores for inferencing acceleration.

Learn more about Vulkan 1.3. >>

Misc

Weights aren’t loading in Tensorflow.js for Node???

Post author By
Post date January 25, 2022
No Comments on Weights aren’t loading in Tensorflow.js for Node???

When I load a trained model that I have previously saved, the model topology is being loaded, but none of the weights are loaded (so I have to train the model from scratch). I am very confused by this, and can find noone else who has had the same problem (here, stackoverflow, …).

I would really apreciate some help, if anyone has any idea what is going on.

I am saving my model as follows:

model.save('file://' + path);

To load the model, I am using:

model = tf.loadLayersModel('file://' + path + '/model.json');

submitted by /u/will0w1sp
[visit reddit] [comments]

Misc

Python error while doing tf.keras hyperparameter optimization

Post author By
Post date January 25, 2022
No Comments on Python error while doing tf.keras hyperparameter optimization

I’m using keras-tuner in order to do an hyperparameter optimization of a neural network.

I’m using an Hyperband optimization, implemented by this Python function:

import keras_tuner as kt def tuning_function(self): objective = Objective('val_loss', direction="min") tuner = kt.Hyperband(ann_model, objective=objective, max_epochs=100, factor=2, directory=/path/to/folder, project_name="name", seed=0) stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0.1, mode='min', patience=15) tensorboard = TensorBoard(log_dir=log_dir) tuner.search(training_gen(), epochs=50, validation_data=valid_gen(), callbacks=[stop_early, tensorboard], steps_per_epoch=np.round(int(total_num_samples / batch_size), decimals=0), validation_freq=1, validation_steps=100)

where ann_model is the compiled model of the Artificial Neural Network under test, while both training_gen() and valid_gen() are Python generators.

As it can be seen, Early Stopping and Tensorbord are the callbacks passed to tuner.search().

The Keras Hyperband algorithm works well: it simulates the various models, and when it reaches the Early Stopping condition on a model, the training of that model stops and the training of a new model starts; the point is that I noticed that between these two events (stop and start), the following two consecutive Python errors occur (anyway, the simulation doesn’t exit or generate an Exception, but it goes on, by simulating the successive model):

W tensorflow/core/framework/op_kernel.cc:1755] Invalid argument: ValueError: callback pyfunc_2 is not found Traceback (most recent call last): File "/home/username/anaconda3/envs/myenv/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 233, in __call__ raise ValueError("callback %s is not found" % token) ValueError: callback pyfunc_2 is not found W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Invalid argument: ValueError: callback pyfunc_2 is not found Traceback (most recent call last): File "/home/username/anaconda3/envs/myenv/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 233, in __call__ raise ValueError("callback %s is not found" % token) ValueError: callback pyfunc_2 is not found [[{{node PyFunc}}]]

The first error seems to refer to a “callback”, while the second one refers to a “callback” and a “generator”.

Which could be the cause of these errors? What could be this callback “not found”?

submitted by /u/RainbowRedditForum
[visit reddit] [comments]

Misc

Allocator ran out of memory

Post author By
Post date January 24, 2022
No Comments on Allocator ran out of memory

I am using the Object detection API, i did everything in the EXACT way the procedure is described on this site: tensorflow-object-detection-api

I am using this model: SSD ResNet50 V1 FPN 640×640 (RetinaNet50) from the Model Zoo

I am running my training on a 1070 Ti with about 8Gb of VRAM and 6,5 are available. Now i am getting this error, when i use a batch size greater than 2

2022-01-24 23:28:40.444781: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory

For me this looks like it is trying to allocate only 4294967296 byte and i have 8589900000 byte available. So im only trying to allocate about 50%. nvidia-smi shows im using 7488MiB/8192MiB of VRAM during training(batchsize = 1). And 14,6 /16GB of RAM.

Obviously training with a batch size of 1 is useless, 8 is just doable it seems, but i dont understand why? Most people say a batch size of 64 should be possible with my hardware, please correct me.

submitted by /u/DieGewissePerson
[visit reddit] [comments]

Offsites

Accurate Alpha Matting for Portrait Mode Selfies on Pixel 6

Post author By
Post date January 24, 2022
No Comments on Accurate Alpha Matting for Portrait Mode Selfies on Pixel 6

Posted by Sergio Orts Escolano and Jana Ehman, Software Engineers, Google Research

Image matting is the process of extracting a precise alpha matte that separates foreground and background objects in an image. This technique has been traditionally used in the filmmaking and photography industry for image and video editing purposes, e.g., background replacement, synthetic bokeh and other visual effects. Image matting assumes that an image is a composite of foreground and background images, and hence, the intensity of each pixel is a linear combination of the foreground and the background.

In the case of traditional image segmentation, the image is segmented in a binary manner, in which a pixel either belongs to the foreground or background. This type of segmentation, however, is unable to deal with natural scenes that contain fine details, e.g., hair and fur, which require estimating a transparency value for each pixel of the foreground object.

Alpha mattes, unlike segmentation masks, are usually extremely precise, preserving strand-level hair details and accurate foreground boundaries. While recent deep learning techniques have shown their potential in image matting, many challenges remain, such as generation of accurate ground truth alpha mattes, improving generalization on in-the-wild images and performing inference on mobile devices treating high-resolution images.

With the Pixel 6, we have significantly improved the appearance of selfies taken in Portrait Mode by introducing a new approach to estimate a high-resolution and accurate alpha matte from a selfie image. When synthesizing the depth-of-field effect, the usage of the alpha matte allows us to extract a more accurate silhouette of the photographed subject and have a better foreground-background separation. This allows users with a wide variety of hairstyles to take great-looking Portrait Mode shots using the selfie camera. In this post, we describe the technology we used to achieve this improvement and discuss how we tackled the challenges mentioned above.

Portrait Mode effect on a selfie shot using a low-resolution and coarse alpha matte compared to using the new high-quality alpha matte.

Portrait Matting
In designing Portrait Matting, we trained a fully convolutional neural network consisting of a sequence of encoder-decoder blocks to progressively estimate a high-quality alpha matte. We concatenate the input RGB image together with a coarse alpha matte (generated using a low-resolution person segmenter) that is passed as an input to the network. The new Portrait Matting model uses a MobileNetV3 backbone and a shallow (i.e., having a low number of layers) decoder to first predict a refined low-resolution alpha matte that operates on a low-resolution image. Then we use a shallow encoder-decoder and a series of residual blocks to process a high-resolution image and the refined alpha matte from the previous step. The shallow encoder-decoder relies more on lower-level features than the previous MobileNetV3 backbone, focusing on high-resolution structural features to predict final transparency values for each pixel. In this way, the model is able to refine an initial foreground alpha matte and accurately extract very fine details like hair strands. The proposed neural network architecture efficiently runs on Pixel 6 using Tensorflow Lite.

The network predicts a high-quality alpha matte from a color image and an initial coarse alpha matte. We use a MobileNetV3 backbone and a shallow decoder to first predict a refined low-resolution alpha matte. Then we use a shallow encoder-decoder and a series of residual blocks to further refine the initially estimated alpha matte.

Most recent deep learning work for image matting relies on manually annotated per-pixel alpha mattes used to separate the foreground from the background that are generated with image editing tools or green screens. This process is tedious and does not scale for the generation of large datasets. Also, it often produces inaccurate alpha mattes and foreground images that are contaminated (e.g., by reflected light from the background, or “green spill”). Moreover, this does nothing to ensure that the lighting on the subject appears consistent with the lighting in the new background environment.

To address these challenges, Portrait Matting is trained using a high-quality dataset generated using a custom volumetric capture system, Light Stage. Compared with previous datasets, this is more realistic, as relighting allows the illumination of the foreground subject to match the background. Additionally, we supervise the training of the model using pseudo–ground truth alpha mattes from in-the-wild images to improve model generalization, explained below. This ground truth data generation process is one of the key components of this work.

Ground Truth Data Generation
To generate accurate ground truth data, Light Stage produces near-photorealistic models of people using a geodesic sphere outfitted with 331 custom color LED lights, an array of high-resolution cameras, and a set of custom high-resolution depth sensors. Together with Light Stage data, we compute accurate alpha mattes using time-multiplexed lights and a previously recorded “clean plate”. This technique is also known as ratio matting.

This method works by recording an image of the subject silhouetted against an illuminated background as one of the lighting conditions. In addition, we capture a clean plate of the illuminated background. The silhouetted image, divided by the clean plate image, provides a ground truth alpha matte.

Then, we extrapolate the recorded alpha mattes to all the camera viewpoints in Light Stage using a deep learning–based matting network that leverages captured clean plates as an input. This approach allows us to extend the alpha mattes computation to unconstrained backgrounds without the need for specialized time-multiplexed lighting or a clean background. This deep learning architecture was solely trained using ground truth mattes generated using the ratio matting approach.

Computed alpha mattes from all camera viewpoints at the Light Stage.

Leveraging the reflectance field for each subject and the alpha matte generated with our ground truth matte generation system, we can relight each portrait using a given HDR lighting environment. We composite these relit subjects into backgrounds corresponding to the target illumination following the alpha blending equation. The background images are then generated from the HDR panoramas by positioning a virtual camera at the center and ray-tracing into the panorama from the camera’s center of projection. We ensure that the projected view into the panorama matches its orientation as used for relighting. We use virtual cameras with different focal lengths to simulate the different fields-of-view of consumer cameras. This pipeline produces realistic composites by handling matting, relighting, and compositing in one system, which we then use to train the Portrait Matting model.

Composited images on different backgrounds (high-resolution HDR maps) using ground truth generated alpha mattes.

Training Supervision Using In-the-Wild Portraits
To bridge the gap between portraits generated using Light Stage and in-the-wild portraits, we created a pipeline to automatically annotate in-the-wild photos generating pseudo–ground truth alpha mattes. For this purpose, we leveraged the Deep Matting model proposed in Total Relighting to create an ensemble of models that computes multiple high-resolution alpha mattes from in-the-wild images. We ran this pipeline on an extensive dataset of portrait photos captured in-house using Pixel phones. Additionally, during this process we performed test-time augmentation by doing inference on input images at different scales and rotations, and finally aggregating per-pixel alpha values across all estimated alpha mattes.

Generated alpha mattes are visually evaluated with respect to the input RGB image. The alpha mattes that are perceptually correct, i.e., following the subject’s silhouette and fine details (e.g., hair), are added to the training set. During training, both datasets are sampled using different weights. Using the proposed supervision strategy exposes the model to a larger variety of scenes and human poses, improving its predictions on photos in the wild (model generalization).

Estimated pseudo–ground truth alpha mattes using an ensemble of Deep Matting models and test-time augmentation.

Portrait Mode Selfies
The Portrait Mode effect is particularly sensitive to errors around the subject boundary (see image below). For example, errors caused by the usage of a coarse alpha matte keep sharp focus on background regions near the subject boundaries or hair area. The usage of a high-quality alpha matte allows us to extract a more accurate silhouette of the photographed subject and improve foreground-background separation.

Try It Out Yourself
We have made front-facing camera Portrait Mode on the Pixel 6 better by improving alpha matte quality, resulting in fewer errors in the final rendered image and by improving the look of the blurred background around the hair region and subject boundary. Additionally, our ML model uses diverse training datasets that cover a wide variety of skin tones and hair styles. You can try this improved version of Portrait Mode by taking a selfie shot with the new Pixel 6 phones.

Portrait Mode effect on a selfie shot using a coarse alpha matte compared to using the new high quality alpha matte.

Acknowledgments
This work wouldn’t have been possible without Sergio Orts Escolano, Jana Ehmann, Sean Fanello, Christoph Rhemann, Junlan Yang, Andy Hsu, Hossam Isack, Rohit Pandey, David Aguilar, Yi Jinn, Christian Hane, Jay Busch, Cynthia Herrera, Matt Whalen, Philip Davidson, Jonathan Taylor, Peter Lincoln, Geoff Harvey, Nisha Masharani, Alexander Schiffhauer, Chloe LeGendre, Paul Debevec, Sofien Bouaziz, Adarsh Kowdle, Thabo Beeler, Chia-Kai Liang and Shahram Izadi. Special thanks to our photographers James Adamson, Christopher Farro and Cort Muller who took numerous test photographs for us.