We recently kicked off our NVIDIA Developer Program exclusive series of Connect with Experts Ask Me Anything (AMA) sessions featuring NVIDIA experts and Ray…
During the AMA, the editors offered some valuable guidance and tips on how to successfully integrate real-time rendering. Check out the top five questions and answers from the AMA:
1. Are there some rules of thumb one should follow when adding ray tracing (RT) applications like translucency, reflections, shadows, GI, or diffuse illumination to games?
Adam: There are many things to take into consideration when adding ray-traced effects to a game’s renderer. The main consideration to keep top of mind is for the ray-traced effects to work hand-in-hand with the goals of your game’s art direction. This will change what performance costs are reasonable for any given effect.
For example, if shadows are an important game mechanic (think of Splinter Cell), then a higher cost for extra-nice ray-traced shadows makes sense, but spending extra performance on RT translucency probably doesn’t make as much sense. For guidance on how to balance ray tracing and performance, we have a variety of webinars and other content that you can learn from. In fact, there’s an event coming up about RTX in Unreal Engine 5. (Note that you can access this content on demand.)
2. When sampling direct lighting, both reservoir sampling and resampled importance sampling can be useful techniques. But it seems difficult to recompute PDFs for the sake of MIS when a light has been sampled through a BSDF sample. Could you provide any insights into this problem?
Ingo: Sample importance resampling is only generating samples relative to an existing PDF (that you choose to take these samples). So it should be possible to evaluate that existing PDF to compute PDF values for other samples (in an MIS context).
3. Do ray tracing and deep learning overlap?
Eric: Yes, in many ways. Deep learning can be used to complement ray tracing, “filling in” missing information with plausible interpolated data, such as with NVIDIA Deep Learning Super Sampling (DLSS). This works today.
Neural rendering and neural graphics primitives are hot areas of research currently. One place to start is with Advances in Neural Rendering from SIGGRAPH 2021. Another good resource is a recent overview of NeRF at CVPR 2022, where ray tracing is used to render radiance fields.
4. What’s the latest scoop on using ML training to help with ray-traced GI? Are there any neat advances in ray tracing that benefit from deep learning? Have you connected lower sampling and filtering using an ML upscaling 2D filter?
Adam: There’s been quite a lot of work in the machine learning space to assist with real-time (and not real-time) graphics. For ray-traced global illumination, check out a paper recently published by Thomas Müller, Real-Time Neural Radiance Caching for Path Tracing. Their approach trains a neural network to learn the light transport characteristics of a scene and then builds a light cache that can be queried at a lower cost than tracing the full paths.
5. What are your top three favorite graphics papers of all time?
Register for GTC 2022 to learn the latest about RTX real-time ray tracing. For a full list of content for game developers including tools and training, visit NVIDIA Game Development.
In four talks over two days, senior NVIDIA engineers will describe innovations in accelerated computing for modern data centers and systems at the edge of the network. Speaking at a virtual Hot Chips event, an annual gathering of processor and system architects, they’ll disclose performance numbers and other technical details for NVIDIA’s first server CPU, Read article >
Graphics pioneer Dr. Donald Greenberg shares the new chapter in digital design and how NVIDIA Omniverse supports the expansion. #DigitalTwins #SIGGRAPH2022
Posted by Yutian Chen, Staff Research Scientist, DeepMind, and Xingyou (Richard) Song, Research Scientist, Google Research, Brain Team
One of the most important aspects in machine learning is hyperparameter optimization, as finding the right hyperparameters for a machine learning task can make or break a model’s performance. Internally, we regularly use Google Vizier as the default platform for hyperparameter optimization. Throughout its deployment over the last 5 years, Google Vizier has been used more than 10 million times, over a vast class of applications, including machine learning applications from vision, reinforcement learning, and language but also scientific applications such as protein discovery and hardware acceleration. As Google Vizier is able to keep track of use patterns in its database, such data, usually consisting of optimization trajectories termed studies, contain very valuable prior information on realistic hyperparameter tuning objectives, and are thus highly attractive for developing better algorithms.
While there have been many previous methods for meta-learning over such data, such methods share one major common drawback: their meta-learning procedures depend heavily on numerical constraints such as the number of hyperparameters and their value ranges, and thus require all tasks to use the exact same total hyperparameter search space (i.e., tuning specifications). Additional textual information in the study, such as its description and parameter names, are also rarely used, yet can hold meaningful information about the type of task being optimized. Such a drawback becomes more exacerbated for larger datasets, which often contain significant amounts of such meaningful information.
Today in “Towards Learning Universal Hyperparameter Optimizers with Transformers”, we are excited to introduce the OptFormer, one of the first Transformer-based frameworks for hyperparameter tuning, learned from large-scale optimization data using flexible text-based representations. While numerous works have previously demonstrated the Transformer’s strong abilities across various domains, few have touched on its optimization-based capabilities, especially over text space. Our core findings demonstrate for the first time some intriguing algorithmic abilities of Transformers: 1) a single Transformer network is capable of imitating highly complex behaviors from multiple algorithms over long horizons; 2) the network is further capable of predicting objective values very accurately, in many cases surpassing Gaussian Processes, which are commonly used in algorithms such as Bayesian Optimization.
Approach: Representing Studies as Tokens Rather than only using numerical data as common with previous methods, our novel approach instead utilizes concepts from natural language and represents all of the study data as a sequence of tokens, including textual information from initial metadata. In the animation below, this includes “CIFAR10”, “learning rate”, “optimizer type”, and “Accuracy”, which informs the OptFormer of an image classification task. The OptFormer then generates new hyperparameters to try on the task, predicts the task accuracy, and finally receives the true accuracy, which will be used to generate the next round’s hyperparameters. Using the T5X codebase, the OptFormer is trained in a typical encoder-decoder fashion using standard generative pretraining over a wide range of hyperparameter optimization objectives, including real world data collected by Google Vizier, as well as public hyperparameter (HPO-B) and blackbox optimization benchmarks (BBOB).
The OptFormer can perform hyperparameter optimization encoder-decoder style, using token-based representations. It initially observes text-based metadata (in the gray box) containing information such as the title, search space parameter names, and metrics to optimize, and repeatedly outputs parameter and objective value predictions.
Imitating Policies As the OptFormer is trained over optimization trajectories by various algorithms, it may now accurately imitate such algorithms simultaneously. By providing a text-based prompt in the metadata for the designated algorithm (e.g. “Regularized Evolution”), the OptFormer will imitate the algorithm’s behavior.
Over an unseen test function, the OptFormer produces nearly identical optimization curves as the original algorithm. Mean and standard deviation error bars are shown.
Predicting Objective Values In addition, the OptFormer may now predict the objective value being optimized (e.g. accuracy) and provide uncertainty estimates. We compared the OptFormer’s prediction with a standard Gaussian Process and found that the OptFormer was able to make significantly more accurate predictions. This can be seen below qualitatively, where the OptFormer’s calibration curve closely follows the ideal diagonal line in a goodness-of-fit test, and quantitatively through standard aggregate metrics such as log predictive density.
Combining Both: Model-based Optimization We may now use the OptFormer’s function prediction capability to better guide our imitated policy, similar to techniques found in Bayesian Optimization. Using Thompson Sampling, we may rank our imitated policy’s suggestions and only select the best according to the function predictor. This produces an augmented policy capable of outperforming our industry-grade Bayesian Optimization algorithm in Google Vizier when optimizing classic synthetic benchmark objectives and tuning the learning rate hyperparameters of a standard CIFAR-10 training pipeline.
Left: Best-so-far optimization curve over a classic Rosenbrock function. Right: Best-so-far optimization curve over hyperparameters for training a ResNet-50 on CIFAR-10 via init2winit. Both cases use 10 seeds per curve, and error bars at 25th and 75th percentiles.
Conclusion Throughout this work, we discovered some useful and previously unknown optimization capabilities of the Transformer. In the future, we hope to pave the way for a universal hyperparameter and blackbox optimization interface to use both numerical and textual data to facilitate optimization over complex search spaces, and integrate the OptFormer with the rest of the Transformer ecosystem (e.g. language, vision, code) by leveraging Google’s vast collection of offline AutoML data.
Acknowledgements The following members of DeepMind and the Google Research Brain Team conducted this research: Yutian Chen, Xingyou Song, Chansoo Lee, Zi Wang, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc’aurelio Ranzato, Sagi Perel, and Nando de Freitas.
We would like to also thank Chris Dyer, Luke Metz, Kevin Murphy, Yannis Assael, Frank Hutter, and Esteban Real for providing valuable feedback, and further thank Sebastian Pineda Arango, Christof Angermueller, and Zachary Nado for technical discussions on benchmarks. In addition, we thank Daniel Golovin, Daiyi Peng, Yingjie Miao, Jack Parker-Holder, Jie Tan, Lucio Dery, and Aleksandra Faust for multiple useful conversations.
Finally, we thank Tom Small for designing the animation for this post.
LEGO lovers scratching their heads reading assembly instructions could soon have help with complicated builds thanks to a new study from Stanford University,…
LEGO lovers scratching their heads reading assembly instructions could soon have help with complicated builds thanks to a new study from Stanford University, MIT, and Autodesk. The researchers designed a deep learning framework that translates 2D manuals into steps a machine can understand to build 3D LEGO kits. The work could advance research focused on creating machines that aid people while assembling objects.
“LEGO manuals provide a self-contained environment that exemplifies a core human skill: learning to complete tasks under guidance. Leveraging recent advances in visual scene parsing and program synthesis, we aimed to build machines with similar skills, starting with LEGO and eventually aiming for real-world scenarios,” said study senior author Jiajun Wu, an assistant professor in Computer Science at Stanford University.
According to the researchers, translating 2D manuals with AI presents two main challenges. First, AI must learn and understand the correspondence between a 3D shape during each assembly step based on the 2D manual images. This includes accounting for the orientation and alignment of the pieces.
It must also be capable of sorting through the bricks and inferring their 3D poses within semi-assembled models. As part of the LEGO build process, small pieces are combined to create larger parts, such as the head, neck, and body of a guitar. When combined, these larger parts create a complete project. This increases the difficulty as machines must parse out all the LEGO bricks, even those that may not be visible such as LEGO studs and antistuds.
The team worked to create a model that can translate 2D manuals into machine-executable plans to build a defined object. While there are two current approaches for performing this task—search-based and learning-based—both present limitations.
The search-based method seeks out possible 3D poses of pieces and manual images, looking for the correct pose. The method is compute intensive and slow, but precise.
Learning-based models rely on neural networks to predict a component’s 3D pose. They are fast, but not as accurate, especially when using unseen 3D shapes.
To solve this limitation, the researchers developed the Manual-to-Executable-Plan Network (MEPNet), which according to the study, uses deep learning and computer vision to integrate “neural 2D keypoint detection modules and 2D-3D projection algorithms.”
Working off a sequence of predictions, at each step, the model reads the manual, locates the pieces to add, and deduces the 3D positioning. After the model predicts the pose for each piece and step, it can parse the manual from scratch creating a building plan a robot could follow to build the LEGO object.
“For each step, the inputs consist of 1) a set of primitive bricks and parts that have been built in previous steps represented in 3D; and 2) a target 2D image showing how components should be connected. The expected output is the (relative) poses of all components involved in this step,” the researchers write in the study.
They created the first synthetic training data from a LEGO kit containing 72 types of bricks and employed an image rendering from LPub3D, an open-source application for “creating LEGO style digital building instructions.”
In total, the researchers generated 8,000 training manuals, using 10 sets for validation, and 20 sets for testing. There are around 200 individual steps in each data set accounting for about 200,000 individual steps in training.
“We train MEPNet with full supervision on a synthetically generated dataset where we have the ground truth keypoint, mask, and rotation information,” they write in the study. The MEPNet model was trained for 5 days on four NVIDIA TITAN RTX GPUs powered by NVIDIA Turing architecture.
They also tested the model on a Minecraft house dataset, which has a similar build style to LEGO.
Comparing MEPNet to existing models, the researchers found it outperformed the others in real-world LEGO sets, synthetically generated manuals, and the Minecraft example.
MEPNet was more accurate in pose estimations and better at identifying builds even with unseen pieces. The researchers also found that the model is able to apply learnings from synthetically generated manuals to real-world LEGO manuals.
While producing a robot capable of executing the plans is also needed, the researchers envision this work as a starting point.
“Our long-term goal is to build machines that can assist humans in constructing and assembling complex objects. We are thinking about extending our approach to other assembly domains, such as IKEA furniture,” said lead author Ruocheng Wang, an incoming Ph.D. student in Computer Science at Stanford University.
When Rachel Carpenter and Joseph French founded Intrinio a decade ago, the fintech revolution had only just begun. But they saw an opportunity to apply machine learning to vast amounts of financial filings to create an alternative data provider among the giants. The startup, based in St. Petersburg, Fla., delivers financial data to hedge funds, Read article >
Cybersecurity-related risk remains one of the top sources of risk in the enterprise. This has been exacerbated by the global pandemic, which has forced…
Cybersecurity-related risk remains one of the top sources of risk in the enterprise. This has been exacerbated by the global pandemic, which has forced companies to accelerate digitization initiatives to better support a remote workforce.
This includes not only the infrastructure to support a distributed workforce but also automation through robotics, data analytics, and new applications. Unfortunately, this expansive digital footprint has led to an increase in cybercriminal attacks.
If you are considering a new cybersecurity solution for your business, it is important to understand how traditional prevention methods differ from modern AI solutions.
Are traditional cybersecurity methods still feasible for enterprises?
The proliferation of endpoints in today’s more distributed environments makes traditional cybersecurity methods, which create perimeters to secure the infrastructure, much less effective. In fact, it’s estimated that for at least half of all attacks, the intruder is already inside.
Manual data collection and analysis process
Implementing rules-based tools or supervised machine-learning systems to combat cyberattacks is ineffective. The number of logs collected on devices and added to networks continues to increase and can overwhelm traditional collection mechanisms. Petabytes of data are easily amassed and must be sent back to a central data lake for processing.
Due to bandwidth limitations, only a small sample is typically analyzed. This might be as little as five percent of the data, so one in every 2000 packets can be analyzed. This is a suboptimal way of analyzing data for cybersecurity threats.
Most enterprises have the means to look at only a small percentage of their data. This means they are likely missing valuable data points that could help identify vulnerabilities and prevent threats. Analysts may look to enrich their view of what is happening in and around the network by integrating tools and data, but this is often a manual process.
Lack of AI capabilities leads to longer threat detection times
It is estimated that it can take up to 277 days to identify and contain a security breach. Being able to quickly triage and iterate on a perceived threat is crucial, but also typically requires human intervention. These problems are magnified by the global shortage of cybersecurity professionals.
Supervised ML systems also can’t detect zero-day threats because that is a “look back” cybersecurity approach. Traditional software-driven approaches like these can impede security teams from responding quickly to cybercriminals.
A better way to address threat detection challenges is with AI technology. For example, a bank institution may implement an AI cybersecurity solution to automatically identify which customer transactions are typical and which are potential threats.
How is AI changing modern cybersecurity solutions?
It’s no secret that cybersecurity professionals face an uphill battle to keep their organizations secure. Traditional threat detection methods are costly, reactive, and leave large gaps in security coverage, particularly in operations and globally distributed organizations.
To meet today’s cyberthreats, organizations need solutions that can provide visibility into 100% of the available data to identify malicious activity, along with insights to assist cybersecurity analysts in responding to threats.
AI cybersecurity use cases include:
Analyst augmentation technology using predictive analytics to assist with querying for large datasets.
User behavior risk scoring using AI algorithms to mine network data to identify and stop potential threats.
Reducing the time required to detect threats through faster, automated AI model updates.
Adopt an enterprise AI cybersecurity framework
NVIDIA Morpheus enables enterprises to observe all their data and apply AI inferencing and real-time monitoring of every server and packet across the entire network, at a scale previously impossible to achieve.
The Morpheus pipeline, combined with the NVIDIA accelerated computing platform, enables the analysis of cybersecurity data orders of magnitude faster than traditional solutions that use CPU-only servers.
Additionally, the Morpheus prebuilt use cases enable simplified augmentation of existing security infrastructure:
Digital fingerprinting uses unsupervised AI and time series modeling to create micro-targeted models for every user account and machine account combination running on the network, detecting humans posing as machines and machines as humans.
Phishing detection analyzes the entire raw email to classify it into ham, spam, or phishing.
Sensitive information detection finds and classifies leaked credentials, keys, passwords, credit card numbers, financial account information, and more.
Crypto-mining detection addresses the issue, reported by more than 69% of enterprises, of crypto-mining malware resulting in malicious DNS traffic and over-utilization of compute resources. This model determines crypto-mining, malware, machine learning and deep learning workloads, and more.
To get started with Morpheus, see the nvidia/morpheus GitHub repo.
To learn about how Morpheus can help companies leverage AI to improve their cybersecurity posture, register for the free online Morpheus DLI course or check out the following on-demand GTC sessions:
Transform Cybersecurity with Accelerated Data Science: Find out how Morpheus provides the necessary foundations and abstractions to enable any developer to write high-performance pipelines that use the latest machine learning and neural network models.
For live sessions, join us at GTC, Sept 19 – 22, to explore the next technology and research across AI, data science, cybersecurity, and more.
Learn About the Latest Developments with AI-Powered Cybersecurity [A41142]: Learn about the latest innovations available with NVIDIA Morpheus, being introduced in the Fall 2022 release, and find out how today’s security analysts are using Morpheus in their everyday investigations and workflows. – Bartley Richardson, Director of Cybersecurity Engineering, NVIDIA.
Deriving Cyber Resilience from the Data Supply Chain [A41145]: Hear how NVIDIA tackles these challenges through the application of zero-trust architectures in combination with AI and data analytics, combating our joint adversaries with a data-first response with the application of DPU, GPU, and AI SDKs and tools. Learn where the promise of cyber-AI is working in application. – Daniel Rohrer, Vice President of software Product Security, NVIDIA.
Accelerating the Next Generation of Cybersecurity Research [A41120]: Discover how to apply prebuilt models for digital fingerprinting to analyze behavior of every user and machine, analyze raw emails to automatically detect phishing, find and classify leaked credentials and sensitive information, profile behaviors to detect malicious code and behavior, and leverage graph neural networks to identify fraud. – Killian Sexsmith, Senior Developer Relations Manager, NVIDIA.
Deploying an application using a microservice architecture has several advantages: easier main system integration, simpler testing, and reusable code…
Deploying an application using a microservice architecture has several advantages: easier main system integration, simpler testing, and reusable code components. FastAPI has recently become one of the most popular web frameworks used to develop microservices in Python. FastAPI is much faster than Flask (a commonly used web framework in Python) because it is built over an Asynchronous Server Gateway Interface (ASGI) instead of a Web Server Gateway Interface (WSGI).
What are microservices
Microservices define an architectural and organizational approach to building software applications. One key aspect of microservices is that they are distributed and have loose couplings. Implementing changes is unlikely to break the entire application.
You can also think of an application built with a microservice architecture as being composed of several small, independent services that communicate through application programming interfaces (APIs). Typically, each service is owned by a smaller, self-contained team responsible for implementing changes and updates when necessary.
One of the major benefits of using microservices is that they enable teams to build new components for their applications rapidly. This is vital to remain aligned with ever-changing business needs.
Another benefit is how simple they make it to scale applications on demand. Businesses can accelerate the time-to-market to ensure that they are meeting customer needs constantly.
The difference between microservices and monoliths
Monoliths are another type of software architecture that proposes a more traditional, unified structure for designing software applications. Here are some of the differences.
Microservices are decoupled
Think about how a microservice breaks down an application into its core functions. Each function is referred to as a service and performs a single task.
In other words, a service can be independently built and deployed. The advantage of this is that individual services work without impacting the other services. For example, if one service is in more demand than the others, it can be independently scaled.
Monoliths are tightly coupled
On the other hand, a monolith architecture is tightly coupled and runs as a single service. The downside is that when one process experiences a demand spike, the entire application must be scaled to prevent this process from becoming a bottleneck. There’s also the increased risk of application downtime, as a single process failure affects the whole application.
With a monolithic architecture, it is much more complex to update or add new features to an application as the codebase grows. This limits the room for experimentation.
When to use microservices or monoliths?
These differences do not necessarily mean microservices are better than monoliths. In some instances, it still makes more sense to use a monolith, such as building a small application that will not demand much business logic, superior scalability, or flexibility.
However, machine learning (ML) applications are often complex systems with many moving parts and must be able to scale to meet business demands. Using a microservice architecture for ML applications is usually desirable.
Packaging a machine learning model
Before I can get into the specifics of the architecture to use for this microservice, there is an important step to go through: model packaging. You can only truly realize the value of an ML model when its predictions can be served to end users. aIn most scenarios, that means going from notebooks to scripts so that you can put your models into production.
In this instance, you convert the scripts to train and make predictions on new instances into a Python package. Packages are an essential part of programming. Without them, most of your development time is wasted rewriting existing code.
To better understand what packages are, it is much easier to start with what scripts are and then introduce modules.
Script: A file expected to be run directly. Each script execution performs a specific behavior defined by the developer. Creating a script is as simple as saving a file with the .py extension to denote a Python file.
Module: A program created to be imported into other scripts or modules. A module typically consists of several classes and functions intended to be used by other files. Another way to think of modules is as code to be reused over and over again.
A package may be defined as a collection of related modules. These modules interact with one another in a specific way such that you are enabled to accomplish a task. In Python, packages are typically bundled and distributed through PyPi and they can be installed using pip, a Python package installer.
Figure 1 shows the directory structure for this model.
The package modules include the following:
config.yml: YAML file to define constant variables.
pipeline.py: Pipeline to perform all feature transformations and modeling.
predict.py: To make predictions on new instances with the trained model.
train_pipeline.py: To conduct model training.
VERSION: The current release.
config/core.py: Module used to parse YAML file such that constant variables may be accessed in Python.
data/: All data used for the project.
models/: The trained serialized model.
processing/data_manager.py: Utility functions for data management.
processing/features.py: Feature transformations to be used in the pipeline.
processing/validation.py: A data validation schema.
The model is not optimized for this problem as the main focus of this post is to show how to build a ML application with a microservice architecture.
Now the model is ready to be distributed, but there is an issue. Distributing the package through the PyPi index would mean that it is accessible worldwide. This may be okay for a scenario where there’s no business value in the model. However, this would be a complete disaster in a real business scenario.
Instead of using PyPi to host private packages, you can use a third-party tool like Gemfury. The steps to do this are beyond the scope of this post. For more information, see Installing private Python packages.
After you have trained and saved your model, you need a way of serving predictions to the end user. REST APIs are a great way to achieve this goal. There are several application architectures you could use to integrate the REST API. Figure 3 shows the embedded architecture that I use in this post.
An embedded architecture refers to a system in which the trained model is embedded into the API and installed as a dependency.
There is a natural trade-off between simplicity and flexibility. The embedded approach is much simpler than other approaches but is less flexible. For example, whenever a model update is made, the entire application would have to be redeployed. If your service were being offered on mobile, then you’d have to release a new version of the software.
Building the API with FastAPI
The consideration when building the API is dependencies. You won’t be creating a virtual environment because you are running the application with tox, which is a command-line–driven, automated testing tool also used for generic virtualenv management. Thus, calling tox creates a virtual environment and runs the application.
There’s an extra index, another index for pip to search for packages if it cannot be found in PyPI. This is a public link to the Gemfury account hosting the packaged model, thus, enabling you to install the trained model from Gemfury. This would be a private package in a professional setting, meaning that the link would be extracted and hidden in an environment variable.
Another thing to take note of is uvicorn. Uvicorn is a server gateway interface that implements the ASGI interface. In other words, it is a dedicated web server that is responsible for dealing with inbound and outbound requests. It’s defined in Procfile.
Now that the dependencies are specified, you can move on to look at the actual application. The main part of the API application is the main.py script:
from typing import Any
from fastapi import APIRouter, FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import HTMLResponse
from loguru import logger
from app.api import api_router
from app.config import settings, setup_app_logging
# setup logging as early as possible
setup_app_logging(config=settings)
app = FastAPI(
title=settings.PROJECT_NAME, openapi_url=f"{settings.API_V1_STR}/openapi.json"
)
root_router = APIRouter()
@root_router.get("/")
def index(request: Request) -> Any:
"""Basic HTML response."""
body = (
""
""
"
"
""
""
)
return HTMLResponse(content=body)
app.include_router(api_router, prefix=settings.API_V1_STR)
app.include_router(root_router)
# Set all CORS enabled origins
if settings.BACKEND_CORS_ORIGINS:
app.add_middleware(
CORSMiddleware,
allow_origins=[str(origin) for origin in settings.BACKEND_CORS_ORIGINS],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
if __name__ == "__main__":
# Use this for debugging purposes only
logger.warning("Running in development mode. Do not run like this in production.")
import uvicorn # type: ignore
uvicorn.run(app, host="localhost", port=8001, log_level="debug")
If you are unable to follow along, do not worry about it. The key thing to note is that there are two routers in the main application:
root_router: This endpoint defines a body that returns an HTML response. You could almost think of it as a home endpoint that is called index.
api_router: This endpoint is used to specify the more complex endpoints that permit other applications to interact with the ML model.
Dive deeper into the api.py module to understand api_router better. First, there are two endpoints defined in this module: health and predict.
Take a look at the code example:
@api_router.get("/health", response_model=schemas.Health, status_code=200)
def health() -> dict:
"""
Root Get
"""
health = schemas.Health(
name=settings.PROJECT_NAME, api_version=__version__, model_version=model_version
)
return health.dict()
@api_router.post("/predict", response_model=schemas.PredictionResults, status_code=200)
async def predict(input_data: schemas.MultipleCarTransactionInputData) -> Any:
"""
Make predictions with the Fraud detection model
"""
input_df = pd.DataFrame(jsonable_encoder(input_data.inputs))
# Advanced: You can improve performance of your API by rewriting the
# `make prediction` function to be async and using await here.
logger.info(f"Making prediction on inputs: {input_data.inputs}")
results = make_prediction(inputs=input_df.replace({np.nan: None}))
if results["errors"] is not None:
logger.warning(f"Prediction validation error: {results.get('errors')}")
raise HTTPException(status_code=400, detail=json.loads(results["errors"]))
logger.info(f"Prediction results: {results.get('predictions')}")
return results
The health endpoint is quite straightforward. It returns the health response schema of the model when you access the web server (Figure 4). You defined this schema in the health.py module in the schemas directory.
The predict endpoint is slightly more complex. Here are the steps involved:
Make a prediction using the ML model’s make_prediction function.
Catch any errors made by the model.
Return the results, if the model has no errors.
Check that all is functioning well by spinning up a server with the following command from a terminal window:
py -m tox -e run
This should display several logs if the server is running, as shown in Figure 5.
Now, you can navigate to http://localhost:8001 to see the interactive endpoints of the API.
Testing the microservice API
Navigating to the local server takes you to the index endpoint defined in root_rooter from the main.py script. You can get more information about the API by adding /docs to the end of the local host server URL.
For example, Figure 6 shows that you’ve created the predict endpoint as a POST request, and the health endpoint is a GET request.
First, expand the predict heading to receive information about the endpoint. In this heading, you see an example in the request body. I defined this example in one of the schemas so that you can test the API—this is beyond the scope of this post, but you can browse the schema code.
To try out the model on the request body example, choose Try it out.
Figure 7 shows that the model returns a predicted output class of 1. Internally, you know that 1 refers to the acc class value, but you may want to reveal that to the user when displayed in a user interface.
What’s next?
Congratulations, you have now built your own ML model microservice. The next steps involve deploying it so that it can run in production.
To recap: A microservice is an architectural and organizational design approach that arranges loosely coupled services. One of the main benefits of using the microservice approach for ML applications is independence from the main software product. Having a feature service (the ML application) that is separate from the main software product has two key benefits:
It enables cross-functional teams to engage in distributed development, which results in faster deployments.
The scalability of the software is significantly improved.
Did you find this tutorial helpful? Leave your feedback in the comments or connect with me at kurtispykes (LinkedIn).