Month: October 2024

Misc

An Introduction to Model Merging for LLMs

Post author By
Post date October 28, 2024
No Comments on An Introduction to Model Merging for LLMs

One challenge organizations face when customizing large language models (LLMs) is the need to run multiple experiments, which produces only one useful model….

One challenge organizations face when customizing large language models (LLMs) is the need to run multiple experiments, which produces only one useful model. While the cost of experimentation is typically low, and the results well worth the effort, this experimentation process does involve “wasted” resources, such as compute assets spent without their product being utilized…

Source

Misc

Upcoming Webinar: Enhance Generative AI Model Accuracy Through High-Quality Data Processing

Post author By
Post date October 28, 2024
No Comments on Upcoming Webinar: Enhance Generative AI Model Accuracy Through High-Quality Data Processing

Learn how to build scalable data processing pipelines to create high-quality datasets.

Source

Misc

NVIDIA GH200 Superchip Accelerates Inference by 2x in Multiturn Interactions with Llama Models

Post author By
Post date October 28, 2024
No Comments on NVIDIA GH200 Superchip Accelerates Inference by 2x in Multiturn Interactions with Llama Models

Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing…

Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing system throughput. While enhancing user interactivity requires minimizing time to first token (TTFT), increasing throughput requires increasing tokens per second. Improving one aspect often results in the decline of the other…

Source

Misc

Supercharging Fraud Detection in Financial Services with Graph Neural Networks

Post author By
Post date October 28, 2024
No Comments on Supercharging Fraud Detection in Financial Services with Graph Neural Networks

Fraud in financial services is a massive problem. According to NASDAQ, in 2023, banks faced $442 billion in projected losses from payments, checks, and credit…

Fraud in financial services is a massive problem. According to NASDAQ, in 2023, banks faced $442 billion in projected losses from payments, checks, and credit card fraud. It’s not just about the money, though. Fraud can tarnish a company’s reputation and frustrate customers when legitimate purchases are blocked. This is called a false positive. Unfortunately, these errors happen more often than…

Source

Misc

Creating RAG-Based Question-and-Answer LLM Workflows at NVIDIA

Post author By
Post date October 28, 2024
No Comments on Creating RAG-Based Question-and-Answer LLM Workflows at NVIDIA

GIF shows chat app in use. The rapid development of solutions using retrieval augmented generation (RAG) for question-and-answer LLM workflows has led to new types of system…

The rapid development of solutions using retrieval augmented generation (RAG) for question-and-answer LLM workflows has led to new types of system architectures. Our work at NVIDIA using AI for internal operations has led to several important findings for finding alignment between system capabilities and user expectations. We found that regardless of the intended scope or use case…

Source

Misc

Expert Support case study: Bolstering a RAG app with LLM-as-a-Judge

Post author By
Post date October 28, 2024
No Comments on Expert Support case study: Bolstering a RAG app with LLM-as-a-Judge

Misc

NVIDIA Ethernet Networking Accelerates World’s Largest AI Supercomputer, Built by xAI

Post author By
Post date October 28, 2024
No Comments on NVIDIA Ethernet Networking Accelerates World’s Largest AI Supercomputer, Built by xAI

NVIDIA today announced that xAI’s Colossus supercomputer cluster comprising 100,000 NVIDIA Hopper Tensor Core GPUs in Memphis, Tennessee, achieved this massive scale by using the NVIDIA Spectrum-X™ Ethernet networking platform, which is designed to deliver superior performance to multi-tenant, hyperscale AI factories using standards-based Ethernet, for its Remote Direct Memory Access (RDMA) network.

Misc

Bring Receipts: New NVIDIA AI Workflow Detects Fraudulent Credit Card Transactions

Post author By
Post date October 28, 2024
No Comments on Bring Receipts: New NVIDIA AI Workflow Detects Fraudulent Credit Card Transactions

Financial losses from worldwide credit card transaction fraud are expected to reach $43 billion by 2026. A new NVIDIA AI workflow for fraud detection running on Amazon Web Services (AWS) can help combat this burgeoning epidemic — using accelerated data processing and advanced algorithms to improve AI’s ability to detect and prevent credit card transaction
Read Article

Misc

Fintech Leaders Tap Generative AI for Safer, Faster, More Accurate Financial Services

Post author By
Post date October 28, 2024
No Comments on Fintech Leaders Tap Generative AI for Safer, Faster, More Accurate Financial Services

An overwhelming 91% of financial services industry (FSI) companies are either assessing artificial intelligence or already have it in the bag as a tool that’s driving innovation, improving operational efficiency and enhancing customer experiences. Generative AI — powered by NVIDIA NIM microservices and accelerated computing — can help organizations improve portfolio optimization, fraud detection, customer
Read Article

Misc

Advancing Performance with NVIDIA SHARP In-Network Computing

Post author By
Post date October 25, 2024
No Comments on Advancing Performance with NVIDIA SHARP In-Network Computing

Picture of servers in a data center. AI and scientific computing applications are great examples of distributed computing problems. The problems are too large and the computations too intensive to…

AI and scientific computing applications are great examples of distributed computing problems. The problems are too large and the computations too intensive to run on a single machine. These computations are broken down into parallel tasks that are distributed across thousands of compute engines, such as CPUs and GPUs. To achieve scalable performance, the system relies on dividing workloads…

Source