Amazon Proposes A New Ai Benchmark To Measure Rag

SERVERS

amazon-aws-rag-benchmarks-crop-for-twitter-new — An outline of Amazon's proposed benchmarking process for RAG implementations of generative AI.

Amazon AWS

This year is supposed to be the year that generative artificial intelligence (GenAI) takes off in the enterprise, according to many observers. One of the ways this could happen is via retrieval-augmented generation (RAG), a methodology by which an AI large language model is hooked up to a database containing domain-specific content such as company files.

However, RAG is an emerging technology with its pitfalls.

Also: Make room for RAG: How Gen AI's balance of power is shifting

For that reason, researchers at Amazon's AWS propose in a new paper to set a series of benchmarks that will specifically test how well RAG can answer questions about domain-specific content.

"Our method is an automated, cost-efficient, interpretable, and robust strategy to select the optimal components for a RAG system," write lead author Gauthier Guinet and team in the work, "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation," posted on the arXiv preprint server.

The paper is being presented at the 41st International Conference on Machine Learning, an AI conference that takes place July 21- 27 in Vienna.

The basic problem, explains Guinet and team, is that while there are many benchmarks to compare the ability of various large language models (LLMs) on numerous tasks, in the area of RAG, specifically, there is no "canonical" approach to measurement that is "a comprehensive task-specific evaluation" of the many qualities that matter, including "truthfulness" and "factuality."

The authors believe their automated method creates a certain uniformity: "By automatically generating multiple choice exams tailored to the document corpus associated with each task, our approach enables standardized, scalable, and interpretable scoring of different RAG systems."

To set about that task, the authors generate question-answer pairs by drawing on material from four domains: the troubleshooting documents of AWS on the topic of DevOps; article abstracts of scientific papers from the arXiv preprint server; questions on StackExchange; and filings from the US Securities & Exchange Commission, the chief regulator of publicly listed companies.

Also:Hooking up generative AI to medical data improved usefulness for doctors

They then devise multiple-choice tests for the LLMs to evaluate how close each LLM comes to the right answer. They subject two families of open-source LLMs to these exams -- Mistral, from the French company of the same name, and Meta Properties's Llama.

They test the models in three scenarios. The first is a "closed book" scenario, where the LLM has no access at all to RAG data, and has to rely on its pre-trained neural "parameters" -- or "weights" -- to come up with the answer. The second is what's called "Oracle" forms of RAG, where the LLM is given access to the exact document used to generate a question, the ground truth, as it's known.

The third form is "classical retrieval," where the model has to search across the entire data set looking for a question's context, using a variety of algorithms. Several popular RAG formulas are used, including one introduced in 2019 by scholars at Tel-Aviv University and the Allen Institute for Artificial Intelligence, MultiQA; and an older but very popular approach for information retrieval called BM25.

Also: Microsoft Azure gets 'Models as a Service,' enhanced RAG offerings for enterprise generative AI

They then run the exams and tally the results, which are sufficiently complex to fill tons of charts and tables on the relative strengths and weaknesses of the LLMs and the various RAG approaches. The authors even perform a meta-analysis of their exam questions --to gauge their utility -- based on the education field's well-known "Bloom's taxonomy."

What matters even more than data points from the exams are the broad findings that can be true of RAG -- irrespective of the implementation details.

One broad finding is that better RAG algorithms can improve an LLM more than, for example, making the LLM bigger.

"The right choice of the retrieval method can often lead to performance improvements surpassing those from simply choosing larger LLMs," they write.

That's important given concerns over the spiraling resource intensity of GenAI. If you can do more with less, it's a valuable avenue to explore. It also suggests that the conventional wisdom in AI at the moment, that scaling is always best, is not entirely true when it comes to solving concrete problems.

Also: Generative AI is new attack vector endangering enterprises, says CrowdStrike CTO

Just as important, the authors find that if the RAG algorithm doesn't work correctly, it can degrade the performance of the LLM versus the closed-book, plain vanilla version with no RAG.

"Poorly aligned retriever component can lead to a worse accuracy than having no retrieval at all," is how Guinet and team put it.

Artificial Intelligence

Transparency is sorely lacking amid growing AI interest
What is a Chief AI Officer, and how do you become one?
How Adobe manages AI ethics concerns while fostering creativity
6 ways OpenAI just supercharged ChatGPT for free users

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

Unveiling the Huawei CloudEngine S6730 Series: Advanced Switching for Modern Networks

Huawei S6730-H48X6C: A Comprehensive Overview

Amazon proposes a new AI benchmark to measure RAG

Artificial Intelligence

Hot Tags : Innovation

Ordering Guide

Resources

About Us

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

​Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

Unveiling the Huawei CloudEngine S6730 Series: Advanced Switching for Modern Networks

Huawei S6730-H48X6C: A Comprehensive Overview

Amazon proposes a new AI benchmark to measure RAG

Artificial Intelligence

Hot Tags : Innovation

Ordering Guide

Resources

About Us

Introduction to Huawei CloudEngine S6730-H Series Switches