Global Players Look To Create Baseline To Evaluate Generative Ai Applications

SERVERS

courtneyk/Getty Images

Efforts are underway to provide a common set of benchmarks to assess generative artificial intelligence (AI) products and to create a "body of knowledge" on how these tools should be tested.

The aim is to provide a standard approach to the evaluation of generative AI applications and to galvanize efforts to address the risks. This common approach is a shift away from existing "piecemeal" efforts.

Also: Six skills you need to become an AI prompt engineer

Dubbed Sandbox, the initiative is led by Singapore's Infocomm Media Development Authority (IMDA) and AI Verify Foundation, and has garnered support from global market players, such as Amazon Web Services (AWS), Anthropic, Google, and Microsoft. These organizations are part of a current group of 15 participants, which also comprises Deloitte, EY, and IBM, as well as Singapore-based OCBC Bank and telco Singtel.

Sandbox is guided by a new draft catalog that categorizes current benchmarks and methods used to evaluate large language models (LLMs). The catalog compiles commonly used technical testing tools, organizing these according to what they test for and their methods, and recommends a baseline set of tests to evaluate generative AI products, IMDA said.

Also: Want a job in AI? These are the skills you need

The goal is to establish a common language and support "broader, safe and trustworthy adoption of generative AI", it said.

"Systematic and robust evaluation of models is a critical component of LLM governance and helps form the bedrock of trust in the use of these technologies," IMDA said.

"Through rigorous evaluation, the capabilities of a model are revealed, which can assist in determining its intended uses and potential limitations. Evaluation [also] provides a vital roadmap for developers to make improvements."

Achieving this common language requires a standardized taxonomy and baseline set of pre-deployment safety evaluations for LLMs, it noted. The Singapore government agency hopes the draft catalog offers a starting point for global discussions, with the aim of driving consensus on safety standards for LLMs.

Also: How to write better ChatGPT prompts (and this applies to most other text-based AIs, too)

Moving toward common standards also means involving other stakeholders in the ecosystem, beyond the model developers, such as application developers that build on top of the models and developers of third-party testing tools.

Through Sandbox, IMDA wants to offer use cases that include a generative AI model developer, application deployer, and third-party tester to demonstrate how the different players can work together. For instance, model developers, such as Anthropic or Google, can work with app developers OCBC or Singtel, alongside third-party testers, such as Deloitte and EY, and on generative AI use cases for the financial services or telecommunications sector.

Regulators, such as Singapore's Personal Data Protection Commission, should also be involved, so Sandbox can provide an environment for experimentation and development where all parties in the ecosystem can be "transparent" about their needs, IMDA said.

IMDA expects Sandbox to uncover gaps in the current state of generative AI evaluations, including domain-specific applications, such as human resources and cultural-specific areas, which are currently under-developed.

"Sandbox will develop benchmarks for evaluating model performance in specific areas that are important for use cases, and for countries like Singapore because of cultural and language specificities," IMDA said.

Also: 6 things ChatGPT can't do (and another 20 it refuses to do)

The Singapore agency said it is collaborating with Anthropic on a Sandbox project that uses the catalog to identify aspects for red teaming, which looks to challenge policies and assumptions used in AI systems by taking on an adversarial approach.

IMDA will tap Anthropic's models and research tooling platform to develop red-teaming methodologies customized for Singapore's diverse linguistic and cultural landscape. For instance, AI models will be evaluated for their abilities to perform within the country's multi-lingual context.

In July, the Singapore government launched two sandboxes running on Google Cloud's generative AI toolsets, one of which is used exclusively by government agencies to develop and test generative AI applications. The other sandbox is available to local organizations and provided at no cost for three months, for up to 100 use cases.

Artificial Intelligence

AI at the edge: Fast times ahead for 5G and the Internet of ThingsAI pioneer Daphne Koller sees generative AI leading to cancer breakthroughsWorried about AI gobbling up your job? Start doing these 3 things nowWith AI, organizations are now seeing software developers as great collaborators

AI at the edge: Fast times ahead for 5G and the Internet of Things
AI pioneer Daphne Koller sees generative AI leading to cancer breakthroughs
Worried about AI gobbling up your job? Start doing these 3 things now
With AI, organizations are now seeing software developers as great collaborators

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

Unveiling the Huawei CloudEngine S6730 Series: Advanced Switching for Modern Networks

Huawei S6730-H48X6C: A Comprehensive Overview

Comprehensive Guide to Huawei S6730-H24X6C

Huawei Switches Visio Stencils

Global players look to create baseline to evaluate generative AI applications

Artificial Intelligence

Hot Tags : Artificial Intelligence Innovation

Ordering Guide

Resources

About Us

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

​Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

Unveiling the Huawei CloudEngine S6730 Series: Advanced Switching for Modern Networks

Huawei S6730-H48X6C: A Comprehensive Overview

Comprehensive Guide to Huawei S6730-H24X6C

Huawei Switches Visio Stencils

Global players look to create baseline to evaluate generative AI applications

Artificial Intelligence

Hot Tags : Artificial Intelligence Innovation

Ordering Guide

Resources

About Us

Introduction to Huawei CloudEngine S6730-H Series Switches