SERVERS

Can the frequency of language, and qualities such as polysemy, affect whether a neural network can suddenly solve tasks for which it was not specifically developed, known as "few-shot learning"? DeepMind says yes.

Tiernan Ray for ZDNet

How is it that a program such as OpenAI's GPT-3 neural network can answer multiple choice questions, or write a poem in a particular style, despite neverbeing programmed for those specific tasks?

It may be because human language has statistical properties that lead a neural network to expect the unexpected, according to new research by DeepMind, the AI unit of Google.

Artificial Intelligence

8 ways to reduce ChatGPT hallucinations
AI is transforming organizations everywhere. How these 6 companies are leading the way
3 ways AI is revolutionizing how health organizations serve patients. Can LLMs like ChatGPT help?
If AI is the future of your business, should the CIO be the one in control?

Natural language, when viewed from the point of view of statistics, has qualities that are "non-uniform," such as words that can stand for multiple things, known as "polysemy," like the word "bank," meaning a place where you put money or a rising mound of earth. And words that sound the same can stand for differentthings, known as homonyms, like "here" and "hear."

Those qualities of language are the focus of a paper posted on arXiv this month, "Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers," by DeepMind scientists Stephanie C.Y. Chan, Adam Santoro, Andrew K. Lampinen, Jane X. Wang, Aaditya Singh, Pierre H. Richemond, Jay McClelland, and Felix Hill.

Also:What is GPT-3? Everything you need to know about OpenAI's breakthrough AI language program

The authors started by asking how programs such as GPT-3 can solve tasks where they are presented with kinds of queries for which they have not been explicitly trained, what is known as "few-shot learning."

For example, GPT-3 can answer multiple choice questions without ever having been explicitly programmed to answer such a form of a question, simply by being prompted by a human user typing an example of a multiple choice question and answer pair.

"Large transformer-based language models are able to perform few-shot learning (also known as in-context learning), without having been explicitly trained for it," they write, referring to the wildly popular "Transformer" neural net from Google that is the basis of GPT-3 and Google's BERT language program.

As they explain, "We hypothesized that specific distributional properties of natural language might drive this emergent phenomenon."

The authors speculate that such large language model programs are behaving like another kind of machine learning program, known as meta-learning. Meta-learning programs, which have been explored by DeepMind in recent years, function by being able to model patterns of data that span different data sets. Such programs are trained to model not a single data distribution but adistribution of data sets,as explained in prior research by team member Adam Santoro.

Also:OpenAI's gigantic GPT-3 hints at the limits of language models for AI

The key here is the idea ofdifferent data sets. All the non-uniformities of language, they conjecture, such as polysemy and the "long tail," of language, the fact that speech contains words used with relatively little frequency - each of these strange facts of language are akin to separate data distributions.

In fact, language, they write, is like something between supervised training data, with regular patterns, and meta-learning with lots of different data:

As in supervised training, items (words) do recur, and item-label mappings (e.g. word meanings) are somewhat fixed. At the same time, the long-tailed distribution ensures that there exist many rare words that recur only infrequently across context windows, but may be bursty (appear multiple times) within context windows. We can also see synonyms, homonyms, and polysemy as weaker versions of the completely unfixed item-label mappings that are used in few-shot meta-training, where the mappings change on every episode.

To test the hypothesis, Chan and colleagues, surprisingly, do not actually work with language tasks. Instead, they train a Transformer neural net to solve a visual task, called Omniglot, introduced in 2016 by NYU, Carnegie Mellon, and MIT scholars. Omniglot challenges a program to assign the right classification label to 1,623 handwritten character glyphs.

In the case of Chan et al.'s work, they turn the labeled Omniglot challenge into a one-shot task by randomly shuffling the labels of the glyphs, so that the neural net is learning with each "episode":

Unlike in training, where the labels were fixed across all sequences, the labels for these two image classes were randomly re-assigned for each sequence [...] Because the labels were randomly re-assigned for each sequence, the model must use the context in the current sequence in order to make a label prediction for the query image (a 2-way classification problem). Unless stated otherwise, few-shot learning was always evaluated on holdout image classes that were never seen in training.

In this way, the authors are manipulating visual data, the glyphs, to capture the non-uniform qualities of language. "At training time, we situate the Omniglot images and labels in sequences with various language-inspired distributional properties," they write. For example, they gradually turn up the number of class labels that can be assigned to a given glyph, to approximate the quality of polysemy.

"At evaluation, we then assess whether these properties give rise to few-shot learning abilities."

What they found is that as they multiply the number of labels for a given glyph, the neural network got better at performing few-shot learning. "We see that increasing this 'polysemy factor' (the number of labels assigned to each word) also increases few-shot learning," as Chan and colleagues put it.

"In other words, making the generalization problem harder actually made few-shot learning emerge more strongly."

At the same time, it is not only the data distribution that is causing the few-shot performance, they conclude. There is something about the specific structure of the Transformer neural network that helps it achieve few-shot learning, Chan and colleagues find. They test "a vanilla recurrent neural network," they write, and find that such a networknever achieves a few-shot ability.

"Transformers show a significantly greater bias towards few-shot learning than recurrent models."

The authors conclude that both the qualities of the data, such as language's long tail, and the nature of the neural net, such as Transformer structure, matter. It's not one or the other but both.

The authors enumerate a number of avenues to explore in the future. One is the connection to human cognition since babies demonstrate what appears to be few-shot learning.

For example, infants rapidly learn the statistical properties of language. Could these distributional features help infants acquire the ability for rapid learning, or serve as useful pre-training for later learning? And could similar non-uniform distributions in other domains of experience, such as vision, also play a role in this development?

It should be apparent that the current work is not a test of language at all. Rather, it aims to emulate the supposed statistical properties of language by recreating non-uniformities in visual data, the Omniglot images.

The authors don't explain whether that translation from one modality to another has any effect on the significance of their work. Instead, they write that they expect to extend their work to more aspects of language.

"The above results suggest exciting lines of future research," they write, including, "How do these data distributional properties interact with reinforcement learning vs. supervised losses? How might results differ in experiments that replicate other aspects of language and language modeling, e.g. using symbolic inputs, training on next-token or masked-token prediction, and having the meaning of words determined by their context?"

Featured

New iPhone 15 Pro overheating reports: Still too hot after iOS 17.0.3 and fresh issues arise after the updateGenerative AI will far surpass what ChatGPT can do. Here's everything on how the tech advancesiPhone 15 Pro review: Prepare to be dazzledThe best USB-C cables for the iPhone 15: What the experts recommend

New iPhone 15 Pro overheating reports: Still too hot after iOS 17.0.3 and fresh issues arise after the update
Generative AI will far surpass what ChatGPT can do. Here's everything on how the tech advances
iPhone 15 Pro review: Prepare to be dazzled
The best USB-C cables for the iPhone 15: What the experts recommend

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei s5735-l24p4s-a-v2 Delivers Scalable, Secure, and Smart PoE Access for Modern IT Infrastructures

Huawei S5735-L48T4XE-A-V2 Switch Delivers Enterprise-Grade Performance in a Compact Design

Huawei S5735-L48P4XE-A-V2 Review: Versatile Campus Switch with iStack and Full L3 Support

Differences Between Huawei CE Series and S Series Switches

Huawei CloudEngine S5735 Switches Set the Benchmark for High-Performance, Energy-Efficient Switching

Huawei CloudEngine S5731‑S48P4X Datasheet

Huawei CloudEngine S5731‑S24P4X Datasheet

Huawei S5731-S Empowers Next-Generation Campus Networks with Advanced Capabilities

Huawei S5731-H24P4XC Switch Review: Power-Packed Performance and Smart PoE

Huawei S5731-H Series Switches Redefine Campus Networking with Intelligent High-Performance Architecture

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

DeepMind: Why is AI so good at language? It's something in language itself

Artificial Intelligence

Featured

Hot Tags : Artificial Intelligence Innovation

Ordering Guide

Resources

About Us

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei s5735-l24p4s-a-v2 Delivers Scalable, Secure, and Smart PoE Access for Modern IT Infrastructures

Huawei S5735-L48T4XE-A-V2 Switch Delivers Enterprise-Grade Performance in a Compact Design

Huawei S5735-L48P4XE-A-V2 Review: Versatile Campus Switch with iStack and Full L3 Support

Differences Between Huawei CE Series and S Series Switches

Huawei CloudEngine S5735 Switches Set the Benchmark for High-Performance, Energy-Efficient Switching

Huawei CloudEngine S5731‑S48P4X Datasheet

Huawei CloudEngine S5731‑S24P4X Datasheet

Huawei S5731-S Empowers Next-Generation Campus Networks with Advanced Capabilities

Huawei S5731-H24P4XC Switch Review: Power-Packed Performance and Smart PoE

Huawei S5731-H Series Switches Redefine Campus Networking with Intelligent High-Performance Architecture

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

DeepMind: Why is AI so good at language? It's something in language itself

Artificial Intelligence

Featured

Hot Tags : Artificial Intelligence Innovation

Ordering Guide

Resources

About Us

Huawei CloudEngine S5731‑S48P4X Datasheet