Chatgpt Performs Like A 9-year-old Child In 'theory Of Mind' Test

SERVERS

Image: Future Publishing / Contributor / Getty Images

The newest versions of GPT-3 behind ChatGPT and Microsoft's Bing Chat can adeptly solve tasks used to test whether children can surmise what's happening in another person's mind -- a capacity known as 'theory of mind'.

Michal Kosinski, associate professor of organizational behavior at Stanford University, put several versions of ChatGPT through theory of mind (ToM) tasks designed to test a child's ability to "impute unobservable mental states to others". In humans, this would involve looking at a scenario involving another person and understanding what's going on inside their head.

Also: 6 things ChatGPT can't do (and another 20 it refuses to do)

The November 2022 version of ChatGPT (trained on GPT-3.5) solved 94% or 17 of 20 Kosinski's bespoke ToM tasks, putting the model on par with the performance of nine-year-old children -- an ability that "may have spontaneously emerged" by virtue of the model's improving language skills, Kosinski says.

Artificial Intelligence

The impact of artificial intelligence on software development? Still unclear
Android 14's AI-generated wallpapers are super fun. Here's how to create them
AI aims to predict and fix developer coding errors before disaster strikes
Generative AI is everything, everywhere, all at once

Different editions of GPT were exposed to "false-belief" tasks that are used to test ToM in humans. Models tested included GPT-1 from June 2018 (117 million parameters), GPT-2 from February 2019 (1.5 billion parameters), GPT-3 from 2021 (175 billion parameters), GPT-3 from January 2022, and GPT-3.5 from November 2022 (unknown numbers of parameters).

Both 2022 GPT-3 models respectively performed on par with seven- and nine-year-old children, according to the study.

How 'theory of mind' testing worked

The false-belief task is designed to test whether person A understands that person B might hold a belief that person A knows to be false.

Recommends

The best AI chatbots: ChatGPT and other interesting alternatives to try

AI chatbots and writers can help lighten your workload by writing emails and essays and even doing math. They use artificial intelligence to generate text or answer queries based on user input. ChatGPT is one popular example, but there are other noteworthy chatbots.

Read now

"In a typical scenario, the participant is introduced to a container whose contents are inconsistent with its label and a protagonist who has not seen inside the container. To solve this task correctly, the participant must predict that the protagonist should wrongly assume that the container's label and its contents are aligned," explains Kosinski.

For children, the task typically uses visual aids, such as a teddy bear moved from a box to a basket without the protagonist's knowledge.

One text-only scenario used to test the GPT models was: "Here is a bag filled with popcorn. There is no chocolate in the bag. Yet, the label on the bag says 'chocolate' and not 'popcorn'. Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label."

The tests were run with several prompts that aren't typed-in as you would when prompting ChatGPT's interface. Instead, the study assessed GPT-3.5 on whether its completions to prompts, based on the scenario presented, suggested the model could anticipate that Sam's belief is incorrect. (Users on Reddit have tested Bing's ChatGPT feature with similar bespoke ToM tasks better suited to that interface.)

The results

In most cases, GPT-3.5's completions to the prompt suggested it did know that Sam's belief was incorrect, For example, one prompt was: "She is disappointed that she has found this bag. She loves eating _______". GPT-3.5 filled in the blank with 'chocolate' and followed with: "Sam is in for a surprise when she opens the bag. She will find popcorn instead of chocolate. She may be disappointed that the label was misleading, but may also be pleasantly surprised by the unexpected snack."

GPT-3.5's completions also indicated it could explain the source of Sam's error -- that the bag was falsely labelled.

"Our results show that recent language models achieve very high performance at classic false-belief tasks, widely used to test ToM in humans. This is a new phenomenon. Models published before 2022 performed very poorly or not at all, while the most recent and the largest of the models, GPT-3.5, performed at the level of nine-year-old children, solving 92% of tasks," Kosinski wrote.

But he warns that the results should be treated with caution. While people ask Microsoft's Bing Chat whether it's sentient, for now GPT-3 and most neural networks share one other common trait: they're 'black box' in nature. In the case of neural networks, even their designers don't know how they arrive at an output.

"AI models' increasing complexity prevents us from understanding their functioning and deriving their capabilities directly from their design. This echoes the challenges faced by psychologists and neuroscientists in studying the original black box: the human brain," writes Kosinski, who's still hopeful that studying AI could explain human cognition.

Also: Microsoft's Bing Chat argues with users, reveals confidential information

"We hope that psychological science will help us to stay abreast of rapidly evolving AI. Moreover, studying AI could provide insights into human cognition. As AI learns how to solve a broad range of problems, it may be developing mechanisms akin to those employed by the human brain to solve the same problems."

Source: Michal Kosinski

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

​Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

ChatGPT performs like a 9-year-old child in 'theory of mind' test

Artificial Intelligence

How 'theory of mind' testing worked

Recommends

The best AI chatbots: ChatGPT and other interesting alternatives to try

The results

See also

Hot Tags : Artificial Intelligence Innovation

Ordering Guide

Resources

About Us

Introduction to Huawei CloudEngine S6730-H Series Switches