Register now for better personalized quote!

This new technology could blow away GPT-4 and everything like it

Apr, 20, 2023 Hi-network.com

Stanford and MILA's Hyena Hierarchy is a technology for relating items of data, be they words or pixels in a digital image. The technology can reach similar accuracy in benchmark AI tasks as the existing "gold standard" for large language models, the "attention" mechanism, but with as little as 100 times less compute power.

Phil Newton/Getty Images

For all the fervor over the chatbot AI program known as ChatGPT, from OpenAI, and its successor technology, GPT-4, the programs are, at the end of they day, just software applications. And like all applications, they have technical limitations that can make their performance sub-optimal. 

In a paper published in March, artificial intelligence (AI) scientists at Stanford University and Canada's MILA institute for AI proposed a technology that could be far more efficient than GPT-4 -- or anything like it -- at gobbling vast amounts of data and transforming it into an answer. 

Also: These ex-Apple employees want to replace smartphones with this gadget

Known as Hyena, the technology is able to achieve equivalent accuracy on benchmark tests, such as question answering, while using a fraction of the computing power. In some instances, the Hyena code is able to handle amounts of text that make GPT-style technology simply run out of memory and fail. 

"Our promising results at the sub-billion parameter scale suggest that attention may not be all we need," write the authors. That remark refers to the title of a landmark AI report of 2017, 'Attention is all you need'. In that paper, Google scientist Ashish Vaswani and colleagues introduced the world to Google's Transformer AI program. The Transformer became the basis for every one of the recent large language models.

But the Transformer has a big flaw. It uses something called "attention," where the computer program takes the information in one group of symbols, such as words, and moves that information to a new group of symbols, such as the answer you see from ChatGPT, which is the output. 

Also: What is GPT-4? Here's everything you need to know

That attention operation -- the essential tool of all large language programs, including ChatGPT and GPT-4 -- has "quadratic" computational complexity (Wiki "time complexity" of computing). That complexity means the amount of time it takes for ChatGPT to produce an answer increases as the square of the amount of data it is fed as input. 

At some point, if there is too much data -- too many words in the prompt, or too many strings of conversations over hours and hours of chatting with the program -- then either the program gets bogged down providing an answer, or it must be given more and more GPU chips to run faster and faster, leading to a surge in computing requirements.

In the new paper, 'Hyena Hierarchy: Towards Larger Convolutional Language Models', posted on the arXiv pre-print server, lead author Michael Poli of Stanford and his colleagues propose to replace the Transformer's attention function with somethingsub-quadratic,namely Hyena.

Also: What is Auto-GPT? Everything to know about the next powerful AI tool

The authors don't explain the name, but one can imagine several reasons for a "Hyena" program. Hyenas are animals that live in Africa that can hunt for miles and miles. In a sense, a very powerful language model could be like a hyena, hunting for miles and miles to find nourishment.

But the authors are really concerned with "hierarchy", as the title suggests, and families of hyenas have a strict hierarchy by which members of a local hyena clan have varying levels of rank that establish dominance. In some analogous fashion, the Hyena program applies a bunch of very simple operations, as you'll see, over and over again, so that they combine to form a kind of hierarchy of data processing. It's that combinatorial element that gives the program its Hyena name.

Also: Future ChatGPT versions could replace a majority of work people do today, says Ben Goertzel

The paper's contributing authors include luminaries of the AI world, such as Yoshua Bengio, MILA's scientific director, who is a recipient of a 2019 Turing Award, computing's equivalent of the Nobel Prize. Bengio is widely credited with developing the attention mechanism long before Vaswani and team adapted it for the Transformer.

Also among the authors is Stanford University computer science associate professor Christopher R

tag-icon Hot Tags : Artificial Intelligence Innovation

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.