Since launching ChatGPT, OpenAI has continued to work on new AI projects that build on the success and popularity of its AI chatbot. Now, the appearance of a new mystery large language model (LLM) gives the public a sneak peek at its latest project -- and it's impressive.
Last week, "gpt2-chatbot" appeared on the Chatbot Arena, a benchmarking platform for comparing the performance of LLMs. The LLM caused quite the stir by outperforming many of the most popular LLMs on the market, such as Gemini, Claude, and even GPT-4. To the disappointment of many, however, Chatbot Arena quickly removed "gpt2-chatbot."
Also: Google was right to be worried: OpenAI reportedly wants to enter the search market
As of last night, however, if you visit the Chatbot Arena, you can encounter what seem to be two variants of the original chatbot: "im-a-good-gpt2-chatbot" and "im-also-a-good-gpt2-chatbot."
Despite the two models having "GPT" in their names, which usually denotes OpenAI's family of Generative Pre-trained Transformer (GPT) LLMs, the company has not officially acknowledged that it's behind the model. OpenAI CEO Sam Altman posted on X to merely cryptically state the name of one of the LLMs, "im-a-good-gpt2-chatbot," as seen below.
Even though the models are available in Chatbot Arena, accessing them is tricky. The two models are not in Chatbot Arena's list of supported LLMs and thus you can't test them in the side-by-side comparison feature.
Instead, if you want to access them, you must keep initiating an Arena (battle) comparison -- which randomly selects two LLMs to compete against each other -- until one of the two new models comes up. It took me five rounds to finally have one of the two appear, as seen below. If you're determined to test these models for yourself, the extra effort is worth it.
Once you have either "im-a-good-gpt2-chatbot" or "im-also-a-good-gpt2-chatbot" open, you can keep chatting with the model to test its capabilities for yourself. You can keep asking questions until you decide to start a new round or hit refresh.
Also: These four new Copilot for Microsoft 365 features make prompt writing like a pro even easier
Users have tested the new anonymous models' impressive capabilities, including by creating a Flappy Bird clone with one prompt, creating a code interpreter that uses Claude Opus, and even reasoning through basic physics questions.
These improvements have led people to speculate that the model is OpenAI's GPT-4.5 or GPT-5, released under a penname so that OpenAI can benchmark its performance accurately. When one user asked the "im-a-good-gpt2-chatbot" what exact LLM version it was, the model said, "I am based on the GPT-4 architecture, specifically the GPT-4.5 variant."
There's no way of knowing whether this is the result of a hallucination; until OpenAI confirms anything, it is best to err on the side of caution when using this LLM. If you are even the slightest bit curious, however, I encourage you to give it a try. It's free.