Anthropic's Claude 3.7 Sonnet, the latest in AI development, is being put to the test by playing Pokemon Red on Twitch. The livestream has drawn attention as viewers watch the AI slowly navigate through the game, reasoning its way through each step. While Pokemon is a game designed for children, it serves as a useful benchmark for evaluating the AI's problem-solving skills, demonstrating how far AI models have come in reasoning through complex puzzles.
Despite its progress, Claude 3.7 Sonnet's gameplay is far from perfect. Early on, the AI struggled with basic tasks, like leaving the starting town, but later managed to win several gym leader badges. Yet, its performance is often slow, with moments of confusion that highlight the challenges AI faces in navigating human-designed games. For example, Claude became fixated on a rock wall, unable to move past it until it reasoned a way around.
This experiment draws comparisons to the earlier phenomenon "Twitch Plays Pokemon," where thousands of people worked together to guide a character through the game. Now, AI plays solo, and while it's a fascinating display of technological progress, some viewers lament the shift from collaborative human gameplay to watching a machine take on the same challenges. Still, the experiment highlights the growing sophistication of AI in handling tasks once thought too complex for machines.