
News + Trends
Meta caught whitewashing AI benchmarks
by Samuel Buchmann
The open environment and the diverse tasks in the popular computer game Minecraft provide an ideal test balloon for AI models. They can put their skills to the test there.
The pixelated presentation of Minecraft has achieved cult status. The computer game is all about exploring an open world and constructing buildings. With more than 300 million copies sold, it is the best-selling computer game. And it now serves as a test balloon for AI models. Researchers led by Timothy Lillicrap from Google DeepMind have now presented an AI algorithm in the scientific journal "Nature" that has learnt to mine diamonds in the computer game world independently for the first time without special training or human data - an action that requires long-term strategic thinking.
"Minecraft poses two particular challenges for AI algorithms," says computer scientist Philipp Henning, who was not involved in the current work. Firstly, the randomly generated game world looks different in every game, which means that the models cannot memorise a fixed sequence of actions in order to perform well. "Secondly, the game requires comparatively long-term plans." This applies to the mining of diamonds, among other things. This is because it requires many successive steps that are only rewarded sparingly - the action only pays off at the very end. This is why diamond mining in Minecraft became a test balloon for the development of predictive AI models. Between 2019 and 2022, competitions were even held for this purpose. "Dreamer is the first algorithm to autonomously mine diamonds in Minecraft, achieving an important milestone in the field of artificial intelligence," write the experts at DeepMind in their publication.
The Dreamer algorithm learns independently from interactions with the environment through reinforcement learning. For example, if the AI scores points through an action in Minecraft, it learns that this action pays off - and will probably repeat it in the future. Dreamer consists of three parts: "The first model predicts the consequences of possible actions, a critical neural network assesses the value of each consequence, and the third neural network then selects the actions to achieve the best results," the paper reads. The AI model was able to learn more than 150 different tasks in Minecraft.
From today's perspective, the capabilities of the Dreamer architecture are somewhat less impressive.
Although the paper was only published in Nature in April 2024, it dates back to January 2023, when DeepMind submitted it for peer review. "It was perceived as a great success at the time because the open Minecraft game world was considered a demanding benchmark," says Henning. But: "Since then, as we all know, artificial intelligence has made huge leaps forward."
"From today's perspective, the capabilities of the Dreamer architecture are somewhat less impressive," says Henning. For example, in November 2024, an as yet unreviewed research paper was published in which several large language models controlled more than 1000 Minecraft characters and displayed surprisingly human behaviour: the AI players took on different roles, such as defenders, builders or explorers, and some even set off on missionary journeys. Let's see how long it takes for this work to appear in a scientific journal.
We are partners of Spektrum der Wissenschaft and want to make well-founded information more accessible to you. Follow Spektrum der Wissenschaft if you like the articles
Originalartikel auf Spektrum.deExperts from science and research report on the latest findings in their fields – competent, authentic and comprehensible.