OpenAI Spent $160,000 On Upwork For Minecraft Gamers To Train A Neural Net

OpenAI Spent $160,000 On Upwork For Minecraft Gamers To Train A Neural Net


From the video of VPT pursuing the making of a diamong pickaxe in Minecraft. The computer program did the feat in ten minute, which is half the time it would take for a skilled human player.

What is it worth to master the "diamond tools" in Minecraft?

Gaming

According to OpenAI, an artificial intelligence startup, $160,000 is enough.

This is the amount that OpenAI spent to hire Minecraft gamers on Upwork's job listings platform.

ZDNet Recommends

The 6 best Samsung smartphones: Find a new Galaxy. Essential home batteries: The best battery backup systems

Our top picks for pellet grills: Alternatives from charcoal and gas

The 5 best wireless headphones for great sound without tethering

The best GPS trackers & devices for children: Find your child fast

In a paper unveiled this week, "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos," OpenAI researchers Bowen Baker and team break ground in the use of large datasets to train a neural network to mimic human keystrokes to solve different tasks in the video game. OpenAI has also published a blog.

A multitude of neural networks have conquered different types of games through reinforcement learning. This includes DeepMind DeepMind’s AlphaZero which defeated chess, Go and Shogi. MuZero followed, which added the ability for Atari games to be handled.

Baker and his team sought to create a neural network for Minecraft's more complex open-world game environment. There, a variety of keystrokes allows players greater freedom than in Atari or chess games.

Also: AI in Sixty Seconds

The authors note that the research literature includes a "vast amount of" work on Minecraft. The VPT work is however unique because of its scope and size. "To our best knowledge, there is no published work which operates in the full human action space, including drag-and-drop inventory management, item crafting, and item crafting."

The work of building the neural network, called VPT, took place in two stages. The first stage required human contractors or game players. They assembled 4,500 hours in game play. The researchers later figured out that they only really needed about 2,000 hours.

Baker and team describe the process:

We opened the applications for a day and randomly selected 10 applicants for our first round of contractors. Later on, as we needed more information and because some contractors requested to terminate their contracts we added more applicants from our original pool and also referred contractors from our current work. The hourly rate paid to the contractors was $20, minus any applicable taxes and fees on Upwork. All of the results presented in this paper are based on about 4,500 hours of data (including data recorded to gather statistics of human play that was not used for training), which cost us around $90,000. We collected data we didn't use due to bugs in our recorders, as well as ideas that we never pursued. We spent $160k total on contractor compensation during the course the project. Sec. As we discussed in Sec. the foundation VPT model, BC fine-tuning to the earlygame_keyword dataset, and the RL fine-tuning results. Collecting the contractor_house dataset cost about $8000. Because we used the IDM trained on about 2000 hours of contractor data, the actual cost of contractor data for those results was around $40,000.

They added labels to the frames of video for actions such "inventory", to check a player’s object collection, using the "E” key; and "sneak," which allows a player to move "carefully in the current direction using the SHIFT keys. These actions are recorded as JSON text strings during game play and stored along with the video frames.

The gameplay frames were used to train an inverse dynamics model (or IDM) which learns what actions go along with which frames. The IDM is an amalgamation of several kinds of neural networks, including a 3-D conevolutional neuralnet and a ResNet that parses the video frames. There are also several Transformer networks of attention that predict the next videoframe.

Also: Sentient? Google LaMDA feels like a typical chatbot

The IDM's expertise is then applied to a much larger collection of video footage, which totals 70,000 hours of unlabeled Minecraft videos gathered from the web. The IDM applies "pseudolabels" on this much larger collection. In other words: The IDM and the contractor fees are a way of bootstrapping a large video training set.

The training regimen for VPT.

The authors note that, despite the cost of the contractor payment, this approach can result in significant cost savings. It would be far more expensive to collect contractor data equivalent of the 70,000 hours in Web videos.

"If we could inexpensively collect a labeled contractors dataset of a similar order to web_clean then this would be irrelevant; however, collecting that magnitude of data would have incurred millions of dollars."

The authors then use the 70,000 hours to train another neural network, also made of Transformer layers. This second neural network is used to replicate the user actions in videos. This practice is commonly known as "behavioral copying."

The purpose of this work is to find a way of training a general purpose computer agent that can use all the data available on the Internet. It does not have labels and can solve tasks that involve causality or meaning.

They write that the results of this paper "help pave the way to utilizing the wealth data on the internet for sequential decision domains."

The work can conceivably be used for numerous computer tasks that require sequences of mouse clicks and other human operator controls, they suggest.

"While our experiments are limited to Minecraft, we believe VPT provides a general formula for training behavioral priors into hard, yet generic action spaces in any domain that contains a large quantity of unlabeled data like computer usage."

Open-AI is best-known for its large language program GPT-3. It also uses a "pretrained approach" based on tons Web data that is not labeled. In a sense, Minecraft is expanding that approach to mimicry behavior in the domains of sequential computer tasks captured via camera.

Also: What is GPT-3 and how can it help your business? Everything your business needs about OpenAI's groundbreaking AI language program

The ultimate achievement in life is to surpass the time required by a person to complete one the most difficult tasks: obtaining a diamond pickaxe.

In Minecraft, diamond-based tools just last longer and can do more damage. Diamond pickaxes are the only ones that are specifically important to most gamers. You need a diamond pickaxe to mine obsidian and a fictional material called netherite, both of which are important for endgame activities such as enchanting tables and making netherite equipment.

After training the VPT in Minecraft tasks, the authors used a fine-tuning approach to develop a reinforcement learning neural network that could make a diamond pickaxe faster than normal.

They write that "To show the efficacy RL Fine-Tuning, we chose to set the challenging goal of obtaining a Diamond Pickaxe in 10 minutes starting from a new Minecraft survival world."

This is a difficult task for humans. They usually take twice as long to complete it, if at all.

This requires complex skills such as inventory management, mining, inventory management, crafting with or without a crafting table, tool usage, operating a furnace, and mining to the lowest depths where there are many hazards like enemies, lava, and so forth. 6). You can easily lose your progress by dropping, destroying, or even dying. It takes 20 minutes to obtain a diamond pickaxe, which is more than most people can do in a matter of seconds (24,000 actions).

The authors were conscious of the possibility that offensive content could be created when assembling the contractor data and unlabeled Web video for 70,000 hours. The contractors could theoretically use Minecraft’s open-world property to generate personally identifiable data and/or offensive material (e.g. by using Minecraft blocks to write their name or offensive messages, then finding a spot from which the message would be visible)," they write, though they didn't see this in the videos from contractors the authors watched.

"Officially, we train our BC [behavioral copying] models from videos of people playing Minecraft on the internet. If such behavior is shown in those videos, our model could possibly learn it. However we expect such behavior to not be common enough that our modeling would be able reproduce it," they write.

Where can such a general agent go? VPT, or its offspring will be able to do anything a person would do with a mouse, keyboard and mouse, including surfing social media, navigating map, and even booing.

Report Page