AI Playing Games: Crash Course AI #12
Articles,  Blog

AI Playing Games: Crash Course AI #12


Jabril: D7
John-Green-bot: Miss. John-Green-bot: I-9
Jabril: NOOOOOOOOOO Hey, I’m Jabril and welcome to Crash Course
AI! John-Green-bot might struggle with some things
like natural language processing and moving through the real world, but using AI he’s
pretty good at board games. AI researchers spend a LOT of time trying
to teach AI how to beat humans at games, & this isn’t just because games are fun. Games provide constrained scenarios for testing
new AI algorithms and approaches. In a game, it’s easy to know whether the
AI is winning or losing, because there’s usually a score or some objective measure
of “winning.” This is great because AI learns from examples,
trying things out, and slowly improving. Games basically provide their own training
data, which is a big relief because AI systems need lots of training data to get really good. An AI can even play against itself to generate
training data and evolve better game strategies. Or an AI can be programmed to look at previous
games (even games played by expert humans) for strategies that lead to victory. Comparing AIs against expert human gamers
can also help us figure out how an AI is improving over time. This comparison also gives us a sense of the
difficulty of problems an AI can solve. In other words, if I can teach John-Green-Bot
to beat me at battleship, I can probably also teach him to beat me at any game that’s
simpler than battleship. Finally, games are cool (and that’s important
too!). Sometimes AI programming can feel a bit difficult
or dry because of all the math and troubleshooting, and games can provide a fun motivation to
learn all this stuff. This is the reason why some of my first AI
demos were games. For all these reasons, games and computing
have a rich history that’s worth diving into. For that, let’s go to the Thought Bubble. Humans have always been fascinated by the
idea that machines could beat us at our own strategy games. In 1770, the inventor Wolfgang von Kempelen
revealed an invention to the Empress Maria Theresa of Austria. The Mechanical Turk was a clock-like contraption
that appeared to play chess. Chess pieces were attached to rods that were
hooked into a bathtub-sized box. After Empress Maria made a move on the board,
she turned a crank that activated the machine, which would move chess pieces mechanically. To her surprise, the machine was able to beat
most challengers. However, it was an elaborate hoax and The
Mechanical Turk was actually controlled by a person hidden inside! Getting a machine to play chess is actually
really complicated. So when AI researchers first tried to tackle
the problem in the late 1940s and early 1950s, they focused on simpler chess situations. Like, games with only a few pieces remaining
on the board or full games played on a small 6×6 board without any bishops. At the same time, researchers worked on an
AI that could play checkers, because checkers looked easier… although it was really almost
as complicated. The first program to play a legitimate game
of checkers was built by IBM in 1956. And, in a classic cold war move, two programs
that could play a full chess game were developed in parallel by the US and Russia in 1957. But these programs didn’t get /good/ for
another 40 years. Checkers was first, with a program called
Chinook which started dominating masters in 1995. Chess followed when a computer called Deep
Blue beat the chessmaster Garry Kasparov in 1997. Thanks Thought Bubble. Since then, strategy games have been mastered
one-by-one, with the most recent victories over humans at Go in 2017, DOTA 2 in 2018,
and Starcraft II in 2019. Okay, so the best way to understand the difficulty
of teaching AI to play games is through an example. Oh John Green Bot. So let’s start with a really simple goal:
teaching John-Green-bot to play tic-tac-toe. One of the ways that we can think about playing
tic-tac-toe is as a tree with all the possible moves from any given state of what the game
board looks like. For example, if this is the current game state,
it’s John-Green-bot’s turn, and he’s using Xs… there are three places he can
go. We can draw a tree representing possible outcomes
for each of these options, and all of the options his opponent (me, or anyone else)
can take: Because computers think with numbers, each
outcome can be assigned a reward — a number like a 1 for a win, and -1 for a loss or tie. Basically, John-Green-bot will need to search
through the tree of possibilities to find his win. To decide which choice to make, John-Green-bot
will assume that in each tree, both he AND his opponent will make the best possible choices. In other words, his policy (or his framework
for making decisions) will alternate between choosing the branch that will maximize the
outcome of winning on his turn, and minimize the outcome of his opponent winning on their
turn. This is called the minimax algorithm. Then, each game state can be assigned a value
based on how likely it leads to John-Green-bot winning, and he can decide which move to make
based on his policy. Looking at this tree, John-Green-bot will
always pick option 1.0, and win the game! Of course, this was a pretty small tree because
we were looking at a game in progress. To draw the whole tic-tac-toe tree from beginning
to end, we would need to represent about 250,000 boards. That seems like a lot, but it would take like
a half a second for a powerful modern computer to compute this many options. By laying out all the possibilities and taking
the paths that led to a win, John-Green-bot can solve tic-tac-toe. This means that John-Green-bot will always
achieve the best possible outcome, either a win or a tie, no matter how his opponent
plays. Thanks John Green Bot. But we can’t solve all games this way. Checkers, for example, has about 10 to the
20th power board states… or 10 followed by 20 zeros. That’s more board states than there are
grains of sand on Earth! Chess has 10 to the 50th power board states. And Go has 10 to the 250th power board states. To put those huge numbers into perspective,
there are only 10 to the 80th atoms in the entire known universe! Computer scientists have theorized that it
would be impossible for conventional computers to calculate this many states due to the laws
of physics. Like, for example, if you combined all planets
and stars and everything in the whole universe into a single supercomputer, it still wouldn’t
be powerful enough to solve the game of Go. But some people have hope that quantum computers
may be able to get there one day… So, if figuring out all of the board states
could be mathematically impossible, how did computers beat the number one ranked human
masters in Chess and Go? Many modern systems, including Google’s
AlphaGo computer that beat a human master in Go in 2017, use an algorithm called Monte
Carlo Tree Search. Monte Carlo is a famous casino, so whenever
you see the term “monte carlo,” it’s a good bet that the algorithm will be using
randomness and chance to solve a problem. Combining Monte Carlo randomness and regular
tree search like minimax, modern game AIs decide which part of the huge tree to search
by guessing at odds. Basically, they want higher odds that the
part of the game tree they search will lead to a win. But these aren’t just random guesses like
we would make in many casino games, AI systems can simulate millions of “what-if” scenarios
and use math to estimate the likelihood of winning if they choose one path or another. In each “what-if” scenario, the AI considers
making one particular move and then simulates playing a large number of (but not all) possible
games, where the next moves are chosen at random. By averaging these possible outcomes, the
AI estimates how “good” that particular move is. It’s so much faster to estimate a handful
of choices than exhaustively calculate each branch of the game tree. And some computers can even do this estimation
in real time. One example of this is Google’s DeepMind
which defeated human professional players at Starcraft II in 2019 — where time is
very critical. Of course, Starcraft II, Go, and Tic-Tac-Toe
aren’t all the types of games that humans play. Other games require other strategies and have
other computational challenges: IBM’s Watson question-answering system was
able to beat human Jeopardy! champions in two televised matches in 2011. Watson listened for keywords in the clue and
tried to use a knowledge graph to figure out responses. And we’ll talk more about knowledge graphs
in a future episode. Watson wasn’t perfect and struggled a bit
with context. For example, it famously guessed “What is
Toronto?” on something in the category “US Cities.” But Watson was still able to do better than
human contestants overall. Evolutionary neural networks use the environment
as an input, like reinforcement learning. But this approach introduces /multiple/ agents
who try multiple neural network structures, and then build on successful ones for the
next generation. Sort of like animals, the ones that are better
at surviving get to pass on their genes. For example, the AI MarI/O can learn how to
play a Super Mario World level by telling MarI/O what buttons it can push and that getting
farther to the “right” in the level is good. This AI will start by basically mashing buttons
at random, but as some button mashes get it farther to the right, it remembers and learns
from those successful attempts. In the next lab, we’ll build our own AI
to use this approach to /crush/ a video game that we built where John-Green bot destroys
trash. So, are there any games that are safe to play,
where humans will always have an edge and AI won’t be able to beat us? Computers definitely seem to struggle with
parts of language like humor, irony, metaphor, and wordplay. Computers also aren’t great at understanding
and predicting real people, who don’t always act “optimally,” so social games could
be more difficult too. But AI systems are finding some success in
bluffing games like online poker, so it’s important not to underestimate them. John Green Bot: All – in. Computers might also struggle with creativity
or surprise, because there’s not a really clear way to assign values to states. It’s difficult to assign a number to “how
creative” something is, compared to saying “go as far right as you can in the Mario
level” or “achieve a winning state in a chess game.” So, considering all of that, maybe games like
charades would be pretty stacked for a human victory. Or… what about pictionary? Hide-and seek??? We’d love to hear in the comments what games
you think are safe from AI. But in the next episode, which is another
lab, we’ll program an AI system to learn how to play an arcade-like game… and I’ll
beg John Green bot for my poker money back. I’ll see ya then Crash Course is produced in association with
PBS Digital Studios. If you want to help keep Crash Course free
for everyone, forever, you can join our community on Patreon. And if you want to learn more about the history
of games, we have a whole series about that.

78 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *