Google’s AlphaGo Beats Professional Go Player

Two weeks ago SmartGo sponsored the Core Intuition podcast, and I encouraged listeners to learn Go before computers conquer the game. Little did I know that Google’s AlphaGo had already beaten a professional player in a five-game match back in October; they were just waiting for their paper to get published.

I had expected this day to be at least five years away, more likely a decade; this is an amazing accomplishment by Google’s DeepMind team. In my previous thoughts on Facebook and Google entering the race, I was right about tree search being essential, wrong about more Go-specific approaches being necessary.

So how did Google do it? They build on Monte Carlo Tree Search (MCTS), but refine and extend it, and combine it with neural networks. Their paper is full of interesting details; here’s my initial take on the most important factors contributing to their success:

  • Extending MCTS with position evaluation: At each node where they perform a Monte Carlo simulation (a rollout or playout), they also evaluate the position using a value function. The results from the simulations and the value function are combined; the combination performs better than either alone.
  • Improved rollout policy: They create a deep neural network for the rollout policy using supervised learning, then refine it with reinforcement learning. A neural network implementing this rollout policy takes 3 ms to select an action; they also trained a rollout policy that is 1000 times faster but less precise (based on a linear combination of pattern features instead of a neural net). They use the more accurate one during learning, the faster one during actual play.
  • Using rollouts to learn value function: Position evaluation has not worked well for computer Go in the past, but here they just need a reasonably good guess at how likely a position will lead to a win. They trained it with 30 million distinct positions (each sampled from a separate game to avoid overfitting), and used their best rollout policy to play to the end and determine the winner. This resulting value function is more accurate than a playout using their fast rollout policy, and while it’s less accurate than using their best rollouts using neural nets, it takes 15000 times less computation. (The paper also mentions truncated rollouts, not quite clear on those yet.)

The AlphaGo team has also found good solutions for an asynchronous MCTS algorithm, for combining fast MCTS operations with slow neural networks, and for hashing local moves during the playouts. Massive amounts of hardware are helpful too.

The match against Lee Sedol in March is going to be tremendously exciting. There’s still time to learn Go now so you can follow along and share in the excitement.

Go at Facebook

Before I left Microsoft in 1999 to pursue SmartGo full-time, I tried to convince them to start a computer Go project. No luck. It appears my timing was off by a decade: Microsoft released Path of Go for Xbox in 2010. Now Facebook and Google are getting into the game. Will they have a chance against the top programs?

We don’t know much about Google yet, just a hint from Demis Hassabis about a surprise from their DeepMind project. But researchers at Facebook (Yuandong Tian and Yan Zhu) have written a paper with more details.

Facebook researchers built a Deep Convolution Neural Net (DCNN) that they trained with games from KGS and GoGoD to predict good moves to play. They’re achieving an impressive 1 dan in ranked games on KGS, which beats any previous pattern-based approach. However, when they combine their approach with Monte Carlo Tree Search (MCTS), the results are a mixed bag. Evaluating a position with a DCNN is computationally expensive (their paper mentions about 3,000 MCTS playouts for every DCNN board evaluation); it’s not clear that using a neural net to generate moves for tree search is the best use of computing power.

Neural nets can be a powerful solution for pattern recognition problems, and many such problems are relevant to Facebook, so I understand why they have chosen to focus on improved move prediction using DCNN. My guess is that Google will take a similar approach. However, pattern recognition works well for shallow problems; Go is a deep problem, lookahead is essential.

As long as Facebook and Google stick with trying to find general solutions to general problems, I don’t think top Go programs like Zen and Crazy Stone have anything to worry about. But once these giants decide to beat the strongest human players and are willing to focus on Go-specific solutions, it will get interesting.

Monte Carlo Tree Search

Go-playing programs have made remarkable progress over the last decade. I want to give a brief overview of what has changed and how today’s top programs work.

Chess programs are based on tree search and position evaluation: look some moves ahead, evaluate the position, and select the move that leads to the position with the best evaluation (assuming best opponent defense). Do this well enough and fast enough, and you can beat anybody.

Go programs tried variations on that approach, and made very slow progress. The high number of legal moves at each position was one obstacle, but the main issue was the lack of a simple evaluation function. In chess, the material on the board is a good first approximation, and you can add adjustments to get a solid evaluation. In Go, you have to solve many separate tactical problems to find out what’s happening on the board and who is ahead. Those tactical problems include life and death, captures, and connections; once you have computed those, you can figure out territories, make assumptions about boundaries and open areas, and get a halfway reasonable evaluation. Unreliable and hard to scale; not a recipe for success.

Today’s top programs like Zen and Crazy Stone use Monte Carlo Tree Search (MCTS). The two components of that approach:

  • Monte Carlo: Play thousands of semi-random games (playouts) from the current board position. Play all the way to the very end of the game, where evaluation is trivial.
  • Tree Search: Build a tree of moves, and explore more promising moves more deeply.

This sounds crazy at first, but the reason it works is that it completely bypasses the difficult problem of evaluating the current position. Our goal is simply to choose the move that gives us a better chance of winning. With thousands of playouts from the current position, better moves tend to lead to more wins.

Monte Carlo programs are stronger when they just optimize their chance of winning, not the number of points they win by. This leads to good risk management: play safe moves that assure a win when you’re ahead, take risks when you’re behind.

The playouts are not completely random: they use heuristics to respond to standard patterns. The heuristics need to be carefully balanced to avoid bias. There are many refinements in MCTS, but the principles are simple, and properly implemented, this approach yields a scalable program where quality of play improves with added computing power.

While current top Go programs are approaching top amateur strength, they still exhibit a weakness when dealing with multiple tactically complicated situations on the board. The global search may be powerful enough to solve each life and death problem separately, but not together. Improvements in handling local fights without losing the benefits of global MCTS are needed.

You can find more details at: