Tag: AlphaGo

Four More Games

AlphaGo’s victory over Lee Sedol last night was stunning. I’m still gathering my thoughts and trying to figure out what happened.

The game analysis at Go Game Guru has been very helpful. But I have to wonder whether I can trust the commentary — maybe AlphaGo knew what it was doing?

Move 80 was described by Younggil An 8p as ‘slack’. I wonder whether AlphaGo at that point already calculated that it was winning, and that eliminating the aji (latent possibilities) in that area would be the best way to reduce the risk of losing. I would love to know more about AlphaGo’s evaluation of that move.

AlphaGo demonstrated that it’s good at fighting, and would not back down from a fight. It also showed excellent positional judgement and timing, managing to invade on the right side with 102, get just enough out of that fight, and end with sente to play the huge move of 116 to take the upper left corner. And it’s not letting up in the endgame once it sees a path to victory. We have not seen any ko fights yet, but there’s no reason to believe AlphaGo couldn’t handle those well.

For the remaining games, I think Lee Sedol must establish a lead by mid game at the latest to have a chance of winning. As the game gets closer to the end, there are fewer moves to consider, and there are fewer moves for the Monte Carlo playouts to reach the end of the game, so AlphaGo will just get stronger.

Move 7 was a new move. At least, it’s not in the GoGoD database of 85,000 professional game records. With SmartGo’s side-matching, only two games (both played in 2013) match that right-side position. He probably tried to make sure AlphaGo couldn’t just rely on known patterns, but that gambit didn’t pay off. I don’t think Lee Sedol will try a similar move tonight.

There are four more games; I would not count Lee Sedol out yet. He now knows what AlphaGo can do, and won’t underestimate it again. We have some very exciting games to look forward to.

Late Nights with AlphaGo

Google has announced times and time limits for Lee Sedol’s match against AlphaGo. Games start at 13:00 (UTC+9), which means they’ll start in the evening the day before in the US:

  • Tuesday March 8: 8 p.m. PST, 11 p.m. EST
  • Wednesday March 9: 8 p.m. PST, 11 p.m. EST
  • Friday March 11: 8 p.m. PST, 11 p.m. EST
  • Saturday March 12: 8 p.m. PST, 11 p.m. EST
  • Monday March 14: 9 p.m. PST, midnight EST (DST!)

And Michael Redmond 9p will be commenting in English. Mark your calendars and stock up on popcorn.

Time limits are two hours per player, plus 3×1-minute byo-yomi. Thus after basic time is up, players can use up to a minute for every move; three times they can spend an extra minute. The article says each game is thus expected to last 4-5 hours, but if AlphaGo uses its full two hours (instead of playing very fast as in the Fan Hui match), it could easily go longer. Be prepared for some late nights.

I’m glad they increased the time limits from the Fan Hui match; this should be a very exciting match. Just one new tidbit since my Lee Sedol vs AlphaGo blog post — on the Computer Go mailing list, Aja Huang commented today: “We are still preparing hard for the match. … AlphaGo is getting stronger and stronger.”

Lee Sedol vs AlphaGo

Google’s AlphaGo beat the European Go champion Fan Hui in October, and Google has challenged Lee Sedol to a five-game match in March. How can we assess his chances?

Analysis of October games

As an amateur 3 dan, I can understand much of what’s going on in the five games AlphaGo played against Fan Hui, but professional analysis reveals subtleties and deeper issues:

Also, this PDF by the British Go Association includes some history of computer Go, more background on the match, and an analysis with comments by Hajin Lee 3p.

The conclusions I draw from all of this:

  • Fan Hui made a number of mistakes that Lee Sedol is unlikely to make.
  • While AlphaGo played very well, it did make some mistakes in those five games. Also, Fan Hui did win two unofficial games against AlphaGo (sadly unpublished).
  • AlphaGo’s reading (looking ahead many moves to determine whether a plan will work or not) is very strong.
  • AlphaGo sometimes mimics the play of professional players and follows standard patterns that may not be optimal in that specific situation. Professional players are more creative and will vary their play more based on subtle differences in other parts of the board.
  • AlphaGo may not have a nuanced enough understanding of the value of sente (having the initiative).
  • AlphaGo doesn’t show deep understanding of why a move is played, or the far-reaching effects of a move.

So I’m confident that the October version of AlphaGo was weaker than Lee Sedol. And those five games only give us a limited view of AlphaGo; there are probably more weaknesses to be discovered. Ko was only played once; AlphaGo did well, but we don’t know how it will do in a complex, protracted ko fight. We don’t know how it will do when the fighting gets more complex. We don’t know how it will do when the board is more fluid and multiple local positions are left unresolved.

March version of AlphaGo

Google has five months to improve AlphaGo. So what can they do?

During the Deep Blue match, engineers could make adjustments to Deep Blue’s algorithm between games, such as fixing a bug after game 1. For AlphaGo, it may not be as easy. For example, move 31 in game 2 (shown below) was a mistake by AlphaGo (see Myungwan Kim’s analysis). It’s the right local move under the right circumstances, but AlphaGo doesn’t have a sufficient understanding of that board position. How can they fix that issue? Feeding that position to the neural network won’t help balance out what it has learned in 30 million other positions. There’s no quick fix for that kind of mistake.

Blog alphago game 2 move 31

Avoiding specific mistakes is easier. AlphaGo was not using an opening library in October, but Google could easily add that by March, making it at least possible to adjust play between games so Lee Sedol won’t be able to exploit a particular joseki mistake in multiple games.

There are many other improvements Google can make before March:

  • Google can refine AlphaGo’s neural networks. They used 30 million positions to train the value network for the Fan Hui match — they can use 100 million for the Lee Sedol match.
  • They can add extra training for the rollout policy. And then feed the improved rollout policy into the training of the value network.
  • They can fine-tune the balance between rollouts and the value network.
  • They can throw more computing power at the match itself. And the match will likely have longer time limits, so AlphaGo can calculate more during the opponent’s time.
  • And more. Google has a strong team that wants to win, and they’ll have other ideas up their sleeves.

Google has also expressed confidence, and they chose the opponent and timing for this next match. So I’d expect the March version of AlphaGo to be significantly stronger than the one that played last October. Its reading is going to be even better. Its assessment of the global position will be improved.

The main question is whether those improvements will be enough to remedy or at least balance out the weaknesses seen in the October version. Google made a huge leap forward with AlphaGo, creating a qualitative difference in computer play, not just a quantitative one. It’s hard to tell what another five months of work and neural network training can do.


The AlphaGo from last October was very strong, but probably not strong enough to beat Lee Sedol. With five months of work, I think AlphaGo is going to be a different beast in March, and it will be a very exciting match. If I had to bet? Lee Sedol will lose a game or two but win the match.

Learn Go Now

In March, Google’s AlphaGo is going to play a five-game match against Lee Sedol. For Go, this may become the equivalent of Kasparov’s match against Deep Blue. Tremendously exciting. And if you don’t know the game of Go, you’ll miss out.

So how do you get up to speed so you have a clue what’s going on? Several possibilities:

  • Friend or Go club: If you have a friend who has been trying to tell you about Go for years, now’s the time. Otherwise, check the American Go Association or the European Go Federation for a Go club near you. (In Salt Lake City, we meet every Thursday after 7 pm at the Salt Lake Roasting Co. — everybody is welcome, and we love to teach beginners about the game.)
  • App with tutorial: The SmartGo Player app for iPhone and iPad includes a tutorial that guides you through the (very simple) rules, then provides over 100 interactive problems to apply those rules (much more complex) and practice your tactics. Then as you play against the computer, it automatically adjusts the handicap to keep the game challenging.
  • Books about Go: There are a number of books that will give you a good introduction to the game , e.g. the “Learn to Play Go” series by Janice Kim followed by the “Graded Go Problems for Beginners” series. Here’s a list of beginner books. If you have an iPad or iPhone, use the Go Books app to read and solve problems interactively; if you prefer printed books, click on the Printed Book link in that list.

Enjoy learning about Go — it’s a game worth knowing. There’s a reason it has been played for thousands of years, and it will remain popular even after computers eventually conquer it.

Google’s AlphaGo Beats Professional Go Player

Two weeks ago SmartGo sponsored the Core Intuition podcast, and I encouraged listeners to learn Go before computers conquer the game. Little did I know that Google’s AlphaGo had already beaten a professional player in a five-game match back in October; they were just waiting for their paper to get published.

I had expected this day to be at least five years away, more likely a decade; this is an amazing accomplishment by Google’s DeepMind team. In my previous thoughts on Facebook and Google entering the race, I was right about tree search being essential, wrong about more Go-specific approaches being necessary.

So how did Google do it? They build on Monte Carlo Tree Search (MCTS), but refine and extend it, and combine it with neural networks. Their paper is full of interesting details; here’s my initial take on the most important factors contributing to their success:

  • Extending MCTS with position evaluation: At each node where they perform a Monte Carlo simulation (a rollout or playout), they also evaluate the position using a value function. The results from the simulations and the value function are combined; the combination performs better than either alone.
  • Improved rollout policy: They create a deep neural network for the rollout policy using supervised learning, then refine it with reinforcement learning. A neural network implementing this rollout policy takes 3 ms to select an action; they also trained a rollout policy that is 1000 times faster but less precise (based on a linear combination of pattern features instead of a neural net). They use the more accurate one during learning, the faster one during actual play.
  • Using rollouts to learn value function: Position evaluation has not worked well for computer Go in the past, but here they just need a reasonably good guess at how likely a position will lead to a win. They trained it with 30 million distinct positions (each sampled from a separate game to avoid overfitting), and used their best rollout policy to play to the end and determine the winner. This resulting value function is more accurate than a playout using their fast rollout policy, and while it’s less accurate than using their best rollouts using neural nets, it takes 15000 times less computation. (The paper also mentions truncated rollouts, not quite clear on those yet.)

The AlphaGo team has also found good solutions for an asynchronous MCTS algorithm, for combining fast MCTS operations with slow neural networks, and for hashing local moves during the playouts. Massive amounts of hardware are helpful too.

The match against Lee Sedol in March is going to be tremendously exciting. There’s still time to learn Go now so you can follow along and share in the excitement.