I saw an interesting here's-how-it-works post about the minimax algorithm in the context of Tic Tac Toe. I wrote a tic-tac-toe player myself a few years ago, for a HackerRank puzzle, and of course back in the Dark Ages when I was doing my MS in Computer Science. But in this particular post, Jason Fox makes an interesting point: a perfect tic-tac-toe strategy is fatalistic.
While testing his player, he noticed that once the bot saw it couldn't win, it didn't seem to care how long it took to lose. He modified his algorithm to prefer longer fights, just to make it seem like it was putting up a fight.
Why would this behavior occur? And why does it bother us? Turns out it's simple: a perfect strategy assumes that its opponent is playing a perfect strategy as well, because that's easier to code. In other words, when scoring the different options, the perfect minimax strategy assigns its values by playing the other side perfectly as well. Mathematically, it doesn't matter whether we lose perfectly earlier or lose perfectly later, does it?
But that glosses over a fundamental truth out here in the world, and it's why it rankles to see this algorithm playing so oddly - when playing actual games, it's not at all uncommon for our opponent to screw up. And the longer we draw out the game, the more likely it is that they will. So intuitively, we want to fight a delaying action, because our own internal algorithms tell us that the longer we're in the game, the better off we are in terms of an eventual surprise win.
To reflect that, the tic-tac-toe minimax algorithm should really be modified to reflect the probability of error on the part of the opponent. Even a small change in that score should serve to weight longer fights a little higher, and that's enough for the algorithm to choose that path.
It would be instructive to write that up as a real article, but man, I've got stuff to do that's higher priority than that.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment