Saturday, August 8, 2020

ReBel is the name, poker is its game




... ''' History of AIs in Skill-based Games
May 2015 - four poker players named Bjorn Li, Dong Kim, Doug Polk, and Jason Les, battled against AI Claudico. The 80,000 hands challenge that lasted 8 weeks concluded with 3 of the 4 pros winning considerably against the program, though the computer experts called it a tie.

2016 - the AI named AlphaGO created by Google DeepMind software had beaten the best Go players in the world. Go is a Chinese game resembling chess involving two players and determined strategies.

January 2017 - the contest "Brains vs. Artificial Intelligence: Upping the Ante" involved AI Libratus defeating poker pros Daniel McAulay, Dong Kim, Jason Les, and Jimmy Chou.

2019 - Carnegie Mellon University (CMU) researchers developed AI systems, Cepheus and DeepStack, that are able to handle complex games like poker. After five years of research, the team claimed it had successfully solved 6-Max No-Limit Hold'em poker.

ReBel is the name, poker is its game
The name of the bot is the shortened form of "Recursive Belief-based Learning" which focuses on self-learning amid imperfect information conditions. In the paper's title it said "Combining Deep Reinforcement Learning and Search for Imperfect-Information Games, written by Noam Brown, Anton Bakhtin, Adam Lerer, and Qucheng Gong from the Facebook AI Research team."

ReBel was made based on the earlier poker AI Deepstack, the first bot to beat a human back in 2017. Just like AI Libratus, ReBel makes use of self-play to learn how to play heads-up No-Limit Hold'em. The main difference with ReBel versus other early poker AI versions is that it uses the so-called public belief states (PBS).

PBS is a new self-learning mechanism used by ReBel, which includes not only analyzing current information about the game but it also has an intuitive decision-making skills based on the previous moves made by the opponents.

ReBel takes into consideration not only the info about the visual game state, like the bet sizing, known cards, and even the range of hands the opponent may have, but it also accounts for each player's belief about the state they are in. This is somewhat similar to how a real human might consider whether an opponent thinks he is ahead or behind in a certain hand.

To make this possible, ReBel trains two different AI models via self-play reinforcement learning: a value network and a policy network. The bot then operates on PBS.

Simply put, ReBel not only analyzes the hand itself but it also analyzes how to opponent evaluates it, just like what successful (human) players do.


ReBel's Results Revealed
The Facebook team conducted experiments in which ReBel played two-player version of Hold'em, turn endgame Hold'em (a simplified version of the game with no raises on the first two betting rounds), and Liar's Dice.

Compared to all its predecessors, ReBel is obviously much faster - it spends at least 2 seconds less than Libratus. In general, no more than 5 seconds to decide and make a move.

So far the only poker player that battled against ReBel is Dong Kim - he was also one of the players who lost to Libratus. After 7,500 hands, the poker bot outperformed the human player for 0.165BB per hand, while Libratus scored 0.147BB. ''' ... 


Any other details below: 

1 comment:

  1. Wow, cool post. I’d like to write like this too – taking time and real hard work to make a great article… but I put things off too much and never seem to get started. Thanks though.
    asiagaming

    ReplyDelete