Machine Learning with Chess Part 01 — Parsing PGN Chess Games

Rahulkant
3 min readJul 11, 2022

A chess game is recorded in a text file with the extension .pgn. The format of this file is described by the chess program, and can be parsed by a computer to extract data from it. Portable Game Notation (PGN) records and stores the game moves in a text file with a few chess metadata.

I want to analyse hundreds of thousands of games, so first step is to gather the plain text files and parse them using the grammar of PGN files.

Chess

I am going to work on the Lichess Elite Database, which includes all standard games from lichess across players rated 2400+ and 2200+, excluding bullet matches. I am using a chess database of May, 2022 which has 355,535 games.

Now, we can use these files to watch and analyse our games in the future. We will use these PGN files in order to extract data from our chess games using the chess library in Python. For instance, this 2021 game between Viswanathan Anand, and Jorden Van Foreest :

[Event “Grand Chess Tour Croatia Rapid & Blitz 2021”]
[White “Anand, Viswanathan”]
[Black “Van Foreest, Jorden”]
[WhiteFideId “5000017”]
[BlackFideId “1039784”]
[WhiteElo “2751”]
[BlackElo “2543”]
[Result “1–0”]
[Round “01”]
[TimeControl “900+5”]
[Date “2021.07.07”]
[WhiteClock “0:03:19”]
[BlackClock “0:00:24”]

1. e4 c5 2. Nf3 e6 3. d4 cxd4 4. Nxd4 Nc6 5. Nc3 Qc7 6. g4 a6 7. Be3 Nxd4 8. Qxd4 b5 9. O-O-O Bb7 10. Kb1 Nf6 11. f3 Rc8 12. g5 Nh5 13. Qd2 Be7 14. Bh3 b4 15. Ne2 d5 16. Bg4 g6 17. Bd4 O-O 18. e5 Ng7 19. h4 a5 20. h5 Ba6 21. Ng3 a4 22. f4 b3 23. cxb3 axb3 24. axb3 Nf5 25. Bxf5 gxf5 26. g6 fxg6 27. hxg6 hxg6 28. Qh2 Kf7 29. Nxf5 exf5 30. Qh7+ Ke6 31. Qxg6+ Bf6 32. Bc3 Qg7 33. Rh6 Qxg6 34. Rxg6 Be2 35. Rd2 Bf3 36. b4 Rc4 37. Rd4 Rc7 38. Kc1 Rg7 39. Rxf6+ Rxf6 40. exf6 Rb7 41. Kd2 Be4 42. Ke3 Rh7 43. b5 Rh3+ 44. Kd2 Rh2+ 45. Kc1 Rh1+ 46. Rd1 Rh2 47. Rg1 d4 48. Bxd4 Rc2+ 49. Kd1 Rc4 50. Be5 Rc5 51. Rg7 Rd5+ 52. Ke1 Rd8 53. Re7+ Kd5 54. f7 1–0

The annotations appear in brackets, starting with the tag followed by a quoted string. After the annotations, the moves are listed by move number followed by white’s move, then black’s move.

Now that we have an understanding of our Chess grammar, Let’s Parse the single chess game.

Now, In order to Implement Machine Learning Model, we need extract valuable data and store it in .csv file. Here, I am only extracting Black and White Player, Elo Rating of Black and White and the Mainline Game.

Great! .csv file is created, Now we can create Machine Learning Model. I have a couple of projects in mind for chess game.

ThankYou !

--

--