[refactor] Code, build flags and documentation.
This commit is contained in:
+88
-40
@@ -1,69 +1,117 @@
|
||||
# 🕹️ Connect 4 AI: How the Brain Works
|
||||
|
||||
To create a competitive Connect 4 experience on a small microcontroller, the game uses a mix of mathematical strategy and "shortcuts" to play like a human master.
|
||||
|
||||
---
|
||||
# Connect 4 AI: How the Computer Thinks
|
||||
|
||||
## 1. The Virtual Board
|
||||
|
||||
The computer sees the board as a grid of numbers. It uses a **7-column by 6-row** map where:
|
||||
The computer doesn't see colored discs on a grid. It sees a table of numbers:
|
||||
|
||||
- **0** = Empty space
|
||||
- **1** = Yellow (Human Player)
|
||||
- **2** = Red (Computer AI)
|
||||
- **1** = Yellow disc
|
||||
- **2** = Red disc
|
||||
|
||||
Every time a disc is dropped, a "Scan" function checks every possible direction (horizontal, vertical, and diagonal) to see if anyone has reached four in a row.
|
||||
The board has 7 columns and 6 rows. After every move, a scan function checks all directions (horizontal, vertical, and both diagonals) to see if anyone has four in a row.
|
||||
|
||||
---
|
||||
## 2. What is a "Ply"?
|
||||
|
||||
## 2. Thinking Ahead (The "What If?" Engine)
|
||||
A **ply** is one move by one player. If the AI is set to ply 6, it looks 6 individual moves into the future. Since players alternate turns, ply 6 means the AI considers 3 of its own moves and 3 of the opponent's moves.
|
||||
|
||||
The AI doesn't just look at the current board; it plays out thousands of "What if?" scenarios in its head.
|
||||
More plies = stronger play, but takes longer to calculate. On the ESP32-C3, ply 4 is nearly instant, ply 6 takes about a second, and ply 8-10 can take several seconds. The AI shows a pulsing light while it is thinking.
|
||||
|
||||
### The Minimax Strategy
|
||||
## 3. The Minimax Strategy
|
||||
|
||||
The AI uses a strategy called **Minimax**. It assumes that you will play your absolute best move, and it tries to find the move that leaves you with the worst possible outcome. It "maximizes" its own advantage while "minimizing" yours.
|
||||
### The basic idea
|
||||
|
||||
### Alpha-Beta Pruning (The Shortcut)
|
||||
Imagine you are playing Connect 4 against a friend. Before you drop your disc, you think: "If I put my disc here, what will my friend do? And then what would I do after that?"
|
||||
|
||||
Calculating every possible move in Connect 4 would take hours. To make the AI fast, it uses **Pruning**. If the AI starts calculating a move and realizes it’s definitely worse than a move it already found, it "prunes" (deletes) that entire branch of thought and moves on. This allows it to ignore up to 90% of useless moves.
|
||||
That is exactly what the computer does, except it checks **every** possible move, not just a few.
|
||||
|
||||
---
|
||||
### Two players, two goals
|
||||
|
||||
## 3. Scoring System (The AI’s Instincts)
|
||||
The AI calls the two players **Max** (itself) and **Min** (you):
|
||||
|
||||
Since the computer can't always see to the very end of the game, it uses a scoring system to guess which positions are strongest:
|
||||
- **Max** wants the highest possible score (the AI winning).
|
||||
- **Min** wants the lowest possible score (you winning).
|
||||
|
||||
- **Speed Matters:** The AI is rewarded more for a win that happens soon than a win that takes a long time. This gives it a "killer instinct" to end the game as quickly as possible.
|
||||
- **The Center is King:** The AI is programmed to prefer the middle column. Mathematically, the center column is involved in the most possible winning combinations, so the AI fights to control it early in the game.
|
||||
The AI assumes you will always make your best move. It doesn't hope you'll make a mistake.
|
||||
|
||||
---
|
||||
### A simple example
|
||||
|
||||
## 4. Being Responsive (Interrupt Handling)
|
||||
Imagine there are only 3 columns left and the AI can look 2 moves ahead. It builds a tree like this:
|
||||
|
||||
Your game runs on an **ESP32-C3**, which is a single-tasking processor. Normally, if the AI spends 2 seconds thinking, the buttons would feel "broken" or "frozen" until it finishes.
|
||||
```
|
||||
AI's turn (Max - pick the highest)
|
||||
/ | \
|
||||
col 2 col 3 col 4
|
||||
/ \ / \ / \
|
||||
Your turn (Min - pick the lowest)
|
||||
... ... ... ... ... ...
|
||||
+5 -3 +2 +8 -1 +4
|
||||
```
|
||||
|
||||
We solve this with two clever tricks:
|
||||
1. After column 2: you would pick the move scoring -3 (lowest = best for you).
|
||||
2. After column 3: you would pick the move scoring +2.
|
||||
3. After column 4: you would pick the move scoring -1.
|
||||
|
||||
1. **Mid-Thought Checks:** Every few milliseconds of calculation, the AI "pauses" for a microsecond to see if you have pressed the button.
|
||||
2. **Instant Exit:** If it detects you pressed the button while it was thinking, it abandons all calculations immediately and jumps back to the main menu.
|
||||
The AI compares -3, +2, and -1, and picks column 3 because +2 is the best it can guarantee.
|
||||
|
||||
---
|
||||
### Scoring
|
||||
|
||||
## 5. Summary of an AI Move
|
||||
The AI assigns scores to board positions:
|
||||
|
||||
When it is the computer's turn, it follows these steps in a split second:
|
||||
- **+1000 or more:** The AI wins. A faster win gets a higher score, so the AI goes for the quickest victory.
|
||||
- **-1000 or less:** The opponent wins. A faster loss gets a more negative score, so the AI fights hardest against immediate threats.
|
||||
- **0:** Nobody has won and the search depth ran out. The position is neutral.
|
||||
|
||||
1. **Check for Lethal:** Can I win right now? If yes, take it.
|
||||
2. **Check for Danger:** Can the human win on their next move? If yes, block it.
|
||||
3. **Search:** Look at the middle columns first, then the edges.
|
||||
4. **Prune:** Throw away bad moves immediately to save time.
|
||||
5. **Act:** Choose the move that leads to the quickest victory.
|
||||
This scoring is why the AI has "killer instinct" - it doesn't just try to win, it tries to win as fast as possible.
|
||||
|
||||
---
|
||||
## 4. Alpha-Beta Pruning: The Smart Shortcut
|
||||
|
||||
## 📚 References & Further Reading
|
||||
### The problem
|
||||
|
||||
- [The Mathematical Solution to Connect 4](https://en.wikipedia.org/wiki/Connect_Four#Mathematical_solution)
|
||||
- [How the Minimax Algorithm Works](https://en.wikipedia.org/wiki/Minimax)
|
||||
- [Understanding Alpha-Beta Pruning](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning)
|
||||
Looking ahead 8 plies in Connect 4 means exploring millions of board positions. Even a fast microcontroller can't check them all in a reasonable time.
|
||||
|
||||
### The solution
|
||||
|
||||
**Alpha-Beta pruning** is a way to skip branches of the tree that can't possibly change the final decision.
|
||||
|
||||
Think of it like shopping for a birthday present. You visit Shop A and find a nice toy for 10 euros. Then you go to Shop B. The first item you see costs 15 euros, and you notice everything else in Shop B is even more expensive. You don't need to check every item in Shop B - you already know Shop A is better. You leave Shop B and save time.
|
||||
|
||||
The AI does the same thing:
|
||||
|
||||
- **Alpha** is the best score the AI (Max) has found so far. Think of it as "I already know I can do at least this well."
|
||||
- **Beta** is the best score the opponent (Min) has found so far. Think of it as "The opponent already knows they can limit me to at most this."
|
||||
|
||||
When the AI is exploring a branch and discovers that the score can never beat what it already has (beta <= alpha), it **prunes** (cuts off) that entire branch. It skips all remaining moves in that branch because they can't change the outcome.
|
||||
|
||||
### How much does it help?
|
||||
|
||||
In practice, pruning lets the AI skip 50-90% of the positions it would otherwise need to check. This is why the column order matters - the AI checks the center column first (column 3), then works outward. Good moves tend to be near the center, so checking them first leads to better pruning and faster search.
|
||||
|
||||
## 5. The Three-Phase Move Strategy
|
||||
|
||||
Before running the expensive minimax search, the AI takes two quick shortcuts:
|
||||
|
||||
1. **Can I win right now?** The AI tries placing its disc in each column. If any column completes four in a row, it takes that move immediately. No need to think further.
|
||||
|
||||
2. **Can my opponent win next turn?** The AI checks if the opponent could win by playing in any column. If so, it blocks that column. Missing this would be a fatal mistake.
|
||||
|
||||
3. **Deep search.** Only if there are no immediate wins or threats does the AI run the full minimax search with alpha-beta pruning.
|
||||
|
||||
This three-phase approach makes the AI both fast (instant reactions to obvious moves) and smart (deep strategic thinking when needed).
|
||||
|
||||
## 6. Demo Mode: Asymmetric Skill
|
||||
|
||||
In demo mode, two AI players play against each other. To make the games interesting (rather than always ending in a draw), each player is randomly assigned a different search depth. One player might look 5 moves ahead while the other only looks 3 moves ahead. The stronger player can find winning setups that the weaker one misses, leading to exciting games with real winners. Who gets the advantage is randomized each game.
|
||||
|
||||
## 7. Responsive Controls
|
||||
|
||||
The ESP32-C3 is a single-core processor. When the AI is thinking, it could block all input for several seconds. Two techniques keep the game responsive:
|
||||
|
||||
1. **Mid-search button checks:** During the minimax search, the AI periodically checks whether the player has pressed the button. If so, it immediately abandons the search.
|
||||
|
||||
2. **Abort flag:** A global flag (`abortAi`) propagates through all levels of the recursive search. Once set, every level of the search returns immediately, unwinding the entire calculation in microseconds.
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [Connect Four - Mathematical Solution (Wikipedia)](https://en.wikipedia.org/wiki/Connect_Four#Mathematical_solution)
|
||||
- [Minimax Algorithm (Wikipedia)](https://en.wikipedia.org/wiki/Minimax)
|
||||
- [Alpha-Beta Pruning (Wikipedia)](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning)
|
||||
|
||||
Reference in New Issue
Block a user