Updated Background information.md, Achtergrondinformatie.md, and README.md to describe the improved AI: playable vs non-playable threat scoring, fork detection bonus, and the split Phase 1 strategy. README now lists all three implementations and the AI strategy section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 KiB
Connect 4 AI: How the Computer Thinks
1. The Virtual Board
The computer doesn't see colored discs on a grid. It sees a table of numbers:
- 0 = Empty space
- 1 = Yellow disc
- 2 = Red disc
The board has 7 columns and 6 rows. After every move, a scan function checks all directions (horizontal, vertical, and both diagonals) to see if anyone has four in a row.
2. What is a "Ply"?
A ply is one move by one player. If the AI is set to ply 6, it looks 6 individual moves into the future. Since players alternate turns, ply 6 means the AI considers 3 of its own moves and 3 of the opponent's moves.
More plies = stronger play, but takes longer to calculate. On the ESP32-C3, ply 4 is nearly instant, ply 6 takes about a second, and ply 8-10 can take several seconds. The AI shows a pulsing light while it is thinking.
3. The Minimax Strategy
The basic idea
Imagine you are playing Connect 4 against a friend. Before you drop your disc, you think: "If I put my disc here, what will my friend do? And then what would I do after that?"
That is exactly what the computer does, except it checks every possible move, not just a few.
Two players, two goals
The AI calls the two players Max (itself) and Min (you):
- Max wants the highest possible score (the AI winning).
- Min wants the lowest possible score (you winning).
The AI assumes you will always make your best move. It doesn't hope you'll make a mistake.
A simple example
Imagine there are only 3 columns left and the AI can look 2 moves ahead. It builds a tree like this:
AI's turn (Max - pick the highest)
/ | \
col 2 col 3 col 4
/ \ / \ / \
Your turn (Min - pick the lowest)
... ... ... ... ... ...
+5 -3 +2 +8 -1 +4
- After column 2: you would pick the move scoring -3 (lowest = best for you).
- After column 3: you would pick the move scoring +2.
- After column 4: you would pick the move scoring -1.
The AI compares -3, +2, and -1, and picks column 3 because +2 is the best it can guarantee.
Scoring: How the AI Rates a Board
After playing out a "what if?" scenario, the AI needs to decide: is this a good result or a bad one? It uses a layered scoring system:
-
+1000 or more: "I win!" The AI found a way to get four in a row. The bonus points above 1000 depend on how quickly it can win. Winning in 2 moves scores higher than winning in 6 moves. This is why the AI always goes for the fastest victory — it never wastes time when it can finish the game.
-
-1000 or less: "I lose!" The opponent gets four in a row. Losing sooner gets an even worse score. This makes the AI fight hardest against moves that threaten an immediate loss.
-
Heuristic score: "I don't know yet, but I can tell how good this looks." When the AI has looked as far ahead as it can (it ran out of plies) and nobody has won, it evaluates the position using a heuristic — a quick estimate of who is in a stronger position.
The Heuristic: Reading the Board
Instead of calling every unsolved position "neutral," the AI examines every possible group of four consecutive cells on the board (horizontal, vertical, and both diagonals — 69 groups in total). For each group, it counts pieces:
- 3 AI pieces + 1 empty (playable): The empty cell can be filled right now (it's on the bottom row or has a piece below it). This is an immediate threat. Score: +100.
- 3 AI pieces + 1 empty (not yet playable): The empty cell is floating in the air — the threat exists but can't be used yet. Score: +40.
- 2 AI pieces + 2 empty: A promising setup that could develop into a threat. Score: +5.
- 3 opponent pieces + 1 empty (playable): An immediate danger. Score: -100.
- 3 opponent pieces + 1 empty (not yet playable): A future danger. Score: -40.
- 2 opponent pieces + 2 empty: The opponent is building something. Score: -5.
- Mixed groups (both players have pieces in the same group): Blocked — nobody can win here. Score: 0.
On top of that, the AI uses two more scoring bonuses:
- Center column control: +3 per AI piece in the center column, -3 per opponent piece. The center column is involved in more winning lines than any other column, so controlling it is valuable.
- Fork detection: If a player has two or more three-in-a-row threats at the same time, that's a fork — the opponent can only block one per turn, so the other wins the game. The AI adds a large bonus (+200 or -200) when it detects a fork, making it aggressively pursue fork setups and desperately avoid letting the opponent create one.
All these scores add up. The maximum possible heuristic score is well below 1000, so it never interferes with actual win/loss detection — a guaranteed win always beats the best heuristic position.
This heuristic means the AI can now tell the difference between a strong position (many threats being built, especially playable ones) and a weak one (the opponent has all the threats), even when it can't see a forced win or loss within its search depth.
Why the center column matters
The AI always checks the center column first (column 3), then works outward (2, 4, 1, 5, 0, 6). The center column is involved in more possible winning lines than the edges, so checking it first helps the AI find good moves faster and skip bad ones sooner (thanks to alpha-beta pruning). The heuristic also gives a small bonus for center control, reinforcing this natural advantage.
4. Alpha-Beta Pruning: The Smart Shortcut
The problem
Looking ahead 8 plies in Connect 4 means exploring millions of board positions. Even a fast microcontroller can't check them all in a reasonable time.
The solution
Alpha-Beta pruning is a way to skip branches of the tree that can't possibly change the final decision.
Think of it like shopping for a birthday present. You visit Shop A and find a nice toy for 10 euros. Then you go to Shop B. The first item you see costs 15 euros, and you notice everything else in Shop B is even more expensive. You don't need to check every item in Shop B - you already know Shop A is better. You leave Shop B and save time.
The AI does the same thing:
- Alpha is the best score the AI (Max) has found so far. Think of it as "I already know I can do at least this well."
- Beta is the best score the opponent (Min) has found so far. Think of it as "The opponent already knows they can limit me to at most this."
When the AI is exploring a branch and discovers that the score can never beat what it already has (beta <= alpha), it prunes (cuts off) that entire branch. It skips all remaining moves in that branch because they can't change the outcome.
How much does it help?
In practice, pruning lets the AI skip 50-90% of the positions it would otherwise need to check. This is why the column order matters - the AI checks the center column first (column 3), then works outward. Good moves tend to be near the center, so checking them first leads to better pruning and faster search.
5. The Three-Phase Move Strategy
Before running the expensive minimax search, the AI takes two quick shortcuts:
-
Can I win right now? The AI checks all columns for a winning move. If any column completes four in a row, it takes that move immediately. No need to think further. Importantly, the AI scans every column for its own win before checking for threats — this ensures it never accidentally blocks an opponent's threat when it could win the game outright.
-
Can my opponent win next turn? Only after confirming there is no instant win, the AI checks all columns for opponent threats. If the opponent could win by playing in any column, the AI blocks it. Missing this would be a fatal mistake.
-
Deep search. Only if there are no immediate wins or threats does the AI run the full minimax search with alpha-beta pruning and the heuristic evaluation.
This three-phase approach makes the AI both fast (instant reactions to obvious moves) and smart (deep strategic thinking when needed).
6. Demo Mode: Asymmetric Skill
In demo mode, two AI players play against each other. To make the games interesting (rather than always ending in a draw), each player is randomly assigned a different search depth. One player might look 5 moves ahead while the other only looks 3 moves ahead. The stronger player can find winning setups that the weaker one misses, leading to exciting games with real winners. Who gets the advantage is randomized each game.
7. Blunder Mode
Normally, the AI always plays the best move it can find. But that can be frustrating for younger or casual players who never get to win. Blunder mode gives the AI a configurable chance (for example 20%) to make a random move instead of running the deep minimax search. When a blunder happens, the AI simply drops a disc in a random open column. It still plays normally the rest of the time, so the game feels real - but every now and then the AI makes a silly mistake that a sharp player can punish.
Blunders never override an instant win or block. If the AI can win right now, or if the opponent is about to win, the AI always makes the correct move. Blunders only replace the deep search on turns where there is no immediate threat.
8. Responsive Controls
The ESP32-C3 is a single-core processor. When the AI is thinking, it could block all input for several seconds. Two techniques keep the game responsive:
-
Mid-search button checks: During the minimax search, the AI periodically checks whether the player has pressed the button. If so, it immediately abandons the search.
-
Abort flag: A global flag (
abortAi) propagates through all levels of the recursive search. Once set, every level of the search returns immediately, unwinding the entire calculation in microseconds.