connect_four_pico/Background information.md

# Connect 4 AI: How the Computer Thinks

## 1. The Virtual Board

The computer doesn't see colored discs on a grid. It sees a table of numbers:

- **0** = Empty space
- **1** = Yellow disc
- **2** = Red disc

The board has 7 columns and 6 rows. After every move, a scan function checks all directions (horizontal, vertical, and both diagonals) to see if anyone has four in a row.

## 2. What is a "Ply"?

A **ply** is one move by one player. If the AI is set to ply 6, it looks 6 individual moves into the future. Since players alternate turns, ply 6 means the AI considers 3 of its own moves and 3 of the opponent's moves.

More plies = stronger play, but takes longer to calculate. On the ESP32-C3, ply 4 is nearly instant, ply 6 takes about a second, and ply 8-10 can take several seconds. The AI shows a pulsing light while it is thinking.

## 3. The Minimax Strategy

### The basic idea

Imagine you are playing Connect 4 against a friend. Before you drop your disc, you think: "If I put my disc here, what will my friend do? And then what would I do after that?"

That is exactly what the computer does, except it checks **every** possible move, not just a few.

### Two players, two goals

The AI calls the two players **Max** (itself) and **Min** (you):

- **Max** wants the highest possible score (the AI winning).
- **Min** wants the lowest possible score (you winning).

The AI assumes you will always make your best move. It doesn't hope you'll make a mistake.

### A simple example

Imagine there are only 3 columns left and the AI can look 2 moves ahead. It builds a tree like this:

```
                    AI's turn (Max - pick the highest)
                   /            |            \
              col 2          col 3          col 4
              /    \         /    \         /    \
         Your turn (Min - pick the lowest)
         ...     ...     ...     ...     ...     ...
         +5      -3      +2      +8      -1      +4
```

1. After column 2: you would pick the move scoring -3 (lowest = best for you).
2. After column 3: you would pick the move scoring +2.
3. After column 4: you would pick the move scoring -1.

The AI compares -3, +2, and -1, and picks column 3 because +2 is the best it can guarantee.

### Scoring: How the AI Rates a Board

After playing out a "what if?" scenario, the AI needs to decide: is this a good result or a bad one? It uses a very simple scoring system with only three possible outcomes:

- **+1000 or more: "I win!"** The AI found a way to get four in a row. The bonus points above 1000 depend on how quickly it can win. Winning in 2 moves scores higher than winning in 6 moves. This is why the AI always goes for the fastest victory — it never wastes time when it can finish the game.

- **-1000 or less: "I lose!"** The opponent gets four in a row. Losing sooner gets an even worse score. This makes the AI fight hardest against moves that threaten an immediate loss.

- **0: "I don't know yet."** The AI looked as far ahead as it could (it ran out of plies) and nobody won. It simply calls this position "neutral" — not good, not bad.

That's it — the AI does not give extra points for having three in a row, controlling the center, or any other clever trick. It relies entirely on looking many moves ahead to figure out which moves lead to wins and which ones don't. If it can't see a win or loss within its search depth, every position looks the same.

### Why the center column matters

Even though the AI doesn't give bonus points for playing in the center, it always checks the center column first (column 3), then works outward (2, 4, 1, 5, 0, 6).
The center column is involved in more possible winning lines than the edges, so checking it first helps the AI find good moves faster and skip bad ones sooner (thanks to alpha-beta pruning).

## 4. Alpha-Beta Pruning: The Smart Shortcut

### The problem

Looking ahead 8 plies in Connect 4 means exploring millions of board positions.
Even a fast microcontroller can't check them all in a reasonable time.

### The solution

**Alpha-Beta pruning** is a way to skip branches of the tree that can't possibly change the final decision.

Think of it like shopping for a birthday present. You visit Shop A and find a nice toy for 10 euros.
Then you go to Shop B. The first item you see costs 15 euros, and you notice everything else in Shop B is even more expensive.
You don't need to check every item in Shop B - you already know Shop A is better. You leave Shop B and save time.

The AI does the same thing:

- **Alpha** is the best score the AI (Max) has found so far. Think of it as "I already know I can do at least this well."
- **Beta** is the best score the opponent (Min) has found so far. Think of it as "The opponent already knows they can limit me to at most this."

When the AI is exploring a branch and discovers that the score can never beat what it already has (beta <= alpha), it **prunes** (cuts off) that entire branch. It skips all remaining moves in that branch because they can't change the outcome.

### How much does it help?

In practice, pruning lets the AI skip 50-90% of the positions it would otherwise need to check. This is why the column order matters - the AI checks the center column first (column 3), then works outward. Good moves tend to be near the center, so checking them first leads to better pruning and faster search.

## 5. The Three-Phase Move Strategy

Before running the expensive minimax search, the AI takes two quick shortcuts:

1. **Can I win right now?** The AI tries placing its disc in each column. If any column completes four in a row, it takes that move immediately. No need to think further.

2. **Can my opponent win next turn?** The AI checks if the opponent could win by playing in any column. If so, it blocks that column. Missing this would be a fatal mistake.

3. **Deep search.** Only if there are no immediate wins or threats does the AI run the full minimax search with alpha-beta pruning.

This three-phase approach makes the AI both fast (instant reactions to obvious moves) and smart (deep strategic thinking when needed).

## 6. Demo Mode: Asymmetric Skill

In demo mode, two AI players play against each other. To make the games interesting (rather than always ending in a draw), each player is randomly assigned a different search depth. One player might look 5 moves ahead while the other only looks 3 moves ahead. The stronger player can find winning setups that the weaker one misses, leading to exciting games with real winners. Who gets the advantage is randomized each game.

## 7. Blunder Mode

Normally, the AI always plays the best move it can find. But that can be frustrating for younger or casual players who never get to win. **Blunder mode** gives the AI a configurable chance (for example 20%) to make a random move instead of running the deep minimax search. When a blunder happens, the AI simply drops a disc in a random open column. It still plays normally the rest of the time, so the game feels real - but every now and then the AI makes a silly mistake that a sharp player can punish.

Blunders never override an instant win or block. If the AI can win right now, or if the opponent is about to win, the AI always makes the correct move. Blunders only replace the deep search on turns where there is no immediate threat.

## 8. Responsive Controls

The ESP32-C3 is a single-core processor. When the AI is thinking, it could block all input for several seconds. Two techniques keep the game responsive:

1. **Mid-search button checks:** During the minimax search, the AI periodically checks whether the player has pressed the button. If so, it immediately abandons the search.

2. **Abort flag:** A global flag (`abortAi`) propagates through all levels of the recursive search. Once set, every level of the search returns immediately, unwinding the entire calculation in microseconds.

## Further Reading

- [Connect Four - Mathematical Solution (Wikipedia)](https://en.wikipedia.org/wiki/Connect_Four#Mathematical_solution)
- [Minimax Algorithm (Wikipedia)](https://en.wikipedia.org/wiki/Minimax)
- [Alpha-Beta Pruning (Wikipedia)](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning)