[update] Documentation for heuristic, fork detection, and playable threats.

Updated Background information.md, Achtergrondinformatie.md, and README.md to describe the improved AI: playable vs non-playable threat scoring, fork detection bonus, and the split Phase 1 strategy. README now lists all three implementations and the AI strategy section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
[fix] Add heuristic evaluation, fork detection, and Phase 1 win/block split to AI.
2026-03-27 17:00:10 +01:00 · 2026-03-27 16:59:55 +01:00
6 changed files with 113 additions and 29 deletions
@@ -69,17 +69,22 @@ Na het doorspelen van een "wat als?"-scenario, moet de AI beslissen: is dit een
 In plaats van elke onbesliste positie "neutraal" te noemen, bekijkt de AI elke mogelijke groep van vier opeenvolgende cellen op het bord (horizontaal, verticaal en beide diagonalen — 69 groepen in totaal). Voor elke groep telt hij de schijfjes:
- **3 AI-schijfjes + 1 leeg:** Dit is een sterke dreiging — de AI heeft nog maar één zet nodig om hier te winnen. Score: **+50**.
+- **3 AI-schijfjes + 1 leeg (speelbaar):** De lege cel kan nu meteen gevuld worden (hij zit op de onderste rij of er zit een schijfje onder). Dit is een directe dreiging. Score: **+100**.
 - **3 AI-schijfjes + 1 leeg (nog niet speelbaar):** De lege cel zweeft in de lucht — de dreiging bestaat maar kan nog niet benut worden. Score: **+40**.
 - **2 AI-schijfjes + 2 leeg:** Een veelbelovende opbouw die zich tot een dreiging kan ontwikkelen. Score: **+5**.
- **3 tegenstander-schijfjes + 1 leeg:** Een gevaarlijke dreiging van de tegenstander. Score: **-50**.
+- **3 tegenstander-schijfjes + 1 leeg (speelbaar):** Een direct gevaar. Score: **-100**.
 - **3 tegenstander-schijfjes + 1 leeg (nog niet speelbaar):** Een toekomstig gevaar. Score: **-40**.
 - **2 tegenstander-schijfjes + 2 leeg:** De tegenstander bouwt iets op. Score: **-5**.
 - **Gemengde groepen** (beide spelers hebben schijfjes in dezelfde groep): Geblokkeerd — niemand kan hier winnen. Score: **0**.
-Daarbovenop geeft de AI een kleine bonus (**+3** per schijfje) voor het beheersen van de middelste kolom, en een straf (**-3** per tegenstander-schijfje) daar. De middelste kolom is betrokken bij meer winnende lijnen dan elke andere kolom, dus het beheersen ervan is waardevol.
+Daarbovenop gebruikt de AI twee extra scorebonussen:
-Al deze kleine scores tellen bij elkaar op. De maximale heuristiek-score ligt ruim onder 1000, dus het verstoort nooit de echte winst/verlies-detectie — een gegarandeerde winst wint altijd van de beste heuristiek-positie.
+- **Controle over de middelste kolom:** +3 per AI-schijfje in de middelste kolom, -3 per tegenstander-schijfje. De middelste kolom is betrokken bij meer winnende lijnen dan elke andere kolom, dus het beheersen ervan is waardevol.
 - **Vorkdetectie:** Als een speler **twee of meer** drie-op-een-rij dreigingen tegelijk heeft, is dat een vork — de tegenstander kan er maar één per beurt blokkeren, dus de andere wint het spel. De AI geeft een grote bonus (**+200** of **-200**) wanneer hij een vork detecteert, waardoor hij agressief vork-opstellingen najaagt en wanhopig probeert te voorkomen dat de tegenstander er een maakt.
-Deze heuristiek betekent dat de AI nu het verschil kan zien tussen een sterke positie (veel dreigingen in opbouw) en een zwakke (de tegenstander heeft alle dreigingen), zelfs als hij geen gedwongen winst of verlies kan zien binnen zijn zoekdiepte.
+Al deze scores tellen bij elkaar op. De maximale heuristiek-score ligt ruim onder 1000, dus het verstoort nooit de echte winst/verlies-detectie — een gegarandeerde winst wint altijd van de beste heuristiek-positie.
 Deze heuristiek betekent dat de AI nu het verschil kan zien tussen een sterke positie (veel dreigingen in opbouw, vooral speelbare) en een zwakke (de tegenstander heeft alle dreigingen), zelfs als hij geen gedwongen winst of verlies kan zien binnen zijn zoekdiepte.
 ### Waarom de middelste kolom belangrijk is
@@ -67,17 +67,22 @@ After playing out a "what if?" scenario, the AI needs to decide: is this a good
 Instead of calling every unsolved position "neutral," the AI examines every possible group of four consecutive cells on the board (horizontal, vertical, and both diagonals — 69 groups in total). For each group, it counts pieces:
- **3 AI pieces + 1 empty:** This is a strong threat — the AI is one move away from winning here. Score: **+50**.
+- **3 AI pieces + 1 empty (playable):** The empty cell can be filled right now (it's on the bottom row or has a piece below it). This is an immediate threat. Score: **+100**.
 - **3 AI pieces + 1 empty (not yet playable):** The empty cell is floating in the air — the threat exists but can't be used yet. Score: **+40**.
 - **2 AI pieces + 2 empty:** A promising setup that could develop into a threat. Score: **+5**.
- **3 opponent pieces + 1 empty:** A dangerous opponent threat. Score: **-50**.
+- **3 opponent pieces + 1 empty (playable):** An immediate danger. Score: **-100**.
 - **3 opponent pieces + 1 empty (not yet playable):** A future danger. Score: **-40**.
 - **2 opponent pieces + 2 empty:** The opponent is building something. Score: **-5**.
 - **Mixed groups** (both players have pieces in the same group): Blocked — nobody can win here. Score: **0**.
-On top of that, the AI gives a small bonus (**+3** per piece) for controlling the center column, and a penalty (**-3** per opponent piece) there. The center column is involved in more winning lines than any other column, so controlling it is valuable.
+On top of that, the AI uses two more scoring bonuses:
-All these small scores add up. The maximum possible heuristic score is well below 1000, so it never interferes with actual win/loss detection — a guaranteed win always beats the best heuristic position.
+- **Center column control:** +3 per AI piece in the center column, -3 per opponent piece. The center column is involved in more winning lines than any other column, so controlling it is valuable.
 - **Fork detection:** If a player has **two or more** three-in-a-row threats at the same time, that's a fork — the opponent can only block one per turn, so the other wins the game. The AI adds a large bonus (**+200** or **-200**) when it detects a fork, making it aggressively pursue fork setups and desperately avoid letting the opponent create one.
-This heuristic means the AI can now tell the difference between a strong position (many threats being built) and a weak one (the opponent has all the threats), even when it can't see a forced win or loss within its search depth.
+All these scores add up. The maximum possible heuristic score is well below 1000, so it never interferes with actual win/loss detection — a guaranteed win always beats the best heuristic position.
 This heuristic means the AI can now tell the difference between a strong position (many threats being built, especially playable ones) and a weak one (the opponent has all the threats), even when it can't see a forced win or loss within its search depth.
 ### Why the center column matters
@@ -114,12 +114,36 @@ All configurable parameters are defined as `-D` flags in `platformio.ini`:
 | `WIFI_SSID`            | `Connect4` | SSID for the WiFi access point                     |
 | `WIFI_PASSWORD`        | `youlose4` | Password for the WiFi access point                 |
 ## AI Strategy
 The AI uses **minimax with alpha-beta pruning** and a **heuristic evaluation function**. Moves are selected in three phases:
 1. **Instant win/block** — scan all columns for an immediate win first, then for an opponent threat to block.
 2. **Blunder** (optional) — random move at a configurable chance, skipping the deep search.
 3. **Deep minimax search** — full tree search with alpha-beta pruning up to the configured ply depth.
 The heuristic evaluates leaf nodes by scoring all 69 possible four-cell windows on the board:
 - **Playable threats** (3-in-a-row where the gap can be filled now): ±100
 - **Non-playable threats** (gap is floating in the air): ±40
 - **Two-in-a-row setups**: ±5
 - **Center column control**: ±3 per piece
 - **Fork bonus** (2+ simultaneous three-in-a-row threats): ±200
 See `Background information.md` / `Achtergrondinformatie.md` for a detailed explanation accessible to all ages.
 ## Project Structure
 ```
-src/main.cpp        Single-file application (all game logic, AI, LED, web server)
+src/main.cpp                 ESP32 application (game logic, AI, LED, web server)
-platformio.ini      Build configuration, pin mappings, and tunable parameters
+connect_four.js              JavaScript browser edition (canvas rendering)
-README.md           This file - technical and practical information
+connect_four.html            HTML wrapper for the JavaScript version
-Background information.md   How the AI works (suitable for all ages)
+connect_four.py              Python terminal edition (Rich TUI)
-CLAUDE.md           AI assistant project context
+platformio.ini               Build configuration, pin mappings, and tunable parameters
 README.md                    This file - technical and practical information
 Background information.md    How the AI works (English, suitable for all ages)
 Achtergrondinformatie.md     How the AI works (Dutch, suitable for all ages)
 CLAUDE.md                    AI assistant project context
 ```
 All three implementations (C++, JavaScript, Python) share the same AI algorithm and heuristic.
@@ -1,6 +1,6 @@
 /* ============================================================
 *  Connect Four — Browser Edition
- *  A single-file game: AI (minimax + alpha-beta), demo mode,
+ *  A single-file game: AI (minimax + alpha-beta + heuristic), demo mode,
 *  game log (localStorage), blunder mode, idle timeout.
 *
 *  Include this script in an HTML page that has:
@@ -168,6 +168,7 @@ function scanBoard(b) {
 function evaluateBoard(b, aiP, huP) {
    let score = 0;
    let aiThreats = 0, huThreats = 0;
    // Center column bonus
    for (let r = 0; r < ROWS; r++) {
@@ -177,16 +178,27 @@ function evaluateBoard(b, aiP, huP) {
    // Score a window of 4 cells by piece counts
    function scoreWindow(c, r, dc, dr) {
-        let ai = 0, hu = 0;
+        let ai = 0, hu = 0, emptyC = -1, emptyR = -1;
        for (let i = 0; i < 4; i++) {
-            const v = b[c + i * dc][r + i * dr];
+            const cc = c + i * dc;
            const rr = r + i * dr;
            const v = b[cc][rr];
            if (v === aiP) ai++;
            else if (v === huP) hu++;
            else { emptyC = cc; emptyR = rr; }
        }
        if (ai > 0 && hu > 0) return 0;
-        if (ai === 3) return 50;
+        if (ai === 3) {
            aiThreats++;
            const playable = emptyR === 0 || b[emptyC][emptyR - 1] !== 0;
            return playable ? 100 : 40;
        }
        if (ai === 2) return 5;
-        if (hu === 3) return -50;
+        if (hu === 3) {
            huThreats++;
            const playable = emptyR === 0 || b[emptyC][emptyR - 1] !== 0;
            return playable ? -100 : -40;
        }
        if (hu === 2) return -5;
        return 0;
    }
@@ -208,6 +220,10 @@ function evaluateBoard(b, aiP, huP) {
        for (let c = 0; c <= COLS - 4; c++)
            score += scoreWindow(c, r, 1, -1);
    // Fork bonus: multiple threats are disproportionately dangerous
    if (aiThreats >= 2) score += 200;
    if (huThreats >= 2) score -= 200;
    return score;
 }
@@ -324,6 +340,7 @@ function checkGameEnd() {
    if (gameState !== State.DEMO) {
        games = logGame(games, gameMenuMode, gameLevel, won ? w : 0, currentMoves);
        console.log(`Game: ${currentMoves} → ${won ? playerName(w) + " wins" : "Draw"}`);
    }
    gameState = won ? State.FINISHED_WIN : State.FINISHED_DRAW;
    demoResetTimer = performance.now() / 1000;
@@ -1,4 +1,4 @@
-"""Connect Four terminal game with AI, using Rich for display."""
+"""Connect Four terminal game with AI (minimax + alpha-beta + heuristic), using Rich for display."""
 import os
 import queue
@@ -271,6 +271,8 @@ def log_game(games: list[dict], game_menu_mode: int, level: int, winner: int, mo
 def evaluate_board(board: list[list[int]], ai_p: int, hu_p: int) -> int:
    score = 0
    ai_threats = 0
    hu_threats = 0
    # Center column bonus
    for r in range(ROWS):
@@ -281,21 +283,30 @@ def evaluate_board(board: list[list[int]], ai_p: int, hu_p: int) -> int:
    # Score a window of 4 cells by piece counts
    def score_window(c: int, r: int, dc: int, dr: int) -> int:
-        ai, hu = 0, 0
+        nonlocal ai_threats, hu_threats
        ai, hu, empty_c, empty_r = 0, 0, -1, -1
        for i in range(4):
-            v = board[c + i * dc][r + i * dr]
+            cc = c + i * dc
            rr = r + i * dr
            v = board[cc][rr]
            if v == ai_p:
                ai += 1
            elif v == hu_p:
                hu += 1
            else:
                empty_c, empty_r = cc, rr
        if ai > 0 and hu > 0:
            return 0
        if ai == 3:
-            return 50
+            ai_threats += 1
            playable = empty_r == 0 or board[empty_c][empty_r - 1] != 0
            return 100 if playable else 40
        if ai == 2:
            return 5
        if hu == 3:
-            return -50
+            hu_threats += 1
            playable = empty_r == 0 or board[empty_c][empty_r - 1] != 0
            return -100 if playable else -40
        if hu == 2:
            return -5
        return 0
@@ -317,6 +328,12 @@ def evaluate_board(board: list[list[int]], ai_p: int, hu_p: int) -> int:
        for c in range(COLS - 3):
            score += score_window(c, r, 1, -1)
    # Fork bonus: multiple threats are disproportionately dangerous
    if ai_threats >= 2:
        score += 200
    if hu_threats >= 2:
        score -= 200
    return score
@@ -268,6 +268,7 @@ int8_t scanBoard() {
 int evaluateBoard(int8_t aiP, int8_t huP) {
    int score = 0;
    int aiThreats = 0, huThreats = 0;
    // Center column bonus
    for (int r = 0; r < ROWS; r++) {
@@ -277,16 +278,27 @@ int evaluateBoard(int8_t aiP, int8_t huP) {
    // Score a window of 4 cells by piece counts
    auto scoreWindow = [&](int c, int r, int dc, int dr) -> int {
-        int ai = 0, hu = 0;
+        int ai = 0, hu = 0, emptyC = -1, emptyR = -1;
        for (int i = 0; i < 4; i++) {
-            int8_t v = board[c + i * dc][r + i * dr];
+            int cc = c + i * dc;
            int rr = r + i * dr;
            int8_t v = board[cc][rr];
            if (v == aiP) ai++;
            else if (v == huP) hu++;
            else { emptyC = cc; emptyR = rr; }
        }
        if (ai > 0 && hu > 0) return 0;
-        if (ai == 3) return 50;
+        if (ai == 3) {
            aiThreats++;
            bool playable = emptyR == 0 || board[emptyC][emptyR - 1] != 0;
            return playable ? 100 : 40;
        }
        if (ai == 2) return 5;
-        if (hu == 3) return -50;
+        if (hu == 3) {
            huThreats++;
            bool playable = emptyR == 0 || board[emptyC][emptyR - 1] != 0;
            return playable ? -100 : -40;
        }
        if (hu == 2) return -5;
        return 0;
    };
@@ -296,6 +308,10 @@ int evaluateBoard(int8_t aiP, int8_t huP) {
    for (int r = 0; r < 3; r++) for (int c = 0; c < 4; c++) score += scoreWindow(c, r, 1, 1);
    for (int r = 3; r < 6; r++) for (int c = 0; c < 4; c++) score += scoreWindow(c, r, 1, -1);
    // Fork bonus: multiple threats are disproportionately dangerous
    if (aiThreats >= 2) score += 200;
    if (huThreats >= 2) score -= 200;
    return score;
 }