Hungry Bird — WhizzStep AI Lab

How RL Masters Games

🎮

State

What the agent observes: bird height, vertical speed, distance to next pipe gap, gap position.

⚡

Actions

Just two: flap or don't flap. Simple actions, but the timing makes all the difference!

🏆

Rewards

+1 per frame alive, +10 per pipe passed, -100 for collision. The agent learns to maximise total reward.

📚

Q-Table

Maps every (state, action) pair to an expected reward. Updates after every frame using the Bellman equation.

🐦

Wizzy the AI Tutor

Play Hungry Bird yourself first! 🎮 Click or press Space to flap. Experience firsthand how hard it is to time those flaps perfectly. Remember your best score — the AI will start with zero knowledge and has to figure all of this out by itself!

Step 1 — You Play First

Click canvas or press Space to flap

🎮 Your Score

Current score0

Best score0

Games played0

Pipes passed0

💡 Your challenge:
Score as high as you can! The AI starts knowing nothing — it will fail over and over. But after hundreds of tries, it will beat your score easily. That's the power of RL!

🐦

Wizzy the AI Tutor

Now watch the AI learn! 🤖 It starts completely random — flapping at wrong times and dying immediately. But each death teaches it something. Watch the Q-table fill up and the score gradually improve. Press Fast to skip ahead hundreds of episodes!

Step 2 — AI Training

Speed:

🤖 AI Stats

Episode0

Current score0

Best score0

Avg (last 20)0

Exploration ε1.00

Q-table size0

📈 Score per Episode

// Ready to train. Press Start!

🐦

Wizzy the AI Tutor

Look at the learning curve! Early episodes = score near zero (random flapping). Around episode 50–100, the curve jumps — the AI discovers the strategy of aligning with the gap. This "breakthrough moment" is one of the most exciting things in RL!

Step 3 — Full Learning Curve

—

Total Episodes

—

Best Score

—

Your Best Score

—

Q-Table States

Train the AI in Phase 2 to see the full learning curve here!

🐦

Wizzy the AI Tutor

Reward shaping is one of the trickiest parts of RL! Change the reward values and retrain — watch how the agent's behaviour completely changes. What if you give a huge reward for every frame alive? Or only reward pipe passages? The reward function IS the goal!

Step 4 — Design the Reward Function

🎛️ Reward Values

Per frame alive 1.0

Pipe passed 10

Collision penalty -100

🎛️ Reward Training

Episode0

Best score0

Avg score0

Try these presets:
🟢 Alive only: alive=5, pipe=0, crash=-10
🔵 Pipe focus: alive=0, pipe=50, crash=-100
🔴 Survival: alive=2, pipe=10, crash=-200

📈 Reward Training Progress

🐦

Wizzy the AI Tutor

🎊 You've trained a real RL agent to play a game — from zero to expert through pure trial and error! This is exactly how DeepMind's AlphaGo, OpenAI Five, and Google's game-playing AIs were trained. You understand Q-learning, reward shaping, and the exploration-exploitation trade-off!

🐦

Game AI Badge!

You trained a Q-Learning agent to play Hungry Bird from scratch!

🐦 WhizzStep AI Lab

This certifies that

Student Name

has trained a Q-Learning Game AI from scratch

Game AI Expert

Reward Shaper

RL Engineer

whizzstep.in

Key Concepts Mastered

State Space

📍 What the Agent Sees

The discretised observation: bird height bucket, velocity bucket, horizontal distance to gap, gap position bucket.

Reward Shaping

🎯 Designing the Goal

The reward function defines what the agent optimises. A poorly designed reward leads to unexpected, often hilarious, behaviour.

Exploration vs Exploitation

🎲 Try vs Use

Epsilon-greedy: high ε early = try random actions. Low ε later = use learned knowledge. The schedule matters!

Convergence

📈 Getting Better

When the Q-table stabilises and scores plateau at a high level, the agent has converged to a near-optimal policy.

AlphaGo / OpenAI Five

🏆 Real Examples

DeepMind's AlphaGo used RL + MCTS to beat the world Go champion. OpenAI Five beat world champions at Dota 2.

Sparse Rewards

⏳ Delayed Feedback

In games like Go, you only find out if you won at the very end — no reward for 200+ moves. Hard for RL to handle!