Uno AI Agent
I built two autonomous AI agents that win at Uno.
GAMEPLAY
THE PROJECT
I built two AI agents that play and win Uno, a multiplayer card game driven by uncertainty, hidden information, and shifting odds, using Monte Carlo Tree Search from scratch in Java. Rather than writing rules for every situation, both agents discover what works by simulating thousands of possible futures, tracking which moves lead to wins, and acting on that evidence. The result is an agent that reasons under uncertainty, adapts to any opponent, and plays competitive Uno without ever being told what a good move looks like.
ALGORITHMIC ARCHITECTURE
I engineered two progressively sophisticated MCTS implementations, each exploring the game tree in a fundamentally different way.
Click on any card to expand.
01 · EXPECTED OUTCOME AGENT
Flat Monte Carlo evaluation
One child per legal move. Equal rollouts on each. Root Q-values updated after every simulation.
02 · UCT AGENT
Upper Confidence Trees
Dynamic tree built using UCB1 to balance exploration and exploitation at every node level.
03 · BACKPROPAGATION
Q-value propagation up the path
After each rollout, outcome flows back through every node visited, root included. UCT's key advantage over flat MCTS.
04 · ROLLOUT POLICY
Uniform random simulation
Every player draws from their legal moves at random. With enough rollouts, the signal rises above the noise.
05 · REWARD SHAPING
Heuristic when depth cap hits
Win=+1, loss=-1 at game end. Mid-game cap: r = opponents' cards minus my cards. No domain knowledge needed.
06 · ANYTIME ALGORITHM
Budget-driven search
The outer loop runs until the time budget expires. More compute = better Q-value estimates. Stop whenever.
RESULTS AND PERFORMANCE
Both agents consistently outperform random opponents across 2-player and 4-player configurations, with the UCT agent demonstrating sharper decision-making as the game tree deepens over successive iterations. The Q-value estimates converge quickly enough within the time budget to produce reliably strong move selection.
SKILLS AND CONCEPTS
Core AI algorithms
Monte Carlo Tree Search (MCTS)
Implemented the full MCTS loop (selection, expansion, simulation, and backpropagation) tailored to Uno's branching legal moves and multi-player turns.
UCT / UCB1 selection
Coded UCB1 at each decision node so the agent balances exploiting strong lines with exploring under-sampled branches.
Flat Monte Carlo baseline
Built a root-only Monte Carlo agent that allocates rollouts evenly across legal moves for comparison against UCT.
Q-value estimation from simulations
Maintained running averages and visit counts per (state, action) to rank moves from empirical win rates and shaped rewards.
Reward shaping at non-terminal depth
When rollouts hit the depth cap, used a hand-size differential heuristic so partial games still produce a useful training signal.
Search and decision-making concepts
Anytime search under a wall-clock budget
Structured the outer loop so the agent returns the best move known when the per-move time budget expires.
Exploration vs exploitation
UCB1 formalizes the tradeoff between playing what looks best so far and probing moves with uncertain value.
Incremental tree growth
Unlike flat MC, UCT adds one child per iteration so effort concentrates along promising lines of play.
Root move ordering by empirical value
After search, selected the legal move at the root with the strongest aggregated Q-value or visit-weighted score.
Probability and statistics
Monte Carlo averaging
Treated each rollout as a sample; averaged outcomes so estimates stabilize as visit counts grow.
Law of large numbers (intuition)
With enough random rollouts, the empirical mean reward at a node approaches the expectation under the rollout policy.
Log term in UCB for confidence width
The √(log N / N_child) term shrinks as parent visits grow and grows for rarely tried children, which drives exploration.
Game theory and multi-agent reasoning
Stochastic multi-player games
Extended search and rollouts to 2- and 4-player Uno with rotating turns and different opponent models in simulation.
Hidden information
Opponents' hands are unknown; rollouts use the engine's information model so the agent reasons under uncertainty without cheating.
Opponents as part of the environment
Randomized opponent play in rollouts approximates a wide distribution of behaviors without hand-coded strategies.
Equilibrium concepts (background)
Not solving Nash equilibria explicitly; MCTS instead finds strong empirical responses via sampling.
Software engineering and Java
Java agent and game engine integration
Implemented agents against the course game API: legal move enumeration, state copy for simulation, and clean separation of search from rules.
Java Swing visualizer
Built a real-time display of hands, piles, and play so experiments and debugging are observable.
Data structures for the search tree
Used maps and node records to track children, visit counts, and Q accumulators per expanded state.
Profiling and time budgets
Bounded search by wall time so the agent respects engine limits and stays competitive in timed matches.
Let's Build Something Amazing Together
I'm actively seeking software engineering internship opportunities where I can apply my full-stack development, applied AI, data engineering, and research experience to drive measurable impact for your team.
Can start immediately for part-time roles, Summer 2026 for full-time internships.