Dynamic Programming for Door & Key Navigation

UC San Diego, ECE 276B: Planning & Learning in Robotics ยท Spring Quarter 2026

Code  |  Report


Overview

This project solves the Door & Key grid-world navigation problem using Dynamic Programming. An agent (red triangle) must reach a goal (green square) at minimum cost, potentially picking up a key and unlocking a door along the way. The problem is formulated as a deterministic, fully observable, first-exit MDP and solved via value iteration.

Two scenarios are addressed. Part A computes a separate optimal policy for each of seven known environments. Part B computes a single unified policy covering all 36 random 10x10 environment configurations by augmenting the state space with discrete key and goal location indices.


MDP Formulation

The state encodes the agent's grid position, facing direction (right/down/left/up), whether it is carrying the key, and the open/locked status of each door. For the random-map setting the state is extended with indices for the active key location and goal location, giving 28,800 total states across the 10x10 grid.

The five actions and their costs are: Turn Left/Right (cost 1), Move Forward (cost 3), Pick Up Key (cost 2), Unlock Door (cost 5). Invalid actions are no-op transitions that still incur cost, so the DP naturally avoids them. Value iteration runs until the maximum change in the cost-to-go falls below 10-6.


Part A: Known Maps

A separate policy is computed for each of the seven environments. The direct environments have the lowest costs because the agent can reach the goal without a key. The shortcut environments use the locked door for a shorter route despite the unlock cost. The normal environments require the longest key-retrieval detour.

EnvironmentOptimal CostSteps
doorkey-5x5-normal2813
doorkey-6x6-normal3314
doorkey-6x6-direct135
doorkey-6x6-shortcut156
doorkey-8x8-normal5421
doorkey-8x8-direct177
doorkey-8x8-shortcut198
5x5 normal

5x5 normal (cost 28)

6x6 normal

6x6 normal (cost 33)

6x6 direct

6x6 direct (cost 13)

6x6 shortcut

6x6 shortcut (cost 15)

8x8 normal

8x8 normal (cost 54)

8x8 direct

8x8 direct (cost 17)

8x8 shortcut

8x8 shortcut (cost 19)


Part B: Random Maps

A single unified policy is computed over the augmented 28,800-state space and applied to all 36 random 10x10 environments. Value iteration converged in 25 iterations. All 36 environments are solved: minimum cost 14 (direct path, no key needed), maximum cost 53 (long key-retrieval route with door unlock), mean cost 30.1.

Sample GIFs

env 1

Env 1 (cost 29)

env 4

Env 4 (cost 50)

env 9

Env 9 (cost 14)

env 12

Env 12 (cost 53)

env 21

Env 21 (cost 14)

All 36 Trajectories

Optimal paths for all 36 random environments, shown as static trajectory overlays.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

← Back to Portfolio