Dynamic Programming for Door & Key Navigation

This project solves the Door & Key grid-world navigation problem using Dynamic Programming. An agent (red triangle) must reach a goal (green square) at minimum cost, potentially picking up a key and unlocking a door along the way. The problem is formulated as a deterministic, fully observable, first-exit MDP and solved via value iteration.

Two scenarios are addressed. Part A computes a separate optimal policy for each of seven known environments. Part B computes a single unified policy covering all 36 random 10x10 environment configurations by augmenting the state space with discrete key and goal location indices.

MDP Formulation

The state encodes the agent's grid position, facing direction (right/down/left/up), whether it is carrying the key, and the open/locked status of each door. For the random-map setting the state is extended with indices for the active key location and goal location, giving 28,800 total states across the 10x10 grid.

The five actions and their costs are: Turn Left/Right (cost 1), Move Forward (cost 3), Pick Up Key (cost 2), Unlock Door (cost 5). Invalid actions are no-op transitions that still incur cost, so the DP naturally avoids them. Value iteration runs until the maximum change in the cost-to-go falls below 10^-6.

Part A: Known Maps

A separate policy is computed for each of the seven environments. The direct environments have the lowest costs because the agent can reach the goal without a key. The shortcut environments use the locked door for a shorter route despite the unlock cost. The normal environments require the longest key-retrieval detour.

Environment	Optimal Cost	Steps
doorkey-5x5-normal	28	13
doorkey-6x6-normal	33	14
doorkey-6x6-direct	13	5
doorkey-6x6-shortcut	15	6
doorkey-8x8-normal	54	21
doorkey-8x8-direct	17	7
doorkey-8x8-shortcut	19	8

5x5 normal (cost 28)

6x6 normal (cost 33)

6x6 direct (cost 13)

6x6 shortcut (cost 15)

8x8 normal (cost 54)

8x8 direct (cost 17)

Part B: Random Maps

A single unified policy is computed over the augmented 28,800-state space and applied to all 36 random 10x10 environments. Value iteration converged in 25 iterations. All 36 environments are solved: minimum cost 14 (direct path, no key needed), maximum cost 53 (long key-retrieval route with door unlock), mean cost 30.1.

Sample GIFs