Orientation Tracking and Panoramic Image Stitching

This project tracks the 3D orientation of a rotating body using IMU data alone, then uses those orientation estimates to stitch a sequence of narrow camera frames into wide panoramic images. The core algorithm is projected gradient descent over a trajectory of unit quaternions, jointly optimizing a motion model (gyroscope kinematics) and an observation model (accelerometer gravity constraint).

The method is evaluated on 9 training sequences and 2 test sequences, each containing synchronized IMU, camera, and (for training) VICON motion-capture ground truth data. Dataset provided by Prof. Nikolay Atanasov as part of ECE 276A; training & test data.

Approach

IMU Calibration

Each dataset begins with a static period where the body is at rest. Sensor biases are estimated by averaging gyroscope and accelerometer readings during this window; the gyroscope should read zero and the accelerometer should read 1g upward. After subtracting these biases, gyroscope-only integration already tracks roll and pitch reasonably well, as shown below.

Gyro-only integration vs. VICON ground truth (datasets 1–3). Roll and pitch track well; yaw drifts as expected.

Projected Gradient Descent

The full optimization minimizes a cost over the quaternion trajectory q_1:T with two terms: a motion model error penalizing deviation from gyroscope-predicted orientation, and an observation model error penalizing mismatch with the measured gravity direction. After each gradient step the quaternions are projected back to unit norm. The trajectory is initialized via gyroscope integration, which dramatically improves convergence.

After optimization, roll and pitch agree closely with VICON ground truth (typically within ±0.1 rad). Yaw is unobservable by the accelerometer and exhibits some drift, an expected fundamental limitation.

Optimized orientation (gyro + accelerometer) vs. VICON ground truth (datasets 1–3).

Panorama Stitching

Each camera frame is back-projected through the estimated orientation into a cylindrical world coordinate panorama. For each pixel, the viewing ray is transformed from camera frame → IMU frame → world frame, then mapped to azimuth and elevation coordinates in the output image. Frames are stitched by simple overwriting in chronological order.

Convergence

All 11 datasets (9 training, 2 test) converged to a final cost below 1.5, representing reductions of 1–3 orders of magnitude from initial cost. Most training sets converged before the 5000-iteration limit.

* Reached 5000-iteration limit; final cost change was <10⁻⁴, indicating near-convergence.

Training Panoramas

Panoramas from the four camera-equipped training datasets show clear room structure with correct vertical orientation. Black regions correspond to directions not observed during the rotation sequence. Each is shown alongside its VICON-based reference.

Dataset	Split	Samples	Converged At	Initial Cost	Final Cost
1	Train	5645	2057	161.4	0.434
2	Train	4698	5000*	281.4	0.561
3	Train	3404	5000*	11.2	1.187
4	Train	3156	5000*	142.4	1.089
5	Train	3210	4811	271.1	1.242
6	Train	3211	2776	84.8	0.836
7	Train	3577	1978	230.7	1.466
8	Train	3501	2717	75.6	0.401
9	Train	2931	4117	301.7	0.328
10	Test	3078	3576	37.8	0.315
11	Test	5441	712	11.8	1.182

Dataset 1 (optimized)

Dataset 1 (VICON ground truth)

Dataset 2 (optimized)

Dataset 2 (VICON ground truth)

Dataset 8 (optimized)

Dataset 8 (VICON ground truth)

Dataset 9 (optimized)

Dataset 9 (VICON ground truth)

Optimized panoramas closely match the VICON-based reference. The main visible difference is a slight horizontal shift in some datasets (e.g. dataset 2), caused by yaw drift, consistent with the known unobservability of yaw from accelerometer data alone.

Test Panoramas

No ground truth is available for the test datasets. Both sequences converged cleanly and produce recognizable panoramas.