Reproducing ViViDex: Dexterous Manipulation from Human Videos

UC San Diego, ECE 228: Machine Learning for Physical Applications ยท Spring Quarter 2026

Code  |  Report  |  ViViDex Paper


Overview

ViViDex (ICRA 2025) learns vision-based dexterous manipulation from a single human video demonstration. It extracts a reference trajectory from multi-view RGB-D data, trains a state-based policy via trajectory-guided PPO in MuJoCo, then distills it into a vision-based policy via Behaviour Cloning on rendered rollouts.

This project reproduces the first two stages of the pipeline from scratch: reference trajectory extraction from DexYCB and trajectory-guided PPO training on the MuJoCo Adroit hand. ViViDex does not release its retargeting code, so we derive the full camera-to-world coordinate transform from DexYCB calibration data and implement two retargeting variants to study the effect of retargeting quality on downstream training.


Pipeline

Step 1 -- DexYCB Demonstration

DexYCB captures human hand-object interactions from 8 synchronized RGB-D cameras. Human annotators label 2D joint keypoints per view; DexYCB fits the MANO hand model jointly across all 8 views to recover accurate 3D hand trajectories. Each frame stores 21 hand joint positions in camera space, along with object pose as a rotation-translation matrix.

DexYCB init

(a) Approach

DexYCB pregrasp

(b) Pregrasp

DexYCB manip

(c) Lift

Step 2 -- MANO Hand Reconstruction

We reconstruct the 3D MANO hand mesh per frame using subject-specific shape parameters β and per-frame pose parameters θ, then transform all poses from camera space to the simulator world frame via the derived coordinate transform. Object translation and orientation match ViViDex's reference NPZ exactly.

MANO init

(a) MANO approach

MANO pregrasp

(b) MANO pregrasp

MANO manip

(c) MANO lift

Step 3 -- Motion Retargeting to Adroit

Because MANO and Adroit differ in kinematic structure, we solve a per-frame NLopt optimization matching 6 target body positions (palm + 5 middle phalanges) on the Adroit hand to corresponding human joint positions in world space, with temporal smoothness regularization. We implement two variants:

Baseline adroit init

(a) Baseline init

Baseline adroit pregrasp

(b) Baseline pregrasp

Baseline adroit manip

(c) Baseline manip

ViViDex reference trajectory (undisclosed pipeline)

Naive adroit init

(d) Naive init

Naive adroit pregrasp

(e) Naive pregrasp -- insufficient abduction

Naive adroit manip

(f) Naive manip

Naive retargeting (position-only NLopt)

Chain adroit init

(g) Chain init

Chain adroit pregrasp

(h) Chain pregrasp -- improved spread

Chain adroit manip

(i) Chain manip

Chain retargeting (MANO global frame initialization)


PPO Training

Each retargeted trajectory is used as a reference for trajectory-guided PPO in MuJoCo (Adroit hand, mustard bottle relocate task). The two-phase reward guides the policy through pregrasp hand matching followed by object trajectory tracking with a lift bonus. All runs used 32 parallel environments, approximately 200k gradient updates, and approximately 1.5x108 total environment steps on Google Colab (12 CPU cores, NVIDIA L4). MuJoCo is CPU-bound; GPU utilization remained below 1% throughout.

Training Curves (Baseline)

The baseline training curve reveals a sharp phase transition around 80-90M total steps (shown as approximately 40-50M on the per-session x-axis due to Colab session resets), where hand success jumps from 0.68 to 0.85 and object success rises from 0.55 to 0.80. Goal success first appears only after this transition.

Goal success curve

Goal success

Object success curve

Object success

Hand success curve

Hand success

Results

Method Goal Success Object Success Hand Success Grad. Updates
Baseline (ViViDex ref.) 0.190 0.812 0.887 197,745
Chain (ours) 0.000 0.547 0.841 198,695
Naive (ours) 0.000 0.519 0.180 222,145

Policy Rollouts

Pretrained init

(a) Pretrained init

Pretrained pregrasp

(b) Pretrained pregrasp

Pretrained manip

(c) Pretrained -- successful lift

ViViDex pretrained checkpoint

Baseline PPO init

(d) Baseline init

Baseline PPO pregrasp

(e) Baseline pregrasp

Baseline PPO manip

(f) Baseline -- partial lift

Our baseline policy (ViViDex reference trajectory, trained from scratch)

Chain PPO init

(g) Chain init

Chain PPO pregrasp

(h) Chain pregrasp

Chain PPO manip

(i) Chain -- contact but no lift

Chain retargeting policy

Naive PPO init

(j) Naive init

Naive PPO pregrasp

(k) Naive pregrasp

Naive PPO manip

(l) Naive -- fails to grasp

Naive retargeting policy

The green marker is the target object position; the dark circle marks the initial position on the table.


Key Findings


← Back to Portfolio