A single five-second manipulation episode — robot hand picking up a red cube — viewed through six modalities a foundation model would ingest in parallel. Hover any panel to move a synchronized cursor across all six.
Vision-Language3 frames + instruction
›Pick up the red cube and hand it over.
ProprioceptionJoint angle (rad)
ActionMotor cmd
Memory12 tokens
pastnow
Cognition tokens (compressed past context)
TactilePressure (N)
TorqueJoint torque (Nm)
t = 0.00 sHover any panel — a shared cursor will sweep across all six.