Embodied AI 101

40 Episodes
Subscribe

By: Shaoqing Tan

Stay in the loop on research in AI and physical intelligence.

✂️ Clip this podcast
SERF: 4D Latent Mapping for Long-Horizon Mobile Manipulation
Today at 12:48 PM

Embeds both the robot and environment into a shared 4D latent space augmented with forward-kinematics robot points, enabling a vision-language-action model to handle dynamic scenes and long-horizon memory. Outperforms image-only VLA baselines on the BEHAVIOR-1K benchmark for mobile manipulation.


ViserDex: Visual Sim-to-Real for Robust Dexterous In-Hand Reorientation
Today at 12:13 PM

A single-camera sim-to-real framework that uses physically consistent 3D Gaussian Splatting augmentations to achieve zero-shot transfer of dexterous in-hand reorientation policies to an Allegro hand. The approach trains entirely on consumer hardware while maintaining high fidelity to real-world dynamics.


DexSkin: A High-Coverage, Conformable "Electronic Skin" for Robot Fingers
Today at 10:12 AM

Introduces a high-coverage, conformable robotic skin hardware system designed to improve data collection and policy learning for contact-rich, dexterous manipulation tasks. The system provides rich tactile sensing coverage to enable more capable robot manipulation policies.


EBench: A Diagnostic Benchmark for Generalist Manipulation Policies
Today at 10:11 AM

A CAT-scan style diagnostic benchmark for robot foundation models that evaluates policies such as π0, π0.5, and Qwen-RobotManip beyond single success rates. The benchmark is designed to distinguish genuine generalization from overfitting to demonstrations in generalist manipulation policies.


VITRA: A Foundation for Dexterous VLA via Human Video Pretraining
Yesterday at 9:13 PM

A scalable VLA pretraining pipeline that converts unstructured egocentric human videos into robot training data, trains a dexterous hand VLA, and fine-tunes on robot data, achieving strong zero-shot generalization and real-robot dexterous manipulation.


DexWM: A Dexterous Manipulation World Model from Human Videos
Yesterday at 9:08 PM

A dexterous manipulation world model pretrained on 829 hours of EgoDex human data and DROID robot data using conditioned diffusion transformers, enabling open-loop rollouts and sim-to-real transfer with minimal robot fine-tuning.


PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning
Yesterday at 12:14 PM

Introduces PoLAR, a method that factorizes latent action representations into extent and mode components to improve robot policy learning efficiency and generalization.


Continual Robot Policy Learning via Variational Neural Dynamics
Yesterday at 12:13 PM

Proposes a variational neural dynamics framework for continual robot policy learning, enabling robots to acquire new skills without forgetting previously learned ones.


PhysisForcing: Physics-Reinforced World Models for Robotic Manipulation
Yesterday at 10:24 AM

Plug-and-play training framework that enforces physical plausibility in robotic video generation models, achieving SOTA on R-Bench, PAI-Bench, and EZS-Bench. Lifts WorldArena success rate from 16% to 24% with zero extra inference cost.


Translation as a Bridging Action
Yesterday at 10:11 AM

Replaces noisy 6DoF hand poses with relative wrist translation as a shared action space between cheap human videos and bimanual robots. Scales data-efficiently and outperforms full-pose baselines on manipulation tasks.


Play2Perfect: Dexterous Play Pretraining for Precise Assembly
Last Sunday at 9:13 PM

Pre-trains a dexterous hand via unstructured 'play' interactions with objects, then fine-tunes for precise assembly tasks including 0.5 mm clearance insertions and furniture screwing, achieving 33x better sample efficiency than RL from scratch.


Dexora: Open-Source VLA for High-DoF Bimanual Dexterity
Last Sunday at 9:11 PM

First open-source Vision-Language-Action (VLA) model for dual-arm, dual-hand 36-DoF dexterous manipulation, trained on 100K simulated and 10K real trajectories with strong cross-embodiment transfer capabilities.


WorldVLA: Towards Autoregressive Action World Model
Last Sunday at 12:09 PM

Unifies VLA and world-modeling in a single autoregressive transformer that predicts both future images and actions. Outperforms separate VLA or world models on LIBERO simulation benchmarks.


HumDex: Humanoid Dexterous Manipulation Made Easy
Last Sunday at 12:08 PM

HumDex targets humanoid dexterous manipulation, aiming to simplify the development of dexterous manipulation capabilities for humanoid robots.


ForceBand: Learning Forceful Manipulation with sEMG
Last Saturday at 9:11 PM

Presents an open-source, low-cost sEMG wristband framework that extracts force signals from human muscle activity in videos, enabling zero-shot human-to-robot transfer of forceful manipulation policies across any robot, camera, or environment.


In-Context World Modeling for Robotic Control
Last Saturday at 9:08 PM

Introduces ICWM, a method that learns world dynamics from just seconds of a robot's self-generated interaction data, enabling zero-shot adaptation to unseen cameras and new robot morphologies without any fine-tuning.


WOLF-VLA: Vision-Language-Action for Humanoid Walking
Last Saturday at 12:15 PM

Introduces a framework integrating vision-language-action models for whole-body humanoid locomotion, addressing optimal control and learning for complex bipedal behaviors. Combines VLA learning with locomotion-specific control for humanoid robots.


Motion-Focused Latent Action for Cross-Embodiment VLA from Human Videos
Last Saturday at 12:14 PM

Proposes a motion-focused latent action representation for cross-embodiment vision-language-action policies learned from human videos, accepted to IROS 2026.


ManiFlow: Manipulation via Rectified Flow
Last Saturday at 10:11 AM

ManiFlow is a visuomotor imitation learning policy using consistency flow matching with a DiT-X architecture that generates high-quality actions in 1–2 steps. It works across single-arm, bimanual, and humanoid platforms using RGB or point cloud inputs.


RL-100: Toward Highly Reliable Real-World Robot Reinforcement Learning
Last Saturday at 10:10 AM

RL-100 demonstrates highly reliable real-world RL manipulation achieving 900/900 success rates across 7 tasks with up to 250 consecutive trials without failure. It also shows strong robustness to disturbances and zero/few-shot adaptation capabilities.


Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
Last Friday at 9:10 PM

A video-diffusion world model trained on over 1 million manipulation episodes (3,000 hours) that includes an action model and neural simulator for closed-loop robotic manipulation control, with all code and models open-sourced.


Bi-HIL: Bilateral Control-Based Multimodal Hierarchical Imitation Learning for Long-Horizon Contact-Rich Manipulation
Last Friday at 12:12 PM

Proposes a hierarchical imitation learning framework using bilateral control, subtask-level progress tracking, and keyframe memory to handle long-horizon, contact-rich manipulation tasks.


From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation
Last Friday at 12:08 PM

Uses reinforcement learning to improve process reasoning capabilities in robotic manipulation policies, shifting the model from passive observation to active critique.


ROVE: Unlocking Human Interventions for Humanoid Manipulation via Reinforcement Learning
Last Friday at 10:12 AM

ROVE leverages reinforcement learning to enable humanoid robots to benefit from human interventions during manipulation tasks.


ConstrainedMimic: Safe Humanoid Robot Motion Tracking
Last Friday at 10:11 AM

A control framework for safe humanoid robot motion tracking using RL policies with real-time constraint enforcement via kinematics, dynamics, and control barrier functions.


REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering
Last Thursday at 9:13 PM

Introduces spatio-temporal policy learning combined with physics-guided filtering to achieve robust and extremely agile robot control.


HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching
Last Thursday at 9:08 PM

Introduces a tokenization-free autoregressive policy learning framework using flow matching across scales for robotic control.


Reactive Diffusion Policy: Slow-Fast Visual-Tactile Learning for Contact-Rich Manipulation
Last Thursday at 12:15 PM

Introduces a slow-fast imitation learning framework combining diffusion-based planning with reactive tactile/force feedback for contact-rich manipulation tasks. Also includes TactAR, an AR-based teleoperation system with tactile sensing.


SARM2 + SPIRAL: Multi-Task Reward Models and RL Refinement for Long-Horizon Dexterous Manipulation
Last Thursday at 12:12 PM

Combines scalable autonomous reward modeling with RL-based refinement to improve vision-language-action policies on long-horizon dexterous manipulation tasks via autonomous rollouts. Demonstrates significant gains over imitation learning baselines.


Co-VLA: Coordination-Aware Structured Action Modeling for Dual-Arm VLA Systems
Last Thursday at 10:34 AM

Introduces coordination-aware structured action modeling for dual-arm robotic systems within a VLA framework. Addresses the unique challenges of bimanual manipulation through specialized action representations.


ThinkingVLA: Interleaved Vision and Language Reasoning for Robotic Manipulation
Last Thursday at 10:31 AM

Proposes interleaved vision and language reasoning for robotic manipulation within a VLA framework. Aims to improve instruction following and task performance through integrated multimodal reasoning.


Playful Agentic Robot Learning
Last Thursday at 10:22 AM

Self-directed play combined with Code-as-Policy for reusable skill acquisition and downstream manipulation tasks.


Learning Unified Force and Position Control for Legged Loco-Manipulation
Last Wednesday at 9:14 PM

A unified RL policy for quadrupeds and humanoids that jointly handles force and position control without force sensors, enabling compliant behaviors, force-aware imitation learning, and contact-rich tasks.


Robots that Collaborate: Sequential Asymmetric Imitation for Learning Coupled Robot Policies
Last Wednesday at 9:11 PM

Explores imitation learning approaches for multi-robot systems, focusing on policy coupling through sequential asymmetric imitation to enable collaborative robot behaviors.


AstraBrain-WBC 0.5: A Humanoid Robot Cerebellum Foundation Model
Last Wednesday at 12:17 PM

A humanoid robot 'cerebellum' foundation model trained on 20,000 hours of human motion data that demonstrates scaling laws for robot motion control and enables zero-shot execution of unseen motions on real humanoids.


SRL: Combining SLIP Model and Reinforcement Learning for Agile Robotic Jumping
Last Wednesday at 12:14 PM

Combines the Spring-Loaded Inverted Pendulum (SLIP) model with reinforcement learning to achieve agile jumping behaviors in robotic systems.


DataClaw0: Agentic Tailoring for Raw Multimodal Streams
Last Wednesday at 10:20 AM

A 9B model that filters noise from videos, GUI, and embodied data streams, reorganizing them into dense supervision via factual anchors and semantic synthesis; trained with SFT + GRPO across five domains with benchmarks.


ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining
Last Wednesday at 10:16 AM

Converts 6K+ hours of mixed human/robot egocentric video into robot pseudo-actions via camera-space alignment and reliability-aware loss, achieving 72.8% on RoboCasa and 91.1% on RoboTwin.


VERA: Video-to-Action World Model Policy
Last Wednesday at 7:03 AM

A 14B-parameter video world model that converts predicted visual futures into embodiment-agnostic actions via Jacobian inverse-dynamics, enabling zero-shot cross-robot transfer across a Panda arm and 16-DoF hand with open-sourced weights and training code.


GEN-1: Scaled Dexterous Manipulation Foundation Model
Last Wednesday at 7:01 AM

A dexterous manipulation foundation model trained on 500k hours of real-world bimanual data that handles deformable objects such as cardboard folding and screw packing, featuring online retry and adaptation capabilities.