AI Safety
- CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation
 - Persona Vectors: Monitoring and Controlling Character Traits in Language Models
 
Agriculture
Augmented Reality
Computer Vision
- ViPE: Video Pose Engine for 3D Geometric Perception
 - ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
 
Dialog Navigation
Dialog System
- Seamlessly Integrating Factual Information and Social Content with Persuasive Dialogue
 - How to Build User Simulators to Train RL-based Dialog Systems
 - A Network-based End-to-End Trainable Task-oriented Dialogue System
 
Dialog Systems
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models
 - Mixed-Initiative Dialog for Human-Robot Collaboration
 - Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback
 - ACE: A LLM-based Negotiation Coaching System
 
Guide Dog Robot
HRI
Human Robot Interaction
- Towards Robotic Companions: Understanding Handler-Guide DogInteractions for Informed Guide Dog Robot Design
 - Reimagining RViz: Multidimensional Augmented Reality Robot Signal Design
 - DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following
 - Descriptive and Prescriptive Visual Guidance to Improve Shared Situational Awareness in Human-Robot Teaming
 - Seamlessly Integrating Factual Information and Social Content with Persuasive Dialogue
 - Unwinding Rotations Improves User Comfort with Immersive Telepresence Robots
 - Outracing champion Gran Turismo drivers with deep reinforcement learning
 - Flight, Camera, Action! Using Natural Language and Mixed Reality to Control a Drone
 - Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning
 - Virtual Reality for Robots
 - RMM: A Recursive Mental Model for Dialogue Navigation
 - Improving Grounded Natural Language Understanding through Human-Robot Dialog
 - RoomShift: Room-scale Dynamic Haptics for VR with Furniture-moving Swarm Robots
 - That and There: Judging the Intent of Pointing Actions with Robotic Arms
 - Communicating Robot Motion Intent with Augmented Reality
 
Human-Robot Interaction
- TeleMoMa A Modular and Versatile Teleoperation System for Mobile Manipulation
 - Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning
 
Humanoid Robots
Imitation Learning
Knowledge-based Sequential Decision Making
- Visual Semantic Navigation Using Scene Priors
 - Continual Learning of Knowledge Graph Embeddings
 - Ethically Compliant Sequential Decision Making
 - Semantic Linking Maps for Active Visual Object Search
 - Commonsense Reasoning and Knowledge Acquisition to Guide Deep Learning on Robots
 - Learning Pipelines with Limited Data and Domain Knowledge
 
LLM
- Mixed-Initiative Dialog for Human-Robot Collaboration
 - CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation
 - Persona Vectors: Monitoring and Controlling Character Traits in Language Models
 - FEAST: A Flexible Mealtime Assistance System Towards In-the-Wild Personalization
 - BadChain: Backdoor Chain-of-Thought Prompting for Large
 - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers
 - True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning
 - Universal and Transferable Adversarial Attacks on Aligned Language Models
 - An LLM can Fool Itself: A Prompt-Based Adversarial Attack
 - VIMA: General Robot Manipulation with Multimodal Prompts
 
Learning
- Learned Visual Navigation for Under-Canopy Agricultural Robots
 - Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation
 - Practice Makes Perfect: Planning to Learn Skill Parameter Policies
 - SayCanPay: Heuristic Planning with Large Language Models using Learnable Domain Knowledge
 - VIMA: General Robot Manipulation with Multimodal Prompts
 - NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities
 - Eureka: Human-Level Reward Design via Coding Large Language Models
 - Video Language Planning
 - Learning to Navigate Sidewalks in Outdoor Environments
 - Open X-Embodiment: Robotic Learning Datasets and RT-X Models
 - Robot Parkour Learning
 - LEAGUE: Guided Skill Learning and Abstraction for Long-Horizon Manipulation
 - Language Reward Modulation for Pretraining Reinforcement Learning
 - Transforming a Quadruped into a Guide Robot for the Visually Impaired: Formalizing Wayfinding, Interaction Modeling, and Safety Mechanism
 - Neural Volumetric Memory for Visual Locomotion Control
 - Legs as Manipulator: Pushing Quadrupedal Agility Beyond Locomotion
 - Embodied Amodal Recognition: Learning to Move to Perceive Objects
 - MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
 - Guiding Pretraining in Reinforcement Learning with Large Language Models
 - System Configuration and Navigation of a Guide Dog Robot: Toward Animal Guide Dog-Level Guiding Work
 - DM2: Decentralized Multi-Agent Reinforcement Learning for Distribution Matching
 - Robotic Guide Dog: Leading a Human with Leash-Guided Hybrid Physical Interaction
 - Deep Variational Reinforcement Learning for POMDPs
 - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
 - Discovering Generalizable Skills via Automated Generation of Diverse Tasks
 - A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
 - Continual Learning of Knowledge Graph Embeddings
 - Learning When to Quit: Meta-Reasoning for Motion Planning
 - Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks
 - Joint Inference of Reward Machines and Policies for Reinforcement Learning
 - Human-like Planning for Reaching in Cluttered Environments
 - Simultaneously Learning Transferable Symbols and Language Groundings from Perceptual Data for Instruction Following
 - SAIL: Simulation-Informed Active In-the-Wild Learning
 - Improving Grounded Natural Language Understanding through Human-Robot Dialog
 - Proximal Policy Optimization Algorithms
 - Imagination-Augmented Agents for Deep Reinforcement Learning
 - Learning from Interventions using Hierarchical Policies for Safety Learning
 - Deep Imitation Learning for Autonomous Driving in Generic Urban Scenarios with Enhanced Safety
 - Learning to Teach in Cooperative Multiagent Reinforcement Learning
 - Using Natural Language for Reward Shaping in Reinforcement Learning
 - Agile Autonomous Driving using End-to-End Deep Imitation Learning
 - Adversarial Actor-Critic Method for Task and Motion Planning Problems Using Planning Experience
 - Learning Pipelines with Limited Data and Domain Knowledge
 - Behavioral Cloning from Observation
 
Learning and Planning
- Using Commonsense Knowledge to Answer Why-Questions
 - Learning Multi-Object Dynamics with Compositional Neural Radiance Fields
 - Learning and Deploying Robust Locomotion Policies with Minimal Dynamics Randomization
 - Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion
 - Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
 - Detect, Understand, Act: A Neuro-Symbolic Hierarchical Reinforcement Learning Framework (Extended Abstract)∗
 - Florence: A New Foundation Model for Computer Vision
 - Object Goal Navigation using Goal-Oriented Semantic Exploration
 - Learning Feasibility to Imitate Demonstrators with Different Dynamics
 - Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and Baselines
 - Reward Machines for Vision-Based Robotic Manipulation
 - Decision Transformer: Reinforcement Learning via Sequence Modeling
 - Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
 - Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World
 - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations
 - Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
 - Advice-Guided Reinforcement Learning in a non-Markovian Environment
 - Spatial Intention Maps for Multi-Agent Mobile Manipulation
 - What Does BERT with Vision Look At?
 - A formal methods approach to interpretable reinforcement learning for robotic planning
 
Logical Reasoning
Manipulation
- ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation
 - RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools
 
Mobile robots
NLP
- Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback
 - ACE: A LLM-based Negotiation Coaching System
 
Neurosymbolic
Open-World Generalization
Persona Modeling
Planning
- Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot Agents
 - Learning to Bridge the Gap: Efficient Novelty Recovery with Planning and Reinforcement Learning
 - Practice Makes Perfect: Planning to Learn Skill Parameter Policies
 - SayCanPay: Heuristic Planning with Large Language Models using Learnable Domain Knowledge
 - Video Language Planning
 - Human-like Planning for Reaching in Cluttered Environments
 - Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
 - Reasoning About Physical Interactions with Object-Oriented Prediction and Planning
 - SAIL: Simulation-Informed Active In-the-Wild Learning
 - Adversarial Actor-Critic Method for Task and Motion Planning Problems Using Planning Experience
 - Behavioral Cloning from Observation
 
Quadruped Robot
- Understanding Expectations for a Robotic Guide Dog for Visually Impaired People
 - Towards Robotic Companions: Understanding Handler-Guide DogInteractions for Informed Guide Dog Robot Design
 - Practice Makes Perfect: Planning to Learn Skill Parameter Policies
 - Learning to See Physical Properties with Active Sensing Motor Policies
 
RL
Reinforcement Learning
- Leveraging Constraint Violation Signals For Action-Constrained Reinforcement Learning
 - FlowPG: Action-constrained Policy Gradient with Normalizing Flows
 - True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning
 - Learning to See Physical Properties with Active Sensing Motor Policies
 
Robotic Manipulation
Robotics
- Learned Visual Navigation for Under-Canopy Agricultural Robots
 - Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation
 
Safety
Security
- POEX: Towards Policy Executable Jailbreak Attacks Against the LLM-based Robots
 - Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems
 - Characterizing Physical Adversarial Attacks on Robot Motion Planners
 - BadChain: Backdoor Chain-of-Thought Prompting for Large
 - Universal and Transferable Adversarial Attacks on Aligned Language Models
 - An LLM can Fool Itself: A Prompt-Based Adversarial Attack
 
Skill Discovery
State Estimation
Task Planning
Task and Motion Planning
- ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation
 - RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools
 - GraphEQA: Using 3D Semantic Scene Graphs for Real-time Embodied Question Answering
 - Open-World Task and Motion Planning via Vision-Language Model Inferred Constraints
 
Task-Motion Planning
- LEAGUE: Guided Skill Learning and Abstraction for Long-Horizon Manipulation
 - Code as Policies: Language Model Programs for Embodied Control
 - Using Deep Learning to Bootstrap Abstractions for Hierarchical Robot Planning
 - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
 - Pre-Trained Language Models for Interactive Decision-Making
 - Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
 - Online Replanning in Belief Space for Partially Observable Task and Motion Problems
 - Elephants Don't Pack Groceries: Robot Task Planning for Low Entropy Belief States
 - Planning with Learned Object Importance in Large Problem Instances using Graph Neural Networks
 - Learning When to Quit: Meta-Reasoning for Motion Planning
 - Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs
 - Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
 
VLA
VLM
- SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
 - GraphEQA: Using 3D Semantic Scene Graphs for Real-time Embodied Question Answering
 - Open-World Task and Motion Planning via Vision-Language Model Inferred Constraints