🤖 Embodied Artificial Intelligence Seminar - Readings and Resources
Welcome to the CS6604 Embodied AI Seminar!
Below is a list of topics we’ll cover during the semester, along with recommended readings, and links to project pages or source code repositories where applicable.
For each paper, click on 📚 for the PDF version and on 🌍 for additional resources.
Topic 1: Benchmarks: Simulators, Environments, Datasets
- ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes 📚 🌍
- iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes 📚 🌍
- Matterport3D: Interpreting Visually-Grounded Navigation Instructions in Real Environments 📚 🌍
- CVDN: Vision-and-Dialog Navigation 📚
- Soundspaces: Audio-Visual Navigation in 3D Environments 📚 🌍
- AI2-THOR: An Interactive 3D Environment for Visual AI 📚 🌍
- Rearrangement: A Challenge for Embodied AI 📚
- Visual Room Rearrangement 📚 🌍
- ProcTHOR: Large-Scale Embodied AI Using Procedural AI Generation 📚 🌍
- ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills 📚 🌍
- Object Goal Navigation using Goal-Oriented Semantic Exploration 📚 🌍
- Embodied Question Answering in Photorealistic Environments with Point Cloud Perception 📚 🌍
- Alfred: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks 📚 🌍
- DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following 📚 🌍
- Alexa Arena: A User-Centric Interactive Platform for Embodied AI 📚 🌍
- VirtualHome: Simulating Household Activities via Programs 📚 🌍
- BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation 📚 🌍
- MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge 📚 🌍
Topic 2: Conceptual Framing, World Models, Behavioral and Performance Metrics
- World Models 📚 🌍
- Machine Theory of Mind 📚
- Collaborative World Models: An Online-Offline Transfer RL Approach 📚
- Transformers are Sample-Efficient World Models 📚
- Learning Temporally Abstract World Models without Online Experimentation 📚
- Reward-Free Curricula for Training Robust World Models 📚
- Recurrent World Models Facilitate Policy Evolution 📚 🌍
- Discovering and Achieving Goals via World Models 📚 🌍
- Planning to Explore via Self-Supervised World Models 📚 🌍
- Learning to Model the World with Language📚 🌍
- Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling 📚
- Dream to Control: Learning Behaviors by Latent Imagination 📚
- DayDreamer: World Models for Physical Robot Learning 📚 🌍
- Mastering Diverse Domains through World Models 📚 🌍
- Mastering Atari with Discrete World Models 📚 🌍
- Masked World Models for Visual Control 📚 🌍
- Structured World Models from Human Videos 📚 🌍
- Building Machines That Learn and Think Like People 📚
- Action and Perception as Divergence Minimization 📚
- Intrinsically Motivated Reinforcement Learning 📚
- Decision Transformer: Reinforcement Learning via Sequence Modeling 📚
- Curiosity-Driven Exploration of Learned Disentangled Goal Spaces 📚
- Encouraging and Evaluating Embodied Exploration 📚
- Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration 📚
- Learning to play with intrinsically-motivated, self-aware agents 📚
- On Evaluation of Embodied Navigation Agents 📚
- ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects 📚
- On the Evaluation of Vision-and-Language Navigation Instructions 📚
- A New Path: Scaling Vision-and-Language Navigation With Synthetic Instructions and Imitation Learning 📚
- Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation 📚
- Iterative Vision-and-Language Navigation 📚
- GRIDTOPIX : Training Embodied Agents with Minimal Supervision 📚 🌍
- On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets 📚
Topic 3: Learning about visual sensory information through interaction
- Scene Graph Contrastive Learning for Embodied Navigation 📚
- Learning Navigational Visual Representations with Semantic Map Supervision 📚
- Topological Semantic Graph Memory for Image-Goal Navigation 📚
- Object-Goal Visual Navigation via Effective Exploration of Relations among Historical Navigation States 📚
- One-4-All: Neural Potential Fields for Embodied Navigation 📚
- 🏅 Emergence of Maps in the Memories of Blind Navigation Agents (ICLR'23 Outstanding Paper) 📚
- Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks 📚
- Graph Attention Memory for Visual Navigation 📚
- Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances 📚
- Navigating to Objects Specified by Images 📚 🌍
- TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors 📚 🌍
- Egocentric Planning for Scalable Embodied Task Achievement 📚
- ALP: Action-Aware Embodied Learning for Perception 📚
- Simple but Effective: CLIP Embeddings for Embodied AI 📚
- Continuous Scene Representations for Embodied AI 📚
- Graph-based Environment Representation for Vision-and-Language Navigation in Continuous Environments📚
- Learning Affordance Landscapes for Interaction Exploration in 3D Environments 📚
- PASTA: Pretrained Action-State Transformer Agents 📚
Topic 4: Learning about language and language-guided interaction
- MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception 📚
- Plan4MC: Skill reinforcement learning and planning for open-world Minecraft tasks 📚
- VOYAGER: An Open-Ended Embodied Agent with Large Language Models 📚
- Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents 📚
- Embodied Task Planning with Large Language Models 📚
- Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning 📚
- Chasing Ghosts: Instruction Following as Bayesian State Tracking 📚
- Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents 📚
- SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation 📚
- Building Cooperative Embodied Agents Modularly with Large Language Models 📚
- Asking Before Action: Gather Information in Embodied Decision Making with Language Models 📚
- Language Models Meet World Models: Embodied Experiences Enhance Language Models 📚
- DANLI: Deliberative Agent for Following Natural Language Instructions 📚
- 3D-LLM: Injecting the 3D World into Large Language Models 📚
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought 📚
- PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World 📚
- Embodied Executable Policy Learning with Language-based Scene Summarization📚
- JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models📚
- See and Think: Embodied Agent in Virtual Environment📚
- Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models📚
Topic 5: Dive into robotics, Sim2Sim/Sim2Real Transfer, Embodied Communication, Multi-agent Embodied Collaboration
- Eureka: Human-Level Reward Design via Coding Large Language Models 📚
- PaLM-E: An Embodied Multimodal Language Model 📚
- Learning Interactive Real-World Simulators 📚 🌍
- Open X-Embodiment: Robotic Learning Datasets and RT-X Models 📚 🌍
- RT-2: Vision-Language-Action Models 📚 🌍
- Scaling Robot Learning with Semantically Imagined Experience 📚 🌍
- AR2-D2:Training a Robot Without a Robot 📚 🌍
- IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience📚 🌍
- VIMA: General Robot Manipulation with Multimodal Prompts 📚 🌍
- AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer 📚 🌍
- RoboCat: A self-improving robotic agent 📚 🌍
- Policy Stitching: Learning Transferable Robot Policies 📚 🌍
- EC2 : Emergent Communication for Embodied Control 📚
- Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents 📚
- Heterogeneous Embodied Multi-Agent Collaboration 📚 🌍
- Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments 📚 🌍
Topic 6: Diffusion Policies
- Diffusion Policy: Visuomotor Policy Learning via Action Diffusion 📚
- NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration 📚
- PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play 📚
- Learning Universal Policies via Text-Guided Video Generation 📚
- Compositional Foundation Models for Hierarchical Planning 📚
- XSkill: Cross Embodiment Skill Discovery 📚
Additional Resources
- 🏠 Course Syllabus
- 🗓️ Seminar Schedule
- 🧠 Reinforcement Learning Supplemental Reading
- 📊 Transformers Supplemental Reading
- 🌐 Diffusion for robotics and RL