🤖 CS 6604: Embodied Artificial Intelligence (Fall 2023)

👩🏻‍🏫 Instructor: Ismini Lourentzou

🏫 Meeting time: Tuesdays and Thursdays, 9:30 AM - 10:45 AM EST, McBryde Hall 232 and Zoom

🕦 Office hours: Tuesdays 11:00 AM - 12:00 PM EST, Zoom

đź“š Reading List

Course Description

Embodied Artificial Intelligence (E-AI) is a rapidly advancing field that aims to develop intelligent agents that can perceive and act in the physical world through sensors and actuators and learn from their experiences in completing various physical tasks. In an era where AI systems are increasingly expected to interact with the world in a human-like manner, E-AI offers a promising approach. By integrating intelligence with physicality, AI systems gain a deeper understanding of visual perception, language comprehension, and multimodal interactions.

This is an immersive seminar course that offers a unique opportunity to explore the cutting-edge intersection of embodied AI, computer vision, and natural language processing (NLP). The course goes beyond traditional vision and language approaches by focusing on the integration of cognition, physical embodiment, and multimodal sensory input to develop AI systems that perceive, understand, and interact with the physical world. Students will examine the theoretical foundations, cutting-edge research, and practical implications of embodied AI across various vision-language planning tasks such as Vision-Language Navigation (VLN), Vision-Dialog Navigation (VDN), Embodied Question Answering (EQA), Embodied Task Completion, etc. By the end of the course, students will possess the knowledge and skills required to design, implement, and advance the next generation of intelligent embodied systems.

Prerequisites

Students should have experience with machine learning, data analytics, and deep learning. Strong programming skills in a high-level language such as Python, as well as frameworks for rapid ML prototyping, e.g., PyTorch, Tensorflow, Keras, etc. are essential for implementing and experimenting with the concepts covered in this course. While not mandatory, familiarity with computer vision, natural language processing, and reinforcement learning would be advantageous. Most importantly, students are expected to extract key concepts and ideas from reading ML conference papers.

Course Format

The course is a role-playing paper reading seminar that is structured around reading, presenting, and discussing weekly papers. Each class will involve the presentation and discussion of two papers. Each student will have a unique, rotating role per week. This role defines the lens through which each student reads the paper and determines what they prepare for the group in-class discussion. All students, irrespective of their role, are expected to have read the paper readings of each corresponding session before class and come to class ready to discuss. There will be no exams or traditional assignments. Instead, throughout the course, students will engage in practical hands-on projects and discussions to identify and work on open research questions on a variety of topics in embodied AI.

Course Topics

Key topics covered in the course include:

Presentation Roles

This seminar is organized around the different “roles” students play each week, that define the lens through which students read the paper. Students will be divided into two groups, one group presenting on Tuesdays and the other on Thursdays. In a given class session, students in the presenting groups will each be given a rotating role (described below): Presenter (two students), Reviewer, Archaeologist, Researcher, Industry Expert, and Blogger OR Hacker (pick one). Presenting groups should create a formal presentation, i.e., have slides prepared for the group in-class discussion. For each student in a presenting group, their assigned role determines what they should include in the slides. The Hacker and Blogger roles are the only exceptions to the rule. Hackers should provide a Jupyter Notebook instead of slides and Bloggers go over their written articles.

Depending on changes in course enrollment, the roles might change, for example, remove roles or make roles optional in case enrollment decreases or allow groups of two students for all roles in the event of enrollment increase. Improving based on student feedback, as we go along with the readings, is crucial.

Non-presenter assignment:

Everyone, every week (Optional): After each class session, you may post your thoughts on Piazza, for example, which parts you enjoyed reading, what results and insights you found interesting, a missing result the paper could have included, any useful additional links and resources, etc. Whenever you agree with the comments of a student’s post, make sure to endorse their answer. You can also post a reply with your additional thoughts.

Final Project

The main project goal is to engage students in research on Embodied AI. In particular, students should try to extend papers from topics covered in class and present the research outcomes as a research paper, in a standard conference paper format. Students are encouraged to work in groups of no more than four members, taking into consideration that the work produced should be proportional to the number of members in a team. Groups are required to include a “contributions” section in the final project report, listing each member’s contributions in detail. Projects will be hosted on GitHub and should include a written report accompanied by a descriptive Jupyter Notebook, with a format similar to this notebook. In addition, groups will present their final projects during the last two class sessions. A PowerPoint or LaTex final presentation is required.

Technology

Piazza will be used for announcements, general questions, and discussions, etc. If you are unable to register to Piazza, please email me. Please familiarize yourself with GitHub, Zoom, LaTeX and paper writing practices. To enhance class participation, and unless restricted by low internet bandwidth, please try to keep your video turned on during class. Please keep your audio muted unless you would like to respond to an ongoing discussion or have a question. You can also use the “raise hand” option, type in the chatbox, or use the Zoom reactions for nonverbal feedback. Please remember that all in-class discussions should adhere to Virginia Tech’s Principles of Community. To keep track of student order during office hours, please type your name in the chat as soon as you enter the Zoom room. For one-on-one interactions with the instructor, please post a private note on Piazza or use Slack.

Schedule

We will update the schedule regularly based on the readings and presentations.

Lecture No.DateReadings
1Tuesday, August 22Course Introduction
2Thursday, August 24Building Blocks in Perception (Instructor)
3Tuesday, August 29Building Blocks in Planning (Instructor)
4Thursday, August 31How to Read Papers, What to Look for in Simulators
5Tuesday, September 5 Benchmarks: Simulators, Environments, Datasets
ProcTHOR: Large-Scale Embodied AI Using Procedural AI Generation
6Thursday, September 7 Benchmarks: Simulators, Environments, Datasets
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
7Tuesday, September 12 Benchmarks: Simulators, Environments, Datasets
Object Goal Navigation using Goal-Oriented Semantic Exploration
8Thursday, September 14 Benchmarks: Simulators, Environments, Datasets
ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes
9Tuesday, September 19 Conceptual Framing, World Models, Behavioral and Performance Metrics
Learning to Model the World with Language
10Thursday, September 21 Conceptual Framing, World Models, Behavioral and Performance Metrics
World Models
11Tuesday, September 26 Conceptual Framing, World Models, Behavioral and Performance Metrics
Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration
12Thursday, September 28 Conceptual Framing, World Models, Behavioral and Performance Metrics
DayDreamer: World Models for Physical Robot Learning
13Tuesday, October 3Conceptual Framing, World Models, Behavioral and Performance Metrics
GRIDTOPIX: Training Embodied Agents with Minimal Supervision
 
14Thursday, October 5Project Pitch Due
15Tuesday, October 10 Learning about visual sensory information through interaction
Object-Goal Visual Navigation via Effective Exploration of Relations among Historical Navigation States
16Thursday, October 12  Learning about visual sensory information through interaction
Simple but Effective: CLIP Embeddings for Embodied AI
17Tuesday, October 17 Learning about visual sensory information through interaction
Emergence of Maps in the Memories of Blind Navigation Agents 
Project Proposal Due
18Thursday, October 19 Learning about visual sensory information through interaction
Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks
19Tuesday, October 24 Learning about language and language-guided interaction 
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
20Thursday, October 26 Learning about language and language-guided interaction
Language Models Meet World Models: Embodied Experiences Enhance Language Models
21Tuesday, October 31 Presenting papers related to course projects
22Thursday, November 2Class Canceled
23Tuesday, November 7 Learning about language and language-guided interaction
Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks 
24Thursday, November 9 Robot Learning, Sim2Real Transfer, Multi-agent Systems, Emergent Communication
AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer
25Tuesday, November 14Project Checkpoint Due
26Thursday, November 16 Robot Learning, Sim2Real Transfer, Multi-agent Systems, Emergent Communication
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
No classTuesday, November 21Thanksgiving Break
No classThursday, November 23Thanksgiving Break
27Tuesday, November 28Robot Learning, Sim2Real Transfer, Multi-agent Systems, Emergent Communication
28Thursday, November 30Project Presentations
29Tuesday, December 5Project Presentations

Grading

  1. Readings: 60 points: Each student will be in the presenting role for 12 sessions and the non-presenting role for the remaining 12. You can earn up to 4 points each time you present (all presenting roles are considered equal). You will receive full credit if you do a thorough job of undertaking your role and present it in a clear and compelling way. When you aren’t presenting, you can earn up to 1 point by completing the non-presenting assignment and by participating in the class. At the end of the semester, extra credit of up to 3 points will be assigned to the most well-made presentation, blog, and notebook.

  2. Final Project: 40 points divided into the following categories:

    • Proposal: 5 points.
    • Clarity: 12 points; your paper should be readable, contain well-defined and clear motivation and contribution statements and appropriately make connections with related work. In general, your project report should follow standard machine learning conference paper formatting and style.
    • Novelty: 3 points; your project should propose something new (a new method, application, or perspective).
    • Code: 5 points; the code accompanying your project should be well-documented and your experimental results should be reproducible. Your repository should include a README file with full instructions on how to run the code. Moreover, your code should be easy to run with one simple command; if there are multiple steps involved, please make a bash script.
    • In-class presentation: 15 points; your final presentation should be clear to the audience and provide a solid review of your work as if you were presenting at a conference. You can find examples in the NeurIPS’20 schedule (Oral Spotlight sessions such as this one).

Attendance and late work

If you expect to miss a class session in which you are in a “presenting” role, you should still create the presentation for your assigned session and find someone else to present for you before the day of the presentation. Missing the class session in which you are supposed to present without arranging the aforementioned accommodations will result in a penalty of 12 points from your total grade, as this disrupts the whole class. If you miss a non-presenting assignment, you’ll get a zero for that session. Final project presentations cannot be postponed, as they are scheduled in the course’s last few sessions and students need to present at their assigned timeslot. You are welcome to switch your timeslot with another group, but you are responsible for making such arrangements. Other materials, such as the final project submission and report are negotiable, based on the severity of the request, e.g., medical reasons.

At any time during the course, if you are facing any difficulties in meeting the course deliverables or would like to discuss any concerns, you are welcome to contact me over email, Slack, or Piazza. Students can also submit anonymous feedback to this link. Students seeking special accommodations based on disabilities should contact me and also coordinate accessibility arrangements with the Services for Students with Disabilities office.

Honor Code Statement

All assignments submitted shall be considered “graded work” and all aspects of your coursework are covered by the Honor Code. Students enrolled in this course are responsible for abiding by the Honor Code. For additional information about the Honor Code, please visit https://www.honorsystem.vt.edu/. You must attribute appropriate credit to existing ideas, facts, methods, and external sources of code by citing the source. At all times, you should avoid claiming someone else’s work as your own. This course will have a zero-tolerance philosophy regarding plagiarism or other forms of cheating, and incidents of academic dishonesty will be reported. A student who has doubts about how the Honor Code applies to this course should obtain specific guidance from the course instructor before submitting the respective assignment.