Seminars
Within EI, we currently have two seminar series (1) the main EI Seminar series, detailed below, and (2) the Computational Sensorimotor Learning (CSL) Seminar Series.
The main EI seminar series consists of talks from PIs doing research in ML, robotics and related fields. If you're interested in these talks, please join the EI seminar mailing list here or join the #talks channel in the EI slack. Recorded talks are posted to our Youtube Channel. All seminars are also posted to the EI Seminar Calendar and this semester's schedule can be found here.
Upcoming Seminars
Redefining Context for Powerful Test-Time Adaptation Using Unlabeled Data
21 November 2024: Sharut Gupta (MIT)
Abstract: Foundation models, while powerful, often struggle under distribution shifts in unfamiliar domains, typically requiring costly data collection and retraining to maintain performance. Test-Time Adaptation (TTA) has emerged as a promising approach to address these limitations, enabling models to adapt dynamically to new target domains at test time. In this talk, I will present TTA approaches by rethinking the notion of “context”—an abstract concept drawn from in-context learning—to address two fundamental challenges: improving out-of-distribution generalization and aligning representations with varying task-specific inductive biases, such as fairness constraints. Specifically, we explore two ways of leveraging unsupervised in-context learning, allowing models to use unlabeled data to adapt their behavior flexibly. First, we will demonstrate how using unlabeled domain data as context can align models with diverse distributions, enhancing their robustness in changing environments. Next, we will extend this idea to further improve this alignment by enforcing task-specific inductive priors. Together, these approaches showcase the potential of unsupervised, context-driven TTA to address key challenges of current-generation foundation models. Finally, we will explore the broader implications of this context-driven perspective for building world models, planning, and robust decision-making.
Biography: Sharut Gupta is a third-year PhD candidate in Electrical Engineering and Computer Science (EECS) at MIT, advised by Prof. Stefanie Jegelka. Her research interests focus on multi-modal representation learning, robustness, and out-of-distribution generalization. She received her Bachelor’s and Master’s (Dual) degrees from the Indian Institute of Technology Delhi (IIT Delhi), where she completed her thesis research with Prof. Yoshua Bengio on "A Causal Perspective on Efficient Distributed Systems”. Sharut is a recipient of the MIT Presidential Fellowship and has completed research internships at FAIR (Meta AI) and Google DeepMind.
Past Seminars
Recent Progress on Foundation Model Supervision for Robot Learning
14 November 2024: Jason Ma (University of Pennsylvania)
Abstract: Achieving general-purpose robotics requires robots to quickly learn diverse tasks without extensive training data or hand-engineered controllers for each scenario. While recent efforts in crowd-sourcing robot datasets have expanded available training data, these remain orders of magnitude smaller than datasets used in vision or language foundation models. Rather than solely focusing on scaling robot data, my research develops algorithms that train new and leverage existing foundation models from non-robot domains to provide scalable supervision across diverse robot embodiments, tasks, and policy learning approaches -- in short, enabling robot learning from foundation model supervision. This approach enables automated task learning while bypassing labor-intensive controller design and data collection. In this talk, I will present some recent progress in these directions. First, I will discuss Eurekaverse, a LLM-based environment curriculum generation algorithm that enables acquisition of complex parkour skills in the real world. Second, I will present Generative Value Learning, a new approach for universal value function enabled by long-context VLM in-context learning.
Biography: Jason Ma is a final-year PhD student at the University of Pennsylvania. His research interests include foundation models for robotics, robot learning, and reinforcement learning. His work has received Best Paper Finalist at ICRA 2024, Top 10 NVIDIA Research Projects of 2023, and covered by popular media such as the Economist, Fox, Yahoo, and TechCrunch. Jason is supported by Apple Scholar in AI/ML PhD Fellowship as well as OpenAI Superalignment Fellowship.
The Promises and Pitfalls of Open-source Agent Systems
7 November 2024: Tim Dettmers (Carnegie Mellon University / Allen Institute for AI)
Abstract: Agent systems, AI systems that make their own plans and act on them, have shown promising results particularly for coding-changes such as SWE-bench. However, currently, most agent systems rely on closed-source API models such as GPT-4o and Claude as it is believed that open-source models do not have the capabilities to make up successful agent systems. In this talk, I show that agent systems powered by open-source models can match the performance of systems based on GPT-4o. This implies that for good task performance how you use a model is much more important than what model you use. I also discuss problems with agent system generalization and high variability in evaluation that shows we need to be cautious when making scientific claims about agent systems. I will argue that we will need to focus on these generalization and evaluation challenges to make steady scientific progress.
Biography: Tim Dettmers is a Research Scientist at the Allen Institute for AI and an Assistant Professor at Carnegie Mellon University. His work focuses on making foundation models, such as ChatGPT, accessible to researchers and practitioners by reducing their resource requirements. His main focus is to develop high-quality agent systems that are open-source and can be run on consumer hardware, such as laptops. His research won oral, spotlight, and best paper awards at conferences such as ICLR and NeurIPS and was awarded the Block Award and Madrona Prize. He created the bitsandbytes open-source library for efficient foundation models, which is growing at 2.2 million installations per month, and for which he received Google Open Source and PyTorch Foundation awards.
Games and Filters: A Road to Safe Intelligence
31 October 2024: Jaime Fernández Fisac (Princeton University)
Abstract: Despite their growing sophistication, autonomous systems still struggle to operate safely in uncertain, open-world situations—as highlighted by public skepticism toward early automated driving technologies. Meanwhile, the excitement around generative AI has been tempered by concerns about potential harms from poorly understood human–AI interactions, where existing guardrails often obscure rather than remove underlying pitfalls. Comprehensive safety assurances remain elusive in both domains—but could insights from one catalyze breakthroughs in the other? This talk will demonstrate how bridging AI’s learned representations and control theory’s safety principles lays a strong common foundation for certifiable intelligent systems. First, we will explore how game-theoretic reinforcement learning synthesizes robust safety filters with clear-cut guarantees for robotics problems beyond the reach of model-based methods, from legged locomotion to urban driving. Next, we will discuss the value of closing the safety–learning loop by accounting for players’ evolving beliefs during interactions, reducing conservativeness without compromising safety. Finally, we will review early evidence that generative AI systems can use introspective self-queries to refine situational uncertainty, identify novel hazards, and anticipate the future consequences of their actions on users, with strong implications on AI alignment. The talk will end with a vision for general human–AI safety filters that monitor interactions and proactively steer them towards safe and beneficial outcomes.
Biography: Jaime Fernández Fisac is an Assistant Professor of Electrical and Computer Engineering at Princeton University, where he directs the Safe Robotics Laboratory and co-directs Princeton AI4ALL. His research integrates control systems, game theory, and artificial intelligence to equip robots with transparent safety assurances that users and the public can trust. Before joining Princeton, he was a Research Scientist at Waymo, where he pioneered new approaches to interaction planning that continue to shape how autonomous vehicles share the road today. He is also the co-founder of Vault Robotics, a startup developing agile delivery robots that work alongside human drivers. Prof. Fisac holds an Engineering Degree from Universidad Politécnica de Madrid, a Master’s in Aeronautics from Cranfield University, and a Ph.D. in Electrical Engineering and Computer Sciences from the University of California, Berkeley. His work has been featured in The Wall Street Journal and WIRED, and recognized with the Google Research Scholar Award and the NSF CAREER Award.
Have Large Models Changed Robotics?
24 October 2024: Danny Driess (Google DeepMind)
Abstract: In this talk, I will give perspectives on how large models have changed robotics, and why there is still fundamental research to be done. The main focus of the discussion is how we can achieve generalization in robotics. More traditional methods from Task and Motion Planning (TAMP) are capable of solving complex sequential manipulation problems while generalizing over a wide range of initial scene configurations. However, those methods make assumptions that limit their generalization in real world scenarios. In the first half of the talk, I will discuss how (small) machine learning methods can be integrated into TAMP to address some of these shortcomings. In the second half of the talk, I will then explain how the rise of large models has transformed these previous findings. In particular, I will present PaLM-E, a large vision-language model for embodied decision making, RT-2, a vision-language-action model that connects large models to low-level robot actions, and Aloha Unleashed, a recipe to push the boundary of robot dexterity. Finally, I will sort all these developments into a larger picture of how the future of robotics research could look like.
Aligning Language Models with LESS Data and a Simple (SimPO) Objective
24 October 2024: Mengzhou Xia (Princeton University)
Abstract: Aligning pre-trained language models ensures they follow human instructions reliably to produce helpful and harmless outputs. Supervised fine-tuning and preference optimization are two key approaches for achieving this goal. In this talk, I will introduce two novel algorithms designed to enhance these two stages. First, I introduce LESS, a model- and optimizer-aware algorithm for data selection. LESS leverages a few curated examples to identify instruction-tuning data that fosters specific capabilities in the model. It avoids relying on surface-form cues by framing data selection as an optimization problem, aiming to minimize the loss on a target dataset (e.g., validation). Our experiments show that training on just 5% of the data selected by LESS outperforms training on the full dataset, with the selected data often transferable across different model sizes and families. Next, I will introduce a simple yet effective algorithm for model alignment, SimPO, which utilizes a reference-free reward formulation based on the average likelihood of model responses. Extensive experiments demonstrate that SimPO outperforms existing offline preference optimization methods, such as DPO, across various settings. Notably, the Gemma2-9B model, tuned with SimPO, achieved the highest rank among <10B models on Chatbot Arena, AlpacaEval 2, and WildBench.
Biography: Mengzhou Xia is a final-year PhD student in Computer Science at Princeton University, advised by Danqi Chen. Her research focuses on developing algorithms to build effective language models via data-centric approaches and objective designs under an academic budget. She received her master's degree from Carnegie Mellon University, where she worked with Graham Neubig and her bachelor's degree from Fudan University in China. Mengzhou is a recipient of the 2024 Apple Scholars in AI/ML PhD Fellowship and the 2022 Bloomberg Data Science PhD Fellowship. She has also been awarded as a 2024 MIT EECS Rising Star. Throughout her PhD, she has interned at Meta AI, Microsoft Research, and Bloomberg AI.
Aligning Robot and Human Representations
17 October 2024: Andreea Bobu (MIT)
Abstract: To perform tasks that humans want in the world, robots rely on a representation of salient task features; for example, to hand me a cup of coffee, the robot considers features like efficiency and cup orientation in its behavior. Prior methods try to learn both a representation and a downstream task jointly from data sets of human behavior, but this unfortunately picks up on spurious correlations and results in behaviors that do not generalize. In my view, what’s holding us back from successful human-robot interaction is that human and robot representations are often misaligned: for example, our assistive robot moved a cup inches away from my face -- which is technically collision-free behavior -- because it lacked an understanding of personal space. Instead of treating people as static data sources, my key insight is that robots must engage with humans in an interactive process for finding a shared representation for more efficient, transparent, and seamless downstream learning. In this talk, I focus on a divide and conquer approach: explicitly focus human input on teaching robots good representations before using them for learning downstream tasks. This means that instead of relying on inputs designed to teach the representation implicitly, we have the opportunity to design human input that is explicitly targeted at teaching the representation and can do so efficiently. I introduce a new type of representation-specific input that lets the human teach new features, I enable robots to reason about the uncertainty in their current representation and automatically detect misalignment, and I propose a novel human behavior model to learn robust behaviors on top of human-aligned representations. By explicitly tackling representation alignment, I believe we can ultimately achieve seamless interaction with humans where each agent truly grasps why the other behaves the way they do.
Biography: Andreea Bobu is an Assistant Professor at MIT in AeroAstro and CSAIL. She leads the Collaborative Learning and Autonomy Research Lab (CLEAR Lab), where they develop autonomous agents that learn to do tasks for, with, and around people. Her goal is to ensure that these agents' behavior is consistent with human expectations, whether they interact with expert designers or novice users. She obtained her Ph.D. in Electrical Engineering and Computer Science at UC Berkeley with Anca Dragan in 2023. Prior to her Ph.D. she earned her Bachelor’s degree in Computer Science and Engineering from MIT in 2017. She was the recipient of the Apple AI/ML Ph.D. fellowship, is a Rising Star in EECS and an R:SS and HRI Pioneer, and has won best paper award at HRI 2020 and the Emerging Research Award at the International Symposium on the Mathematics of Neuroscience 2023. Before MIT, she was also a Research Scientist at the AI Institute and an intern at NVIDIA in the Robotics Lab.
On Building General, Zero-Shot Robot Policies
10 October 2024: Mahi Shafiullah (NYU Courant Institute)
Abstract: Robot models, particularly those trained with large amounts of data, have recently shown a plethora of real-world manipulation and navigation capabilities. Several independent efforts have shown that given sufficient training data in an environment, robot policies can generalize to demonstrated variations in that environment. However, needing to finetune robot models to every new environment stands in stark contrast to models in language or vision that can be deployed zero-shot for open-world problems. In this talk, I will present Robot Utility Models (RUMs), a framework for training and deploying zero-shot robot policies that can directly generalize to new environments without any finetuning. To create RUMs efficiently, we developed new tools to quickly collect data for mobile manipulation tasks, integrate such data into a policy with multi-modal imitation learning, and deploy policies on-device on Hello Robot Stretch, a cheap commodity robot, with an external mLLM verifier for retrying. We trained five such utility models for opening cabinet doors, opening drawers, picking up napkins, picking up paper bags, and reorienting fallen objects. Our system, on average, achieves 90% success rate in unseen, novel environments interacting with unseen objects. Moreover, the utility models can also succeed in different robot and camera set-ups with no further data, training, or fine-tuning. I will talk about our primary lessons from training RUMS: namely the importance of training data over training algorithm and policy class, guidance about data scaling, necessity for diverse yet high-quality demonstrations, and a recipe for robot introspection and retrying to improve performance on individual environments. All the code and data, and models I will talk about have been open sourced in our website, https://robotutilitymodels.com/
Biography: Nur Muhammad “Mahi” Shafiullah is a Ph.D. student at NYU Courant Institute advised by Lerrel Pinto. His research is driven by a vision of robots seamlessly integrated into our messy everyday lives: automating problems and continuously learning alongside us. Mahi's recent work has developed new algorithms for learning robotic behavior, large robot models for robust manipulation, and spatio-semantic memory that can handle dynamic changes in the world. He is passionate about getting these models and algorithms out in the real-world, operating autonomously in NYC homes. His work has been featured in Oral and Spotlight presentations and demos at conferences like ICRA, RSS, NeurIPS, ICML, and ICLR. Mahi is supported by the Apple Fellowship, the Jacob T. Schwarz Fellowship, and was a visiting scientist at Meta. In a past life, Mahi was a Silver medalist at IMO and worked on adversarial robustness as an undergrad at MIT (S.B. ‘19).
Foundations of High-Modality Multisensory AI
3 October 2024: Paul Liang (MIT)
Abstract: Building multisensory AI that learns from text, speech, video, real-world sensors, wearable devices, and medical data holds promise for impact in many scientific areas with practical benefits, such as supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents. However, multimodal systems quickly run into data and modeling bottlenecks: it is increasingly difficult to collect paired multimodal data and scale multimodal transformers as the number of modalities and their dimensionality grows. In this talk, I propose a vision of high-modality learning: building multimodal AI over many diverse input modalities, given only partially observed subsets of data or model representations. We will cover 2 key ideas to enable high-modality learning: (1) discovering how modalities interact to give rise to new information, and (2) tackling the heterogeneity over many different modalities. Finally, I will discuss our collaborative efforts in scaling AI to many modalities and tasks for real-world impact on affective computing, mental health, and cancer prognosis.
Biography: Paul Liang is an Assistant Professor at MIT Media Lab and MIT EECS. His research advances the foundations of multisensory artificial intelligence to enhance the human experience. He is a recipient of the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, Rising Stars in Data Science, and 3 best paper awards. Outside of research, he received the Alan J. Perlis Graduate Student Teaching Award for developing new courses on multimodal machine learning.
Learning Robust, Real-world Visuomotor Skills from Generated Data
26 September 2024: Ge Yang (MIT)
Abstract: The mainstream approach in robot learning today relies heavily on imitation learning from real-world human demonstrations. These methods are sample efficient in controlled environments and easy to scale to a large number of skills. However, I will present algorithmic arguments to explain why merely scaling up imitation learning is insufficient for advancing robotics. Instead, my talk will focus on developing performant visuomotor policies in simulation and the techniques that make them robust enough to transfer directly to real-world color observations.
Biography: Ge Yang is a postdoctoral researcher working with Phillip Isola at MIT CSAIL. His research focuses on developing the algorithmic and system foundations for computational visuomotor learning, with an emphasis on learning from synthetic data and sim-to-real transfer. Ge's work is dedicated to making robots capable, versatile, and intelligent.
Cultural Biases, World Languages, and Privacy Protection in Large Language Models
19 September 2024: Wei Xu (Georgia Institute of Technology)
Abstract: In this talk, I will highlight three key aspects of large language models: (1) cultural bias in LLMs and pre-training data, (2) decoding algorithm for low-resource languages, and (3) human-centered design for real-world applications. The first part focuses on systematically assessing LLMs' favoritism towards Western culture. We take an entity-centric approach to measure the cultural biases among LLMs (e.g., GPT-4, Aya, and mT5) through natural prompts, story generation, sentiment analysis, and named entity tasks. One interesting finding is that a potential cause of cultural biases in LLMs is the extensive use and upsampling of Wikipedia data during the pre-training of almost all LLMs. The second part will introduce a constrained decoding algorithm that can facilitate the generation of high-quality synthetic training data for fine-grained prediction tasks (e.g., named entity recognition, event extraction). This approach outperforms GPT-4 on many non-English languages, particularly low-resource African languages. Lastly, I will showcase an LLM-powered privacy preservation tool designed to safeguard users against the disclosure of personal information. I will share findings from an HCI user study that involves real Reddit users utilizing our tool, which in turn informs our ongoing efforts to improve the design of AI models. Concluding the talk, I will briefly touch upon recent research exploring the temporal robustness of large language models (e.g., handling neologisms) and advances in human-AI interactive evaluation of LLM-generated texts.
Biography: Wei Xu is an Associate Professor in the College of Computing and Machine Learning Center at the Georgia Institute of Technology, where she is the director of the NLP X Lab. Her research interests are in natural language processing and machine learning, with a focus on Generative AI, robustness and fairness of large language models, multilingual LLMs, as well as interdisciplinary research in AI for science, education, accessibility, and privacy. She is a recipient of the NSF CAREER Award, AI for Everyone Award, Best Paper Award and Honorable Mention at COLING'18, ACL’23. She also received research funds from DARPA and IARPA. She is currently an executive board member of NAACL.
Don’t teach. Incentivize: Scale-first view of Large Language Models
2 May 2024: Hyung Won Chung (OpenAI)
Abstract: The unit cost of compute has been decreasing exponentially over 100+ years. As a response, AI researchers should develop learning paradigms that enable more computation to maximally leverage this trend. Methods that were previously considered too expensive are increasingly more promising. For example, the current Large Language Models (LLMs) are capable enough that we should start incentivizing the models as opposed to directly teaching manually-enumerated lists of skills and tasks. Incentive-based learning requires a lot more compute, but when done correctly it leads to generalizable skills such as reasoning. I use the next token prediction (i.e. the learning objective for most LLMs) as a running example to illustrate this concept. The next token prediction task can be seen as a massively multitask learning framework, which then serves as a weak incentive structure for the models. The weak incentive structure coupled with the size of the model is closely related to the phenomenon where some abilities emerge with scale.
Biography: Hyung Won is a research scientist at OpenAI. He has worked on various aspects of Large Language Models: pre-training, instruction fine-tuning, reinforcement learning with human feedback, reasoning, multilinguality, parallelism strategies, etc. Some of the notable work includes scaling Flan paper (Flan-T5, Flan-PaLM) and T5X, the training framework used to train the PaLM language model. Before OpenAI, he was at Google Brain and before that he received a PhD from MIT where he worked on renewable energy and clean water systems.
High-level guidance for generalizable reinforcement learning
18 April 2024: Lawson Wong (Northeastern University)
Abstract: Reinforcement learning (RL) is a compelling framework for robotics and embodied intelligence when the environment/task is not fully known. However, it is difficult to make RL work. My thesis is that RL is difficult because it is too general. We need to, and often can, provide RL a helping hand by providing a modicum of task-relevant high-level information. In this talk, I will discuss various thrusts in my research group on this theme: (1) Using symmetry to quickly learn to plan and navigate; (2) Following a single high-level trajectory such as a path on a coarse map; (3) Integrating a wider range of guidance into the RL loop.
Biography: Lawson L.S. Wong is an assistant professor in the Khoury College of Computer Sciences at Northeastern University. At Northeastern, he leads the Generalizable Robotics and Artificial Intelligence Laboratory (GRAIL). The group's research focuses on learning, representing, estimating, and using knowledge about the world that an autonomous robot may find useful. His research agenda is to identify and learn intermediate state representations that enable effective robot learning and planning, and therefore enable robot generalization. Prior to Northeastern, Lawson was a postdoctoral fellow at Brown University, working with Stefanie Tellex. He completed his PhD at the Massachusetts Institute of Technology, advised by Leslie Pack Kaelbling and Tomás Lozano-Pérez.
Spatio-Temporal Maps of the Human Brain
11 April 2024: Aude Oliva (MIT)
Abstract: Any cognitive function is mediated by a network of many cortical sites whose activity is orchestrated through complex temporal dynamics. To understand perception and cognition, we need to identify brain responses in space and time. In this talk, I will present a series of cognitive neuroscience results based on multivariate response patterns of the human brain recorded with functional magnetic resonance imaging (fMRI) and with magneto-encephalography (MEG), for visual, auditory and memory tasks.
Biography: Aude Oliva, PhD is the MIT director in the MIT-IBM Watson AI Lab and a senior research scientist at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), where she heads the Computational Perception and Cognition group. Oliva has received an NSF Career Award in computational neuroscience, a Guggenheim fellowship in computer science and a Vannevar Bush Faculty Fellowship in cognitive neuroscience. Her research is cross-disciplinary, spanning human perception and cognition, computer vision and cognitive neuroscience, and focuses on research questions at the intersection of all three domains. She earned a MS and PhD in cognitive science from the Institute National Polytechnique de Grenoble, France.
Frugal, Interpretable, Dynamics-Inspired Architectures for Sequence Analysis
4 April 2024: Octavia Camps (Northeastern University)
Abstract: One of the long-term objectives of Machine Learning is to endow machines with the capacity of structuring and interpreting the world as we do. This is particularly challenging in scenes involving time series, such as video sequences, since seemingly different data can correspond to the same underlying dynamics. In this talk, I will discuss how we can leverage concepts from dynamical systems theory to design frugal, and interpretable architectures for sequence analysis, classification, prediction, and manipulation. I will illustrate these ideas with two examples. Firstly, I will show how we can incorporate view invariance while designing computer vision architectures for cross-view action recognition. The central theme of this approach is the use of dynamical models, and their associated invariants, as an information-autoencoding unsupervised learning paradigm. This framework is flexible and can be used with different types of input modalities: RGB, 3D Skeletons, or both. Comparisons against the current state of the art methods, using four widely used benchmark datasets, show that this approach achieves state of the art in all input modalities and that it significantly closes the performance gap between RGB and 3D skeleton-based approaches. In the second example, I will introduce a framework inspired by recent results in non-linear systems identification, capable of decomposing a video into its moving objects, their attributes, and the dynamic modes of their trajectories. The framework captures the dynamic information as the eigenvalues and eigenvectors of a Koopman operator, which provide an interpretable and parsimonious representation. This decomposition can then be used to perform video analytics, predict future frames, and generate and manipulate synthetic videos.
Biography: Octavia Camps received a B.S. degree in computer science and a B.S. degree in electrical engineering from the Universidad de la Republica (Uruguay), and a M.S. and a Ph.D. degree in electrical engineering from the University of Washington. Since 2006, she has been a Professor in the Electrical and Computer Engineering Department at Northeastern University. From 1991 to 2006 she was a faculty of Electrical Engineering and of Computer Science and Engineering at The Pennsylvania State University. Prof. Camps was a visiting researcher at the Computer Science Department at Boston University during Spring 2013 and in 2000, she was a visiting faculty at the California Institute of Technology and at the University of Southern California. She is an associate editor IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) and a General Chair for IEEE/CVF Computer Vision and Pattern Recognition (CVPR) 2024. Her main research interests include dynamics-based computer vision, machine learning, and image processing. In particular, her work seeks data-driven dynamic representations for high dimensional temporal sequences, which are compact, physically meaningful, and capture causality relationships. Combining recent deep learning developments with concepts from dynamic systems identification, she has developed models and algorithms for a range of video analytics applications, including human re-identification, visual tracking, action recognition, video generation, and medical imaging.
Leveraging Symmetries to Make Robotic Learning More Data Efficient
21 March 2024: Rob Platt (Northeastern University)
Abstract: Many robotics problems have transition dynamics that are symmetric in SE(2) and SE(3) with respect to rotation, translation, scaling, reflection, etc. In these situations, any optimal policy will also be symmetric over these transformations. In this talk, I leverage this insight to improve the data efficiency of policy learning by encoding domain symmetries directly into the neural network model using group invariant and equivariant layers. The result is that we can learn non-trivial visuomotor control policies with much less data than is typically the case. For imitation learning, this significantly reduces the number of demonstrations required. For reinforcement learning, it reduces the amount of experience needed to learn a good policy. In fact, we can sometimes learn good policies from scratch training directly on physical robotic hardware in real time.
Biography: Rob Platt is an Associate Professor in the Khoury College of Computer Sciences at Northeastern University and a Faculty Fellow at BDAI. He is interested in developing robots that can perform complex manipulation tasks alongside humans in the uncertain everyday world. Much of his work is at the intersection of robotic policy learning, planning, and perception. Prior to coming to Northeastern, he was a Research Scientist at MIT and a technical lead at NASA Johnson Space Center.
Unleashing the Power of LLMs for Visual Understanding
14 March 2024: Kate Saenko (Boston University)
Abstract: Recent work has adapted Large Language Models (LLMs) to various visual tasks, such as captioning and answering questions about images or short videos. The resulting multimodal LLMs combine visual understanding with powerful reasoning and "common sense" capabilities of LLMs. However, multimodal-LLMs still struggle with some fundamental visual tasks like image classification and understanding long videos. This talk will cover two recent papers addressing these limitations. CLAMP proposes a parameter-efficient fine-tuning approach for LLMs using a contrastive image-caption matching objective, enabling LLMs to achieve good zero-shot image classification performance, outperforming state-of-the-art multimodal-LLMs by 13% and slightly surpassing contrastive learning with a custom text model. VideoMosaic introduces learnable spatiotemporal queries to adapt pretrained video LLMs (vLLMs) for generalizing to much longer videos. The approach incorporates a global-local video Qformer with two new modules that leverage global video context to compute contextual tokens for understanding short and long video segments. Trained on HowTo100M, VideoMosaic outperforms state-of-the-art large models by 3-6% on zero-shot long video understanding benchmarks and improves the vLLM's performance on short-term action recognition. These findings demonstrate the potential of adapting LLMs for new visual understanding tasks and expanding their capabilities.
Biography: Kate Saenko is a computer scientist, AI researcher at Meta and professor at Boston University. She has made notable contributions to the field of artificial intelligence, particularly in the areas of computer vision and machine learning. Her work has helped advance the state-of-the-art in developing more adaptive, generalizable and multimodal AI systems.
Interpreting Training
7 March 2024: Naomi Saphra (Harvard University)
Abstract: For years, both learning theory and empirical science of deep learning used multipass training on small image classification datasets as a primary testbed and source of inspiration. As a result, our understanding of models and training has largely taken the form of smooth, simple, and continuous laws. Recently, the machine learning community has begun considering textual data and other settings that test discrete reasoning. Observations of training in these environments have introduced discontinuous training dynamics, questioned assumptions about the economy of representations, and highlighted the often-neglected role of random seed. This talk will focus on understanding the nuances of training in a wider range of settings. I will begin by discussing substantial phase transitions during masked language model pretraining, and how we can combine them with perspectives from evolutionary biology to repair the epistemology of mechanistic approaches to interpretability. Then, I will present recent results illuminating training through the historically neglected impact of random seeds. The first of these findings is that for text classification, in contrast with previous results in image classification, different fine-tuning seeds can lead to different loss surface basins that provide different generalization heuristics. Finally, I will discuss an unsupervised approach to discovering and visualizing random variation in training and its influence on the rate of convergence and spontaneous generalization. Overall, these results can support a complex and nuanced new science of deep learning.
Biography: Naomi Saphra is a research fellow at the Kempner Institute at Harvard University. She is interested in language model training dynamics: how models learn to encode linguistic patterns or other structure, how generalization develops, and how we can introduce useful inductive biases into the training process. She has a particular interest in applying models from evolutionary biology to understand neural networks. Recently, Dr. Saphra has become interested in fish. Previously, she earned a PhD from the University of Edinburgh on Training Dynamics of Neural Language Models; worked at NYU, Google and Facebook; attended Johns Hopkins and Carnegie Mellon University; and won multiple awards for being the best disabled person. Outside of research, she plays roller derby under the name Gaussian Retribution, performs standup comedy, and shepherds other disabled programmers into the world of code dictation.
Good Old-fashioned LLMs (or, Autoformalizing the World)
22 February 2024: Jacob Andreas (MIT)
Abstract: Classical formal approaches to artificial intelligence, based on manipulation of symbolic structures, have a number of appealing properties - they generalize (and fail) in predictable ways, provide interpretable traces of behavior, and can be formally verified or manually audited for correctness. Why are they so rarely used in the modern era? One of the major challenges in the development of symbolic AI systems is what McCarthy called the "frame problem": the impossibility of enumerating a set of symbolic rules that fully characterize the behavior of every system in every circumstance. Modern deep learning approaches avoid this representational challenge, but at the cost of interpretability, robustness, and sample-efficiency. How do we build learning systems that are as flexible as neural models but as understandable and generalizable as symbolic ones? In this talk, I'll describe a recent line of work aimed at automatically building "just-in-time" formal models tailored to be just expressive enough to solve tasks of interest In this approach, neural sequence models pre-trained on text and code are used to place priors over symbolic model descriptions, which are then verified and refined interactively - yielding symbolic graphics libraries that can be used to solve image understanding problems, or symbolic planning representations for sequential decision-making. Here natural language turns out to play a central role as an intermediate representation linking neural and symbolic computation, and I'll conclude with some very recent work on using symbolic reasoning to improve the coherence and factual accuracy of language models themselves.
Biography: Jacob Andreas is an associate professor at MIT in the Department of Electrical Engineering and Computer Science as well as the Computer Science and Artificial Intelligence Laboratory. His research aims to build intelligent systems that can communicate effectively using language and learn from human guidance. Jacob earned his Ph.D. from UC Berkeley, his M.Phil. from Cambridge (where he studied as a Churchill scholar) and his B.S. from Columbia. He has been named a Kavli Fellow by the National Academy of Sciences, and has received the NSF CAREER award, MIT's Junior Bose and Kolokotrones teaching awards, and paper awards at ACL, ICML and NAACL.
Progress and challenges of language model powered agents
8 February 2024: Robert Guangyu Yang (MIT)
Abstract: In this talk, I will discuss some ongoing progress from our group building agents powered by large language models. Such agents have shown dramatically different capabilities compared to previous agents, thanks, mostly but not exclusively, to the advances of large language models. I will demonstrate some capabilities of these agents in the domain of social interactions and task solving. Meanwhile, I will showcase and discuss some challenges our group is facing experimenting with these agents.
Biography: Robert Yang joined the McGovern Institute as an associate member in July 2021 and is also an Assistant Professor in the Department of Brain and Cognitive Sciences (BCS) with a joint appointment in the EECS Department in the Schwarzman College of Computing (SCC). He trained in physics at Peking University and then went on to obtain a PhD in computational neuroscience at New York University, followed by an internship in software engineering at Google Brain. Before coming to MIT, Robert conducted postdoctoral research at the Center for Theoretical Neuroscience of Columbia University, where he was a Junior Fellow at the Simons Society of Fellows.
Towards Large Behavior Models: Versatile and Dexterous Robots via Supervised Learning Abstract:
7 December 2023: Ben Burchfield and Siyuan Feng (TRI)
Abstract: Recent advances in machine learning have transformed multiple AI-related fields. Notably, robust general-purpose language and vision models are fast becoming reality and these new capabilities have already begun making their way into consumer-facing technologies where they affect the lives of many millions of people. These same underlying advancements also portend sea-change in robotics. It is now possible to reliably imbue robots with new behaviors, such as beating eggs or folding cloth, using just an hour or two of teaching and a few dozen GPU-hours of compute. In this talk, we will discuss our team's push, at the Toyota Research Institute, to scale ML-powered robot behavior teaching and the road ahead to general-purpose Large Behavior Models for robots. These models will possess the flexibility and generality of existing Large Language Models, but will be capable of dexterously controlling a robot to effect change in the physical world.
Biography: Siyuan Feng is a staff research scientist at Toyota Research Institute. He co-lead the Large Behavior Model project that aims to create general purpose robots via learning at scale. Prior to LBM, he led TRI's Dish Loading project. Before joining TRI, Siyuan received his PhD from Carnegie Mellon University working on bipedal locomotion using optimization based controllers. He participated in the Darpa Robotics Challenge as the control lead of Team WPI-CMU. Their Atlas was the only humanoid that scored most of the tasks with no falls. Siyuan is currently focusing on developing general and scalable approaches to create and refine robot behavior. Ben Burchfiel is manager and co-tech lead for the Large Behavior Model project at Toyota Research Institute where he leads a team working to create general-purpose robots via machine learning at scale. Before joining TRI, Ben Burchfiel was a Postdoc at Brown University and a PhD student at Duke University in the Intelligent Robot Lab. During his thesis work, Ben focused on 3D perception, learning from demonstration, and multimodal vision-language representations. Prior to that, Ben received his Bachelor of Science from UW-Madison, where he studied Computer Science. These days, Ben's research focuses on making general-purpose robots a reality by imbuing data-driven methods with soft inductive biases that make minimal assumptions about the structure of the world and relax gracefully with scale.
The Last Four Frames Is Not All You Need: Learning to Behave in a Partially Observable World
30 November 2023: Leslie Pack Kaelbling (MIT)
Abstract: All robotics problems are partially observable. In some problems, the partial observability is relatively superficial, enabling solution via planning and/or learning methods that assume full observability. The modern embodied intelligence research community spends most of its time looking under this fully-observable lamppost. But many problems of significant importance are significantly partially observable. We know that obtaining optimal (or even reasonably good) solutions to general partially observable Markov decision problems (POMDPs) is intractable or even undecidable. Is that a reason to ignore or give up on them? I'll argue that it's not. The fact that humans and other animals can learn and behave effectively in extremely partially observable domains argues that there is at least a subclass of general POMDPs that can be solved efficiently. I am interested in finding the structures and regularities in the real physical world that render many partially observable problems tractable. This talk will be a combination of tutorial and speculation, hoping to incite discussion, and no actual recent research results.
Biography: Leslie Pack Kaelbling is the Panasonic Professor of Computer Science and Engineering at Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, where she co-leads the Learning and Intelligent Systems (LIS) group. She is widely known for her research on Partially Observable Markov Decision Processes (POMDPs) in the context of artificial intelligence and robotics, as well as her work on reinforcement learning, belief-space planning, state estimation and integrated task and motion planning. Leslie has received the NSF Presidential Faculty Fellowship, the IJCAI Computers and Thought Award, as well as the Bose Award for Excellence in Teaching. She is also the founding editor-in-chief of the Journal of Machine Learning Research (JMLR), and a fellow of the Association for the Advancement of Artificial Intelligence (AAAI).
Evaluating and Detecting Long-form LLM-generated Text
16 November 2023: Mohit Iyyer (University of Massachusetts Amherst)
Abstract: Progress in NLP methodology over the last thirty years has been driven by benchmarks, from the Penn Treebank to GLUE. Benchmarks are useful because they provide a standard task, dataset, and means of evaluation that any researcher can use to quickly and easily demonstrate the value of their method. However, in the current age of LLMs, I argue that benchmarking is becoming increasingly obsolete. Beyond challenges such as data contamination, the dubious scientific validity of "prompt engineering", and usage of closed-source APIs, each of which is critical in its own right, there exist fundamental issues with how to formulate real-world tasks into benchmarks that can rank LLMs based on the much-desired "single score". I highlight these issues using some of my lab's recent work on tasks such as long-form question answering, book-length summarization, and literary translation. Next, I'll pivot to a different problem that plagues not only evaluation (e.g., via Mechanical Turkers using ChatGPT to complete tasks) but also society as a whole: the rapid proliferation of LLM-generated text. Detecting such text is not only important for combating malicious use cases such as academic plagiarism, but also to ensure that LLMs of the future are not just pretrained on text generated by their inferior predecessors. I outline several attacks against existing LLM-generated text detectors such as watermarking (e.g., paraphrasing, translation, cropping) and describe a retrieval-based approach that is more robust to these attacks but comes with issues of its own.
Biography: Mohit Iyyer is an associate professor in computer science at the University of Massachusetts Amherst, with a primary research interest in natural language processing. He is the recipient of best paper awards at NAACL (2016, 2018), an outstanding paper award at EACL 2023, and a best demo award at NeurIPS 2015, and he also received the 2022 Samsung AI Researcher of the Year award. He obtained his PhD in computer science from the University of Maryland, College Park in 2017 and spent the following year as a researcher at the Allen Institute for Artificial Intelligence.
Making robots see and manipulate
2 November 2023: Beomjoon Kim (KAIST)
Abstract: Even with the recent advances in robot AI, we still do not see robots in our lives. Why is this? I argue this is because robots are still missing the basic capabilities to see and manipulate a diverse set of objects. In this talk, I will introduce learning-based algorithms for manipulating diverse objects even in cases when grasping is not an option. I will demonstrate that estimation of the entire object shape is sufficient yet unnecessary, and introduce our contact-based object state representation that affords both prehensile and non-prehensile motions, generalizes across diverse object shapes, and enables the exploitation of object geometries when in need. The key enabler for this is large-scale GPU-based simulation that can efficiently generate big data within a short amount of time. Unfortunately, when it comes to non-convex objects, these simulators slow down significantly. I will introduce a neural network-based contact detector that, unlike classical contact detection algorithms, leverages parallel computation available in GPU. This enables us to generate data 10x faster than state-of-the-art GPU simulators in contact-rich situations.
Biography: Beomjoon is an Assistant Professor in Graduate School of Artificial Intelligence at KAIST (Korean Advanced Institute of Research and Technology). Beomjoon previously obtained his Ph.D. in computer science from MIT CSAIL with Leslie Kaelbling and Tomas Lazano-Perez. Beomjoon has won multiple awards such as the ICRA Best Cognitive Robotics Paper Award, and the McGill GREAT Award. His research focuses on creating intelligent mobile manipulation robots that efficiently perform manipulation tasks in diverse and unstructured environments.
Audio Large Language Models: From Sound Perception to Understanding
19 October 2023: Yuan Gong (MIT)
Abstract: Our cognitive abilities enable us not only to perceive and identify sounds but also to comprehend their implicit meaning. While significant advancements have been achieved in general audio event recognition in recent years, models trained with discrete sound label sets possess limited reasoning and understanding capabilities, e.g., the model may recognize the clock chime 6 times, but not know that it indicates a time of 6 o'clock. Can we build an AI model that has both audio perception and reasoning ability? In this talk, I will share our recent progress in audio large language model (LLM) development. Specifically, I will first introduce a novel GPT-assisted method to generate our large-scale open-ended audio question-answering dataset OpenAQA. I will then discuss the key design choices and the model architecture of our audio large language model. Finally, I will also discuss how to connect an automatic speech recognition model with an audio large language model for joint audio and speech understanding.
Biography: Yuan is a research scientist at the MIT Computer Science and Artificial Intelligence Lab (CSAIL). Before he joined MIT, he got my Ph.D. in computer science from the University of Notre Dame, supervised by Dr. Christian Poellabauer. During the 2019 Summer, he was an applied scientist intern working on clinical text mining in the AWS Comprehend Medical team, supervised by Mohammed Khalilia and Parminder Bhatia. Before coming to Notre Dame, he got my B.Sc. degree in Electrical Engineering (Biomedical Engineering Major) from Fudan University in 2015. My research advisors were Dr. Yuanyuan Wang (on ultrasound image denoising) and Dr. Yuedong Xu (on network science). His current research interest is computational speech and audio signal analysis, which includes the following topics: speech-based healthcare applications, audio-visual multi-modality learning, and general audio event recognition.
Large Language Models as Statisticians
17 October 2023: Jacob Steinhardt (UC Berkeley)
Abstract: Given their complex behavior, diverse skills, and wide range of deployment scenarios, understanding large language models---and especially their failure modes---is important. Given that new models are released every few months, often with brand new capabilities, how can we achieve understanding that keeps pace with modern practice? In this talk, I will present an approach to this that leverages the skills of language models themselves, and so scales up as models get better. Specifically, we leverage the skill of language models *as statisticians*. At inference time, language models can read and process significant amounts of information due to their large context windows, and use this to generate useful statistical hypotheses. We will showcase several systems built on this principle, which allow us to audit other models for failures, identify spurious cues in datasets, label the internal representations of models, and factorize corpora into human-interpetable concepts. This is joint work with many collaborators and students, including Ruiqi Zhong, Erik Jones, and Yossi Gandelsman.
Biography: Jacob is an Assistant Professor of Statistics at UC Berkeley, where he works on trustworthy and human-aligned machine learning. He received his PhD at Stanford University under Percy Liang and has previously worked at OpenAI and Open Philanthropy.
How could we build a generalist robot?
12 October 2023: Tomas Lozano-Perez (MIT)
Abstract: An enduring goal of AI and robotics has been to build a robot capable of robustly performing a wide variety of tasks in a wide variety of environments; not by sequentially being programmed (or taught) to perform one task in one environment at a time, but rather by intelligently choosing appropriate actions for whatever task and environment it is facing. This goal remains a challenge. In this talk I would like to engage in a discussion of how various approaches, both planning and learning, might contribute towards this goal.
Biography: Tomas Lozano-Perez is currently the School of Engineering Professor in Teaching Excellence at the Massachusetts Institute of Technology (MIT), USA, where he is a member of the Computer Science and Artificial Intelligence Laboratory. He has been Associate Director of the Artificial Intelligence Laboratory and Associate Head for Computer Science of MIT's Department of Electrical Engineering and Computer Science. He was a recipient of the 2021 IEEE Robotics and Automation Award, a 2011 IEEE Robotics Pioneer Award and a 1985 Presidential Young Investigator Award. He is a Fellow of the AAAI, ACM, and IEEE. His research has been in robotics (configuration-space a pproach to motion planning), computer vision (interpretation-tree approach to object recognition), machine learning (multiple-instance learning), medical imaging (computer-assisted surgery) and computational chemistry (drug activity prediction and protein structure determination from NMR and X-ray data). His current research is aimed at integrating task, motion and decision-theoretic planning for robotic manipulation.
Geometric Robot Learning for Generalizable Skills Acquisition
18 May 2023: Xiaolong Wang (UC San Diego)
Abstract: Robot learning has witnessed significant progress in terms of generalization in the past few years. At the heart of such a generalization, the advancement of representation learning, such as image and text foundation models plays an important role. While these achievements are encouraging, most tasks conducted are relatively simple. In this talk, I will talk about our recent efforts on learning generalizable skills focusing on tasks with complex physical contacts and geometric reasoning. Specifically, I will discuss our research on: (i) the use of a large number of low-cost, binary force sensors to enable Sim2Real manipulation even without visual input, (ii) the collection of large-scale robot physical interaction demonstrations for imitation learning using a simple and user-friendly visual teleoperation system, and (iii) large-scale 3D representation learning that generalizes Reinforcement Learning policies across diverse objects and scenes. I will also showcase the real-world applications of our research, including dexterous manipulation and legged locomotion control.
Biography: Xiaolong Wang is an Assistant Professor in the ECE department at the University of California, San Diego, affiliated with the TILOS NSF AI Institute. He received his Ph.D. in Robotics at Carnegie Mellon University. His postdoctoral training was at the University of California, Berkeley. His research focuses on the intersection between computer vision and robotics. His specific interest lies in learning 3D and dynamics representations from videos and physical robotic interaction data, These comprehensive representations are utilized to facilitate the learning of robot skills, with the goal of generalizing the robot to interact effectively with a wide range of objects and environments in the real physical world. He is the recipient of the NSF CAREER Award, and Research Awards from Sony, Amazon, and Adobe.
Building Trust in AI for Autonomous Vehicles
27 April 2023: Marco Pavone (Stanford University)
Abstract: AI models are ubiquitous in modern autonomy stacks, enabling tasks such as perception and prediction. However, providing safety assurances for such models represents a major challenge, due in part to their data-driven design and dynamic behavior. I will present recent results on building trust in AI models for autonomous vehicles, along three main directions: (1) data-driven traffic models for closed-loop simulation and safety assessment of autonomy stacks; (2) techniques to provide calibrated uncertainty estimates for AI models leveraging ideas from conformal prediction theory; and (3) tools to monitor AI components at run-time, with an emphasis on detecting semantic anomalies through the use of large language models. The discussion will be grounded in autonomous driving and aerospace robotics applications.
Biography: Dr. Marco Pavone is an Associate Professor of Aeronautics and Astronautics at Stanford University, where he directs the Autonomous Systems Laboratory and the Center for Automotive Research at Stanford. He also serves as Director of Autonomous Vehicle Research at NVIDIA. Before joining Stanford, he was a Research Technologist within the Robotics Section at the NASA Jet Propulsion Laboratory. He received a Ph.D. degree in Aeronautics and Astronautics from the Massachusetts Institute of Technology in 2010. His main research interests are in the development of methodologies for the analysis, design, and control of autonomous systems, with an emphasis on self-driving cars, autonomous aerospace vehicles, and future mobility systems. He is a recipient of a number of awards, including a Presidential Early Career Award for Scientists and Engineers from President Barack Obama, an Office of Naval Research Young Investigator Award, a National Science Foundation Early Career (CAREER) Award, a NASA Early Career Faculty Award, and an Early-Career Spotlight Award from the Robotics Science and Systems Foundation. He was identified by the American Society for Engineering Education (ASEE) as one of America's 20 most highly promising investigators under the age of 40.
Audio-visual learning in 3D environments
20 April 2023: Kristen Grauman (UT Austin)
Abstract: Perception systems that can both see and hear have great potential to unlock problems in video understanding, augmented reality, and embodied AI. I will present our recent work in egocentric audio-visual (AV) perception. First, we explore how audio’s spatial signals can augment visual understanding of 3D environments. This includes ideas for self-supervised feature learning from echoes, AV floorplan reconstruction, and active source separation, where an agent intelligently moves to hear things better in a busy environment. Throughout this line of work, we leverage our open-source SoundSpaces platform, which allows state-of-the-art rendering of highly realistic audio in real-world scanned environments. Next, building on these spatial AV and scene acoustics ideas, we introduce new ways to enhance the audio stream – making it possible to transport a sound to a new physical environment observed in a photo, or to dereverberate speech so it is intelligible for machine and human ears alike.
Biography: Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Director in FAIR at Meta. Her research in computer vision and machine learning focuses on visual recognition, video, and embodied perception. Before joining UT-Austin in 2007, she received her Ph.D. at MIT and B.A. from Boston College. She is an IEEE Fellow, AAAI Fellow, Sloan Fellow, and recipient of the 2013 Computers and Thought Award. She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award). She has served as Associate Editor-in-Chief for PAMI and Program Chair of CVPR 2015, NeurIPS 2018, and ICCV 2023.
Collaborative Robotics: From Dexterity to Teammate Prediction
6 April 2023: Monroe Kennedy (Stanford University)
Abstract: There have been amazing advances in the field of robotic autonomy over the past few decades, we’ve seen robots move from factory floors to working examples of autonomous vehicles on public roads. However, there is still a notable barrier to having robots in everyday environments performing functional tasks. These barriers are usually due to current robots having limited dexterity when it comes to manipulating objects that humans can handle with minimal difficulty, to barriers that exist when a robot is supposed to work alongside a human to complete a task or enable a human to perform a task better. In this talk, we will address the catalyst problem of robotic autonomy in human environments by addressing robotic dexterity and the necessity of improving the sense of robotic touch. We will then look at how robots can become effective teammates by modeling the behavior of their collaborators in the context of the task or environment, which enables them to predict future behavior and take collaborative actions.
Biography: Monroe Kennedy is an assistant professor in Mechanical Engineering and by courtesy, Computer Science at Stanford University. Monroe is the recipient of the NSF Faculty Early Career Award. He directs the Assistive Robotics and Manipulation Laboratory (ARMLab), where the focus is on developing collaborative, autonomous robots capable of performing dexterous, complex tasks with human and robotic teammates. Monroe received his Ph.D. in Mechanical Engineering and Applied Mechanics and master’s in Robotics from the University of Pennsylvania.
Polyglot Robots: Versatile Goal-Based Task Specification for Robot Learning
16 March 2023: Dinesh Jayaraman (University of Pennsylvania)
Abstract: An important goal of the field sensorimotor robot learning is to do away with cumbersome expertise-intensive task specification, so that general-purpose robots of the future might learn large numbers of new skills. In this talk, I will discuss our recent work on algorithms that exploit goals as a versatile and accessible task specification interface. Goals might be specified through images, language, or physical objects, and may either be provided by a layperson or even discovered autonomously by a robot exploring its environment. I will show how unsupervised learning from large human action datasets can train goal-conditioned value functions for robots, how learned verification behaviors can in turn help to evaluate and acquire new skills, and how careful model-based reasoning can help a robot discover interesting goal-based tasks in an environment with no supervision.
Biography: Dinesh Jayaraman is an assistant professor at the University of Pennsylvania's CIS department and GRASP lab. His research group focuses on developing machine learning algorithms that attend to task-relevant components in each stage of the embodied perception-action loop: observation, representation, and action. Dinesh's research has received an NSF CAREER award '23, Best Paper Award at CORL '22, AAAI New Faculty Highlights award '21, an Amazon Research Award '21, a Best Paper Runner-Up Award at ICRA '18, a Best Application Paper Award at ACCV '16, and been featured on the cover page of Science Robotics and in several press outlets.
Scaling Robot Learning for Long-Horizon Manipulation Tasks
2 March 2023: Jeannette Bohg (Stanford University)
Abstract: My long-term research goal is enable real robots to manipulate any kind of object such that they can perform many different tasks in a wide variety of application scenarios such as in our homes, in hospitals, warehouses, or factories. Many of these tasks will require long-horizon reasoning and sequencing of skills to achieve a goal state. While learning approaches promise generalization beyond what the robot has seen during training, they require large data collection - a challenge when operating on real robots and specifically for long-horizon tasks. In this talk, I will present our work on enabling long-horizon reasoning on real robots for a variety of different long-horizon tasks that can be solved by sequencing a large variety of composable skill primitives. We approach this problem from many different angles such as (i) using large-scale, language-annotated video datasets as a cheap data source for skill learning; (ii) sequencing these learned skill primitives to resolve geometric dependencies prevalent in long-horizon tasks; (iii) learning grounded predicates thereby enabling closed-loop, symbolic task planning.
Biography: Jeannette Bohg is an Assistant Professor of Computer Science at Stanford University. She was a group leader at the Autonomous Motion Department (AMD) of the MPI for Intelligent Systems until September 2017. Before joining AMD in January 2012, Jeannette Bohg was a PhD student at the Division of Robotics, Perception and Learning (RPL) at KTH in Stockholm. In her thesis, she proposed novel methods towards multi-modal scene understanding for robotic grasping. She also studied at Chalmers in Gothenburg and at the Technical University in Dresden where she received her Master in Art and Technology and her Diploma in Computer Science, respectively. Her research focuses on perception and learning for autonomous robotic manipulation and grasping. She is specifically interested in developing methods that are goal-directed, real-time and multi-modal such that they can provide meaningful feedback for execution and learning. Jeannette Bohg has received several Early Career and Best Paper awards, most notably the 2019 IEEE Robotics and Automation Society Early Career Award and the 2020 Robotics: Science and Systems Early Career Award.
Learning to Explain and Explaining to Learn
16 Feburary 2023: Graham Neubig (Carnegie Mellon University)
Abstract: Being able to explain language model predictions is essential for many different reasons: determining the trustworthiness of a system, identifying and rectifying model failures, or justifying predictions to stakeholders. Because of this, there have been an extremely large number of techniques proposed to explain complicated ML models, especially neural network based ones. However, how can we determine whether one explanation technique is better than the other? Most works do so by small-scale qualitative analysis, which is expensive, not reproducible, and subjective. In this talk I present a technique for evaluating model explanations based on how well the explanations allow another agent to mimic the model of interest's predictions. I further describe a learning algorithm that can be used to learn better explanations that maximize this evaluation metric.
Biography: Graham Neubig is an associate professor at the Language Technologies Institute of Carnegie Mellon University and CEO of Inspired Cognition. His research focuses on natural language processing, with a focus on multilingual NLP, natural language interfaces to computers, and machine learning methods for NLP system building and evaluation. His final goal is that every person in the world should able to communicate with each-other, and with computers in their own language. He also contributes to making NLP research more accessible through open publishing of research papers, advanced NLP course materials and video lectures, and open-source software, all of which are available on his web site.
Building Blocks of Generalizable Autonomy: Duality of Discovery and Bias
12 January 2023: Animesh Garg (Georgia Tech)
Abstract: Generalization in embodied intelligence, such as in robotics, requires interactive learning across families of tasks is essential for discovering efficient representation and inference mechanisms. Concurrent systems need a lot of hand-holding to even learn a single cognitive concept or a dexterous skill, say "open a door", let alone generalizing to new windows and cupboards! This is far from our vision of everyday robots! This would require a broader concept of generalization and continual update of representations. This study of the science of embodied AI opens three key questions: (a) Representational biases and Causal inference for interactive decision-making, (b) Perceptual representations learned by and for interaction, and (c) Systems and abstractions for scalable learning. This talk will focus on the notions of structure in Embodied AI for both perception and decision-making. uncovering the many facets of inductive biases in off-policy reinforcement learning in robotics. We will first talk about the need and kinds of structure for perception in robotics, thereafter, we will talk about the existence of structure in different aspects of decision-making with RL. Finally, I will propose a framework of generalization through the separation of the 'what' and 'how' of skills in embodied domains.
Biography: Animesh Garg is a Stephen Fleming Early Career Professor at the School of Interactive Computing at Georgia Tech. He leads the People, AI, and Robotics (PAIR) research group. He is on the core faculty in the Robotics and Machine Learning programs. Animesh is also a Senior Researcher at Nvidia Research. Animesh earned a Ph.D. from UC Berkeley and was a postdoc at the Stanford AI Lab. He is on leave from the department of Computer Science at the University of Toronto and the CIFAR Chair position at the Vector Institute. His work aims to build Generalizable Autonomy which involves a confluence of representations and algorithms for reinforcement learning, control, and perception. He currently studies three aspects: learning structured inductive biases in sequential decision-making, using data-driven causal discovery, and transfer to real robots — all in the purview of embodied systems.
The Data Pyramid for Building Generalist Agents
8 December 2022: Yuke Zhu (UT Austin)
Abstract: Recent advances in AI and Machine Learning have made great strides in developing robust and adaptive agents in the real world. Nonetheless, unlike the recent remarkable multi-task consolidations in Natural Language Processing and Computer Vision, today’s Embodied AI research has mainly focused on building siloed systems for narrow tasks. We argue that the crux of building generalist agents is harnessing massive, diverse, and multimodal data altogether. This talk will examine various sources of training data available for training embodied agents, from Internet-scale corpora to task demonstrations. We will discuss the complementary values and limitations of these data in a pyramid structure and introduce our recent efforts in building generalist agents with this data pyramid.
Biography: Yuke Zhu is an Assistant Professor in the Computer Science department of UT-Austin, where he directs the Robot Perception and Learning Lab. He is also a core faculty at Texas Robotics and a senior research scientist at NVIDIA. His research lies at the intersection of robotics, machine learning, and computer vision. He received his Master's and Ph.D. degrees from Stanford University. His research works have won several awards and nominations, including the Best Conference Paper Award in ICRA 2019, Outstanding Learning Paper at ICRA 2022, Outstanding Paper at NeurIPS 2022, and Best Paper Finalists in IROS 2019, 2021. He is the recipient of the NSF CAREER Award and the Amazon Reward Awards.
Understanding the Visual World Through Naturally Supervised Code
4 October 2022: Jiajun Wu (Stanford University)
Abstract: Much of our visual world is inherently symbolic: scenes are made of multiple identical objects; different objects may have the same color or material, with a regular layout; each object can be symmetric and have repetitive parts. How can we infer, represent, and use such symbolic structure from raw data, without hampering the expressiveness of neural networks? In this talk, I will demonstrate that symbolic structure, or code, can be learned from natural supervision. Such supervision can be from pixels, where neuro-symbolic methods automatically discover repetitive parts and objects for scene synthesis. It can also be from objects, where humans during fabrication introduce priors that can be leveraged by machines to infer regular intrinsics such as texture and material. When solving these problems, symbolic programs and neural nets play complementary roles: it is more data-efficient to learn with symbolic programs, and they generalize better to new scenarios with robustly captured high-level information; neural nets effectively extract complex, low-level features from cluttered and noisy visual data.
Biography: Jiajun Wu is an Assistant Professor of Computer Science at Stanford University, working on computer vision, machine learning, and computational cognitive science. Before joining Stanford, he was a Visiting Faculty Researcher at Google Research. He received his PhD in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology. Wu's research has been recognized through the ACM Doctoral Dissertation Award Honorable Mention, the AAAI/ACM SIGAI Doctoral Dissertation Award, the MIT George M. Sprowls PhD Thesis Award in Artificial Intelligence and Decision-Making, the 2020 Samsung AI Researcher of the Year, the IROS Best Paper Award on Cognitive Robotics, and faculty research awards and graduate fellowships from JPMC, Samsung, Amazon, Meta, Nvidia, and Adobe.
From Physical Reasoning to Image-based Reactive Manipulation
29 September 2022: Marc Toussaint (TU Berlin)
Abstract: Our work on Task and Motion Planning (TAMP) in the last years provided some insights on how complex manipulation problems could in principle be tackled. However, it also raises classical questions: 1) How to reactively control and execute robotic manipulation? 2) How to realize physical reasoning and manipulation planning based on perception rather than a known mesh-based scene? 3) How to leverage learning from a model-based solver? I will discuss our team's recent research on these questions.
Biography: Marc Toussaint is professor for Intelligent Systems at TU Berlin since March 2020 and was Max Planck Fellow at the MPI for Intelligent Systems 2018-21. In 2017/18 he spend a year as visiting scholar at MIT, before that some months with Amazon Robotics, and was professor for Machine Learning and Robotics at the University of Stuttgart since 2012. In his view, a key in understanding and creating strongly generalizing intelligence is the integration of learning and reasoning. His research combines AI planning, optimization, and machine learning to tackle fundamental problems in robotics. His work was awarded best paper at RSS'18 and ICMLA'07, and runner up at RSS'12 and UAI'08.
Multi-Sensor Robot Navigation and Subterranean Exploration
2 June 2022: Maurice Falcon (University of Oxford)
Abstract: In this talk I will overview the work of my research group, Dynamic Robot Systems Group. I will focus on multi-sensor state estimation and 3D mapping to enable robots to navigate and explore dirty, dark and dusky environments - with an emphasis on underground exploration with quadrupeds. This multitude of sensor signals need to be fused efficiently and in real-time to enable autonomy. Much of the work will be presented in the context of the DARPA SubT Challenge (Team Cerberus) and the THING EU project. I will also describe our work on trajectory optimization for dynamic motion planning and the use of learning to bootstrap replanning.
Biography: Maurice Fallon is an Associate Professor and Royal Society University Research Fellow at University of Oxford. His research is focused on probabilistic methods for localization and mapping. He has also made research contributions to state estimation for legged robots and is interested in dynamic motion planning and control. His PhD was from the University of Cambridge. He worked as a PostDoc in Prof. John Leonard's Marine Robotics Group in MIT from 2008 before leading the perception part of MIT's entry in the DARPA Robotics Challenge. He has worked in domains as diverse as marine robots detecting mines, humanoid robotics and mapping radiation in nuclear facilities.
Building Embodied Autonomous Agents
5 May 2022: Ruslan Salakhutdinov (CMU)
Abstract: In this talk I will show how we can design modular agents for visual navigation that can perform tasks specified by natural language instructions, perform efficient exploration and long-term planning, build and utilize 3D semantic maps, while generalizing across domains and tasks. Specifically, I will first introduce a novel framework called Self-supervised Embodied Active Learning (SEAL) that builds and utilizes 3D semantic maps to learn both action and perception in a self-supervised manner. I will show that the SEAL framework can be used to close the action-perception loop: it improves object detection and instance segmentation performance of a pretrained perception model by moving around in training environments, while the improved perception model can be used to improve on object goal navigation tasks. I will next introduce a novel embodied instruction following method that uses structured representations to build a semantic map of the scene and perform exploration with a semantic search policy to achieve the natural language goal, achieving SOTA performance on a challenging ALFRED environment. I will show that an explicit spatial memory and a semantic search policy can provide a stronger and more general representation for state-tracking and guidance, even in the absence of expert trajectories or low-level instructions.
Biography: Russ Salakhutdinov is a UPMC Professor of Computer Science in the Department of Machine Learning at CMU. He received his PhD in computer science from the University of Toronto. After spending two post-doctoral years at MIT, he joined the University of Toronto and later moved to CMU. Russ's primary interests lie in deep learning, machine learning, and large-scale optimization. He is an action editor of the Journal of Machine Learning Research, served on the senior programme committee of several top-tier learning conferences including NIPS and ICML, and was a program co-chair for ICML 2019. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Google Faculty Award, and Nvidia's Pioneers of AI award.
Large Language Models: Will they keep getting bigger? And, how will use them if they do?
28 April 2022: Luke Zettlemoyer (UW)
Abstract: The trend of building ever larger language models has dominated much research in NLP over the last few years. In this talk, I will discuss our recent efforts to (at least partially) answer two key questions in this area: Will we be able to keep scaling? And, how will we actually use the models, if we do? I will cover our recent efforts on learning new types of sparse mixtures of experts (MoEs) models. Unlike model-parallel algorithms for learning dense models, which are very difficult to further scale with existing hardware, our sparse approaches have significantly reduced cross-node communication costs and could possibly provide the next big leap in performance, although finding a version that scales well in practice remains an open challenge. I will also present our recent work on prompting language models that better controls for surface form variation, to improve performance of models that are so big we can only afford to do inference, with little to no task-specific fine tuning. Finally, time permitting, I will discuss work on new forms of supervision for language model training, including learning from the hypertext and multi-modal structure of web pages to provide new signals for both learning and prompting the model. Together, these methods present our best guesses for how to keep the scaling trend alive as we move forward to the next generation of NLP models. This talk describes work done at the University of Washington and Meta, primarily led by Armen Aghajanyan, Ari Holtzmann, Mike Lewis, Sewon Min, and Peter West.
Biography: Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington, and a Research Scientist at Meta. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. His honors include being named an ACL Fellow as well as winning a PECASE award, an Allen Distinguished Investigator award, and multiple best paper awards. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh.
Tuning GPT-3 on a Single GPU via Zero-Shot Hyperparameter Transfer
21 April 2022: Greg Yang (Microsoft Research)
Abstract: You can’t train GPT-3 on a single GPU, much less tune its hyperparameters (HPs)…or so it seems. I’m here to tell you this is not true: you *can* tune its HPs on a single GPU even if you can’t train it that way! In the first half of this talk, I’ll describe how, in the so-called maximal update parametrization (abbreviated µP), narrow and wide neural networks share the same set of optimal HPs. This lets us tune any large model by just tuning a small version of it — we call this µTransfer. In particular, this allowed us to tune the 6.7 billion parameter version of GPT-3 using only 7% of its pretraining compute budget, and, with some asterisks, we get a performance comparable to the original GPT-3 model with twice the parameter count. In the second half of this talk, I’ll discuss the theoretical reason µP has this special property and the connection to the study of infinite-width neural networks and, more generally, the theory of Tensor Programs. The first half will target general practitioners or empirical researchers in machine learning, while the second half targets those who are more theoretically curious.
Biography: Greg Yang is a researcher at Microsoft Research in Redmond, Washington. He joined MSR after he obtained Bachelor's in Mathematics and Master's degrees in Computer Science from Harvard University, respectively advised by ST Yau and Alexander Rush. He won the Hoopes prize at Harvard for best undergraduate thesis as well as Honorable Mention for the AMS-MAA-SIAM Morgan Prize, the highest honor in the world for an undergraduate in mathematics. He gave an invited talk at the International Congress of Chinese Mathematicians 2019.
Learning Controllers - from engineering to AGI
14 April 2022: Martin Riedmiller (DeepMind)
Note:: (*Thursday at 10am on Zoom)
Abstract: Being able to autonomously learn control with minimal prior knowledge is a key ability of intelligent systems. Based on our recent work on magnetic confinement control of fusion plasma I will show how deep RL methods can serve as an alternative to classical controller design for dynamical systems. A particular challenge in real world control scenarios are methods that are at the same time highly data-efficient and general, since data-collection on real systems is time intensive and often expensive. I will discuss the collect and infer paradigm for Reinforcement Learning, that takes a fresh look at data collection and exploitation in data-efficient agents. I will give examples of agent designs that can learn increasingly complex tasks from scratch in simulation and reality.
Biography: Martin Riedmiller is a research scientist and team-lead at DeepMind, London. Before joining DeepMind fulltime in spring 2015, he held several professor positions in machine learning and neuro-informatics from 2002 to 2015 at Dortmund, Osnabrück and Freiburg University. From 1998 to 2009 he lead the robot soccer team ‘Brainstormers’ that participated in the internationally renowned RoboCup competitions. As an early proof of the power of neural reinforcement learning techniques, the Brainstormers won the world championships for five times in both simulation and real robot leagues. He has contributed over 20 years in the fields of reinforcement learning, neural networks and learning control systems. He is author and co-author of some early and ground-lying work on efficient and robust supervised learning and reinforcement learning algorithms, including work on one of the first deep reinforcement learning systems.
Recent Advances in Deep Equilibrium Models
7 April 2022: Zico Kolter (CMU)
Abstract: Deep Equilibrium Models (DEQs) are a class of machine learning models that computes the fixed point of a single nonlinear operator in lieu of a traditional multi-layer network. The resulting models are conceptually simpler than traditional models and benefit from smaller memory requirements due to implicit differentiation, yet often outperform similarly sized feedforward networks. This talk will provide a broad overview of our work in DEQ models, with a focus on some of our recent advances on the topic. I will cover the basic foundations of the methods, approaches to multi-scale modeling. I will then discuss several recent advances in training methods or solvers, plus applications of the methods to domains such as input-optimization in deep networks, implicit layers within neural fields, and methods for improved optical flow.
Biography: Zico Kolter is an Associate Professor in the Computer Science Department at Carnegie Mellon University, and also serves as chief scientist of AI research for the Bosch Center for Artificial Intelligence. His work spans the intersection of machine learning and optimization, with a large focus on developing more robust and rigorous methods in deep learning. In addition, he has worked in a number of application areas, highlighted by work on sustainability and smart energy systems. He is a recipient of the DARPA Young Faculty Award, a Sloan Fellowship, and best paper awards at NeurIPS, ICML (honorable mention), AISTATS (Test of Time), IJCAI, KDD, and PESGM.
The Deep Learning Toolbox: from AlphaFold to AlphaCode
28 March 2022: Oriol Vinayls (DeepMind)
Note:: (*Monday at 4pm in 34-101)
Abstract: The capabilities of the 'Deep Learning Toolbox' have expanded through research advances in many AI sub-fields such as Computer Vision or Natural Language Processing. In this talk I will share how this fuelled two recent breakthroughs: AlphaFold, solving the long-standing problem of protein folding in biology, and AlphaCode, creating an agent that is human-level in online coding competitions.
Biography: Oriol Vinyals is a Principal Scientist at DeepMind, and a team lead of the Deep Learning group. His work focuses on Deep Learning and Artificial Intelligence. Prior to joining DeepMind, Oriol was part of the Google Brain team. He holds a Ph.D. in EECS from the University of California, Berkeley and is a recipient of the 2016 MIT TR35 innovator award.
The Past, Present and Future of SLAM
10 March 2022: John Leonard (MIT)
Abstract: Simultaneous localization and mapping (SLAM) is the process of constructing a global model from local observations, acquired as a mobile robot moves through an environment. SLAM is a foundational capability for mobile robots, supporting such core functions as planning, navigation, and control, for a wide range of application domains. SLAM is one of the most deeply investigated fields in mobile robotics research, yet many open questions remain to enable the realization of robust, long-term autonomy. This talk will review the historical development of SLAM and will describe several current research projects in our group. Two key themes are increasing the expressive capacity of the environmental models used in SLAM systems (representation) and improving the performance of the algorithms used to estimate these models from data (inference). Our ultimate goal is to provide autonomous robots with a more comprehensive understanding of the world, facilitating life-long learning in complex dynamic environments.
Biography: John J. Leonard is Samuel C. Collins Professor of Mechanical and Ocean Engineering in the MIT Department of Mechanical Engineering. He is also a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research addresses the problems of navigation and mapping for autonomous mobile robots. He holds the degrees of B.S.E.E. in Electrical Engineering and Science from the University of Pennsylvania (1987) and D.Phil. in Engineering Science from the University of Oxford (1994). He is an IEEE Fellow (2014) and an AAAS Fellow (2020). Prof. Leonard is also a Technical Advisor at Toyota Research Institute
From First-Person Video to Agent Action
2 December 2021: Kristen Graumann (UT Austin / FAIR)
Abstract: First-person or "egocentric" perception requires understanding the video that streams to a person’s or robot’s wearable camera. The egocentric view offers a special window into the camera wearer’s attention, goals, and interactions with people and objects in the environment, making it an exciting avenue for perception in augmented reality and robot learning. I will present our work on first-person video understanding, and show our progress using passive observations of human activity to inform active robot behaviors. First, we explore learning visual affordances to anticipate how objects and spaces can be used. We show how to transform egocentric video into a human-centric topological map of a physical space (such as a kitchen) that captures its primary zones of interaction and the activities they support. Moving down to the object level, we develop video anticipation models that localize interaction “hotspots” indicating how/where an object can be manipulated (e.g., pressable, toggleable, etc.). Towards translating these affordances into robot action, we prime reinforcement learning agents to prefer human-like interactions, thereby accelerating their task learning. Finally, I will overview Ego4D, a massive new egocentric video dataset and benchmark built by a multi-institution collaboration.
Biography: Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Director in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on visual recognition, video, and embodied perception. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an IEEE Fellow, AAAI Fellow, Sloan Fellow, and recipient of the 2013 Computers and Thought Award. She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award). She has served as Associate Editor-in-Chief for PAMI and Program Chair of CVPR 2015 and NeurIPS 2018.
Ask Me Anything
18 November 2021: Leslie Pack Kaelbling (MIT)
Abstract: Ask Me Anything with Leslie Pack Kaelbling, professor at MIT CSAIL working in ML and planning for robotics.
Biography: Leslie is a Professor at MIT. She has an undergraduate degree in Philosophy and a PhD in Computer Science from Stanford, and was previously on the faculty at Brown University. She was the founding editor-in-chief of the Journal of Machine Learning Research. Her goal is to make robots that are as smart as you are.
Data Augmentation for Image-Based Reinforcement Learning
10 November 2021: Rob Fergus (NYU)
Note:: (*Wednesday at 4pm in Star*)
Abstract: My talk consists of two parts. The first presents a model-free reinforcement learning (RL) algorithm for visual continuous control that relies on data augmentation to learn directly from pixels. The approach yields state-of-the-art results on the DeepMind Control Suite and is able to solve complex humanoid locomotion, previously unattained by model-free RL methods. The approach is conceptually simple and only requires ~8 hours to train on a single GPU. The second part of the talk describes a self-supervised framework that ties representation learning with exploration through prototypical representations for model-free RL. These prototypes simultaneously serve as a summarization of the exploratory experience of an agent as well as a basis for representing observations. We pre-train these task-agnostic representations and prototypes on environments without downstream task information. This enables state-of-the-art downstream policy learning on a set of difficult continuous control tasks.
Biography: Rob Fergus is a Professor of Computer Science at the Courant Institute of Mathematical Sciences, New York University. He is also a Research Scientist at DeepMind New York. Previously, he co-founded Facebook AI Research. He received a Masters in Electrical Engineering with Prof. Pietro Perona at Caltech, before completing a PhD with Prof. Andrew Zisserman at the University of Oxford. Before joining NYU, he spent two years as a postdoc in the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. William Freeman.
Learning for Future Mobility: Uncovering Requirements and Addressing Scalability
28 October 2021: Cathy Wu (MIT)
Abstract: Technological and global trends are increasing the pace of change in mobility and infrastructure systems. At the same time, infrastructure decisions are long-lasting and often effectively permanent. Machine learning is emerging as a promising methodology to aid in the design of future infrastructure systems, due to its potential to analyze complex multi-agent dynamical systems. This talk first presents recent work on the use of deep reinforcement learning to uncover prospective engineering requirements for future mobility systems, particularly in the context of human-compatible Lagrangian traffic flow optimization. Then, several recent results are presented, which demonstrate the potential for machine learning to scale up the analysis of mobility systems, namely for multi-agent systems in grid networks and vehicle routing problems.
Biography: Cathy Wu is an Assistant Professor at MIT in LIDS, CEE, and IDSS. She holds a PhD from UC Berkeley, and B.S. and M.Eng from MIT, all in EECS, and completed a Postdoc at Microsoft Research. She studies the role of machine learning and computation in the design of future mobility systems. Her interests include reinforcement learning, autonomy, multi-agent dynamical systems, and network optimization. Her work has been acknowledged by several awards, including the 2019 IEEE ITSS Best Ph.D. Dissertation Award, 2019 Microsoft Location Summit Hall of Fame, 2018 Milton Pikarsky Memorial Dissertation Award, the 2016 IEEE ITSC Best Paper Award, and numerous fellowships, and appeared in the press, including Wired and Science Magazine.
JAX: accelerated machine learning research via composable function transformations in Python
21 October 2021: Matthew Johnson (Google)
Abstract: This talk is about JAX, a system for high-performance machine learning research and numerical computing. It offers the familiarity of Python+NumPy together with hardware acceleration. JAX combines these features with user-wielded function transformations, including automatic differentiation, automatic vectorized batching, end-to-end compilation (via XLA), parallelizing over multiple accelerators, and more. Composing these transformations is the key to JAX's power and simplicity.
Biography: Matt Johnson is a research scientist at Google Brain interested in software systems powering machine learning research. He's the tech lead for JAX. When moonlighting as a machine learning researcher, he works on making neural ODEs faster to solve, automatically exploiting conjugacy in probabilistic programs, and composing graphical models with neural networks. Matt was a postdoc with Ryan Adams at the Harvard Intelligent Probabilistic Systems Group and Bob Datta in the Datta Lab at the Harvard Medical School. His Ph.D. is from MIT in EECS, where he worked with Alan Willsky at LIDS on Bayesian time series models and scalable inference.
Recent papers in Embodied Intelligence III
14 October 2021: EI Submissions (MIT)
Abstract: Presentations on the following recent papers in EI: "A large-scale benchmark for few-shot program induction and synthesis", "Noether networks: meta-learning useful conserved quantities", "PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception", "Learning Evolved Combinatorial Symbols with a Neuro-symbolic Generative Model", "AST: Audio Spectrogram Transformer", "Set-based State Estimation with Probabilistic Consistency Guarantee under Epistemic Uncertainty", "PARP for Self-Supervised Speech Recognition and Speech Synthesis", "AutoOED: Automated Optimal Experiment Design Platform", "An End-to-End Differentiable Framework For Contact-Aware Robot Design", "What Context Features Can Transformer Language Models Use?", "Temporal and Object Quantification Networks", "Plannable Skills from Rational Demonstrations".
Biography: Presentations by: Javier Lopez-Contreras, Dylan Doblar, Aviv Netanyahu, Matthias Hofer, Yuan Gong, Shen Li, Jeff Lai, Yunsheng Tian, Jie Xu, Joe O'Connor, Zhezheng Luo
Enabling Cooperative Deep-Sea Exploration through Hybrid Task and Motion Planning
7 October 2021: Brian Williams (MIT)
Abstract: The state-of-the-art practice in robotic mission planning is to script fixed behavior sequences manually, which are comprised of individual behaviors that are hand coded or generated by solving tailored trajectory optimization problems. This paradigm is insufficient, for example, to autonomously explore a deep-sea volcano on Earth or Europa. Instead, to execute a sequence efficiently and robustly, the trajectories a robot plans should make tradeoffs across behaviors, such as timing, resource usage and pre-positioning, rather than implementing trajectories for each behavior myopically. Further, in order for robots to adapt to truly novel situations, they need the ability to plan new behavior sequences on the fly, and then generate their corresponding trajectories. This process is called hybrid activity planning. Finally, in cases where the conditions and effects of individual behaviors are tightly coupled across temporal, state and control variables, activity planning and trajectory planning should be considered together. To be effective, hybrid planners should carefully balance the tradeoff of modeling robot dynamics accurately, against the need to plan for multiple robots across time horizons on the order of days and weeks. Hybrid activity and trajectory planners that employ mixed integer programming within a discrete time formulation are able to model dynamics with sufficient accuracy; however, they tend to be restricted to short horizons. On the other hand, current hybrid activity planners handle longer horizons by employing a continuous time formulation, but do not support trajectory optimization, and are limited to linear constraints. In this lecture we present Scotty, a hybrid activity and trajectory planner that is able to plan over long horizons accurately, by unifying methods from convex optimization with methods for heuristic forward search based on relaxed plan graphs. We demonstrate the effectiveness of Scotty on scenarios drawn from our November 2019 campaign to explore the Columbo deep sea volcano in the Mediterranean.
Biography: Professor Brian Williams received his S.B., S.M. and Ph.D. from MIT in Computer Science and Electrical Engineering in 1989. He helped pioneer the fields of model-based autonomous systems and diagnosis, at NASA and Xerox PARC through the Sherlock and Livingstone model-based health management systems and the Burton model-based execution system. At the NASA Ames Research Center he formed the Autonomous Systems Branch and co-invented the Remote Agent model-based autonomous control system, receiving a NASA Space Act Award in 1999. He was a member of the NASA Deep Space One probe flight team, which used Remote Agent to create the first fully autonomous, self-repairing explorer, demonstrated in flight in 1999. Recent research focusses on enabling a new paradigm for autonomously monitoring the health of the ocean ecosystem, with recent and planned demonstration campaigns conducted in the East Timor Sea, off Santa Barbara, Hawaii, Costa Rica, in the Mediterranean, off Alaska and at the Great Barrier Reef. He is currently a research affiliate of the Woods Hole Oceanographic Institute, and Chief Scientist of Mobi, which applies similar technologies to transportation fleet management and the hospitality industries. He has received a range of best paper prizes, for his research in human-robotic teamwork, planning with bounded risk, hybrid model learning, constraint reasoning, propositional inference, and qualitative algebras. He was a member of the Tom Young Blue-Ribbon Team in 2000, assessing future Mars missions in light of the Mars Climate Orbiter and Polar Lander incidents, and subsequently the JPL Caltech Advisory Council. He recently served as President for the Council of the International Conference on Automated Planning and Scheduling, as Counselor for AAAI, and on several editorial boards.
Recent papers in Embodied Intelligence I
30 September 2021: EI Submissions (MIT)
Abstract: Presentations on the following recent papers in EI: "Axiomatic Explanations for Visual Search, Retrieval, and Similarity Learning", "HYPER: Learned Hybrid Trajectory Prediction via Factored Inference and Adaptive Sampling", "Closed-form Continuous-depth Models and Sparse Flows: Pruning Continuous-Depth Models", "Leveraging Language to Learn Program Abstractions and Search Heuristics", "NeRFactor: Neural Factorization of Shape and Reflectance Under an Unknown Illumination", "Planning with Latent Language", "Shortest Paths in Graphs of Convex Sets", "Learning to See by Looking at Noise", "Modelling human planning, and attributing intelligence to plans", "A simple method for complex in-hand manipulation".
Biography: Presentations by: Mark Hamilton, Xin Huang, Ramin Hasani, Lucas Liebenwein, Catherine Wong, Xiuming Zhang, Pratyusha Sharma, Tobia Marcucci, Manel Baradad, Marta Kryven, Tao Chen
Recent papers in Embodied Intelligence I
23 September 2021: EI Submissions (MIT)
Abstract: Presentations on the following recent papers in EI: "Interactive Robot Training for non-Markov Tasks", "Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances", "Learning Task Informed Abstractions", "The Low-Dimensional Linear Geometry of Contextualized Word Representations", "Hybrid Memoised Wake-Sleep: Approximate Inference at the Discrete-Continuous Interface", "Advances in Inference and Representation for SLAM", "Causal Navigation by Continuous Time Neural Networks", "Cooperative Learning of Zero-Shot Machine Reading Comprehension, "Consensus-Informed Optimization Over Mixtures for Ambiguity-Aware Object SLAM", "The Neural MMO Platform for Massively Multiagent Research", "Implicit Representations of Meaning in Neural Language Models".
Biography: Presentations by: Ankit Shah, Xiaolin Fang, Aidan Curtis, Xiang Fu, Ge Yang, Evan Hernandez, Tuan Anh Le, Kevin Doherty, Ramin Hasani, Hongyin Luo, Ziqi Lu, Joseph Suarez, Belinda Li
Generalization in Planning and Learning for Robotic Manipulation
17 September 2021: Tomás Lozano-Pérez (MIT)
Note:: (*Friday* 3 pm in 34-401 or webcast)
Abstract: An enduring goal of AI and robotics has been to build a robot capable of robustly performing a wide variety of tasks in a wide variety of environments; not by sequentially being programmed (or taught) to perform one task in one environment at a time, but rather by intelligently choosing appropriate actions for whatever task and environment it is facing. This goal remains a challenge. In this talk I'll describe recent work in our lab aimed at the goal of general-purpose robot manipulation by integrating task-and-motion planning with various forms of model learning. In particular, I’ll describe approaches to manipulating objects without prior shape models, to acquiring composable sensorimotor skills, and to exploiting past experience for more efficient planning.
Biography: Tomas Lozano-Perez is professor in EECS at MIT, and a member of CSAIL. He has all his degrees (SB '73, SM '76, PhD '80) from MIT in Computer Science. He was a recipient of the 2011 IEEE Robotics Pioneer Award and a co-recipient of the 2021 IEEE Robotics and Automation Technical Field Award. He is a Fellow of the AAAI, ACM, and IEEE.
Rapid Adaptation in Robot Learning
21 July 2021: Deepak Pathak (CMU)
Abstract: How can we train a robot that can generalize to perform thousands of tasks in thousands of environments? This question underscores the holy grail of robot learning research dominated by learning from demonstrations or reward-based learning. However, it is nearly impossible to supervise an agent whether via demonstrations or rewards for all possible situations it is yet to encounter in the future. Hence, we posit that this generalization ability is only possible if the robot can learn continually and adapt rapidly to new situations. The adaptation has to occur online, at a time scale of fractions of a second, which implies that we have no time to carry out multiple experiments in the physical world and optimizing to estimate various system parameters. In this talk, I will present our early efforts in this direction by decoupling the general goal into two sub-problems: 1) generalization to new environments for the same task and 2) generalization to new tasks in the same environment. I will then discuss how these sub-problems can be combined to build a framework for general-purpose embodied intelligence. The talk will include results from case studies of real-world robots including robots walking in unseen diverse terrains in the real world, generalizing to a range of unseen diverse manipulation tasks in a zero-shot manner, and perform dynamic manipulation tasks like writing digits on white-board, scooping, etc.
Biography: Deepak Pathak is a faculty in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. from UC Berkeley and his research spans computer vision, machine learning, and robotics. He is a recipient of the faculty awards from Google, Sony, GoodAI, and graduate fellowship awards from Facebook, NVIDIA, Snapchat. His research has been featured in popular press outlets, including The Economist, The Wall Street Journal, Quanta Magazine, Washington Post, Wired, and MIT Technology Review. Deepak received his Bachelor's from IIT Kanpur with a Gold Medal in Computer Science. He co-founded VisageMap Inc. later acquired by FaceFirst Inc.
Unfold the Unseen: Deformable Cloth Perception and Manipulation
31 March 2021: Shuran Song (Columbia)
Abstract: Deformable objects are common in our everyday environments, yet they possess a set of unique properties that make them incredibly difficult for machines to perceive and interact with -- they have near-infinite degrees of freedom and are subject to extreme cases of self-occlusion, especially when folded or crumpled. To simplify the problem, prior work on cloth perception often build on simplifying assumptions such as full visibility, known instance-level meshes, or full state information in the initial observation. In this talk, I will discuss perception and manipulation algorithms that are able to generalize beyond these conditions and are able to (1) infer the full configuration of unseen garments including both observed and unobserved surfaces using category-level shape priors (GarmentsNet), and (2) efficiently unfold an unseen cloth under arbitrary configurations using dynamic manipulations (FlingBot). Finally, I will discuss a few open research directions in the area of deformable cloth perception and manipulation.
Biography: Shuran Song is an assistant professor in the Department of Computer Science at Columbia University. Before that, she received her Ph.D. in Computer Science at Princeton University, BEng. at HKUST in 2013. Her research interests lie at the intersection of computer vision and robotics. She received the RSS best system paper in 2019, the Best Manipulation System Paper Award from Amazon in 2018, and has been finalist for best paper awards at conferences ICRA'20, CVPR'19, and IROS'18.
Good tactile sensing lets robots do cool things.
17 March 2021: Ted Adelson (MIT)
Abstract: We are building soft sensitive robot fingers using a camera-based tactile sensing technology called GelSight. These fingers have superhuman resolution and provide accurate measurements of the contact region between the skin and the world. If you know the 3D geometry of this contact region, you can infer many important things, such as object pose, object shape, force, slip, hardness, and roughness. You can do system identification on objects, inferring their inertial and frictional properties from the way they feel in the hand. If the sensing is fast enough, you can use it in reactive control, enabling dexterous skills such as cable manipulation.
Biography: Edward ("Ted") Adelson is the John and Dorothy Professor of Vision Science at MIT, in the Department of Brain and Cognitive Sciences, and the Computer Science and Artificial Intelligence Laboratory (CSAIL). For much of his career he has worked on human vision and computer vision, on topics such as multiscale image representation, motion perception, material perception, and plenoptic imaging. In recent years he has been working on artificial tactile sensing, using an optically based touch system known as GelSight, exploring how high resolution touch can advance robotic manipulation. Professor Adelson is a member of the National Academy of Sciences, a Life Fellow of the IEEE, and the recipient of the Nakayama Medal in Vision Science.
Anticipating and Avoiding Failures Using Introspective Perception and Physics-Informed Program Synthesis
24 Feburary 2021: Joydeep Biswas (UT Austin)
Abstract: Robots deployed over extended periods of time in unstructured human environments will inevitably make mistakes in perception and planning due to violations of algorithmic assumptions, unmodelled phenomena, and environment-specific constraints. In this talk, I will present our recent work on two fronts to anticipate and overcome such errors: introspective perception to learn to predict and overcome perception errors, and physics-informed program synthesis of action selection policies to enable environment- and user-specific customizability. Introspective perception leverages either an occasionally available supervisory sensor or post-hoc analysis to autonomously train competency-prediction models of black-box perception algorithms. We present results of introspective perception applied to vision-based obstacle avoidance, showing accurate predictions of algorithm-specific errors. We further use introspective perception to learn context-specific noise models for simultaneous localization and mapping, resulting in significant improvements to accuracy and robustness in visually challenging real-world settings. To adapt the behavior of robots based on user preferences, or to handle unforeseen circumstances, we synthesize and repair programmatic action selection policies (ASPs) in a human-interpretable domain-specific language (DSL), leveraging three key insights: 1) robot ASPs can be represented in a layered structure where each layer consists of increasingly complex subsets of the DSL grammar; 2) physics-informed constraints can be expressed in a type system to prune the search space of programs; and 3) using partial evaluation and human corrections, parameters of such ASPs can be repaired by expressing the corrections to the ASP as satisfiability modulo theory (SMT) formulas. We demonstrate how physics-informed program synthesis can generate ASPs that rival expert-written reference ASPs, require orders of magnitude fewer demonstrations to synthesize than using deep learning, and are repairable when transferred between domains, such as from simulation to real robots.
Biography: Joydeep Biswas is an assistant professor in the department of computer science at the University of Texas at Austin. He earned his B.Tech in Engineering Physics from the Indian Institute of Technology Bombay in 2008, and M.S. and PhD in Robotics from Carnegie Mellon University in 2010 and 2014 respectively. From 2015 to 2019, he was assistant professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst. His research spans perception and planning for long-term autonomy, with the ultimate goal of having service mobile robots deployed in human environments for years at a time, without the need for expert corrections or supervision. Prof. Biswas' research is supported by funding from NSF, DARPA, Army Futures Command Robotics Center of Excellence, an Amazon Research Award, a JP Morgan Faculty Research Award, and by Northrop Grumman Mission Systems.
Learning for Human-Robot Collaboration
10 Feburary 2021: Claudia Perez D'Arpino (Stanford)
Abstract: Robots are becoming capable of learning from and interacting with humans, at the same time that simulation environments for robot learning are becoming astonishingly realistic. This confluence of factors open new possibilities for robot learning of collaborative tasks and human-robot interaction, including robotic manipulation and social navigation. In this talk, I’ll cover different stages of the interaction cycle: from learning using human examples, to executing the learned task in collaboration, and self-discovery of interaction strategies in simulation. I’ll describe a method for learning multi-step constrained manipulation tasks from few human examples, and the benefits it brings to human-robot teaming for dexterous remote manipulation. In search for better generalization to unseen environments, I’ll discuss robot learning in simulation from simple to increasingly realistic environments. First, I’ll describe a method to find novel manipulation strategies with partial knowledge of the physical parameters of the objects. Second, I’ll cover advances in social robot navigation with deep reinforcement learning guided by motion planning, with a compositional approach that leverages simplified layouts and interaction patterns. Finally, I’ll briefly describe my on-going work on learning collaborative policies for human-robot interaction, as well as opportunities for leveraging large-scale simulation of embodied social agents for studying the computational building blocks of intelligence.
Biography: Claudia Perez D’Arpino is a Postdoctoral Scholar in Computer Science at Stanford University, advised by Prof. Silvio Savarese. She is working on robot learning for human-robot collaborative tasks in manipulation and social navigation. Her research focuses on hybrid learning as a framework for developing intelligent robots that can learn from observations, augment this learning with self-exploration, and apply the learned skills in interaction with humans. She received her PhD in Computer Science at MIT advised by Prof. Julie Shah. She received her degrees in Electronics Engineering and Masters in Mechatronics from the Simon Bolivar University in Venezuela.
Semantic Robot Programming... and Maybe Making the World a Better Place
2 December 2020: Chad Jenkins (University of Michigan)
Abstract: The visions of interconnected heterogeneous autonomous robots in widespread use are a coming reality that will reshape our world. Similar to "app stores" for modern computing, people at varying levels of technical background will contribute to "robot app stores" as designers and developers. However, current paradigms to program robots beyond simple cases remains inaccessible to all but the most sophisticated of developers and researchers. In order for people to fluently program autonomous robots, a robot must be able to interpret user instructions that accord with that user’s model of the world. The challenge is that many aspects of such a model are difficult or impossible for the robot to sense directly. We posit a critical missing component is the grounding of semantic symbols in a manner that addresses both uncertainty in low-level robot perception and intentionality in high-level reasoning. Such a grounding will enable robots to fluidly work with human collaborators to perform tasks that require extended goal-directed autonomy. I will present our efforts towards accessible and general methods of robot programming from the demonstrations of human users. Our recent work has focused on Semantic Robot Programming (SRP), a declarative paradigm for robot programming by demonstration that builds on semantic mapping. In contrast to procedural methods for motion imitation in configuration space, SRP is suited to generalize user demonstrations of goal scenes in workspace, such as for manipulation in cluttered environments. SRP extends our efforts to crowdsource robot learning from demonstration at scale through messaging protocols suited to web/cloud robotics. With such scaling of robotics in mind, prospects for cultivating both equal opportunity and technological excellence will be discussed in the context of broadening and strengthening Title IX and Title VI.
Biography: Odest Chadwicke Jenkins, Ph.D., is a Professor of Computer Science and Engineering and Associate Director of the Robotics Institute at the University of Michigan. Prof. Jenkins earned his B.S. in Computer Science and Mathematics at Alma College (1996), M.S. in Computer Science at Georgia Tech (1998), and Ph.D. in Computer Science at the University of Southern California (2003). He previously served on the faculty of Brown University in Computer Science (2004-15). His research addresses problems in interactive robotics and human-robot interaction, primarily focused on mobile manipulation, robot perception, and robot learning from demonstration. His research often intersects topics in computer vision, machine learning, and computer animation. Prof. Jenkins has been recognized as a Sloan Research Fellow and is a recipient of the Presidential Early Career Award for Scientists and Engineers (PECASE). His work has also been supported by Young Investigator awards from the Office of Naval Research (ONR), the Air Force Office of Scientific Research (AFOSR) and the National Science Foundation (NSF). Prof. Jenkins is currently serving as Editor-in-Chief for the ACM Transactions on Human-Robot Interaction. He is a Fellow of the American Association for the Advancement of Science, and Senior Member of the Association for Computing Machinery and the Institute of Electrical and Electronics Engineers. He is an alumnus of the Defense Science Study Group (2018-19).
Computational principles underlying the learning of sensorimotor repertoires
19 November 2020: Daniel Wolpert (Columbia)
Abstract: Humans spend a lifetime learning, storing and refining a repertoire of motor memories appropriate for the multitude of tasks we perform. However, it is unknown what principle underlies the way our continuous stream of sensorimotor experience is segmented into separate memories and how we adapt and use this growing repertoire. I will review our work on how humans learn to make skilled movements focussing on the role of context in activating motor memories and how statistical learning can lead to multimodal object representations. I will then present a principled theory of motor learning based on the key insight that memory creation, updating, and expression are all controlled by a single computation – contextual inference. Unlike dominant theories of single-context learning, our repertoire-learning model accounts for key features of motor learning that had no unified explanation and predicts novel phenomena, which we confirm experimentally. These results suggest that contextual inference is the key principle underlying how a diverse set of experiences is reflected in motor behavior.
Biography: Daniel Wolpert read medicine at Cambridge before completing an Oxford Physiology DPhil and a postdoctoral fellowship at MIT. He joined the faculty at the Institute of Neurology, UCL in 1995 and moved to Cambridge University in 2005 where he was Professor of Engineering and a Royal Society Research Professor. In 2018 he joined the Zuckerman Mind Brain Behavior Institute at Columbia University as Professor of Neuroscience. He was elected a Fellow of the Royal Society (2012) and has been awarded the Royal Society Francis Crick Prize Lecture (2005), the Minerva Foundation Golden Brain Award (2010) and the Royal Society Ferrier medal (2020). His research interests are computational and experimental approaches to human movement.
Intuitive Reasoning as (Un)supervised Neural Generation
4 November 2020: Yejin Choi (UW)
Abstract: Neural language models, as they grow in scale, continue to surprise us with utterly nonsensical and counterintuitive errors despite their otherwise remarkable performances on leaderboards. In this talk, I will argue that it is time to challenge the currently dominant paradigm of task-specific supervision built on top of large-scale self-supervised neural networks. I will first highlight how we can make better lemonade out of neural language models by shifting our focus on unsupervised, inference-time algorithms. I will demonstrate how unsupervised algorithms can match or even outperform supervised approaches on hard reasoning tasks such as nonmonotonic reasoning (such as counterfactual and abductive reasoning), or complex language generation tasks that require logical constraints. Next, I will highlight the importance of melding explicit and declarative knowledge encoded in symbolic knowledge graphs with implicit and observed knowledge encoded in neural language models. I will present COMET, Commonsense Transformers that learn neural representation of commonsense reasoning from a symbolic commonsense knowledge graph, and Social Chemistry 101, a new conceptual formalism, a knowledge graph, and neural models to reason about social, moral, and ethical norms.
Biography: Yejin Choi is a Brett Helsel associate professor at the Paul G. Allen School of Computer Science and Engineering at the University of Washington and also a senior research manager at AI2 overseeing the project Mosaic. Her research interests include commonsense knowledge and reasoning, neural language (de-)generation, language grounding, and AI for social good. She is a co-recipient of the AAAI Outstanding Paper Award in 2020, Borg Early Career Award (BECA) in 2018, IEEE’s AI Top 10 to Watch in 2015, the ICCV Marr Prize in 2013, and the inaugural Alexa Prize Challenge in 2017.
The Task Specification Problem
28 October 2020: Pulkit Agrawal (MIT)
Abstract: Machines are adept at finding action policies that optimize a given set of reward signals. However, humans find it unintuitive to communicate tasks in terms of reward functions. This fundamental mis-match in how we want to communicate tasks and how machines understand tasks leads to many problems that plague today's AI systems: narrow transfer, learning non-robust features etc. It is possible that our quest for true transfer may depend more on how we specify tasks, rather than with the problem of learning useful features. In a discussion style seminar, I will present my thoughts on this topic.
Biography: Dr. Pulkit Agrawal is Steven and Renee Finn Chair Professor in the Department of Electrical Engineering and Computer Science at MIT. He earned his Ph.D. from UC Berkeley and co-founded SafelyYou Inc. His research interests span robotics, deep learning, computer vision and reinforcement learning. Pulkit completed his undergraduate studies from IIT Kanpur and was awarded the Director's Gold Medal. He is a recipient of Sony Faculty Research Award, Salesforce Research Award, Amazon Machine Learning Research Award, Signatures Fellow Award, Fulbright Science and Technology Award among others. Fun fact: Pulkit holds a Sangeet Prabhakar(equivalent to bachelors in Indian classical music).
Natural Language Explanations of Deep Networks
21 October 2020: Jacob Andreas (MIT)
Abstract: Despite major efforts in recent years to improve explainability of deep neural networks, the tools we use for communicating explanations have largely remained the same: visualizations of representative inputs, salient input regions, and local model approximations. But when humans describe complex decision rules, we often use a different explanatory tool: natural language. I'll describe recent work on automatically constructing natural language descriptions of representations, features, and decisions in deep models. These descriptions explain not just what a model is paying attention to, but why, and ground prediction in meaningful perceptual and linguistic abstractions. This talk will survey applications of language-based explanations to multi-agent reinforcement learning, language understanding, and image processing, and will conclude with some preliminary evidence that language data can improve model accuracy in addition to interpretability.
Biography: Jacob Andreas is the X Consortium Career Development Assistant Professor at MIT. His research focuses on building intelligent systems that can communicate effectively using language and learn from human guidance. Jacob earned his Ph.D. from UC Berkeley, his M.Phil. from Cambridge (where he studied as a Churchill scholar) and his B.S. from Columbia. He has been the recipient of an NSF graduate fellowship, a Facebook fellowship, and paper recognition at NAACL and ICML.
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
7 October 2020: Michael Carbin (MIT)
Abstract: Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start and, instead, training must first begin with large, overparameterized networks. In this talk, I’ll present our work on The Lottery Ticket Hypothesis, showing that at a standard pruning technique, iterative magnitude pruning, naturally uncovers subnetworks that are capable of training effectively from early in training. These subnetworks hold out the promise of more efficient machine learning methods, including inference, fine-tuning of pre-trained networks, and sparse training.
Biography: Michael Carbin is an assistant professor in MIT’s Department of Electrical Engineering and Computer Science and a principal investigator at the Computer Science and Artificial Intelligence Laboratory, where he leads the Programming Systems Group. His group investigates the semantics, design, and implementation of systems that operate in the presence of uncertainty in their environment (perception), implementation (neural networks or approximate transformations), or execution (unreliable hardware). Carbin has received a Sloan Research Fellowship, a Facebook Research Award, a Google Faculty Research Award and an NSF Career Award. He earned a BS in computer science at Stanford University and an MS and PhD in electrical engineering and computer science from MIT.
AI for physics and physics for AI
8 July 2020: Max Tegmark (MIT)
Abstract: After briefly reviewing how machine learning is becoming ever-more widely used in physics, I explore how ideas and methods from physics can help improve machine learning, focusing on automated discovery of mathematical formulas from data. I present a method for unsupervised learning of equations of motion for objects in raw and optionally distorted unlabeled video. I also describe progress on symbolic regression, i.e., finding a symbolic expression that matches data from an unknown function. Although this problem is likely to be NP-hard in general, functions of practical interest often exhibit symmetries, separability, compositionality and other simplifying properties. In this spirit, we have developed a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques that discover and exploit these simplifying properties, enabling significant improvement of state-of-the-art performance.
Biography: Max Erik Tegmark is a Swedish-American physicist, cosmologist and machine learning researcher. He is a professor at the Massachusetts Institute of Technology and the scientific director of the Foundational Questions Institute.
Factored Value Functions for Cooperative Multi-Agent Reinforcement Learning
1 July 2020: Shimon Whiteson (Oxford University)
Abstract: Cooperative multi-agent reinforcement learning (MARL) considers how teams of agents can coordinate their behaviour to efficiently achieve common goals. A key challenge therein is how to learn cooperative policies in a centralised fashion that nonetheless can be executed in a decentralised fashion. In this talk, I will discuss QMIX, a simple but powerful cooperative MARL algorithm that relies on factored value functions both to make learning efficient and to ensure decentralisability. Extensive results on the StarCraft Multi-Agent Challenge (SMAC), a benchmark we have developed, confirm that QMIX outperforms alternative approaches, though further analysis shows that this is not always for the reasons we expected.
Biography: Shimon Whiteson is a Professor of Computer Science at the University of Oxford and the Head of Research at Waymo UK. His research focuses on deep reinforcement learning and learning from demonstration, with applications in robotics and video games. He completed his doctorate at the University of Texas at Austin in 2007. He spent eight years as an Assistant and then an Associate Professor at the University of Amsterdam before joining Oxford as an Associate Professor in 2015. He was awarded a Starting Grant from the European Research Council in 2014, a Google Faculty Research Award in 2017, and a JPMorgan Faculty Award in 2019.
The Deep Learning Toolbox in 2020
24 June 2020: Oriol Vinyals (Google DeepMind)
Abstract: Hardware, software, datasets, architectures, or loss functions are a few of the components that have received a lot of enthusiastic focus by the machine learning community in the past decades. Combinations of these tools have enabled incredible applications and breakthroughs in a vast array of areas. In this talk, I will give an overview of the latest and greatest of what's happening in our community, whilst highlighting the most notable applications. I will also depict some of the open challenges and advances likely to happen in the next few years.
Biography: Oriol Vinyals, of Google DeepMind, has been involved in great progress all across Deep Learning: seq2seq, network distillation, graph networks, contrastive learning, meta-learning, RL (leading AlphaStar), and many others.
Beyond the Training Distribution: Embodiment, Adaptation, and Symmetry
17 June 2020: Chelsea Finn (Stanford)
Abstract: A fundamental challenge facing embodied learning systems in the real world is generalization. This is due to (1) the challenge of acquiring and learning from large amounts of data in the context of robotics, as well as (2) the standard i.i.d. assumption that prevents generalization beyond the training distribution. In this talk, I'll discuss how we might start to address both of these challenges, with a focus on the latter. I'll first briefly overview our work on broadening the sources of data that robots can learn from. Then, moving on to (2), I'll describe how we can allow our agents to continuously adapt to both smooth and sharp changes in their environments, even when those changes are substantial or not directly observable. These will include reinforcement learning agents that can maneuver around obstacles after training in obstacle-free environments, agents that can continuously learn amidst persistent non-stationary dynamics such as variable wind, and a real robot that can handle drastically different lighting conditions, backgrounds, objects, and a 10 cm offset in its gripper position. One key factor underlying successful extrapolation in each of these cases is the role of efficient adaptation with data, which can be accelerated with meta-learning. Finally, I'll discuss how we can automatically extract invariances and equivariances from data, and transfer them to new tasks, via a new form of meta-learning.
Biography: Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Finn's research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has included deep learning algorithms for concurrently learning visual perception and control in robotic manipulation skills, inverse reinforcement methods for scalable acquisition of nonlinear reward functions, and meta-learning algorithms that can enable fast, few-shot adaptation in both visual perception and deep reinforcement learning. Finn received her Bachelor's degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley. Her research has been recognized through the ACM doctoral dissertation award, an NSF graduate fellowship, a Facebook fellowship, the C.V. Ramamoorthy Distinguished Research Award, and the MIT Technology Review 35 under 35 Award, and her work has been covered by various media outlets, including the New York Times, Wired, and Bloomberg. Throughout her career, she has sought to increase the representation of underrepresented minorities within CS and AI by developing an AI outreach camp at Berkeley for underprivileged high school students, a mentoring program for underrepresented undergraduates across four universities, and leading efforts within the WiML and Berkeley WiCSE communities of women researchers.
Diverse Data and Efficient Algorithms for Robot Learning
27 May 2020: Lerrel Pinto (NYU)
Abstract: While robotics has made tremendous progress over the last few decades, most success stories are still limited to carefully engineered and precisely modeled environments. Interestingly, one of the most significant successes in the last decade of AI has been the use of Machine Learning (ML) to generalize and robustly handle diverse situations. So why don't we just apply current learning algorithms to robots? The two biggest reasons are the lack of large-scale data and efficient algorithms. In other fields of AI such as computer vision, we were able to collect diverse real-world, large-scale data along with creating powerful self-supervised learning algorithms. These key ingredients which fueled the success of deep learning in other fields are key bottlenecks in robotics. In the first part of this talk, I will discuss our efforts in large-scale and diverse data collection for robotics. Following this, I will discuss some our recent work in efficient reinforcement learning for deformable object manipulation.
Biography: Lerrel Pinto is starting as an Assistant Professor at NYU in Fall 2020 and is currently a postdoctoral fellow at UC-Berkeley. He received his PhD in Robotics from CMU in 2019. His research interests focus on machine learning and computer vision for robots.
Learning Equivariant and Hybrid Message Passing on Graphs
8 May 2020: Max Welling (University of Amsterdam)
Note:: (*Friday* 11 am)
Abstract: In this talk I will extend graph neural nets in two directions. First, we will ask if we can formulate a GNN on meshes of two dimensional manifolds. Previous approaches mostly used standard GNNs which are invariant to permutations of the input nodes. However, we show this is unnecessarily restrictive. Instead, we define mesh-CNNs which are equivariant and allow more general kernels. Second we will study how to incorporate information about the data generating process into GNNs. Belief propagation is a form of GNN with no learnable parameters that performs inference in a generative graphical model. We subsequently augment BP by a trainable GNN to correct the mistakes of BP, in order to improve predictive performance. Experiments show the increased power of both methods.
Biography: Prof. Dr. Max Welling is a research chair in Machine Learning at the University of Amsterdam and a VP Technologies at Qualcomm. He has a secondary appointment as a senior fellow at the Canadian Institute for Advanced Research (CIFAR). He is co-founder of “Scyfer BV” a university spin-off in deep learning which got acquired by Qualcomm in summer 2017. In the past he held postdoctoral positions at Caltech (’98-’00), UCL (’00-’01) and the U. Toronto (’01-’03). He received his PhD in ’98 under supervision of Nobel laureate Prof. G. ‘t Hooft. Max Welling has served as associate editor in chief of IEEE TPAMI from 2011-2015 (impact factor 4.8). He serves on the board of the NIPS foundation since 2015 (the largest conference in machine learning) and has been program chair and general chair of NIPS in 2013 and 2014 respectively. He was also program chair of AISTATS in 2009 and ECCV in 2016 and general chair of MIDL 2018. He has served on the editorial boards of JMLR and JML and was an associate editor for Neurocomputing, JCGS and TPAMI. He received multiple grants from Google, Facebook, Yahoo, NSF, NIH, NWO and ONR-MURI among which an NSF career grant in 2005. He is recipient of the ECCV Koenderink Prize in 2010. Welling is in the board of the Data Science Research Center in Amsterdam, he directs the Amsterdam Machine Learning Lab (AMLAB), and co-directs the Qualcomm-UvA deep learning lab (QUVA) and the Bosch-UvA Deep Learning lab (DELTA). Max Welling has over 250 scientific publications in machine learning, computer vision, statistics and physics and an h-index of 62.
Developmental Machine Learning, Curiosity and Deep RL
22 April 2020: Pierre-Yves Oudeyer (Inria Bordeaux Sud-Ouest)
Abstract: Current approaches to AI and machine learning are still fundamentally limited in comparison with autonomous learning capabilities of children. What is remarkable is not that some children become world champions in certains games or specialties: it is rather their autonomy, flexibility and efficiency at learning many everyday skills under strongly limited resources of time, computation and energy. And they do not need the intervention of an engineer for each new task (e.g. they do not need someone to provide a new task specific reward function). I will present a research program (Kaplan and Oudeyer, 2004; Oudeyer et al., 2007; Gottlieb and Oudeyer, 2019) that has focused on computational modeling of child development and learning mechanisms in the last decade. I will discuss several developmental forces that guide exploration in large real world spaces, starting from the perspective of how algorithmic models can help us understand better how they work in humans, and in return how this opens new approaches to autonomous machine learning. In particular, I will discuss models of curiosity-driven autonomous learning, enabling machines to sample and explore their own goals and their own learning strategies, self-organizing a learning curriculum without any external reward or supervision. I will introduce the Intrinsically Motivated Goal Exploration Processes (IMGEPs-) algorithmic framework, and present two families of IMGEPs: population-based IMGEPs (Baranes and Oudeyer, 2013; Forestie et al.,2017) with learned goal spaces (Pere et al., 2018), which have allowed sample efficient learning learning of skill repertoires in real robots, and goal-conditioned Deep RL-based IMGEPs, which enable strong generalization properties when they are modular (Colas et al., 2019), in particular when leveraging the compositionality of language to imagine goals in curiosity-driven exploration (Colas et al., 2020).
Biography: Dr. Pierre-Yves Oudeyer is Research Director (DR1) at Inria and head of the Inria and Ensta-ParisTech FLOWERS team (France). Before, he has been a permanent researcher in Sony Computer Science Laboratory for 8 years (1999-2007). He studied theoretical computer science at Ecole Normale Supérieure in Lyon, and received his Ph.D. degree in artificial intelligence from the University Paris VI, France. He has been studying lifelong autonomous learning, and the self-organization of behavioural, cognitive and cultural structures, at the frontiers of artificial intelligence, machine learning, cognitive sciences and educational technologies. He has been developing models of intrinsically motivated learning, pioneering curiosity-driven learning algorithms working in real world robots, and developed theoretical frameworks to understand better human curiosity and autonomous learning. He also studied mechanisms enabling machines and humans to discover, invent, learn and evolve communication systems. He has published two books, more than 100 papers in international journals and conferences, holds 8 patents, gave several invited keynote lectures in international conferences, and received several prizes for his work in developmental robotics and on the origins of language. In particular, he is laureate of the Inria-National Academy of Science young researcher prize in computer sciences, and of an ERC Starting Grant EXPLORERS. He is also editor of IEEE CIS Newsletter on Cognitive and Developmental Systems where he organizes interdisciplinary dialogs in cognitive science, AI and robotics, as well as associate editor of IEEE Transactions on Cognitive and Developmental Systems and Frontiers in Neurorobotics. He has been chair of IEEE CIS Technical Committee on Cognitive and Developmental Systems in 2015-16. He is also working actively for the diffusion of science towards the general public, through the writing of popular science articles and participation to radio and TV programs as well as science exhibitions.
Model-free, Model-based and General Intelligence
15 April 2020: Hector Geffner (Universitat Pompeu Fabra)
Abstract: During the 60s and 70s, AI researchers explored intuitions about intelligence by writing programs that displayed intelligent behavior. Many good ideas came out from this work but programs written by hand were not robust or general. After the 80s, research increasingly shifted to the development of learners capable of inferring behavior and functions from experience and data, and solvers capable of tackling well-defined but intractable models like SAT, classical planning, Bayesian networks, and POMDPs. The learning approach has achieved considerable success but results in black boxes that do not have the flexibility, transparency, and generality of their model-based counterparts. Model-based approaches, on the other hand, require models and scalable algorithms. The two have close parallels with Systems 1 and 2 in current theories of the human mind (Kahneman: Thinking fast and slow): the first, a fast, opaque, and inflexible intuitive mind; the second, a slow, transparent, and flexible analytical mind. In this talk, I review learners and solvers, and the challenge of integrating their System 1 and System 2 capabilities, focusing then on our recent work on symbolic representation learning.
Biography: Hector Geffner is an ICREA Research Professor at the Universitat Pompeu Fabra (UPF) in Barcelona, Spain. He was born and grew up in Buenos Aires and obtained a PhD in Computer Science at UCLA under the supervision of Judea Pearl. He worked then at the IBM T.J. Watson Research Center in NY, USA, and at the Universidad Simon Bolivar, in Caracas. Hector is a Fellow of AAAI and EurAI, a board member of the European Association for AI, and former Associate Editor of AI and JAIR. He teaches Logic and AI at UPF where he also also introduced a course on Social and Technological Change. His most recent book with Blai Bonet is "A Concise Introduction to Models and Methods for Automated Planning" Morgan and Claypool, 2013. He is the recipient of an Advanced ERC grant to do research on symbolic representation learning for planning.
Feedback control from pixels
10 April 2020: Russ Tedrake (MIT)
Note:: (*Friday* 2 pm)
Abstract: Control theory has an answer for just about everything, but seems to fall short when it comes to closing a feedback loop using a camera. Recent examples from RL and imitation learning demonstrate the great promise of using cameras in the loop, but don’t leverage the rigorous tools from systems theory. I’d like to discuss why, and show a few recent results where we are trying to take some small steps in this direction.
Biography: Russ is the Toyota Professor of Electrical Engineering and Computer Science, Aeronautics and Astronautics, and Mechanical Engineering at MIT, the Director of the Center for Robotics at the Computer Science and Artificial Intelligence Lab, and the leader of Team MIT's entry in the DARPA Robotics Challenge. Russ is also the Vice President of Robotics Research at the Toyota Research Institute. He is a recipient of the NSF CAREER Award, the MIT Jerome Saltzer Award for undergraduate teaching, the DARPA Young Faculty Award in Mathematics, the 2012 Ruth and Joel Spira Teaching Award, and was named a Microsoft Research New Faculty Fellow. Russ received his B.S.E. in Computer Engineering from the University of Michigan, Ann Arbor, in 1999, and his Ph.D. in Electrical Engineering and Computer Science from MIT in 2004, working with Sebastian Seung. After graduation, he joined the MIT Brain and Cognitive Sciences Department as a Postdoctoral Associate. During his education, he has also spent time at Microsoft, Microsoft Research, and the Santa Fe Institute.
Towards Complex Language in Partially Observed Environments
25 March 2020: Stefanie Tellex (Brown University)
Abstract: Robots can act as a force multiplier for people, whether a robot assisting an astronaut with a repair on the International Space station, a UAV taking flight over our cities, or an autonomous vehicle driving through our streets. Existing approaches use action-based representations that do not capture the goal-based meaning of a language expression and do not generalize to partially observed environments. The aim of my research program is to create autonomous robots that can understand complex goal-based commands and execute those commands in partially observed, dynamic environments. I will describe demonstrations of object-search in a POMDP setting with information about object locations provided by language, and mapping between English and Linear Temporal Logic, enabling a robot to understand complex natural language commands in city-scale environments. These advances represent steps towards robots that interpret complex natural language commands in partially observed environments using a decision theoretic framework.
Biography: Stefanie Tellex is an Associate Professor of Computer Science at Brown University. Her group, the Humans To Robots Lab, creates robots that seamlessly collaborate with people to meet their needs using language, gesture, and probabilistic inference, aiming to empower every person with a collaborative robot. She completed her Ph.D. at the MIT Media Lab in 2010, where she developed models for the meanings of spatial prepositions and motion verbs. Her postdoctoral work at MIT CSAIL focused on creating robots that understand natural language. She has published at SIGIR, HRI, RSS, AAAI, IROS, ICAPs and ICMI, winning Best Student Paper at SIGIR and ICMI, Best Paper at RSS, and an award from the CCC Blue Sky Ideas Initiative. Her awards include being named one of IEEE Spectrum's AI's 10 to Watch in 2013, the Richard B. Salomon Faculty Research Award at Brown University, a DARPA Young Faculty Award in 2015, a NASA Early Career Award in 2016, a 2016 Sloan Research Fellowship, and an NSF Career Award in 2017. Her work has been featured in the press on National Public Radio, BBC, MIT Technology Review, Wired and Wired UK, as well as the New Yorker. She was named one of Wired UK's Women Who Changed Science In 2015 and listed as one of MIT Technology Review's Ten Breakthrough Technologies in 2016.
Curiouser and curiouser: why we make problems for ourselves
2 March 2020: Laura Schultz (MIT)
Note:: (*Monday* 2 pm, Kiva 32-G449)
Abstract: Coming from the Cognitive Science Department, I will talk about curiosity and how children learn.
Biography: Laura Schulz is a Professor of Cognitive Science in the Department of Brain and Cognitive Sciences at MIT and the PI of Early Childhood Cognition Lab. She graduated from the University of Michigan with a degree in philosophy and from the University of California, Berkeley with a PhD in developmental psychology. She has worked on causal learning, exploration, play, and emotion, and her research focuses broadly on the question of how children learn so much from so little so quickly. Work in her lab is influenced by computational models of cognition and she is especially interested in bridging the gap between formal models of learning and children’s behavior. She has been honored with the American Psychological Association Distinguished Scientific Award for Early Career Contributions to Psychology, the Society for Research in Child Development Award for Early Career Contributions, the Troland Award from the National Academy of Sciences, and the Presidential Award Early Career Award for Scientists and Engineers.
Emergent Intelligence: getting more out of agents than you bake in
19 February 2020: Phillip Isola (MIT)
Abstract: One of the most satisfying things in AI research is to get more out than you put in. I will talk about three projects where we got something surprising out, beyond what was explicitly baked in. First, a simple inductive bias for unsupervised learning discovers "semantics", despite that no labels are used during training. Second, physically plausible "videos" emerge in the latent space of a generative model, despite that the model was only fit to static images. Third capable and robust "creatures" self-assemble from a collection of primitive limbs, despite having no centralized controller. I will suggest that a goal in artificial intelligence is to get the most out while baking the least in.
Biography: Phillip Isola is the Bonnie and Marty Tenenbaum Career Development Chair Assistant Professor in EECS at MIT. He completed his Ph.D. in Brain and Cognitive Sciences at MIT, advised by Edward Adelson, then spent two years as a postdoc at UC Berkeley, under Alexei Efros, and one year as a visiting research scientist at OpenAI. His work focuses on why we represent the world the way we do, and how we can replicate these abilities in machines.
Research Update
12 February 2020: Antonio Torralba (MIT)
Abstract: I will present some of the work done in my group (some old and some new) on image
databases, multimodal learning and dissecting neural nets.
Biography: My research is in the areas of computer vision, machine learning and human visual perception. I am interested in building systems that can perceive the world like humans do. Although my work focuses on computer vision I am also interested in other modalities such as audition and touch. A system able to perceive the world through multiple senses might be able to learn without requiring massive curated datasets. Other interests include understanding neural networks, common-sense reasoning, computational photography, building image databases, and the intersections between visual art and computation.