Computational Sensorimotor Learning (CSL) Seminar
The CSL seminar topics span RL, robotics, deep learning and related fields. The seminar meets once every few weeks over zoom on Mondays from 11:00-12:00 ET.
Upcoming Seminars
Past Seminars
On Building General-Purpose Home Robots
12 March 2024: Lerrel Pinto (NYU)
Abstract: The concept of a "generalist machine" in homes - a domestic assistant that can adapt and learn from our needs, all while remaining cost-effective - has long been a goal in robotics that has been steadily pursued for decades. In this talk, I will present our recent efforts towards building such capable home robots. First, I will discuss how large, pretrained vision-language models can induce strong priors for mobile manipulation tasks like pick-and-drop. But pretrained models can only take us so far. To scale beyond basic picking, we will need systems and algorithms to rapidly learn new skills. This requires creating new tools to collect data, improving representations of the visual world, and enabling trial-and-error learning during deployment. While much of the work presented focuses on two-fingered hands, I will briefly introduce learning approaches for multi-fingered hands which support more dexterous behaviors and rich touch sensing combined with vision. Finally, I will outline unsolved problems that were not obvious initially, which, when solved, will bring us closer to general-purpose home robots.
Biography: Lerrel Pinto is an Assistant Professor of Computer Science at NYU. His research focuses on machine learning for robots. He received a Ph.D. degree from CMU after which he did a Postdoc at UC Berkeley. His research on robot learning has received the best paper awards at ICRA 2016 and RSS 2023, and finalist at IROS 2019, and CoRL 2022. Lerrel has received the Packard Fellowship and was named a TR35 innovator under 35 for 2023. Several of his works have been featured in popular media such as The Wall Street Journal, TechCrunch, MIT Tech Review, Wired, and BuzzFeed among others.
Toward Generative AI For The (Cyber-)Physical World
17 January 2024: Andrew Spielberg (Harvard University)
Abstract: Cyberphysical systems - machines in which the virtual "brain" (cognitive planning, control, proprioception, sense, etc.) and the physical "body" (shape, actuators, materiality, sensor networks, etc.) are intimately coupled — are increasingly a part of the modern world. These machines are progressively ubiquitous in the personal realm in household robots and (semi-)autonomous vehicles, in society in smart infrastructure and surveillance drones, and in industry in manufacturing robotics and heavy machinery. As computing and advanced manufacturing techniques expand the types of systems we can build and what built systems around us can do, we require design tools that cut through increasingly complex, often intractable possibilities. Those tools should be accurate, optimizing, explorative, and enable physical realization, with the goal of ideating and fabricating machines that approach the diversity and capability of biological life. In this talk, I will discuss solutions for co-designing dynamical cyberphysical systems, especially rigid and soft robots, over their physical morphological and embodied artificial intelligence. In particular, I will discuss efficient methods for co-optimizing and co-learning control and morphology, digital fabrication methods that leverage spatially programmable materials for function, and data-driven modeling for overcoming the sim-to-real gap. These methods will be tied together in a vision for computational invention.
Biography: Andrew Spielberg is a Postdoctoral Fellow at Harvard University, where he works with Prof. Jennifer A. Lewis in the Lewis Lab. His mission is to enable anyone to be able to design functional artifacts across scales and domains, and with a special emphasis on rigid and soft robots and other cyberphysical machines. He looks to empower novices and accelerate experts' workflows. Andrew researches differentiable simulation, design algorithms, digital manufacturing processes, and methods for overcoming the sim-to-real gap, for inventing in both virtual and physical worlds. He has published over 30 papers in top refereed venues, and his work has been recognized with a CHI best paper award, Icra and RoboSoft best paper nominations, Advanced Intelligent Systems journal higlights, and an NeurIPS oral spotlight. He is a recipient of the Unity Global Fellowship, the DARPA I2O Fellowship, and a Harvard GRID $100K award. Andrew received his PhD from MIT's Computer Science and Artificial Intelligence Lab, where he was advised by Daniela Rus and Wojciech Matusik, and has spent time at Disney Research Pittsburgh and Zürich, Intel Labs, and Johns Hopkins University Applied Physics Lab.
Foundation Models: A Representational Viewpoint on Robotic Decision-Making
7 December 2023: Vikash Kumar (CMU)
Abstract: Advances in building large models are propelling the development of increasingly capable generative agents, especially in fields with extensive data corpora, such as language and vision. Despite the comparatively small and narrow nature of robotic datasets, large models trained in conjunction with abundant non-robotics datasets have demonstrated remarkable progress in various components of robotic decision-making paradigms, including visual representation, language grounding, task decomposition, state-value estimation, etc. With their irrefutable impact on individual layers of robotic decision-making, a natural question arises: is there a unifying viewpoint explaining these seemingly disjoint impacts? This talk will outline robotic decision-making as an exercise primarily in representation learning paired with decision-making towards the tail end. The talk will also introduce a framework capable of unifying recent developments that leverage foundation models in robot learning. Augmenting impressive generalization ability in high-level decision-making, this talk will juxtapose foundation models for low-level motor control, outlining concrete strategies for constructing and leveraging them. It will introduce RoboAgent—an extremely data-efficient universal agent capable of continual evolution. Trained on just 7500 trajectories, it demonstrates a diverse set of 12 non-trivial manipulation skills—beyond picking and pushing, including articulated object manipulation and object re-orientation—across 38 tasks. It can generalize these skills to hundreds of diverse unseen scenarios, involving unfamiliar objects, tasks, and entirely unseen kitchen setups.
Biography: Vikash Kumar is an Adjunct Professor at CMU. His research focuses on understanding the fundamentals of embodied (physiological as well as robotic) movements. He finished his Ph.D. at the University of Washington with Prof. Sergey Levine and Prof. Emo Todorov and his M.S. and B.S. from the Indian Institute of Technology (IIT), Kharagpur. He has also spent time as Sr. Research Scientist at FAIR-MetaAI, and Research Scientist at Google-Brain and OpenAI. His research leverages data-driven techniques to develop efficient and generalizable paradigms for embodied intelligence. Applications of his work have led to human-level dexterity on anthropomorphic robotic hands as well as physiological digital twins, low-cost scalable systems capable of contact-rich behaviors, skilled multi-task multi-skill robotic agents, etc. His recent focus is on building foundation models for physiological as well as robotic embodied intelligence, primarily using off-domain data. He is the lead creator of MyoSuite, RoboHive, and a founding member of the MuJoCo physics engine, now widely used in the fields of Robotics and Machine Learning. His works have been recognized with the best Master's thesis award, best manipulation paper at ICRA’16, best workshop paper ICRA'22, CIFAR AI chair '20 (declined), and have been widely covered in a wide variety of media outlets such as NewYorkTimes, Reuters, ACM, WIRED, MIT Tech reviews, IEEE Spectrum, etc.
Toward General Virtual Agents
15 November 2023: Stephen McAleer (CMU)
Abstract: Agents capable of carrying out general tasks on a computer can greatly improve efficiency and productivity. Ideally, such agents should be able to solve new computer tasks presented to them through natural language commands. However, previous approaches to this problem require large amounts of expert demonstrations and task-specific reward functions, both of which are impractical for new tasks. In this talk, I show that pre-trained LLMs are able to achieve state-of-the-art performance on MiniWoB, a popular computer task benchmark, by recursively criticizing and improving outputs. I then argue that RLHF is a promising approach toward improving LLM agents, and introduce new work on countering over-optimization in RLHF via constrained RL.
Biography: Stephen McAleer is a postdoc at Carnegie Mellon University working with Tuomas Sandholm. His research has led to the first reinforcement learning algorithm to solve the Rubik's cube and the first algorithm to achieve expert-level performance on Stratego. His work has been published in Science, Nature Machine Intelligence, ICML, NeurIPS, and ICLR, and has been featured in news outlets such as the Washington Post, the LA Times, MIT Technology Review, and Forbes. He received a PhD in computer science from UC Irvine working with Pierre Baldi, and a BS in mathematics and economics from Arizona State University.
Inductive Biases for Robot Reinforcement Learning
23 October 2023: Sherry Yang (UC Berkeley)
Abstract: Foundation models pretrained on internet vision and language data have acquired broad knowledge. As these models continue to evolve, they inevitably face the challenge of outperforming their data-driven behaviors especially in low-data situations when faced with intricate tasks demanding extended reasoning, search, or optimization. Such tasks have been at the core of sequential decision making, encompassing areas such as planning and reinforcement learning. Sequential decision making has traditionally faced the challenges of sample efficiency and generalization, partially due to the inability to incorporate broad knowledge from internet data. In this talk, I will provide three foundation model inspired approaches including representation learning, conditional generative modeling, and repurposing pretrained vision and language models, in order to leverage broad knowledge from foundation models to solve more complex tasks such as continuous control, navigation, robotic manipulation, and game play.
Biography: Sherry is a final year PhD student at UC Berkeley advised by Pieter Abbeel and a senior research scientist at Google DeepMind. Her research interests include imitation learning, deep reinforcement learning (RL), and recently foundation models for decision making. Her work spans offline reinforcement learning, representation learning for sequential decision making, and generative modeling for control, planning, and RL. She initiated the Foundation Models for Decision Making workshop at NeurIPS 2022 and 2023, bringing together research communities in vision, language, planning, and RL to solve complex decision making tasks at scale. Before her current role, Sherry received her Bachelor's and Master's degree from MIT advised by Patrick Winston and Julian Shun.
Inductive Biases for Robot Reinforcement Learning
14 April 2023: Jan Peters (Technische Universität Darmstadt)
Abstract: Autonomous robots that can assist humans in situations of daily life have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. A first step towards this goal is to create robots that can learn tasks triggered by environmental context or higher level instruction. However, learning techniques have yet to live up to this promise as only few methods manage to scale to high-dimensional manipulator or humanoid robots. In this talk, we investigate a general framework suitable for learning motor skills in robotics which is based on the principles behind many analytical robotics approaches. To accomplish robot reinforcement learning learning from just few trials, the learning system can no longer explore all learn-able solutions but has to prioritize one solution over others – independent of the observed data. Such prioritization requires explicit or implicit assumptions, often called ‘induction biases’ in machine learning. Extrapolation to new robot learning tasks requires induction biases deeply rooted in general principles and domain knowledge from robotics, physics and control. Empirical evaluations on a several robot systems illustrate the effectiveness and applicability to learning control on an anthropomorphic robot arm. These robot motor skills range from toy examples (e.g., paddling a ball, ball-in-a-cup) to playing robot table tennis, juggling and manipulation of various objects.
Biography: Jan Peters is a full professor (W3) for Intelligent Autonomous Systems at the Computer Science Department of the Technische Universitaet Darmstadt since 2011, and, at the same time, he is the dept head of the research department on Systems AI for Robot Learning (SAIROL) at the German Research Center for Artificial Intelligence (Deutsches Forschungszentrum für Künstliche Intelligenz, DFKI) since 2022. He is also is a founding research faculty member of the Hessian Center for Artificial Intelligence. Jan Peters has received the Dick Volz Best 2007 US PhD Thesis Runner-Up Award, the Robotics: Science and Systems - Early Career Spotlight, the INNS Young Investigator Award, and the IEEE Robotics and Automation Society's Early Career Award as well as numerous best paper awards. In 2015, he received an ERC Starting Grant and in 2019, he was appointed IEEE Fellow, in 2020 ELLIS fellow and in 2021 AAIA fellow. Despite being a faculty member at TU Darmstadt only since 2011, Jan Peters has already nurtured a series of outstanding young researchers into successful careers. These include new faculty members at leading universities in the USA, Japan, Germany, Finland and Holland, postdoctoral scholars at top computer science departments (including MIT, CMU, and Berkeley) and young leaders at top AI companies (including Amazon, Boston Dynamics, Google and Facebook/Meta).
Jan Peters has studied Computer Science, Electrical, Mechanical and Control Engineering at TU Munich and FernUni Hagen in Germany, at the National University of Singapore (NUS) and the University of Southern California (USC). He has received four Master's degrees in these disciplines as well as a Computer Science PhD from USC. Jan Peters has performed research in Germany at DLR, TU Munich and the Max Planck Institute for Biological Cybernetics (in addition to the institutions above), in Japan at the Advanced Telecommunication Research Center (ATR), at USC and at both NUS and Siemens Advanced Engineering in Singapore. He has led research groups on Machine Learning for Robotics at the Max Planck Institutes for Biological Cybernetics (2007-2010) and Intelligent Systems (2010-2021).
Reverse engineering human exploration
24 March 2023: Sam Gershman (Harvard University)
Abstract: This talk explores how humans solve the exploration-exploitation dilemma in reinforcement learning. Mirroring the multiplicity of algorithmic solutions studied in machine learning, the human brain also appears to employ several different algorithms. In particular, evidence indicates that humans use a combination of uncertainty-guided directed and random exploration, as well as more sophisticated algorithms that rely on structured world knowledge. These findings point towards a convergence of natural and artificial intelligence.
Biography: Sam Gershman received his B.A. in Neuroscience and Behavior from Columbia University in 2007 and his Ph.D. in Psychology and Neuroscience from Princeton University in 2013. From 2013-2015, he was a postdoctoral fellow in the Department of Brain and Cognitive Sciences at MIT. He is currently a Professor in the Department of Psychology and Center for Brain Science at Harvard. His research focuses on computational cognitive neuroscience approaches to learning, memory and decision making.
Distributional reinforcement learning: A richer model of agent-environment interactions
16 Nov 2022: Marc Bellemare (Google Research)
Abstract: Biological and artificial agents alike benefit from treating their environment as a stochastic system. In reinforcement learning, we instantiate this principle by modelling the environment dynamics and total reward (the return) as random quantities. Where the classical treatment focuses almost exclusively on the expected return, a much richer picture emerges when we instead consider the entire distribution of returns. I will give a technical overview of the computational concerns and solutions that arise when we design agents that learn return distributions. Following this, I will review recent experimental results, from robotics to computational neuroscience, illustrating the broad benefits of studying and designing agents under the distributional lens.
Biography: Marc G. Bellemare leads the reinforcement learning (RL) group at Google Research in Montreal, Canada. He is adjunct professor at McGill University and Université de Montréal, a core industry member at the Montreal Institute for Learning Algorithms (Mila), CIFAR Learning in Machines & Brains Fellow, and holds a Canada-CIFAR AI Chair. At its core, his group studies how artificial agents can be designed to operate in complex, time-evolving environments. Marc received his Ph.D. from the University of Alberta, where he developed the highly-successful Arcade Learning Environment benchmark. During his subsequent tenure at DeepMind in London, UK he made a number of pioneering developments in deep reinforcement learning, in particular proposing the distributional perspective as a richer model of agent-environment interactions.
Learning-Based Robot Control from Vision: Formal Guarantees and Fundamental Limits
2 Nov 2022: Anirudha Majumdar (Princeton University)
Abstract: The ability of machine learning techniques to process rich sensory inputs such as vision makes them highly appealing for use in robotic systems (e.g., micro aerial vehicles and robotic manipulators). However, the increasing adoption of learning-based components in the robotics perception and control pipeline poses an important challenge: how can we guarantee the safety and performance of such systems? As an example, consider a micro aerial vehicle that learns to navigate using a thousand different obstacle environments or a robotic manipulator that learns to grasp using a million objects in a dataset. How likely are these systems to remain safe and perform well on a novel (i.e., previously unseen) environment or object? How can we learn control policies for robotic systems that provably generalize to environments that our robot has not previously encountered? Unfortunately, existing approaches either do not provide such guarantees or do so only under very restrictive assumptions.
In this talk, I will present our group's work on developing a framework for learning control policies for robotic systems with formal guarantees on generalization to novel environments. The key technical insight is to leverage and extend powerful techniques from generalization theory in theoretical machine learning. We apply our techniques on problems including vision-based navigation and manipulation in order to demonstrate the ability to provide strong generalization guarantees on robotic systems with complicated (e.g., nonlinear/hybrid) dynamics, rich sensory inputs (e.g., RGB-D), and neural network-based control policies. I will also present recent work aimed at understanding fundamental limits on safety and performance imposed by a robot's (imperfect) sensors.
Biography: Anirudha Majumdar is an Assistant Professor at Princeton University in the Mechanical and Aerospace Engineering (MAE) department, and Associated Faculty in the Computer Science department. He also holds a part-time position as a Visiting Research Scientist at the Google AI Lab in Princeton. He received a Ph.D. in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology in 2016, and a B.S.E. in Mechanical Engineering and Mathematics from the University of Pennsylvania in 2011. Subsequently, he was a postdoctoral scholar at Stanford University from 2016 to 2017 at the Autonomous Systems Lab in the Aeronautics and Astronautics department. He is a recipient of the ONR YIP award, the NSF CAREER award, the Google Faculty Research Award (twice), the Amazon Research Award (twice), the Young Faculty Researcher Award from the Toyota Research Institute, the Best Conference Paper Award at the International Conference on Robotics and Automation (ICRA), the Paper of the Year Award from the International Journal of Robotics Research (IJRR), the Alfred Rheinstein Faculty Award (Princeton), and the Excellence in Teaching Award from Princeton’s School of Engineering and Applied Science.
Machine Learning and Model Predictive Control for Adaptive Robotic Systems
19 Oct 2022: Byron Boots (University of Washington)
Abstract: In this talk I will discuss several different ways in which ideas from machine learning and model predictive control (MPC) can be combined to build intelligent, adaptive robotic systems. I’ll begin by showing how to learn models for MPC that perform well on a given control task. Next, I’ll introduce an online learning perspective on MPC that unifies well-known algorithms and provides a prescriptive way to generate new ones. Finally, I will discuss how MPC can be combined with model-free reinforcement learning to build fast, reactive systems that can improve their performance with experience. Along the way, I’ll show how these approaches can be applied to the development of high-speed ground vehicles and resilient quadrupeds.
Biography: Byron Boots is the Amazon Professor of Machine Learning in the Paul G. Allen School of Computer Science and Engineering at the University of Washington. Byron's group performs fundamental and applied research in machine learning, artificial intelligence, and robotics with a focus on developing theory and systems that tightly integrate perception, learning, and control. His work has been applied to a range of problems including localization and mapping, motion planning, robotic manipulation, quadrupedal locomotion, and high-speed navigation. Byron has received several awards including "Best Paper" Awards from ICML, AISTATS, RSS, and IJRR. He is also the recipient of the RSS Early Career Award, the DARPA Young Faculty Award, the NSF CAREER Award, and the Outstanding Junior Faculty Research Award from the College of Computing at Georgia Tech. Byron received his PhD from the Machine Learning Department at Carnegie Mellon University.
Physiological Motor Control
21 Sept 2022: Vikash Kumar (Facebook AI Research (FAIR))
Abstract: The more intelligent an organism is, the more complex the motor behavior it can exhibit. But what enables such complex decision-making and the motor control to execute those decisions? To explore this question, we’ve developed MyoSuite: a contact-rich framework for musculoskeletal motor control. MyoSuite ecosystems consist of musculoskeletal models that are 4,000x faster and as well as ML algorithms for solving musculoskeletal control problems. In this talk, I'll outline MyoSuite's ambitious aim and progress towards unifying the two facets of intelligence: motor and neural, by providing a common platform for the two closely related communities to come together. To further this ambitious undertaking, we have launched MyoChallenge, a NeurIPS'22 competition track where we invite the community to participate in solving two of the most difficult dexterity challenges — die reorientation and simultaneous rotation of two Baoding balls.
Biography: Vikash Kumar is a research scientist in Facebook AI Research (FAIR). He finished his Ph.D. from the University of Washington with Prof. Emo Todorov and Prof. Sergey Levine, where his research focused on imparting human-level dexterity to anthropomorphic robotic hands. He continued his research as a post-doctoral fellow with Prof. Sergey Levine at Univ. of California Berkeley where he further developed his methods to work on low-cost scalable systems. He also spent time as a Research Scientist at OpenAI and Google-Brain where he diversified his research on low-cost scalable systems to the domain of multi-agent locomotion. He has also been involved with the development of the MuJoCo physics engine, now widely used in the fields of Robotics and Machine Learning. His works have been recognized with the best Master's thesis award, best manipulation paper at ICRA’16, best workshop paper ICRA'22, CIFAR AI chair '20 (declined), and have been widely covered with a wide variety of media outlets such as NewYorkTimes, Reuters, ACM, WIRED, MIT Tech reviews, IEEE Spectrum, etc.
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
10 Aug 2022: Jim Fan (NVIDIA AI)
Abstract: Autonomous agents have made great strides in specialist domains like Atari games and Go. However, they typically learn tabula rasa in isolated environments with limited objectives, thus failing to generalize across a wide spectrum of tasks and capabilities. Inspired by how humans continually learn and adapt in the open world, we advocate a trinity of ingredients for building generalist agents: 1) an environment that supports an infinite variety of tasks and goals, 2) a large-scale database of multimodal knowledge, and 3) a flexible and scalable agent architecture. We introduce MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with 1000s of diverse open-ended tasks and an internet-scale knowledge base with 730K YouTube videos, 7K Wiki pages, and 340K Reddit posts. Using MineDojo's data, we propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function. Our agent is able to solve Minecraft tasks specified in free-form language without any manually designed dense shaping reward. MineDojo is open-sourced at https://minedojo.org. We look forward to seeing how MineDojo empowers the community to make progress on the grand challenge of open-ended agent learning.
Biography: Linxi "Jim" Fan is an NVIDIA AI research scientist at Senior Director Anima Anandkumar’s group. His primary focus is to develop generally capable autonomous agents. To tackle this grand challenge, his research efforts span foundation models, policy learning, robotics, multimodal learning, and large-scale systems. He obtained his Ph.D. degree in Computer Science from Stanford University, advised by Prof. Fei-Fei Li. His Ph.D. thesis was titled "Training and Deploying Visual Agents at Scale". Previously, Jim did research internships at NVIDIA, Google Cloud AI, OpenAI, Baidu Silicon Valley AI Lab, and Mila-Quebec AI Institute. He graduated summa cum laude with a Bachelor's degree in Computer Science from Columbia University. Jim was the Valedictorian of Class 2016 and a recipient of the Illig Medal at Columbia.
Speaker Slides: [PDF]
Machine-Learning-Driven Haptic Sensor Design
2 Aug 2022: Huanbo Sun (Max Planck Institute)
Abstract: Similar to biological systems, robots would significantly benefit from skin-like sensing capabilities in order to perceive interactions in complex, dynamic, and human-involved environments. However, current sensing technologies are still far behind their biological counterparts in terms of resolution, surface coverage, and robustness together. During my Ph.D. study, I explore how machine learning can enable the design of new kinds of capable haptic sensors. I propose super-resolution-oriented tactile sensors, reducing the number of physical sensing elements while achieving high spatial accuracy. I also explore vision-based haptic sensor designs. The talk will present several machine-learning-driven haptic sensors that I designed for coarse and fine robotic applications, varying from large surface (robot limbs) to small surface sensing (robot fingers). I will also introduce a super-resolution theory to guide future sensor designs, as this theory can predict the best performance before a physical sensor prototype is built.
Biography: Huanbo Sun recently defended his Ph.D. thesis at the Max Planck Institute for Intelligent Systems in Germany. He studied Mechanical Engineering with a focus on robotics during his Master's and Bachelor's studies at the RWTH Aachen University and the Southeast University Chengxian College. He is currently designing machine-learning-driven haptic sensors to give robots a better sense of touch.
Learned optimizers: why they're the future, why they’re hard, and what they can do now
2 May 2022: Jascha Sohl-Dickstein (Google Brain)
Abstract: The success of deep learning has hinged on learned functions dramatically outperforming hand-designed functions for many tasks. However, we still train models using hand designed optimizers acting on hand designed loss functions. I will argue that these hand designed components are typically mismatched to the desired behavior, and that we can expect meta-learned optimizers to perform much better. I will discuss the challenges and pathologies that make meta-training learned optimizers difficult. These include: chaotic and high variance meta-loss landscapes; extreme computational costs for meta-training; lack of comprehensive meta-training datasets; challenges designing learned optimizers with the right inductive biases; challenges interpreting the method of action of learned optimizers. I will share solutions to some of these challenges. I will show experimental results where learned optimizers outperform hand-designed optimizers in several contexts. I will discuss novel capabilities that can be achieved by meta-training learned optimizers to target downstream performance rather than training loss. I will end with a demo of an open source JAX library for training, testing, and applying learned optimizers.
Biography: Jascha is a senior staff research scientist in Google Brain, and leads a research team with interests spanning machine learning, physics, and neuroscience. Recent projects have focused on theory of overparameterized neural networks, meta-training of learned optimizers, and understanding the capabilities of large language models. Jascha was previously a visiting scholar in Surya Ganguli's lab at Stanford, and an academic resident at Khan Academy. He earned his PhD in 2012 in Bruno Olshausen's lab in the Redwood Center for Theoretical Neuroscience at UC Berkeley. Prior to his PhD, he spent several years working for NASA on the Mars Exploration Rover mission.
The Reasonable Effectiveness of Dynamic Manipulation for Deformable Objects
25 April 2022: Shuran Song (Columbia University)
Abstract: From unfurling a blanket to swinging a rope; high-velocity dynamic actions play a crucial role in how people interact with deformable objects. In this talk, I will discuss how we can get robots to learn to dynamically manipulate deformable objects, where we embrace high-velocity dynamics rather than avoid them (e.g., exclusively using slow pick and place actions). With robots that can fling, swing, or blow with air, our experiments show that these interactions are surprisingly effective for many classically hard manipulation problems and enable new robot capabilities.
Biography: Shuran Song is an assistant professor in the Department of Computer Science at Columbia University. Before that, she received her Ph.D. in Computer Science at Princeton University. Her research interests lie at the intersection of computer vision and robotics. She is a recipient of several awards including the Best Paper Award at T-RO'20, Best Systems Paper Award at CORL'21 and RSS '19, Best Manipulation Systems Paper Award from Amazon '18, and has been finalists for Best Paper Awards at conferences ICRA '20, CVPR'19, RSS '19, IROS '18. She has received research awards from Sloan Foundation, Toyota Research Institute, Microsoft, Amazon, JP Morgan, and Google.
Reconstructing Generic (hand-held or isolated) Objects
11 April 2022: Shubham Tulsiani (CMU)
Abstract: We observe and interact with myriad of objects in our everyday lives, from cups and bottles to hammers and tennis rackets. In this talk, I will describe two recent projects aimed at inferring the 3D structure of such generic objects from a RGB single image.
Towards reconstructing hand-held objects, I will describe a method that can leverage the cues provided by hand articulation e.g. we grasp a pen differently from a bottle. By learning an implicit reconstruction network that infers pointwise SDFs conditioned on articulation-aware coordinates and pixel-aligned features, I will show that we can reconstruct arbitrary hand-held objects — going beyond the common assumption of known templates when understanding hand-object interaction.
I will then focus on scaling 3D prediction to large set of categories, and show how we can learn 3D prediction without 3D supervision. While recent approaches have striven to similarly learn 3D from category-level segmented image collections, they typically learn independent category-specific models from scratch, often relying on adversarial or template-based priors to regularize learning. I will present a simpler and more scalable alternative — learning a unified model across 150 categories while using synthetic 3D data on some categories to help regularize learning for others.
Biography: Shubham Tulsiani is an Assistant Professor in the CMU School of Computer Science. Prior to this, he was research scientist at Facebook AI Research (FAIR). He received a PhD. in Computer Science from UC Berkeley in 2018. He is interested in building perception systems that can infer the spatial and physical structure of the world they observe.
Retrospectives on Scaling Robot Learning
4 April 2022: Andy Zeng (Google AI)
Abstract: Recent incredible results from models like BERT, GPT-3, DALL-E makes you wonder “what will it take to get to something like that for robots?” While we've made lots of progress, robot learning remains hard because scaling data collection is expensive. In this talk, I will discuss two views on how we might be able to work around this: (i) making the most out of our data, and (ii) robot learning from the Internet. I will dive into several projects in the context of learning visuomotor policies from demonstrations, where I will share key takeaways along the way, and conclude with some thoughts on where I think robot learning is headed.
Biography: Andy Zeng is a Research Scientist at Google working on computer vision and machine learning for robotics. He received his Bachelors in CS and Math at UC Berkeley '15, and his PhD in CS at Princeton '19. He was a part of Team MIT-Princeton at the Amazon Picking Challenge, and is a recipient of several Best Paper Awards. His research has been recognized through the Princeton SEAS Award for Excellence, NVIDIA Fellowship, and Gordon Y.S. Wu Fellowship in Engineering and Wu Prize, and his work has been featured in many popular press outlets, including the New York Times, BBC, and Wired.
Predicting the Future for Perception and Beyond
14 March 2022: Carl Vondrick (Columbia University)
Abstract: The future is hard to anticipate, yet machines need to predict in order to fluidly operate in dynamic worlds. In this talk, we will introduce a new framework for learning predictive models from video, which is able to automatically learn what is predictable in the future, and what is not. The key idea behind the technique is based on the observation that hyperbolic geometry naturally encodes hierarchies. Through a gentle introduction to this non-Euclidean geometry, we will create a neural representation that is able to hedge uncertainty about the future in realistic video. We will furthermore analytically and experimentally show that advances in predictive modeling could greatly improve the accuracy of various perception and robot systems.
Biography: Carl Vondrick is an assistant professor of computer science at Columbia University where he directs a research group that studies computer vision and machine learning. His research is supported by the NSF, DARPA, Amazon, and Toyota, and his work has appeared on the national news, such as CNN, NPR, the Associated Press, Stephen Colbert's television show, as well as children's magazines. He received the 2021 NSF CAREER Award, the 2021 Toyota Young Faculty Award, and the 2018 Amazon Research Award. Previously, he was a Research Scientist at Google and he received his PhD from MIT in 2017.
Speaker Slides: [PDF]
Fast skill acquisition with goal conditioned RL: Utilizing behavioral priors and self-supervision
2 March 2022: Todor Davchev (University of Edinburgh)
Abstract: An essential feature of human sensorimotor skills is our ability to adapt them across environmental contexts by leveraging our understanding of the attributes of these environments. A core component of this skill adaptation process is the presence of a specific goal (or target) we aim to achieve. Similarly, this talk explores how goal-conditioned learning for robot control can help autonomous robots cope with variability, hence achieving skill generalization. Our approach is to develop structured machine learning solutions that judiciously combine different inductive biases for learning. Specifically, we focus on the domain of contact-rich robotic manipulation. We consider ways to restrict the search space by greedily solving for specific goals. We show how goal-conditioned residual policies, combined with learning from demonstration, allow for significant gains in sample efficiency and overall robustness in the context of physical part insertion with real-time constraints. We then focus on the setting of long-horizon dexterous dual-arm manipulation. We introduce an approach that can leverage self-supervised learning to guide exploration along task-specific distributions implied by a few successful demonstrations. Finally, we touch on the crucial problem of learning goal-conditioned rewards in the context of contact-rich manipulation. We employ 'learning to learn' strategies, wherein both the objectives of a task (i.e., goal-conditioned reward functions) and the policy for optimally performing that task are learned simultaneously. We show how the proposed approach allows for efficient and generalisable inverse RL.
Biography: Todor is a final year PhD student in Robot Learning at the University of Edinburgh. He has completed two Google X AI residencies (project Intrinsic) and a research scientist internship with DeepMind, where he worked in Jon Scholz' group. He is advised by S. Ramamoorthy and is part of the Robust Autonomy and Decisions (RAD) group and he also works closely with S. Schaal and F. Meier. Broadly, Todor's interests lie in the intersection of robotics and machine learning. The overarching focus of his work is on improving the sample efficiency and robustness of learnt models through utilising inductive biases in the context of robotics applications. A recent problem Todor is particularly excited about is the role of goal-conditioned RL in the context of contact-rich robotic manipulation.
Speaker Slides: [PDF]
Robot In-hand Manipulation Using Roller Graspers
28 February 2022: Shenli Yuan (Stanford University)
Abstract: In-hand manipulation is an essential skill for robots to realize sophisticated real-world tasks. Most existing robot hands rely on finger-gaiting to perform in-hand manipulation, which is an inefficient and difficult approach for a variety of tasks. This dissertation focuses on the development of a series of roller graspers that are able to manipulate objects in an entirely new way: without contact breaking. These roller graspers are multi-fingered robot graspers where each finger is equipped with a steerable, active roller that continually shifts the contact location along a path on the object. We designed a hierarchical manipulation pipeline that allows the roller grasper to achieve autonomous in-hand manipulation. The manipulation pipeline consists of a sample-based planner and a heuristic low-level policy that allows the grasper to perform full 6D manipulation of objects with a variety of shapes. Lastly, we demonstrated how tactile sensing can be incorporated into the roller graspers and control framework leading to more diverse and robust in-hand manipulation.
Biography: Shenli Yuan is a PhD student at Stanford Artificial Intelligence Laboratory advised by Prof. Kenneth Salisbury. His research focuses on robot in-hand manipulation using Roller Graspers. His work received Best Student Paper Award and Best Paper Award in robot manipulation at ICRA 2020, and was featured in IEEE Spectrum. He was a Stanford Interdisciplinary Graduate Fellow, and received his Master of Art in Music, Science and Technology and Master of Science in Mechanical Engineering, both from Stanford University.
Scaling Visual RL for Robotics
14 February 2022: Dinesh Jayaraman (University of Pennsylvania)
Abstract: Robotics in unconstrained settings require high bandwidth sensory observations and policies that can operate on them. Computer vision and deep RL have shown promise, but RL methods operating from pixels are notorious for their sample-inefficiency and instability. In this talk, I will present our work to mitigate these difficulties through the use of expert data, offline data, and model learning. A running theme in our work on these topics is the use of spatial and temporal abstractions to help reduce dimensionality and share information across tasks and robots for efficient learning.
Biography: Dinesh Jayaraman is an assistant professor at the University of Pennsylvania, where he leads the perception, action, and learning research group within GRASP lab. Their research focuses on advances in computer vision and machine learning algorithms with a focus on robotics as a core application domain. Dinesh's research has received an AAAI New Faculty Highlights award '21, an Amazon Research Award '21, a Best Paper Runner-Up Award at ICRA '18, a Best Application Paper Award at ACCV '16, and been featured on the cover page of Science Robotics and in several press outlets.
Speaker Slides: [PDF]
Towards Robust, Efficient, and Practical Reinforcement Learning
16 December 2021: Ling Pan (Tsinghua University)
Abstract: Recent years have witnessed great success of deep reinforcement learning (RL) and multi-agent RL in many challenging tasks, including games, robotics, packet switching, etc. In RL, an agent interacts with an unknown environment to learn a policy, with its goal to estimate and/or optimize the value function. The talk concerns about three important challenges in RL. Firstly, how can we ensure a robust learning behavior and value estimation of a deep RL agent? Secondly, how can we improve its learning efficiency? Thirdly, how to successfully apply deep RL algorithms in important practical applications such as computational sustainability problems?
In this talk, we will present our recent research works aiming to improve the robustness, efficiency, and practicality of deep RL algorithms. We propose to estimate the value function by utilizing the Boltzmann softmax operator, and show that it achieves state-of-the-art performance in single-agent and multi-agent scenarios to improve robustness. We will also introduce our work in offline multi-agent RL via actor rectification allowing efficiency which leverages pre-collected experiences instead of the costly online interaction with the environment. Finally, we will briefly talk about our work in applying deep RL to rebalance bike sharing systems, where we propose a divide-and-conquer based method to tackle the complex and high-dimensional environment.
Biography: Ling Pan is a final-year Ph.D. candidate in Institute for Interdisciplinary Information Sciences, Tsinghua University, advised by Prof. Longbo Huang. Her research interests include deep reinforcement learning and multi-agent systems. She currently focuses on developing robust, efficient, and practical deep reinforcement learning algorithms. Recently, she is visiting Stanford University working with Prof. Tengyu Ma. She visited University of Oxford working with Prof. Shimon Whiteson. She was a research intern in the Machine Learning Group at Microsoft Research Asia working with Dr. Wei Chen. She was a recipient of Microsoft Research Asia Fellowship (2020).
Learning to walk via rapid adaptation
29 November 2021: Ashish Kumar (UC Berkeley)
Abstract: Legged locomotion is commonly studied and programmed as a discrete set of structured gait patterns, like walk, trot, gallop. However, studies of children learning to walk (Adolph et al) show that real-world locomotion is often quite unstructured and more like "bouts of intermittent steps". We have developed a general approach to walking which is built on learning on varied terrains in simulation and then fast online adaptation (fractions of a second) in the real world. This is made possible by our Rapid Motor Adaptation (RMA) algorithm. RMA consists of two components: a base policy and an adaptation module, both of which can be trained in simulation. We thus learn walking policies that are much more flexible and adaptable. In our set-up gaits emerge as a consequence of minimizing energy consumption at different target speeds, consistent with various animal motor studies. We then incrementally add a navigation layer to the robot from onboard cameras and tightly couple it with locomotion via proprioception without retraining the walking policy. This is enabled by the use of additional safety monitors which are trained in simulation to predict the safe walking speed for the robot under varying conditions and also detect collisions which might get missed by the onboard cameras. The planner then uses these to plan a path for the robot in a locomotion aware way. You can see our robot walking at https://www.youtube.com/watch?v=nBy1piJrq1A.
Biography: Ashish Kumar is a graduate student at UC Berkeley advised by Prof. Jitendra Malik. He currently works in legged robots and has previously worked in long-range navigation and efficient machine learning in a few Kilobytes of RAM. Before coming to Berkeley, he was a Research Fellow at Microsoft Research India. He completed his undergraduate study at Indian Institute of Technology Jodhpur, with a major in Computer Science and Engineering. He has published in top Machine Learning and Robotics Conferences, and his work has been featured in popular press including Forbes, Wall Street Journal, Washington Post, CNET, Tech Crunch along with several other international venues.
Speaker Slides: [PDF]
Advances in Off-policy Value Estimation in Reinforcement Learning
22 November 2021: Martha White (University of Alberta)
Abstract: Temporal difference learning algorithms underlie most approaches in reinforcement learning, for both prediction and control. A well-known issue is that these approaches can diverge under nonlinear function approximation, such as with neural networks, and in the off-policy setting where data is generated by a different policy than the one being learned. Naturally, there has been a flurry of work towards resolving this issue. In this talk, I will discuss two key advances that largely resolve the problem: sound gradient-based methods and emphatic reweightings. I will discuss our generalized objective that unifies several approaches and facilitates creating easy-to-use algorithms that consistently outperform temporal difference learning approaches in our experiments.
Biography: Martha White is an Associate Professor of Computing Science at the University of Alberta and a PI of Amii---the Alberta Machine Intelligence Institute---which is one of the top machine learning centres in the world. She holds a Canada CIFAR AI Chair and received IEEE’s “AIs 10 to Watch: The Future of AI” award in 2020. She has authored more than 50 papers in top journals and conferences. Martha is an associate editor for TPAMI, and has served as co-program chair for ICLR and area chair for many conferences in AI and ML, including ICML, NeurIPS, AAAI and IJCAI. Her research focus is on developing algorithms for agents continually learning on streams of data, with an emphasis on representation learning and reinforcement learning.
Speaker Slides: [PDF]
How to Train Your Biped to Stand, Walk, Run, Hop, Skip, and More
1 November 2021: Alan Fern (Oregon State University)
Abstract: Despite years of work on robot locomotion, we still do not have robots that can reliably and flexibly move around in homes, workspaces, and natural terrain. For many of these environments, legged robots, as opposed to wheel-based robots, appear to be the most viable option for achieving the desired level of locomotion autonomy. This talk will present recent advances by the Dynamic Robot Laboratory at Oregon State University that brings us closer to achieving reliable bipedal robot locomotion in natural environments. These advances are a result of combining carefully engineered "animal-like" robot legs with sim-to-real reinforcement learning to control the legs. This approach has allowed for the Cassie robot, developed by Agility Robotics, to reliably stand, walk, run, hop, skip, traverse stairs, and handle novel disturbances in the environment. Most recently this learning approach allowed Cassie to be the first robot to successfully complete a 5K. The talk will cover the journey to these recent results and highlight important future challenges that lie ahead.
Biography: Alan Fern is a Professor of Computer Science and Associate Head of Research for the School of EECS at Oregon State University. He received his Ph.D. (2004) in Computer Engineering from Purdue University, and his B.S. (1997) in Electrical Engineering from the University of Maine. His research interests span a variety of topics in Artificial Intelligence and Robotics with a particular emphasis on building systems that can learn from experience. He co-directs the Dynamic Robotics Laboratory with Jonathan Hurst at Oregon State and is PI for a number of government funded projects including DARPA programs on Explainable Artificial Intelligence and Machine Common Sense. Most recently he is serving as the AI-lead PI for a new $20M AI Institute on Agricultural AI in collaboration with Washington State University.
Scaling Value-Aligned Learning to Robotics
25 October 2021: Scott Niekum (University of Texas at Austin)
Abstract: Before learning robots can be deployed in the real world, it is critical to be able to assure that their goals and behaviors will be aligned with the values of human users. While great progress has been made in the use of imitation learning to teach increasingly complex behaviors to robots, the value alignment problem has largely been ignored. The practical usefulness of such algorithms will remain limited without methods that can provide strong, finite-sample guarantees that scale to realistic robotics problems. Toward this goal, I will first introduce a series of algorithms that challenge common, but poor, statistical assumptions commonly made in imitation learning. This work culminates in an approach that enables value-aligned imitation learning to scale to high-dimensional control problems for the first time. Second, I will discuss a body of work that leverages historically under-utilized data modalities -- ranging from human gaze and facial expressions, to contact and force sensing -- to further improve the characterization of human values and environmental uncertainty, thereby minimizing risk. Taken together, these algorithms represent a significant step toward enabling value-aligned robot learning in the real world, using only modest amounts of data.
Biography: Scott Niekum is an Associate Professor and the director of the Personal Autonomous Robotics Lab (PeARL) in the Department of Computer Science at UT Austin. He is also a core faculty member in the interdepartmental robotics group at UT. His research interests include robotic manipulation, imitation learning, reinforcement learning, and human-robot interaction. Scott is a recipient of the 2018 NSF CAREER Award, the 2019 AFOSR Young Investigator Award, and the 2019 UT Austin College of Natural Sciences Teaching Excellence Award.
Speaker Slides: [PDF]
Toward Developmentally Reasonably Self-Supervised Learning
18 October 2021: Dan Yamins (Stanford University)
Abstract: Neural networks have proven effective learning machines for a variety of challenging AI tasks, as well as surprisingly good models of brain areas that underly real human intelligence. However, most successful neural networks are totally unrealistic as developmental models, because they are trained in a supervised fashion on large labeled datasets. Unsupervised approaches to learning in neural networks are thus of substantial interest for furthering artificial intelligence, both because they would enable the training of networks without the need for annotation, and because they would be better models of the kind of general-purpose learning deployed by humans. In this talk, I will describe a spectrum of recent approaches to unsupervised learning, based on ideas from cognitive science and neuroscience. First, I will discuss breakthroughs in neurally-inspired unsupervised learning of deep visual embeddings that achieve performance levels on challenging visual categorization tasks that are competitive with those of direct supervision of modern convnets. Second, I'll discuss our work building perception systems that make accurate long-range predictions of physical futures in realistic environments, and show how these support richer self-supervised visual learning. I'll also talk about the use of intrinsic motivation and curiosity to create interactive agents that self-curricularize, producing novel visual behaviors and learning powerful sensory representations. Finally, I'll suggest ways in which these models are a better starting point for models of actual human visual development.
Leveraging Language and Video Demonstrations for Robot Learning
27 September 2021: Jeannette Bohg (Stanford University)
Abstract: Humans have gradually developed language, mastered complex motor skills, created and utilized sophisticated tools. The act of conceptualization is fundamental to these abilities because it allows humans to mentally represent, summarize and abstract diverse knowledge and skills. By means of abstraction, concepts that we learn from a limited number of examples can be extended to a potentially infinite set of new and unanticipated situations. Abstract concepts can also be more easily taught to others by demonstration. I will present work that gives robots the ability to acquire a variety of manipulation concepts that act as mental representations of verbs in a natural language instruction. We propose to use learning from human demonstrations of manipulation actions as recorded in large-scale video data sets that are annotated with natural language instructions. In extensive simulation experiments, we show that the policy learned in the proposed way can perform a large percentage of the 78 different manipulation tasks on which it was trained. We show that this multi-task policy generalizes over variations of the environment. We also show examples of successful generalization over novel but similar instructions. I will also present work that enables a robot to sequence these newly acquired manipulation skills for long-horizon task planning. I will especially focus on work that grounds symbolic states in visual data to enable cloud loop task planning.
Biography: Jeannette Bohg is an Assistant Professor of Computer Science at Stanford University. She was a group leader at the Autonomous Motion Department (AMD) of the MPI for Intelligent Systems until September 2017. Before joining AMD in January 2012, Jeannette Bohg was a PhD student at the Division of Robotics, Perception and Learning (RPL) at KTH in Stockholm. In her thesis, she proposed novel methods towards multi-modal scene understanding for robotic grasping. She also studied at Chalmers in Gothenburg and at the Technical University in Dresden where she received her Master in Art and Technology and her Diploma in Computer Science, respectively. Her research focuses on perception and learning for autonomous robotic manipulation and grasping. She is specifically interested in developing methods that are goal-directed, real-time and multi-modal such that they can provide meaningful feedback for execution and learning. Jeannette Bohg has received several Early Career and Best Paper awards, most notably the 2019 IEEE Robotics and Automation Society Early Career Award and the 2020 Robotics: Science and Systems Early Career Award.
Speaker Slides: [PDF]
Mapping timescales of cortical language processing
20 September 2021: Alex Huth (University of Texas at Austin)
Abstract: Natural language contains information that must be integrated over multiple timescales. To understand how the human brain represents this information, one approach is to build encoding models that predict fMRI responses to natural language using representations extracted from neural network language models (LMs). However, these LM-derived representations do not explicitly separate information at different timescales, making it difficult to interpret the encoding models. Here I will discuss how a language model can be engineered to explicitly represent different timescales, and how this model can be used to map representations in human cortex.
Biography: Alex Huth is an Assistant Professor at The University of Texas at Austin in the departments of neuroscience and computer science. His lab uses natural language stimuli and fMRI to study language processing in human cortex in work funded by the Burroughs Wellcome Foundation, Sloan Foundation, Whitehall Foundation, NIH, and others. Before joining UT, Alex did his undergraduate and master’s work at Caltech under Christof Koch, and then PhD and postdoc in Jack Gallant’s laboratory at UC Berkeley, where he developed novel methods for mapping semantic representations of visual and linguistic stimuli in human cortex.
Robot dexterity in the real world
30 August 2021: Samir Menon (Dexterity), Jonathan Kuck (Dexterity), Harry Zhe Su (Dexterity)
Abstract: Transitioning robots from well-controlled lab environments to the real-world has been an outstanding challenge for decades. The challenges are numerous and span a variety of engineering disciplines in machine intelligence, simulation, modeling, algorithms, control, and robotic hardware & software. Dexterity is a robotics startup that has engineered and deployed robotic systems that can intelligently manipulate tens of thousands of items in production, reason about and operate in dynamic environments, collaborate with each other using the sense of touch, and safely operate in the presence of humans. Dexterity’s robots ship hundreds of thousands of units in packaged food and parcel warehouses each day, and are in production 24/6. The data we collect has taught us what works in the real-world and what doesn’t. The consequences of realizing even limited robot autonomy within semi-structured workflows are urgent and compelling - the logistics industry has a 300,000 job shortage and would be transformed if robots could fill the gap. This talk (by Samir, Jonathan & Harry) will discuss our technical advances to date, which are poised to deploy thousands of robots, and present outstanding problems that need to be addressed before we can deploy millions of robots.
Biography: Samir is a founder and CEO of Dexterity, a company working on transforming robots from hard-wired automatons into intelligent collaborative helpers. He has also worked extensively on control theory, mathematical modeling, simulation, human motor control, and neural computation. He has designed and built numerous robots. Samir holds a masters and a Ph.D. in computer science from Stanford University, where he was a Stanford Interdisciplinary Graduate Fellow. Prior to that, Samir completed his Bachelors in Technology at the Indian Institute of Information Technology, Allahabad.
Biography: Jonathan works on optimization, machine learning, and perception at Dexterity. Prior to joining Dexterity he completed his PhD at Stanford, where he worked on machine learning and robotic perception problems bridging theory and application. His research has improved the speed/accuracy tradeoff of approximate probabilistic inference by orders of magnitude, merged the boundaries of classical inference techniques with deep learning, and bridged to core perception problems such as multi-object target tracking and object detection. He completed his bachelors in engineering physics at UIUC.
Biography: Harry Zhe Su is a roboticist at Dexterity Inc. where he focuses on developing algorithms and robotic systems to enable robots to achieve dexterous manipulation. He is a senior architect for the robotics team and had a critical role in taking Dexterity’s intelligent dexterous robots into production in the parcel industry. Before joining Dexterity Inc., Harry received his Ph.D. in biomedical engineering from the University of Southern California (USC) by developing various biomimetic tactile perception algorithms and robotic manipulation algorithms using tactile sensing.
Robot Learning by Understanding Egocentric Videos
16 August 2021: Saurabh Gupta (University of Illinois at Urbana-Champaign)
Abstract: True gains of machine learning in AI sub-fields such as computer vision and natural language processing have come about from the use of large-scale diverse datasets for learning. In this talk, I will discuss if and how we can leverage large-scale diverse data in the form of egocentric videos (first-person videos of humans conducting different tasks) to similarly scale up policy learning for robots. I will discuss the challenges this presents, and some of our initial efforts towards tackling them. In particular, I will describe techniques to acquire a) low-level visuomotor subroutines, b) high-level value functions, and c) an interactive understanding of objects from in-the-wild egocentric videos.
Biography: Saurabh Gupta is an Assistant Professor in the ECE Department at UIUC. Before starting at UIUC in 2019, he received his Ph.D. from UC Berkeley in 2018 and spent the following year as a Research Scientist at Facebook AI Research in Pittsburgh. His research interests span computer vision, robotics, and machine learning, with a focus on building agents that can intelligently interact with the physical world around them. He received the President's Gold Medal at IIT Delhi in 2011, the Google Fellowship in Computer Vision in 2015, and an Amazon Research Award in 2020. He has also won many challenges at leading computer vision conferences.
Looking at a Few Images of Rooms and Many Interacting Hands
2 August 2021: David Fouhey (University of Michigan)
Abstract: The long-term goal of my research is to let computers understand the physical world from images, including both 3D properties and how humans or robots could interact with things. This talk will cover two recent directions aimed at enabling this goal. I will begin by talking about 3D reconstruction from two ordinary images where the camera pose is unknown and the views have little overlap -- think hotel listings. Computers struggle in this setting since standard techniques usually depend on many images, high overlap, known camera poses, or RGBD input. Nonetheless, humans seem to build a sense of a space from a few photos. We think the key to this ability is joint reasoning over reconstruction, camera pose, and correspondence. This insight is put into action with a deep learning architecture and optimization that produces a coherent planar reconstruction. Our system outperforms many baselines on Matterport3D, but there is plenty of room for new work in this exciting setting. Then, I will focus on understanding what humans are doing with their hands. Hands are a primary means for humans to manipulate the world, but fairly basic information about what they're doing is often off limits to computers (at least in unconstrained data). I'll describe some of our efforts on understanding hand state, including work on learning to segment hands and hand-held objects in images via a system that learns from large-scale video data.
Biography: David Fouhey is an assistant professor in the University of Michigan EECS department. He received a PhD in robotics in 2016 from CMU where he was an NSF and NDSEG fellow, then was a postdoctoral fellow at UC Berkeley. He has spent time at Oxford's Visual Geometry Group, INRIA Paris, and Microsoft Research. More information about him can be found here: http://web.eecs.umich.edu/~fouhey/.
Structuring Manipulation Tasks for More Efficient Learning
26 July 2021: Oliver Kroemer (Carnegie Mellon University)
Abstract: In the future, we want to create robots with the robustness and versatility to operate in unstructured and everyday environments. To achieve this goal, robots will need to learn manipulation skills that can be applied to a wide range of objects and task scenarios. In this talk, I will be presenting recent work from my lab on structuring manipulation tasks for more efficient learning. I will begin by discussing how modularity can be used to break down challenging manipulation tasks to learn general object-centric solutions. I will then focus on the question of: what to learn? I will discuss how robots can use model-based reasoning to identify relevant context parameters for adapting skills as well as determining when to learn a skill. I will conclude by discussing how robots can use interactions and multimodal sensing to learn manipulation-oriented representations of different materials.
Biography: Dr. Oliver Kroemer is an assistant professor at the Carnegie Mellon University (CMU) Robotics Institute where he leads the Intelligent Autonomous Manipulation Lab. His research focuses on developing algorithms and representations to enable robots to learn versatile and robust manipulation skills. Before joining CMU, Dr. Kroemer was a postdoctoral researcher at the University of Southern California (USC) for two and a half years. He received his Masters and Bachelors degrees in engineering from the University of Cambridge in 2008. From 2009 to 2011, he was a Ph.D. student at the Max Planck Institute for Intelligent Systems. He defended his Ph.D. thesis on Machine Learning for Robot Grasping and Manipulation in 2014 at the Technische Universitaet Darmstadt.
Multi-Task Robotic Reinforcement Learning at Scale
12 July 2021: Karol Hausman (Google Brain, Stanford University)
Abstract: In this talk, I'll present two new advances for robotic RL at scale, MT-Opt, a new multi-task RL system for automated data collection and multi-task RL training, and Actionable Models, which leverages the acquired data for goal-conditioned RL. MT-Opt introduces a scalable data-collection mechanism that is used to collect over 800,000 episodes of various tasks on real robots and demonstrates a successful application of multi-task RL that yields ~3x average improvement over baselines. Additionally, it enables robots to master new tasks quickly through use of its extensive multi-task dataset (new task fine-tuning in <1 day of data collection). Actionable Models enables learning in the absence of specific tasks and rewards by training an implicit model of the world that is also an actionable robotic policy. This drastically increases the number of tasks the robot can perform (via visual goal specification) and enables more efficient learning of downstream tasks.
Biography: Karol Hausman is a Senior Research Scientist at Google Brain and an Adjunct Professor at Stanford working on robotics and machine learning. He is interested in enabling robots to autonomously acquire general-purpose skills with minimal supervision in the real world. He received his PhD in CS from the University of Southern California and Masters from the Technical University Munich. When he is not debugging robots at Google, he co-teaches Deep Multi-Task and Meta-Learning class at Stanford.
Walking the Boundary of Learning and Interaction
3 May 2021: Dorsa Sadigh (Stanford University)
Abstract: There have been significant advances in the field of robot learning in the past decade. However, many challenges still remain when considering how robot learning can advance interactive agents such as robots that collaborate with humans, and how interactions can enable more effective robot learning. This introduces an opportunity for developing new robot learning algorithms that can help advance interactive autonomy. In this talk, I will discuss a formalism for human-robot interaction built upon ideas from representation learning. This formalism provides an orthogonal perspective to theory of mind, and provides a path for scalable partner modeling. Specifically, I will first discuss the notion of latent strategies — low dimensional representations sufficient for capturing non-stationary interactions. I will then talk about some of the challenges of learning such representations when interacting with humans, and how we can develop data-efficient techniques that enable actively learning computational models of human behavior from interaction data: demonstrations, preferences, or physical corrections. Finally, I will wrap up by discussing some of the challenges that arise when considering long-term repeated interactions, and how partner-specific conventions can be leveraged for fast adaptation on new collaborative tasks.
Biography: Dorsa Sadigh is an assistant professor in Computer Science and Electrical Engineering at Stanford University. Her research interests lie in the intersection of robotics, learning, and control theory. Specifically, she is interested in developing algorithms for safe and adaptive human-robot interaction. Dorsa has received her doctoral degree in Electrical Engineering and Computer Sciences (EECS) from UC Berkeley in 2017, and has received her bachelor’s degree in EECS from UC Berkeley in 2012. She is awarded the NSF CAREER award, the AFOSR Young Investigator award, the IEEE TCCPS early career award, the Google Faculty Award, and the Amazon Faculty Research Award.
Rethinking Dexterous Manipulation: from algorithms to hardware
26 April 2021: Vikash Kumar (Facebook AI Research)
Abstract: During the last decade, learning-based techniques have been quite successful in generating motor skills in simulations. However, these techniques in their current form are less effective on real robots, especially in contact-rich dexterous manipulation settings. In this talk, I'll revisit the problem of learning dexterous manipulation from the first principles. I'd argue that in addition to algorithmic developments, learning paradigms in the real world, as well as hardware infrastructure, needs significant attention to imparting human-level dexterity to our robots in a scalable way.
On the algorithmic front, I'll discuss a game-theoretic formulation for model-based reinforcement learning (MBRL) that not only unifies, and generalizes many previous MBRL algorithms but also provides guidelines for designing more stable algorithms capable of learning general manipulation behaviors that can be retargeted to new unseen tasks. I’ll also share (less discussed) experiences and lessons learned while maturing our algorithmic paradigms to acquire high-dimensional, contact-rich, dexterous manipulation behaviors in the real world in a scalable way. And how we went from half a million-dollar hardware that broke every 30 mins to a scalable and modular 5000$ setup that can learn behaviors unattended for weeks without any human intervention.
Biography: Vikash Kumar is a research scientist in Facebook AI Research (FAIR). He finished his Ph.D. from the University of Washington with Prof. Emo Todorov and Prof. Sergey Levine, where his research focused on imparting human-level dexterity to anthropomorphic robotic hands. He continued his research as a post-doctoral fellow with Prof. Sergey Levine at Univ. of California Berkeley where he further developed his methods to work on low-cost scalable systems. He also spent time as a Research Scientist at OpenAI and Google-Brain where he diversified his research on low-cost scalable systems to the domain of multi-agent locomotion. He has also been involved with the development of the MuJoCo physics engine, now widely used in the fields of Robotics and Machine Learning. His works have been recognized with the best Master's thesis award, best manipulation paper at ICRA’16, CIFAR AI chair '20 (declined), and have been widely covered with a wide variety of media outlets such as NewYorkTimes, Reuters, ACM, WIRED, MIT Tech reviews, IEEE Spectrum, etc.
Learning to Cooperate, Communicate and Coordinate (with Humans)
12 April 2021: Jakob Foerster (Facebook AI Research / University of Toronto & Vector Institute)
Abstract: In recent years we have seen rapid progress on a number of zero-sum benchmark problems in artificial intelligence, e.g. Go, Poker and Dota. In contrast to these competitive settings, success in the real world typically requires humans, and will require AI agents, to cooperate, communicate and coordinate with others. Crucially, from a learning point of view, these three Cs require fundamentally novel approaches, methods and theory, which has been at the heart of my research agenda. In my talk I will cover recent progress, including how agents can learn to entice others to cooperate in settings of conflicting goals by accounting for their learning behavior, how they can learn to communicate by reasoning over (public) beliefs and how they can learn policies that can coordinate with other agents at test time by carefully restricting the amount of counterfactual reasoning. I will finish the talk by outlining some of the promising directions for future work.
Biography: Jakob Foerster received a CIFAR AI chair in 2019 and is starting as an Assistant Professor at the University of Toronto and the Vector Institute in the academic year 20/21. During his PhD at the University of Oxford, he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. He has since been working as a research scientist at Facebook AI Research in California, where he will continue advancing the field up to his move to Toronto. He was the lead organizer of the first Emergent Communication (EmeCom) workshop at NeurIPS in 2017, which he has helped organize ever since.
Mutual Information in Deep RL: Toward a Single Tractable Reward Function for General Intelligence
5 April 2021: Shixiang Shane Gu (Google Brain)
Abstract: What is intelligence? How to measure it? Why robotics over games?: I will discuss fundamental questions for a journey toward a (form of) general intelligence. While appealing proposals have been made before (Legg & Hutter 2005, Chollet 2019) on measuring intelligence, in this talk I will propose an arguably more limited but tractable measure based on mutual information (MI) maximization, or empowerment, from multiple perspectives. I will present (1) how a model-based approach can enable efficient unsupervised learning on real robots (DADS 2020), (2) how unsupervised MI-based skill discovery can be better studied through viewing it as representation learning for goal-conditioned RL (Variational GCRL 2021), and (3) how MI between reward and policy parameters can estimate learning difficulty of RL environments (PIC 2021).
Biography: Shane holds PhD in Machine Learning from the University of Cambridge in the UK and the Max Planck Institute for Intelligent Systems in Tübingen Germany, supervised by Richard E. Turner, Zoubin Ghahramani, and Bernhard Schölkopf, and also mentored by Sergey Levine/Ilya Sutskever at UC Berkeley/Google Brain and Timothy Lillicrap at DeepMind as a student researcher. Shane holds B.ASc. in Engineering Science from the University of Toronto, where he completed his thesis with Geoffrey Hinton in distributed training of neural networks using evolutionary algorithms, and worked with Steve Mann, to develop real-time HDR capture for wearable cameras. Shane also volunteered as a Scientist at CDL, a tech-startup incubator in Canada, and as a visiting scholar at Stanford University and University of Tokyo. Shane's academic work received Best Paper Award at CoRL 2019, Google Focused Research Award, Cambridge-Tübingen PhD Fellowship, and NSERC Scholarship, and was featured in Google Research Blogpost and MIT Technology Review.
Shane is a Japan-born Chinese Canadian, and he speaks, reads, and writes in three languages. Having lived in Japan, China, Canada, the US, the UK, and Germany, he goes under multiple names: Shane Gu, Shixiang Gu, 顾世翔, 顧世翔(ぐう せいしょう).
Building Blocks of Generalizable Autonomy
15 March 2021: Animesh Garg (University of Toronto, Nvidia)
Abstract: My approach to Generalizable Autonomy posits that interactive learning across families of tasks is essential for discovering efficient representation and inference mechanisms. Arguably, a cognitive concept or a dexterous skill should be reusable across task instances to avoid constant relearning. It is insufficient to learn to “open a door”, and then have to re-learn it for a new door, or even windows & cupboards. Thus, I focus on three key questions: (1) Representational biases for embodied reasoning, (2) Causal Inference in abstract sequential domains, and (3) Interactive Policy Learning under uncertainty.
In this talk, I will first through example lay bare the need for structured biases in modern RL algorithms in the context of robotics. This will span state, actions, learning mechanisms, and network architectures. Secondly, we will talk about the discovery of latent causal structure in dynamics for planning. Finally, I will demonstrate how large-scale data generation combined with insights from structure learning can enable sample efficient algorithms for practical systems. In this talk, I will focus mainly on manipulation, but my work has been applied to surgical robotics and legged locomotion as well.
Biography: Animesh Garg is an CIFAR Chair Assistant Professor of Computer Science at the University of Toronto and a Faculty Member at the Vector Institute where he leads the Toronto People, AI, and Robotics (PAIR) research group. Animesh is affiliated with Mechanical and Industrial Engineering (courtesy) and UofT Robotics Institute. Animesh also spends time as a Senior Researcher at Nvidia Research in ML for Robotics. Prior to this, Animesh earned a Ph.D. from UC Berkeley, and was a postdoc at the Stanford AI Lab. His research focuses on machine learning algorithms for perception and control in robotics. His work aims to build Generalizable Autonomy in robotics which involves a confluence of representations and algorithms for reinforcement learning, control, and perception. His work has received multiple Best Paper Awards (ICRA, IROS, Hamlyn Symposium, Neurips Workshop, ICML Workshop) and has been covered in the press (New York Times, Nature, BBC).
Representation learning and exploration in reinforcement learning
22 February 2021: Akshay Krishnamurthy (Microsoft Research)
Abstract: I will discuss new provably efficient algorithms for reinforcement in rich observation environments with arbitrarily large state spaces. These algorithms operate by learning succinct representations of the environment, which they use in an exploration module to acquire new information. The first algorithm, called Homer, operates in a block MDP model and uses a contrastive learning objective to learn the representation. The second algorithm, called FLAMBE, operates in a much richer class of low rank MDPs and is model based. Finally, Moffle is a model-free representation learning approach for low rank MDPS. All algorithms accommodate nonlinear function approximation and enjoy provable sample and computational efficiency guarantees.
Biography: Akshay Krishnamurthy is a principal researcher at Microsoft Research, New York City. Previously, he spent two years as an assistant professor in the College of Information and Computer Sciences at the University of Massachusetts, Amherst and a year as a postdoctoral researcher at Microsoft Research, NYC. He completed his PhD in the Computer Science Department at Carnegie Mellon University, advised by Aarti Singh. His research interests are broadly in machine learning and statistics. More specifically, he is most excited about interactive learning, or learning settings that involve feedback-driven data collection. Recently his work has focused on decision making problems with limited feedback, including contextual bandits and reinforcement learning.
Learning to Score Behaviors for Guided Policy Optimization
7 December 2020: Aldo Pacchiano (UC Berkeley)
Abstract: We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space. We show that by utilizing the dual formulation ofthe WD, we can learn score functions over policy behaviors that can in turn be used to lead policy optimization towards (or away from) (un)desired behaviors. Combined with smoothed WDs, the dual formulation allows us to devise efficient algorithms that take stochastic gradient descent steps through WD regularizers. We incorporate these regularizers into two novel on-policy algorithms, Behavior-Guided Policy Gradient and Behavior-Guided Evolution Strategies, which we demonstrate can outperform existing methods in a variety of challenging environments. Additionally, we show how to make use of this formalism for designing diversity encouraging algorithms in the setting of population-based training, and in the context of meta-learning.
Biography: Aldo Pacchiano received a bachelor's and master’s degrees from MIT and is currently a 5th year Ph.D. student at UC Berkeley co-advised by Prof. Peter Bartlett and Prof. Michael Jordan. He has worked on online learning, bandits, and reinforcement learning. His research work and interests span both theoretical topics in bandits and reinforcement learning, and practical questions in deep reinforcement learning and fairness. The main strand of his theoretical work is on the topic of model selection for contextual bandits and reinforcement learning. His applied work has had a particular focus on designing theoretically sound reinforcement learning algorithms that can be implemented in practical settings. Outside of research, he enjoys creative writing.
Deconstructed Reinforcement Learning
30 November 2020: Ge Yang (University of Chicago)
Abstract: Physics and machine learning are connected by the common notion that there is a shared system under observation that causally induces different but related measurements. The goal of physics is to explain these measurements in a coherent manner that is equivariant to a specific choice of gauge, whereas the goal of machine learning is to find learning procedures and ways to parameterize the model so that complex relationships can be aggregated automatically with little human involvement. Reinforcement learning offers a third angle, where exploration is needed to collect data for system identification, and the goal is to produce behavior that bears a global sense of optimality. This talk will take a deconstructionism approach to deep reinforcement learning by first tackling learning plannable representations with plan2vec, our recent work connecting deep metric learning with dynamic programming on a topological graph. Then we will discuss ways to solve long-horizon visuomotor control tasks by filling in a missing piece in multi-goal reinforcement learning. Finally, I hope to loop back to open problems in offline reinforcement learning and exploration by discussing recent advances in hybrid energy-based models and intuition-guided exploration strategy.
Biography: Ge received his Ph.D. in experimental condensed matter physics from the University of Chicago, where he developed a new type of quantum bit using single electrons trapped on the surface of superfluid helium. His work was published in Physics Review X and Nature Communications. During the last few years of his Ph.D., his focus shifted towards unsupervised learning and deep reinforcement learning, and is currently trying to find better ways fundamental physics can help with the design of algorithms.
Learning Systems for Dexterous Robotic Manipulation
23 November 2020: Abhishek Gupta (UC Berkeley)
Abstract: Dexterous manipulators such as robotic hands have the ability to perform a diverse and versatile set of tasks, particularly well suited for human-centric environments. However, controlling these manipulators can often be very challenging for complex tasks. In this talk, I will discuss how we can use reinforcement learning as a tool to acquire dexterous manipulation behaviors in the real world. I will discuss how a scalable system for learning dexterous manipulation behaviors requires a careful interaction between hardware, learning algorithms and system infrastructure for learning. I will discuss our efforts in building general purpose solutions towards each of these components and show how they can be used to build powerful dexterous manipulation systems. Finally, I will discuss how the insights from the construction of a general purpose dexterous manipulation system can be used more generally to build systems for real world reinforcement learning.
Biography: Abhishek Gupta is a PhD student at UC Berkeley working with Pieter Abbeel and Sergey Levine, where he is interested in algorithms that can leverage reinforcement learning algorithms to solve real world robotics tasks. Currently he has been interested in the directions that enable directly performing reinforcement learning directly in the real world-reward supervision in reinforcement learning, large scale uninterrupted real world data collection, learning from demonstrations, and multi-task reinforcement learning. He has also spent time at Google Brain. He is a recipient of the NDSEG and NSF graduate research fellowships, and several of his works have been presented as spotlight presentations at top-tier machine learning and robotics conferences. His work has been covered by multiple popular news outlets such as the New York Times and VentureBeat.
Data Driven Models for Efficient Reinforcement Learning
17 November 2020: Aravind Rajeshwaran (University of Washington)
Abstract: Reinforcement learning (RL) is typically formulated as an interactive learning process, often involving humans or physical systems in the loop. Slow components in the interaction loop and poor sample efficiency have limited the real-world impact of deep RL when compared with computer vision and NLP. The use of large offline datasets in deep RL can reduce interactive sample complexity, enhance the experimental velocity, and greatly expand the range of applications. Furthermore, the next frontiers are likely to involve AI systems that are broadly competent in many tasks, as opposed to narrow specialists. Sharing of knowledge across tasks and datasets are essential in the development of such multi-purpose AI systems. Model-based RL is a promising approach to realize these aspirations. In this talk, I will present recent algorithmic work in model-based RL that enables state of the art experimental results, as well as detailed theoretical analysis.
Biography: Aravind Rajeswaran is a PhD student at the University of Washington advised by Sham Kakade and Emo Todorov. He has also spent time at Google Brain as a student researcher with Sergey Levine and Igor Mordatch. He is interested in the mathematical foundations and applications of deep reinforcement learning. Aravind has contributed towards algorithmic foundations of model-based RL (game theoretic formulations, offline learning), meta-learning (online MAML, implicit MAML), and deep RL applied to dexterous hand manipulation. He has also designed and taught a graduate level deep RL class at UW. Aravind has received a JP Morgan PhD fellowship, a best paper award from IEEE SIMPAR, and the best undergraduate thesis award from IIT Madras.