Learning Novel Strategies for Model Predictive Control by Leveraging Experience

Sacks, Jacob Isaac

Learning Novel Strategies for Model Predictive Control by Leveraging Experience

Files

Sacks_washington_0250E_26293.pdf (13.24 MB)

Date

2024-02-12

Authors

Sacks, Jacob Isaac

Abstract

A major challenge in robotics is to design robust policies which enable complex and agile behaviors in the real world. On one end of the spectrum, we have model-free reinforcement learning (MFRL), which is incredibly flexible and general but often results in brittle policies. In contrast, model predictive control (MPC) continually re-plans at each time step to remain robust to perturbations and model inaccuracies. However, despite its real-world successes, MPC often under-performs the optimal strategy. This is due to model quality, myopic behavior from short planning horizons, and approximations due to computational constraints. And even with a perfect model and enough compute, MPC can get stuck in bad local optima, depending heavily on the quality of the optimization algorithm. Prior research on improving the performance of MPC has primarily focused on learning or fine-tuning a good dynamics model. However, these methods often train the models via system identification, which optimizes likelihood of the data under the model, rather than directly for the controller performance. Moreover, little work has attempted to improve the machinery of the optimization process. In this thesis, we reinterpret MPC as a structured policy class which can be directly optimized to overcome these core problems facing model-based control and enable robust real-world decision making for robotics. First, we will present our work on developing an approach for learning the dynamics model and cost function of a gradient-based MPC controller end-to-end by optimizing for task performance without unrolling the iterative solver. Next, we will reinterpret MPC from the perspective of online learning and propose a general family of MPC algorithms rooted in dynamic mirror descent, which include many established gradient- and sampling-based techniques as special cases. To overcome the sample inefficiency of popular sampling-based MPC methods, we then propose to learn a more efficient update rule for solving the online optimization problem via imitation learning. Following this work, we relax the Gaussian assumptions many sampling-based MPC algorithms make and show how to learn more expressive proposal distributions with generative models in order to more effectively search the space of plans. Finally, we show how to learn the update rule and warm-starting procedure of an MPC controller simultaneously via reinforcement learning and demonstrate its performance benefits over hand-designed MPC controllers and end-to-end policies trained via MFRL on a real quadrotor agile trajectory tracking task.