Header
Header
Article

Bridging Reinforcement Learning and Iterative Learning Control: Autonomous Motion Learning for Unknown, Nonlinear Dynamics




doi: 10.3389/frobt.2022.793512.


eCollection 2022.

Affiliations

Item in Clipboard

Michael Meindl et al.


Front Robot AI.


.

Abstract

This work addresses the problem of reference tracking in autonomously learning robots with unknown, nonlinear dynamics. Existing solutions require model information or extensive parameter tuning, and have rarely been validated in real-world experiments. We propose a learning control scheme that learns to approximate the unknown dynamics by a Gaussian Process (GP), which is used to optimize and apply a feedforward control input on each trial. Unlike existing approaches, the proposed method neither requires knowledge of the system states and their dynamics nor knowledge of an effective feedback control structure. All algorithm parameters are chosen automatically, i.e. the learning method works plug and play. The proposed method is validated in extensive simulations and real-world experiments. In contrast to most existing work, we study learning dynamics for more than one motion task as well as the robustness of performance across a large range of learning parameters. The method’s plug and play applicability is demonstrated by experiments with a balancing robot, in which the proposed method rapidly learns to track the desired output. Due to its model-agnostic and plug and play properties, the proposed method is expected to have high potential for application to a large class of reference tracking problems in systems with unknown, nonlinear dynamics.


Keywords:

Gaussian processes (GP); autonomous systems; iterative learning control; nonlinear systems; reinforcement learning; robot learning.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures



FIGURE 1

(A) A robot with unknown dynamics is meant to track a reference trajectory leading to a desired, highly dynamic motion. (B) On each iteration, the proposed learning method determines, based on experimental data, a Gaussian Process model, which is in turn used to design and apply a feedforward control input.


FIGURE 2


FIGURE 2

Comparison of feedforward (A) and feedback learning control (C) for reference tracking with a system affected by input delay and measurement noise: Feedforward, unlike feedback control, achieves almost perfect tracking (B). Hence, the learning method proposed in this work employs feedforward control.


FIGURE 3


FIGURE 3

Overview of the proposed learning method: First a Gaussian Process (GP) model is identified, which in turn is used to determine an input trajectory via optimization. The resulting input trajectory is applied in experiment yielding new data to refine the GP model.


FIGURE 4


FIGURE 4

Learning methods typically require some learning parameters. If no procedure for determining the parameters is provided, iterative tuning has to be carried out manually in experiment. If a procedure for determining reliable parameters is available, plug and play application without iterative manual tuning is possible.


FIGURE 5


FIGURE 5

The learning problem: A TWIPR (A) is meant to perform three challenging maneuvers (B). The corresponding pitch angle references (C) differ in length, amplitude, and frequencies.


FIGURE 6


FIGURE 6

Determination of the input variance

σI2

: Five different values, of which only three are presented, are used to draw random input trajectories that are applied to the plant. The input variance

σI2=1

hardly excites the system. In contrary, the input variance

σI2=225

leads to an output trajectory that significantly exceeds the reference’s maximum. The input variance

σI2=25

is selected, because the corresponding output trajectory has the same order of magnitude as the reference.


FIGURE 7


FIGURE 7

Determination of the weight s: Based on the maxima of input and output trajectory in an initial trial, the weight s is chosen according to (27).


FIGURE 8


FIGURE 8

The proposed learning method is employed to track the three desired references. Despite varying lengths, amplitudes and frequencies of the references, satisfying tracking performance is achieved within 10–15 trials. The RMSE is monotonically declining for two of the references and converges to a small value close to zero in all three scenarios. To provide an additional baseline, the dashed lines in the RMSE plot show the performance of a generic reinforcement learning method, which learns magnitudes slower.


FIGURE 9


FIGURE 9

The proposed learning method is run for a total of 5,000 different combinations of parameters and initial data. The RMSE’s maximum over all runs converges to a value significantly lower than the initial. Hence, robust learning is guaranteed for a large parameter space.


FIGURE 10


FIGURE 10

Investigation of the effect of weight s on the learning characteristics: Large values of s lead to slow learning with small performance variance. Increasing the value leads to faster learning but also a larger variance in performance. Excessively small values of s may lead to a RMSE that diverges for some initial data.


FIGURE 11


FIGURE 11

Experimental results of the TWIPR learning to dive beneath an obstacle. Starting from an initial RMSE of roughly 75°, the tracking error rapidly declines over the following trials and sufficiently precise tracking for diving beneath the obstacle is achieved on the seventh trial.

References

    1. Ahn H.-S., Chen Y., Moore K. L. (2007). Iterative Learning Control: Brief Survey and Categorization. IEEE Trans. Syst. Man. Cybern. C 37, 1099–1121. 10.1109/tsmcc.2007.905759



      DOI

    1. Ai Q., Ke D., Zuo J., Meng W., Liu Q., Zhang Z., et al. (2020). High-Order Model-free Adaptive Iterative Learning Control of Pneumatic Artificial Muscle with Enhanced Convergence. IEEE Trans. Ind. Electron. 67, 9548–9559. 10.1109/TIE.2019.2952810



      DOI

    1. Amann N., Owens D. H., Rogers E. (1996). Iterative Learning Control for Discrete-Time Systems with Exponential Rate of Convergence. IEE Proc. Control Theory Appl. 143, 217–224. 10.1049/ip-cta:19960244



      DOI

    1. Apgar T., Clary P., Green K., Fern A., Hurst J. (2018). “Fast Online Trajectory Optimization for the Bipedal Robot Cassie,” in Robotics: Science And Systems XIV (Robotics: Science and Systems Foundation). 10.15607/rss.2018.xiv.054



      DOI

    1. Arimoto S., Kawamura S., Miyazaki F. (1984). Bettering Operation of Robots by Learning. J. Robot. Syst. 1, 123–140. 10.1002/rob.4620010203



      DOI



Source link

Back to top button