Controlling and learning constrained motions for manipulation in contact
Pousa de Moura, Joao Miguel
MetadataShow full item record
Many practical tasks in robotic systems involving contact interaction with the environment, such as cleaning windows, writing or grasping, are inherently constrained, in that both the task and the environment impose constraints on the robot’s motion. While constraints from manipulation motions in contact represent a challenge when modelling and controlling such robotic systems, they might also be an opportunity, if exploited for decomposing complex controllers into simpler ones that are easier to design, implement, test and even learn from data. Modelling such systems requires incorporating these constraints in the robot’s dynamic model. In this thesis, I define the class of Task-based Constraints (TbCs) and prove that the forward dynamic models of a constrained system obtained through the Projected Dynamics (PD) and the Operational Space Formulation (OSF) are equivalent. Establishing such equivalence required: reformulating the PD constraint inertia matrix, generalizing all its previous distinct algebraic variations; and generalizing the OSF to rank deficient constraint Jacobian matrices. This generalization allows us to numerically handle redundant constraints and singular configurations, without having to use different controllers in the vicinity of such configurations. Furthermore, I show that we can recover both operational space control with constraints and the hybrid position/force control in the operational space from a multiple Task-based Constraint abstraction. I then propose a control and trajectory tracking approach for wiping the train cab front panels, using a velocity controlled robotic manipulator and a force/torque sensor attached to its end-effector, without using any surface model or vision-based surface detection. The control strategy consists of a hybrid position/force controller, adapted from the Operational Space Formulation, that aligns the cleaning tool with the surface normal, maintaining a setpoint normal force, while simultaneously moving along the surface. The trajectory tracking strategy consists of specifying and tracking a two dimensional path that, when projected onto the train surface, corresponds to the desired pattern of motion. An experiment with the Baxter robot to wipe a highly curved surface with both a spiral and a raster scan motion patterns validates the approach. I also implemented the same approach in a scaled robot prototype, specifically designed to wipe a 1/8 scaled version of a train cab front, using a raster scan pattern. Learning these type of control policies subject to constraints is a challenging problem. This thesis proposes a Constraint-aware Policy Learning (CaPL) method that solves the policy learning problem on redundant robots which execute a policy acting in the null-space of a constraint. This learning approach allows the generalization of learnt control policies across constraints that are unknown during the training phase. The CaPL method splits the combined problem of learning constraints and policies into: first estimating the constraint, and then estimating an unconstrained policy using the remaining degrees of freedom. For a linear parametrization, there is a closed-form solution for the problem of estimating constraints based on Singular Value Decomposition (SVD). In this thesis, I propose another closed-form solution for constraint estimation for the TbC case, which includes estimating the task component without affecting the norm of the constraint matrix, based on Generalized Singular Value Decomposition (GSVD). I also discuss a metric for comparing the similarity of estimated constraints, which is useful to pre-process the trajectories recorded in the demonstrations. An experiment consisting in: learning a wiping task from human demonstration on flat surfaces; and reproducing it on an unknown curved surface using a force/torque based controller, to achieve tool alignment, validates the CaPL method. Despite the differences between the training and validation scenarios, the learnt policy still provides the desired wiping motion.