Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Learning Personalized Optimal Control for Repeatedly Operated Systems

IP.com Disclosure Number: IPCOM000244518D
Publication Date: 2015-Dec-17
Document File: 7 page(s) / 288K

Publishing Venue

The IP.com Prior Art Database

Abstract

In this invention, we consider the problem of online learning of optimal control for repeatedly operated systems in the presence of parametric uncertainty. During each round of operation, environment selects a parameter from a fixed but unknown probability distribution. This parameter governs the dynamics of a plant. An agent chooses a control input to the plant and is then revealed the cost of the choice. The dependence of the cost on the choice is not necessarily known. In this setting, our goal is to design an agent that personalizes the control input to this plant taking into account the stochasticity involved. In particular, we want to devise an agent that minimizes the total expected cost accumulated in all rounds of operation. We provide multiple solutions to design agents to achieve this objective. A precursor to these solutions is our characterization of the set of candidate control inputs. These sets are bounded and semi-algebraic in nature (i.e., characterized by polynomial equalities and inequalities). The first solution designs an agent that sub-samples several representative controls uniformly at random from the semi-algebraic set. These representative controls are chosen as arms of a multi-armed bandit and algorithms such as the Upper Confidence Bound algorithm, the Thompson sampling algorithm and the Knowledge Gradient algorithm are applied. The second solution creates an -mesh and intersects it with the semi-algebraic set to obtain representative controls and maps these to the arms. Since the number of arms can potentially be exponential in the dimension of the space containing the set, we initially start with a polynomially sized subset of controls. As we progress through several rounds of operations, we discard controls that are less frequently picked by the agents and recursively obtain more controls from the semi-algebraic set near the controls that are more frequently picked and include them as arms. We show the effectiveness of these methods on multiple optimal control tasks including traffic control, acceleration profiling for autonomous vehicles and cooling systems.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 13% of the total text.

Page 01 of 7

Learning Personalized Optimal Control for Repeatedly Operated Systems

1

Abstract

  In this invention, we consider the problem of online learning of optimal control for repeatedly operated systems in the presence of parametric uncertainty. During each round of operation, environment selects a parameter from a fixed but unknown probability distribution. This parameter governs the dynamics of a plant. An agent chooses a control input to the plant and is then revealed the cost of the choice. The dependence of the cost on the choice is not necessarily known. In this setting, our goal is to design an agent that personalizes the control input to this plant taking into account the stochasticity involved. In particular, we want to devise an agent that minimizes the total expected cost accumulated in all rounds of operation.

  We provide multiple solutions to design agents to achieve this objective. A precursor to these solutions is our characterization of the set of candidate control inputs. These sets are bounded and semi-algebraic in nature (i.e., characterized by polynomial equalities and inequalities). The first solution designs an agent that sub-samples several representative controls uniformly at random from the semi-algebraic set. These representative controls are chosen as arms of a multi-armed bandit and algorithms such as the Upper Confidence Bound algorithm, the Thompson sampling algorithm and the Knowledge Gradient algorithm are applied. The second solution creates an -mesh and intersects it with the semi-algebraic set to obtain representative controls and maps these to the arms. Since the number of arms can potentially be exponential in the dimension of the space containing the set, we initially start with a polynomially sized subset of controls. As we progress through several rounds of operations, we discard controls that are less frequently picked by the agents and recursively obtain more controls from the semi-algebraic set near the controls that are more frequently picked and include them as arms. We show the effectiveness of these methods on multiple optimal control tasks including traffic control, acceleration profiling for autonomous vehicles and cooling systems.

Keywords: optimal control, repeatedly operated systems, parametric uncertainty, personalization, multi-armed bandit


I. INTRODUCTION

 In the design of optimal control systems, one seeks a controller that performs some desired task while minimizing a given cost functional. In the classical setting, a well-defined system or plant model (i.e,. a set of differential equations governing the dynamics of the system) is assumed to be known. By using this model, controllers are designed offLine by using dynamic programming (i.e., by solving the Hamilton-Jacobi-Bellman (HJB) partial differential equations) or by solving the necessary conditions provided by the Pontryagin's maximum principle (PMP) [1].

 In this work we consider the novel problem of learning optimal controllers...