Synthesis of Provably Correct Autonomy Protocols

for Shared Control

Murat Cubuktepe, Nils Jansen, Mohammed Alsiekh, Ufuk Topcu

Abstract—We develop algorithms for the synthesis of shared

control protocols subject to probabilistic temporal logic specifi-

cations. More specifically, we introduce a modeling formalism

in which both a human and an autonomy protocol can issue

commands to a robot towards performing a certain task. These

commands are blended into a joint input to the robot. We model

the interaction between the human and the robot as a Markov

decision process that incorporates the potential randomness in

decisions caused by factors such as complexity of the task spec-

ifications and imperfect interfaces. Using inverse reinforcement

learning, we obtain an abstraction of the human’s behavior as a

so-called randomized strategy. We design the autonomy protocol

to ensure that the robot behavior—resulting from the blending

of the commands—satisfies given safety and performance specifi-

cations in probabilistic temporal logic. Additionally, the resulting

strategies generate behavior as similar to the behavior induced by

the human’s commands as possible. We formulate the underlying

problem as a quasiconvex optimization problem, which is solved

by checking feasibility for a number of linear programming

problems. We assess the applicability and the scalability of the

approach through case studies involving autonomous wheelchair

navigation and unmanned aerial vehicle mission planning.

I. INTRODUCTION

We study the problem of shared control, where a robot

is to execute a task according to the goals of a human

operator while adhering to additional safety and performance

requirements. Applications of such human-robot interaction

include remotely operated semi-autonomous wheelchairs 13,

robotic teleoperation 16, and human-in-the-loop unmanned

aerial vehicle mission planning 9. Basically, the human

operator issues a command through an input interface, which

maps the command directly to an action for the robot. The

problem is that a sequence of such actions may fail to

accomplish the task at hand, due to limitations of the interface

or failure of the human in comprehending the complexity of

the problem. Therefore, a so-called autonomy protocol provides

assistance for the human in order to complete the task according

to the given requirements.

At the heart of the shared control problem is the design of

an autonomy protocol. In the literature, there are two main

directions, based on either switching the control authority

between human and autonomy protocol 26, or on blending

their commands towards joined inputs for the robot 7, 15.

M. Cubuktepe and U. Topcu are with the Department of Aerospace Engi-

neering and Engineering Mechanics, University of Texas at Austin, 201 E 24th

St, Austin, TX 78712, USA. Nils Jansen is with the Department of Software

Science, Radboud University Nijmegen, Comeniuslaan 4, 6525 HP Nijmegen,

Netherlands. Mohammed Alsiekh is with the Institute for Computational

Engineering and Sciences, University of Texas at Austin, 201 E 24th St,

Austin, TX 78712, USA. email:({mcubuktepe,malsiekh,utopcu}@utexas.edu,

[email protected]).

One approach to switching the authority first determines

the desired goal of the human operator with high confidence,

and then assists towards exactly this goal 8, 18. In 12,

switching the control authority between the human and au-

tonomy protocol ensures satisfaction of specifications that are

formally expressed in temporal logic. In general, the switching

of authority may cause a decrease in human’s satisfaction, who

usually prefers to retain as much control as possible 17.

The second direction is to provide another command in

addition to the one of the human operator. To introduce a

more flexible trade-off between the human’s control authority

and the level of autonomous assistance, both commands are

then blended to form a joined input for the robot. A blending

function determines the emphasis that is put on the autonomy

protocol in the blending, that is, regulating the amount of

assistance provided to the human. Switching of authority can

be seen as a special case of blending, as the blending function

may assign full control to the autonomy protocol or to the

human. In general, putting more emphasis on the autonomy

protocol in blending leads to greater accuracy in accomplishing

the task 6, 7, 20. However, as before, humans may prefer

to retain control of the robot 16, 17. None of the existing

blending approaches provide formal correctness guarantees that

go beyond statistical confidence bounds. Correctness here refers

to ensuring safety and optimizing performance according to

the given requirements. Our goal is the design of an autonomy

protocol that admits formal correctness while rendering the

robot behavior as close to the human’s inputs as possible.

We model the behavior of the robot as a Markov decision

process (MDP) 23, which captures the robot’s actions inside a

potentially stochastic environment. Problem formulations with

MDPs typically focus on maximizing an expected reward (or,

minimizing the expected cost). However, such formulations may

not be sufficient to ensure safety or performance guarantees in

a task that includes a human operator. Therefore, we design

the autonomy protocol to ensure that the resulting behavior is

formally verified with respect to probabilistic temporal logic

specifications. Such verification problems have been extensively

studied for MDPs 2.

Take as an example a scenario involving a semi-autonomous

wheelchair 13 whose navigation has to account for a randomly

moving autonomous vacuum cleaner, see Fig. 1. The wheelchair

needs to navigate to the exit of a room, and the vacuum

cleaner moves in the room according to a probabilistic transition

function. The goal of the wheelchair is to reach the exit without

crashing into the vacuum cleaner. The human may not fully

perceive the motion of the vacuum cleaner. Note that the

human’s commands, depicted with the solid red line in Fig 1(a),

(a) Autonomy perspective

0.4

0.4

0.2

0.2

(b) Human perspective

Fig. 1. A wheelchair in a shared control setting.

may cause the wheelchair to crash into the vacuum cleaner.

The autonomy protocol provides another set of commands,

which is indicated by the solid red line in Fig 1(b), to carry

out the task safely without crashing. However, the autonomy

protocol’s commands deviate highly from the commands of

the human. The two sets of commands are then blended into

a new set of commands, depicted using the dashed red line

in Fig 1(b). The blended commands perform the task safely

while generating behavior as similar to the behavior induced

by the human’s commands as possible.

In 15, we formulated the problem of designing the

autonomy protocol as a nonlinear programming problem. How-

ever, solving nonlinear programs is generally intractable 3.

Therefore, we proposed a greedy algorithm that iteratively

repairs the human strategy such that the specifications are

satisfied without guaranteeing optimality, based on 22. Here,

we propose an alternative approach for the blending of the two

strategies. We follow the approach of repairing the strategy

of the human to compute an autonomy protocol. We ensure

that the resulting behavior induced by the repaired strategy (1)

deviates minimally from the human strategy and (2) satisfies

given temporal logic specifications after blending. We formally

define the problem as a quasiconvex optimization problem,

which can be solved efficiently by checking feasibility of a

number of convex optimization problems 4.

A human may be uncertain about which command to issue

in order to accomplish a task. Moreover, a typical interface

used to parse human’s commands, such as a brain-computer

interface, is inherently imperfect. To capture such uncertainties

and imperfections in the human’s decisions, we introduce

randomness to the commands issued by humans. It may not

be possible to blend two different deterministic commands. If

the human’s command is “up” and the autonomy protocol’s

command is “right”, we cannot blend these two commands

to obtain another deterministic command. By introducing

randomness to the commands of the human and the autonomy

protocol, we therefore ensure that the blending is always well-

defined. In what follows, we call a formal interpretation of a

sequence of the human’s commands the human strategy, and

the sequence of commands issued by the autonomy protocol

the autonomy strategy.

The question remains how to obtain the human strategy

in the first place. It may be unrealistic that a human can

comprehend an MDP that models a realistic scenario. To this

end, we create a virtual simulation environment that captures

the behavior of the MDP. We ask humans to participate in two

case studies to collect data about typical human behavior. We

use inverse reinforcement learning to get a formal interpretation

as a strategy based on human’s inputs 1, 28. We model

a typical shared control scenario based on an autonomous

wheelchair navigation 13 in our first case study. In our second

case study, we consider an unmanned aerial vehicle mission

planning scenario, where the human operator is to patrol certain

regions while keeping away from enemy aerial vehicles.

In summary, the main contribution this paper is to synthesize

the autonomy protocol such that the resulting blended or

repaired strategy meets all given specifications while only

minimally deviating from the human strategy. We introduce all

formal foundations that we need in Section II. We provide an

overview of the general shared control concept in Section III.

We present the shared control synthesis problem and provide

a solution based on linear programming in Section IV. We

indicate the applicability and scalability of our approach on

experiments in Section V and draw a short conclusion and

critique of our approach in Section VI.

II. PRELIMINARIES

In this section, we introduce the required formal models and

specifications that we use to synthesize the autonomy protocol,

and we give a short example illustrating the main concepts.

1) Markov Decision Processes: A probability distribution

over a finite set X is a function µ : X ? 0, 1 ? R with?

x?X µ(x) = µ(X) = 1. The set X of all distributions is

Distr(X).

Definition 1 (MDP). A Markov decision process (MDP)M =

(S, sI ,A,P) has a finite set S of states, an initial state sI ? S,

a finite set A of actions, and a transition probability function

P : S ×A? Distr(S).

MDPs have nondeterministic choices of actions at the

states; the successors are determined probabilistically via the

associated probability distribution. We assume that all actions

are available at each state and that the MDP contains no

deadlock states. A cost function C : S ×A? R?0 associates

cost to transitions. If there one single action available at each

state, the MDP reduces to a discrete-time Markov chain (MC).

To define a probability measure and expected cost on MDPs,

so-called strategies resolve the choices of actions.

Definition 2 (Strategy). A randomized strategy for an MDPM

is a function ? : S ? Distr(A). If ?(s, ?) = 1 for ? ? A and

?(s, ?) = 0 for all ? ? A {?}, the strategy is deterministic.

The set of all strategies over M is StrM.

Resolving all the nondeterminism for an MDP M with a

strategy ? ? StrM yields an induced Markov chain M? .

s0 s1 s2

s3 s4

a

b

c

d

0.6

0.4

0.4

0.6

0.6

0.4

0.4

0.6

11

1

(a) MDP M

s0 s1 s2

s3 s4

0.5

0.5

0.5

0.5

11

1

(b) Induced MC M?1

Fig. 2. MDP M with target state s2 and induced MC for strategy ?unif

Definition 3 (Induced MC). For an MDP M = (S, sI ,A,P)

and strategy ? ? StrM, the MC induced by M and ? is

M? = (S, sI ,A,P?) where

P?(s, s?) =

?

??A(s)

?(s, ?) · P(s, ?, s?) for all s, s? ? S .

The occupancy measure of a strategy ? gives the expected

number of taking an action at a state in an MDP. In our

formulation, we use the occupancy measure of a strategy to

compute an autonomy protocol.

Definition 4 (Occupancy Measure). The occupancy measure

x? of a strategy ? is defined as

x?(s, ?) = E

??

t=0

P (?t = ?|st = s)

(1)

where st and ?t denote the state and action in an MDP at

time step t. The occupancy measure x?(s, ?) is the expected

number of taking the action ? at state s with the strategy ?.

2) Specifications: A probabilistic reachability specification

P??(?T ) with the threshold ? ? 0, 1 ? Q and the set of

target states T ? S restricts the probability to reach T from

sI in M to be at most ?. Similarly, expected cost properties

E??(?G) restrict the expected cost to reach the set G ? S

of goal states by an upper bound ? ? Q. Intuitively, some

bad states in T should only be reached with probability ?

(safety specification) while the expected cost to reach the goal

states in G should be below ? (performance specification). We

also use until properties of the form Pr??(¬T U G), which

require that the probability of reaching G while not reaching

T beforehand is at least ?.

The synthesis problem is to find one particular strategy ?

for an MDP M such that given specifications ?1, . . . , ?n are

satisfied in the induced MC M? , written ? |= ?1, . . . , ?n.

Example 1. Fig. 2(a) depicts an MDPM with initial state s0.

States s0 and s1 have choices between actions a or b and c or

d. Action a, for example, has a probabilistic choice between

s1 and s3 with probabilities 0.6 and 0.4. For the self loops at

s2, s3 and s4 we omit actions.

For a safety specification ? = P?0.21(?s2), the deterministic

strategy ?1 ? StrM with ?1(s0, a) = 1 and ?1(s1, c) =

1 induces the probability 0.36 of reaching s2. Therefore,

the specification is not satisfied, see the induced MC in

Human

Strategy

Autonomy

Strategy

Blended

Strategy

Robot

execution

command

command

blended command

Blending

function b

Formal

model Mr

Specifications

?1, . . . , ?n

Human

strategy

Fig. 3. Shared control architecture.

Fig. 2(b). Likewise, the randomized strategy ?unif ? StrM

with ?unif(s0, a) = ?unif(s0, b) = 0.5 and ?unif(s1, c) =

?unif(s1, d) = 0.5 violates the specification, as the probability

of reaching s2 is 0.25. However, the deterministic strategy

?safe ? StrM with ?safe(s0, b) = 1 and ?safe(s1, d) = 1 induces

the probability 0.16, thus ?safe |= ?.

III. CONCEPTUAL DESCRIPTION OF SHARED CONTROL

We now detail the general shared control concept adopted in

this paper. Consider the setting in Fig. 3. As inputs, we have a

set of task specifications, a model Mr for the robot behavior,

and a blending function b. The given robot task is described by

certain performance and safety specifications ?1, . . . , ?n. For

example, it may not be safe to take the shortest route because

there may be too many obstacles in that route. In order to

satisfy performance considerations, the robot should prefer to

take the shortest route possible while not violating the safety

specifications. We model the behavior of the robot inside a

stochastic environment as an MDP Mr.

In general, a human issues a set of commands for the robot

to execute. It is unrealistic that a human can comprehend

an MDP that models a realistic scenario in the first place.

Indeed, a human will likely have difficulties interpreting a

large number of possibilities and the associated probability

of paths and payoffs 11, and it may be impractical for the

human to provide the human strategy to the autonomy protocol,

due to possibly large state space of the MDP. Therefore,

we compute a human strategy ?h as an abstraction of a

sequence of human’s commands, which we obtain using inverse

reinforcement learning 1, 28.

We design an autonomy protocol that provides another

strategy ?a, which we call the autonomy strategy. Then, we

blend the two strategies according to the blending function b

into the blended strategy ?ha. The blending function reflects

preference over the human strategy or the autonomy strategy.

We ensure that the blended strategy deviates minimally from

the human strategy.

At runtime, we can then blend commands of the human with

commands of the autonomy strategy. The resulting “blended”

commands will induce same behavior as with the blended

strategy ?ha, and the specifications are satisfied. This procedure

requires to check feasibility of a number of linear programming

problems, which can be expensive, but it is very efficient to

implement at run time.

The shared control synthesis problem is then the synthesis

of the repaired strategy such that, for the repaired strategy ?ha,

it holds ?ha |= ?1, . . . , ?n while minimally deviating from ?h.

The deviation between the human strategy ?h and the repaired

strategy ?ha is measured by the maximal difference between

the two strategies.