Going Beyond Literal Command-Based Instructions: Extending Robotic Natural Language Interaction Capabilities


The ultimate goal of human natural language interaction is to communicate intentions. However, these intentions are often not directly derivable from the semantics of an utterance (e.g., when linguistic modulations are employed to convey politeness, respect, and social standing).

Robotic architectures with simple command-based natural language capabilities are thus not equipped to handle more liberal, yet natural uses of linguistic communicative exchanges.

In this paper, we propose novel mechanisms for inferring intentions from utterances and generating clarification requests that will allow robots to cope with a much wider range of task-based natural language interactions. We demonstrate the potential of these inference algorithms for natural humanrobot interactions by running them as part of an integrated cognitive robotic architecture on a mobile robot in a dialoguebased instruction task.

by Tom Williams, Gordon Briggs, Bradley Oosterveld, and Matthias Scheutz Human-Robot Interaction Laboratory Tufts University, Medford, MA, USA {thomas_e.williams, gordon.briggs, bradley.oosterveld, matthias.scheutz}@tufts.edu


When humans interact in natural language (NL) as part of joint activities, their ultimate goal is to understand each others’ intentions, regardless of how such intentions are expressed. While it is sometimes possible to determine intentions directly from the semantics of an utterance, often the utterance alone does not convey the speaker’s intention. Rather, it is only in conjunction with goal-based, task-based, and other context-based information that listeners are able to infer the intended meaning, such as in indirect speech acts where requests or instructions are not apparent from the syntactic form or literal semantics of the utterance.

Given that an important goal of human-robot interaction is to allow for natural interactions (Scheutz et al. 2007), robotic architectures will ultimately have to face the challenge of coping with more liberal and thus natural human speech. Enabling a broader coverage of human speech acts (beyond imperatives expressing commands), however, is quite involved and requires various additional mechanisms in the robotic architecture.

In this paper, we introduce novel algorithms based on Dempster-Shafer (DS) theory (Shafer 1976) for inferring intentions I from utterances U in contexts C, and, conversely, for generating utterances U from intentions I in contexts C.

We select more general DS-based Copyright © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. representations over single-valued probabilities because the probability-based Bayesian inference problem to calculate P(I|U, C) in terms of P(U|I, C) is not practically feasible, for at least two reasons: (1) we do not have access to distributions over an agent’s intentions (as we cannot look inside its head), and (2) we would need a table containing priors on all combinations of intentions and contexts.

Instead, we employ rules of the form u ∧ c →[α,β] i that capture intentions behind utterances in particular contexts, where [α, β] is a confidence interval contained in [0,1] which can be specified for each rule independently (e.g., based on social conventions, or corpora statics when available).

These rules are very versatile in that they can be defined for individual utterances and contexts or whole classes of utterances and contexts. Most importantly, we can employ DS-based modus ponens to make uncertain deductive and abductive inferences which cannot be made in a mere Bayesian framework.

For more details justifying this approach, see (Williams et al. 2014). We start with background information on instruction based robotic architectures and basic Dempster-Shafer theoretic concepts, and then introduce the proposed algorithms for pragmatic inference and for generating requests to disambiguate intended meanings.

Then we demonstrate the operation of the algorithms in a detailed example showing how uncertainty is propagated at each stage of processing and can lead to different responses by the robot. We finish with a brief discussion of the proposed approach and possible directions for future work.

Read the paper here