Sbia2012

Disponível somente no TrabalhosFeitos
  • Páginas : 17 (4228 palavras )
  • Download(s) : 0
  • Publicado : 10 de abril de 2013
Ler documento completo
Amostra do texto
Context-sensitive ASR for controlling the navigation of mobile robots
Gabriel Ferreira Araujo and Hendrik Teixeira Macedo
Computer Science Department, Federal University of Sergipe, S˜o Cristov˜o SE a a 49100000, Brazil, gabrielfa@dcomp.ufs.br, hendrik@ufs.br

Abstract. Automatic Speech Recognition (ASR) is a complex task, which depends on language, vocabulary and context. In the navigationcontrol of mobile robots, the set of possible interpretations for a command utterance may be reduced in favor of the recognition rate increase, if we consider that the robot’s work environment is quite defined and with constant elements. In this paper we propose a contextual model in addition to the acoustic and language models used by mainstream ASRs. We provide a whole mobile robot navigationsystem which use contextual information to improve the recognition rate of speech-based commands. Recognition accuracy has been evaluated by Word Information Lost (WIL) metric. Results show that the insertion of a contextual model provides a improvement around 3% on WIL.

1

Introduction

Human-Robot Interaction (HRI) is a important multidisciplinary research field, which aims to understand,design, and evaluate every communication form among humans and robots [5]. HRI applications where humans and robots are placed within the same room, are usually classified as proximate interaction, which seems to favor the use of gesture-based or speech-based human-robot interaction kind [2]. Even though multimodal interfaces have been recently proposed in order to provide more intuitive human-robotproximate interaction [6], we are interested in speech-based interface issues, due the advantage very well posed by [12]: “according to the diffraction property of audio signal, the sound can bypass obstacles” and so, a speech-based system may suit better for a mobile robot navigation control, as has been usually applied [13], [1]. In order to be effective, though, speech-based interfaces need highautomatic speech recognition(ASR) rates with low response time to human commands. Unfortunately, this is still not the case, partially due to language, vocabulary and context issues. The set of possible pragmatic interpretations for a given speech command can be reduced through the use of contextual information and so, ASR rates could be raised. In this paper we analyze the influence of contextualelements in ASR rates for navigation control of mobile robots. Some efforts have been carried out on the

use of contextual information in speech-based systems [8], [7], [10], however, not providing a proper evaluation of the speech recognition enhancement obtained. Thus, we propose a contextual model to ASR systems and analyze the influence of this model in speech-based control system by means ofASR widely adopted evaluation metrics. A brief introduction to ASR is presented in section 2. The context model proposed is depicted in section 3. Section 4 shows how the context model provided is applied to speech-based control of mobile robot navigation. The influence analysis of the context model in ASR is presented in section 5. We conclude the work in section 6 along with some suggestedextensions.

2

Automatic Speech Recognition (ASR)

The job of a speech recognizer is thus to map an acoustic signal to a valid sequence of words, which is valid according to syntactic rules of underlying language. The first step of the recognition task is to appropriately represent the speech waveform into an acoustic feature vector (or evidence) X = (x1 , x2 , · · · , xt ). Extraction of theMel-frequency cepstral coefficients (MFCCs) have been largely used for such representation. MFCC warps the linear spectrum of the speech into a nonlinear scale called Mel which attempts to model the human ear sensitivity. An ASR system can be then mathematically described as a mapping of such acoustic evidence X to a sequence of words W = (w1 , w2 , · · · , wn ), as defined in equation 1, where the...
tracking img