Context-sensitive ASR for controlling the navigation of mobile robots
Gabriel Ferreira Araujo and Hendrik Teixeira Macedo
Computer Science Department, Federal University of Sergipe, S˜o Cristov˜o SE a a 49100000, Brazil, firstname.lastname@example.org, email@example.com
Abstract. Automatic Speech Recognition (ASR) is a complex task, which depends on language, vocabulary and context. In the navigationcontrol of mobile robots, the set of possible interpretations for a command utterance may be reduced in favor of the recognition rate increase, if we consider that the robot’s work environment is quite deﬁned and with constant elements. In this paper we propose a contextual model in addition to the acoustic and language models used by mainstream ASRs. We provide a whole mobile robot navigationsystem which use contextual information to improve the recognition rate of speech-based commands. Recognition accuracy has been evaluated by Word Information Lost (WIL) metric. Results show that the insertion of a contextual model provides a improvement around 3% on WIL.
Human-Robot Interaction (HRI) is a important multidisciplinary research ﬁeld, which aims to understand,design, and evaluate every communication form among humans and robots . HRI applications where humans and robots are placed within the same room, are usually classiﬁed as proximate interaction, which seems to favor the use of gesture-based or speech-based human-robot interaction kind . Even though multimodal interfaces have been recently proposed in order to provide more intuitive human-robotproximate interaction , we are interested in speech-based interface issues, due the advantage very well posed by : “according to the diﬀraction property of audio signal, the sound can bypass obstacles” and so, a speech-based system may suit better for a mobile robot navigation control, as has been usually applied , . In order to be eﬀective, though, speech-based interfaces need highautomatic speech recognition(ASR) rates with low response time to human commands. Unfortunately, this is still not the case, partially due to language, vocabulary and context issues. The set of possible pragmatic interpretations for a given speech command can be reduced through the use of contextual information and so, ASR rates could be raised. In this paper we analyze the inﬂuence of contextualelements in ASR rates for navigation control of mobile robots. Some eﬀorts have been carried out on the
use of contextual information in speech-based systems , , , however, not providing a proper evaluation of the speech recognition enhancement obtained. Thus, we propose a contextual model to ASR systems and analyze the inﬂuence of this model in speech-based control system by means ofASR widely adopted evaluation metrics. A brief introduction to ASR is presented in section 2. The context model proposed is depicted in section 3. Section 4 shows how the context model provided is applied to speech-based control of mobile robot navigation. The inﬂuence analysis of the context model in ASR is presented in section 5. We conclude the work in section 6 along with some suggestedextensions.
Automatic Speech Recognition (ASR)
The job of a speech recognizer is thus to map an acoustic signal to a valid sequence of words, which is valid according to syntactic rules of underlying language. The ﬁrst step of the recognition task is to appropriately represent the speech waveform into an acoustic feature vector (or evidence) X = (x1 , x2 , · · · , xt ). Extraction of theMel-frequency cepstral coeﬃcients (MFCCs) have been largely used for such representation. MFCC warps the linear spectrum of the speech into a nonlinear scale called Mel which attempts to model the human ear sensitivity. An ASR system can be then mathematically described as a mapping of such acoustic evidence X to a sequence of words W = (w1 , w2 , · · · , wn ), as deﬁned in equation 1, where the...
Ler documento completo
Por favor, assinar para o acesso.