Toyota Central R&D Labs., Inc. / Technical Journal R&D Review

Technical Journal R&D Review > Previous Issues > Vol.39No.1 > abstract of Special Issure

Abstract : Vol.39No.1(2004.3)
Special Issue：Speech-Based Interfaces in Vehicles

Review

P.1

Toshihiro Wakita

Recently, speech-based interfaces in vehicles have become a popular means of improving the accessibility of in-vehicle information equipment. In this edition, we present the results of our research into the "noise-robustness of speech recognition" and "driver distraction."

(24 k)

TOP

Research Report

P.4

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

Hiroyuki Hoshino

This paper describes an efficient method of improving the noise-robustness of speech recognition in a noisy car environment by considering the acoustic features of a car's interior noise. We analyzed the relationship between the Articulation Index values and the recognition rates in car environments under different driving conditions. We clarified that the recognition rate significantly worsens when the engine noise (periodic sound) components in the frequency range above 200 Hz were large. We developed a preprocessing method to improve the noise-robustness despite large amounts of engine noise. With this method, the cutoff frequency of the front-end high-pass filter is adaptively changed from 200 through 400 Hz according to the level of the engine noise components. The use of this method improved the average recognition rate for all eight cars under the second range acceleration condition by 11.9%, with the recognition rate for one of the cars being improved considerably by 38.6%.

(96k)

TOP

P.10

Estimating Speech-Recognizer Performance Based on Log-Likelihood Difference Distribution of Word-Pairs

Ryuta Terashima

This paper describes an efficient method of estimating word recognition rates without speech data. The method is based on the minimum value of the word-pair recognition rate, which correspons to the word recognition rate. The estimated word-pair recognition rate can be calculated by the measured log-likelihood difference distribution that can be obtained by phoneme recognition, and it is assumed that the distribution is approximated by a normal distribution. To illustrate the effectiveness of our method, we evaluated the performance of the proposed method by actual recognition experiments using 3000 word-pairs. The correlation coefficient value between the estimated and the measured recognition rates was 0.87 when the phoneme lengths of the word-pairs were equal. Furthermore, we also evaluated a 95% confidence interval for the measured recognition rates. The percentage of estimated words that fell within the confidence interval was 94.8%.　

(112k)

TOP

P.16

Voice Information System that Adapts to Driver's Mental Workload

Yuji Uchiyama, Shinichi Kojima, Takero Hongo,
Ryuta Terashima, Toshihiro Wakita

With in-vehicle information systems, there is a danger of voice messages causing the user to be distracted while driving. To reduce this danger, the ideal would be for the system to adapt to the driver's mental workload. Such an adaptive system would deliver voice messages only when the driver's mental workload was low, and suppress messages whenever his or her workload is high. Therefore, such a system would have to be able to estimate the current driver workload from the outputs of the car's sensors such as the speed, steering wheel angle, and accelerator pedal position. To establish a relationship between the driver's mental workload and the data that is output by the car's sensors, a dual-task experiment was conducted on a public road. In this experiment, participants performed a memory-task while driving a test car. At the same time, the data from the car's sensors was recorded. The correlation coefficients linking the performance of the memory-task to the data received from the car's sensors showed that the driver's releasing the accelerator pedal was the most significant indicator of workload. Based on these results, a workload estimation model was developed, which was then applied to a voice information prototype system in a test car. The driving situations in which the system postpones the delivery of voice messages were then confirmed.

(84k)

TOP

P.23

Evaluating the Safety of Verbal Interface Use while Driving

Shinichi Kojima, Yuji Uchiyama,
Hiroyuki Hoshino, Takero Hongo

This paper proposes a method of evaluating the degree of safety of a verbal interface that is used while driving. Recently there have been concerns about driver distraction when a person uses voice commands to operate their in-vehicle multimedia systems while driving, since such distraction has the potential to cause or contribute to a crash. With our evaluation method, the reaction time from the instant that an in-vehicle LED (positioned in the driver's peripheral vision) is turned on to the time that the subject presses a button is measured. We found from the histogram made by the many reaction time data that the number of the delayed reaction time trials increased as a result of the subjects' using a verbal interface compared with the condition that the subjects were only driving. It suggested that the rate of the delayed reaction time trials was available as the evaluation index. Based on the data obtained with an actual vehicle, we found that our method produces more useful results than other methods that use the average reaction time as an index. Additionally, we show that we can find the point at which the subjects' reactions are delayed during a verbal task by processing the delayed reaction time trials. .

(284k)

TOP