株式会社　豊田中央研究所／電子ジャーナル技報R&Dレビュー

　Recently, speech-based interfaces in vehicles have become a popular means of improving the accessibility of in-vehicle information equipment. In this edition, we present the results of our research into the "noise-robustness of speech recognition" and "driver distraction."

　近年，自動車における音声インタフェースは車載情報機器を安全かつ便利に操作できる手段として注目されている。本稿では「車内騒音下音声認識技術」および「運転中音声インタフェースの安全性評価」に関して最近の研究動向を述べる。

(40 k)

TOP▲

　This paper describes an efficient method of improving the noise-robustness of speech recognition in a noisy car environment by considering the acoustic features of a car's interior noise. We analyzed the relationship between the Articulation Index values and the recognition rates in car environments under different driving conditions. We clarified that the recognition rate significantly worsens when the engine noise (periodic sound) components in the frequency range above 200 Hz were large. We developed a preprocessing method to improve the noise-robustness despite large amounts of engine noise. With this method, the cutoff frequency of the front-end high-pass filter is adaptively changed from 200 through 400 Hz according to the level of the engine noise components. The use of this method improved the average recognition rate for all eight cars under the second range acceleration condition by 11.9%, with the recognition rate for one of the cars being improved considerably by 38.6%.

　本論文では，車室内騒音下における音声認識のノイズロバスト性を向上するための，騒音の音響的特徴を考慮した効率的な手法について述べる。我々は様々な走行状況における車室内環境下での会話明瞭度指標値と音声認識率との関係を解析した。その結果我々は，約200Hz以上の周波数帯域におけるエンジンノイズ（周期音）成分が大きい場合，認識率がかなり悪化することを明らかにした。そして，我々は大きなエンジンノイズ成分が存在する条件下においてノイズロバスト性を向上させる前処理法を開発した。これは，エンジンノイズ成分の大きさにより，前処理でのハイパスフィルタのカットオフ周波数を200Hzから400Hzの間で適応的に変化させるものである。この手法により，セカンドレンジ加速走行下における8車種の平均認識率は11.9%向上し，特にその中の1車種では認識率が38.6%向上した。

(136K)

TOP▲

　This paper describes an efficient method of estimating word recognition rates without speech data. The method is based on the minimum value of the word-pair recognition rate, which correspons to the word recognition rate. The estimated word-pair recognition rate can be calculated by the measured log-likelihood difference distribution that can be obtained by phoneme recognition, and it is assumed that the distribution is approximated by a normal distribution. To illustrate the effectiveness of our method, we evaluated the performance of the proposed method by actual recognition experiments using 3000 word-pairs. The correlation coefficient value between the estimated and the measured recognition rates was 0.87 when the phoneme lengths of the word-pairs were equal. Furthermore, we also evaluated a 95% confidence interval for the measured recognition rates. The percentage of estimated words that fell within the confidence interval was 94.8%.　

　音声データを用いずに単語認識率を評価する手法について報告する。本手法は単語対認識率の最小値に基づくものである。そこで，単語対認識率が単語認識率との相関が高いことを，実際の認識実験から示す。予測単語対認識率は音素認識によって得られる対数尤度差分布から計算される。また，本手法の有効性を示すため，3000単語対を用いた評価実験を行った。その結果，単語対の音素長が等しい場合，予測認識率と実際の認識率との相関係数は0.87であった。また95%の信頼区間で評価したところ，全体の94.8%の単語対における認識性能を予測可能であった。

(140K)

TOP▲

内山祐司，小島真一，本郷武朗，寺嶌立太，脇田敏裕

　With in-vehicle information systems, there is a danger of voice messages causing the user to be distracted while driving. To reduce this danger, the ideal would be for the system to adapt to the driver's mental workload. Such an adaptive system would deliver voice messages only when the driver's mental workload was low, and suppress messages whenever his or her workload is high. Therefore, such a system would have to be able to estimate the current driver workload from the outputs of the car's sensors such as the speed, steering wheel angle, and accelerator pedal position. To establish a relationship between the driver's mental workload and the data that is output by the car's sensors, a dual-task experiment was conducted on a public road. In this experiment, participants performed a memory-task while driving a test car. At the same time, the data from the car's sensors was recorded. The correlation coefficients linking the performance of the memory-task to the data received from the car's sensors showed that the driver's releasing the accelerator pedal was the most significant indicator of workload. Based on these results, a workload estimation model was developed, which was then applied to a voice information prototype system in a test car. The driving situations in which the system postpones the delivery of voice messages were then confirmed.　

　車載情報システムによる音声メッセージの提示は運転中のドライバにとって注意をそらす危険がある。そのような危険性を避けるためドライバのメンタルワークロードに適応して，適切なタイミングで情報提示をするシステムが必要である。この適応情報提示システムは，ドライバのメンタルワークロードが低いときに情報提示を行い，高い場合は情報提示を遅らせる動作をする。このシステムの実現には現在のドライバのワークロードを車載センサから間接的に推定することが重要となる。本研究では，車載センサとドライバのメンタルワークロードの関係を見つけるため，二重課題法による実験を行った。被験者に実験車を運転しながら，記憶課題を行ってもらい，同時に車載センサからのデータを記録した。この実験結果から，アクセルペダルをオフにしたときの操作がワークロード推定に最も良い指標となることが明らかになった。この結果に基づき適応して，適切なタイミングで交通情報を提示するシステムを実験車に実装し，システムが情報提示を遅らせる運転状況を確認した。

(128K)

TOP▲

小島真一，内山祐司，星野博之，本郷武朗

　This paper proposes a method of evaluating the degree of safety of a verbal interface that is used while driving. Recently there have been concerns about driver distraction when a person uses voice commands to operate their in-vehicle multimedia systems while driving, since such distraction has the potential to cause or contribute to a crash. With our evaluation method, the reaction time from the instant that an in-vehicle LED (positioned in the driver's peripheral vision) is turned on to the time that the subject presses a button is measured. We found from the histogram made by the many reaction time data that the number of the delayed reaction time trials increased as a result of the subjects' using a verbal interface compared with the condition that the subjects were only driving. It suggested that the rate of the delayed reaction time trials was available as the evaluation index. Based on the data obtained with an actual vehicle, we found that our method produces more useful results than other methods that use the average reaction time as an index. Additionally, we show that we can find the point at which the subjects' reactions are delayed during a verbal task by processing the delayed reaction time trials.

　本研究では運転中の音声インタフェースの安全性を評価する方法を提案する。近年，音声インタフェース使用中の運転者の注意散漫が懸念されている。注意散漫は潜在的に衝突危険性を有するものと考えられているからである。我々の評価法では，運転者の周辺視野に置かれたLEDへの反応時間が計測される。反応時間を何回も計測した後でヒストグラムを作成したところ，反応時間が大きく遅れた試行の割合が音声インタフェース使用時には非使用時よりも増加することを見出し，これが評価指標となると考えた。実車を用いた実験で，我々の評価法が実際に機能することを示すと同時に，よく用いられる反応時間の平均値を用いた評価よりも感度が高いことを示す。さらに，反応時間が大きく遅れた試行を集めて解析することで，音声インタフェース使用時に反応時間が遅れやすいタイミングを知ることができることを示す。

(284K)

TOP▲