Multimodal Driver Referencing: A Comparison of Pointing to Objects Inside and Outside the Vehicle

15 Feb 2022  ·  Abdul Rafey Aftab, Michael von der Beeck ·

Advanced in-cabin sensing technologies, especially vision based approaches, have tremendously progressed user interaction inside the vehicle, paving the way for new applications of natural user interaction. Just as humans use multiple modes to communicate with each other, we follow an approach which is characterized by simultaneously using multiple modalities to achieve natural human-machine interaction for a specific task: pointing to or glancing towards objects inside as well as outside the vehicle for deictic references. By tracking the movements of eye-gaze, head and finger, we design a multimodal fusion architecture using a deep neural network to precisely identify the driver's referencing intent. Additionally, we use a speech command as a trigger to separate each referencing event. We observe differences in driver behavior in the two pointing use cases (i.e. for inside and outside objects), especially when analyzing the preciseness of the three modalities eye, head, and finger. We conclude that there is no single modality that is solely optimal for all cases as each modality reveals certain limitations. Fusion of multiple modalities exploits the relevant characteristics of each modality, hence overcoming the case dependent limitations of each individual modality. Ultimately, we propose a method to identity whether the driver's referenced object lies inside or outside the vehicle, based on the predicted pointing direction.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here