Modeling of Driver Attention in Real World Scenarios Using Probabilistic Salient Maps




Journal Title

Journal ISSN

Volume Title



Monitoring driver behavior can play a vital role in combating various road hazards. The majority of accidents can be avoided if the driver gets an adequate warning few seconds prior to the event. Monitoring driver actions can provide insights about the driver’s intent, attention and vigilance. This information can be helpful in designing smart interfaces in the vehicle that provides necessary warning to the driver or take control when necessary. Visual attention is one of the most important factors in driver monitoring, since most driving maneuvers strongly rely on vision. An inattentive driver may lack awareness about the factors in the environment such as pedestrians, other vehicles and trac changes. Visual attention of a driver can be monitored by either tracking the driver’s head pose or by tracking their eye movement. While advancement in computer vision have inspired various studies that can eciently track head and eye movement from the face, these models face challenges in a naturalistic driving environment because of the changes in illumination, high head rotation and occlusions. This dissertation discusses various methods to predict the driver’s visual attention using probabilistic visual maps. We collect a large scale multimodal dataset where 59 drivers are recording when performing various secondary activities while driving, to capture the vi diversity of data in a naturalistic driving environment. The subjects fixate their gaze at predetermined location which help us establish a correspondence between the driver’s face and their gaze target. Using this dataset, we have performed various analysis that guided our proposed models to predict the driver’s visual attention. We establish that while the head pose of the driver has a strong correlation with the driver’s visual attention the relationship is not one to one. Hence, it is not feasible to design models that can predict a single value of driver’s gaze from the head pose. Therefore, we take a probabilistic approach where the driver’s visual attention is predicted as a probabilistic visual map whose value at each point depend on the probability that the driver is looking at a certain direction. First, we design parametric regression models that provide a Gaussian distribution of the driver’s gaze from the driver’s head pose. The model is heteroscedastic based on Gaussian Process Regression (GPR) which learns the distribution of gaze as a gaussian random process which is function of the head pose in 6 degrees of freedom. Next, we propose deep networks with convolutional and upsampling layers that performs classification on a 2D grid to obtain visual map. The model is non-parametric and learns the distribution from the data. We propose two di↵erent models. The first model takes the head pose of the driver as the input and passes it through a fully connected layer followed by convolution and upsampling to predict the visual attention at di↵erent resolutions. The second model takes an image of the eye patch as an input and passes it through multiple layers of convolution and maxpooling to obtain a low dimensional representation of the visual attention. Consecutively, this low dimensional representation is passed through upsampling and convolution layers to obtain a high dimension representation of visual attention. In our final approach, We design a fusion model that integrates the information from the driver’s head pose as well as their eye appearance to predict a visual attention map at multiple resolution. This model follows an encoder-decoder architecture with two encoders, one each for the head pose and the gaze and a decoder that concatenates the information from both the head pose and gaze to obtain the final visual map. We project the model prediction onto the road and evaluate it on data when the subject looks at the landmarks on the road.



Engineering, Electronics and Electrical