Study of Real-Time Facial Expression Recognition on Noisy Images and Videos




Journal Title

Journal ISSN

Volume Title



Automatic facial expression recognition (FER) and emotion recognition have aroused many researchers’ interest in a variety of research fields because of an important role in human centered interfaces and the advent of cheap and powerful computer and video camera in the last decade. In addition, the emergence of the smartphones era has aroused considerable interest in the mobile application development in connection with facial expression and emotion recognition. However, in spite of the enhanced hardware of recent smartphones, mobile applications for processing real-time video should always consider limited resources available in smartphones. The limited processing resources in smartphones still make it difficult to directly adopt the existing facial expression and emotion recognition system from desktops. Most studies for FER have been carried out and evaluated under restricted experimental environment. For instance, some approaches deal with only static images or work with video sequences manually pre-segmented (temporally) for each expression. However, the temporal segmentation of expressions is the most essential element in automatic FER systems as real world applications for real-time video. Also, the real world dataset for FER is different from most conventional datasets which are mainly collected in a limited experimental environment. It is hard to apply models made with datasets collected under lab environment to real world application. The automatic FER should be capable of satisfying these various types of noisy datasets. We address several problems for real-time FER on low-power smartphones. First, we presents a real-time FER effectively running on smartphones. The system employs a set of Support Vector Machines (SVM) for neutral expression and 6 basic emotions with 13D geometric facial features including temporal information. We evaluated the performance of the proposed system in terms of speed and accuracy on offline dataset and commercial off-the-shelf smartphones. Second, we present a real-time temporal video segmenting approach for automatic FER applicable in a smartphone. The proposed system uses a Finite State Machine (FSM) for segmenting real time video into temporal phases from neutral expression to the peak of an expression. The system performs FER with SVM on every apex state after automatic temporal segmentation, without any sampling time delay. Third, we present gender-driven ensemble models for FER on smartphones working with a context-sensitive multimedia content recommendation system. Based on the fact that male and female express an emotion with a distinct difference in the horizontal and vertical facial movements, we employ the ensemble model with three weak classifiers trained by gender-specific subsets and a general dataset of facial expression. In the system, users receive feedback by links to multimedia contents such as videos, photos and e-books regarding a current user’s emotion. Last, we present an approach using CNN model for FER to accommodate noisy images and videos dataset in real world environment. We adopt FER2013 dataset for training CNN model. We show the CNN model is able to work very well for expression recognition even with real, noisy data that is not used for training



Human face recognition (Computer science), Facial expression, Emotion recognition, Machine learning, Mobile apps, Smartphones, Support vector machines, Sequential machine theory, Convolutions (Mathematics), Neural networks (Computer science)