Unmasking the Devil in the Details: What Works for Deep Facial Action Coding?
Abstract
The performance of automated facial expression coding has improving steadily as evidenced
by results of the latest Facial Expression Recognition and Analysis (FERA 2017)
Challenge. Advances in deep learning techniques have been key to this success. Yet the
contribution of critical design choices remains largely unknown. Using the FERA 2017
database, we systematically evaluated design choices in pre-training, feature alignment,
model size selection, and optimizer details. Our findings vary from the counter-intuitive
(e.g., generic pre-training outperformed face-specific models) to best practices in tuning
optimizers. Informed by what we found, we developed an architecture that exceeded
state-of-the-art on FERA 2017. We achieved a 3.5% increase in F1 score for occurrence
detection and a 5.8% increase in ICC for intensity estimation.
Systematical evaluation
Normalization
The performance with Procrustes analysis is slightly better than the one with Resizing, but the difference is small, only 1%.
Pre-trained architecture
Generic pre-trained models (VGG-ImageNet) show better performance than face-specific ones (VGG-Face).
Training set size
The training set size have minor influence on the performance. We down-sampled the majority class and up-sampled the minority class to build a stratified training set. We used this procedure for each pose and each AU. For example, in the case of AU occurrence detection, a 5,000 training set size indicate that 5,000 frames with AU present and 5,000 frames where the AU is not present were randomly selected for each pose and for each AU, resulting in 90,000 images in total (=5,000 images x 2 classes x 9 poses).
Optimizer and learning rate
Optimal learning rate is largely different between Adam and SGD optimizers. However, the performance differences between Adam and SGD are negligible if one uses the optimal learning rates for each optimizer, respectively.
Comparision with existing methods
Citation
@inproceedings{unmaskingthedevil2019,
title={Unmasking the Devil in the Details: What Works for Deep Facial Action Coding?},
author={Niinuma, Koichiro and Jeni, L{\'a}szl{\'o} A and Onal Ertugrul, Itir and Cohn, Jeffrey F},
booktitle={British Machine Vision Conference},
year={2019}
}