From:  Artificial intelligence strategies for emotion recognition in cancer pain research

 Deep learning applications for facial-expression-based emotion recognition.

Approach/ModelDescriptionStrengths (relevance to APA)Main limitationsRef.
CNNExtraction of spatial features from static facial images for emotion classification or regressionEffective detection of facial muscle activation patterns (AUs); suitable for baseline pain/no-pain discriminationLimited ability to capture temporal dynamics; reduced performance in real-world conditions (occlusions, variability)[24]
Hybrid CNNA combination of convolutional feature extraction with attention mechanisms focusing on salient facial regionsImproved discrimination of subtle expressions; better robustness to noise and inter-individual variabilityIncreased architectural complexity; requires large annotated datasets[25]
Transformer-based models (TFE, Swin)Attention-based architectures model global dependencies and dynamically focus on informative regionsStrong robustness to occlusions and pose variations; improved generalization across datasetsHigh computational cost; data-intensive training[2830]
CNN + temporal models (TCN, LSTM, 3D CNN)Integration of spatial feature extraction with temporal modeling of video sequences and facial dynamicsCapture of microexpressions and temporal evolution of pain-related facial patterns; critical for continuous and real-time APARequires temporally annotated datasets; higher computational burden[3234]

These approaches differ in their ability to capture static versus dynamic emotional features, with temporal models being particularly relevant for continuous pain assessment in clinical settings. CNN: convolutional neural network; LSTM: long short-term memory; TFE: transformer facial encoder; 3D CNN: three-dimensional CNN; TCN: Temporal Convolutional Network; AUs: action units; APA: automatic pain assessment.