From:  Artificial intelligence for pain assessment via facial expression recognition (2015–2025): a systematic review

 Risk of bias of included studies.

Author/YearCountryIntervention/AI approachTimingOutcomes measurementValidation of tool (Y/N)Quality assessment (RoB 2 overall)
Bargshady et al. [17], 2024Australia/USAVision transformerAcute pain datasetsAccuracy, comparison with baselinesYLow risk (well-reported external datasets)
Bargshady et al. [18], 2020Australia/NetherlandsEnsemble CNN + RNNLab datasetsAccuracy, ROCYSome concerns (no external clinical validation)
Bellal et al. [19], 2024FranceNEVVA® device (AI facial)ICU pilotDevice calibration vs. expertsYSome concerns (small sample, feasibility only)
Benavent-Lledo et al. [20], 2023SpainTransformer-based CVLab datasetsAccuracy, F1YLow risk (robust datasets, transparent methods)
Cascella et al. [21], 2024ItalyBinary AU-based classifierOncology outpatientAccuracy, AUROCYSome concerns (limited clinical cohort)
Cascella et al. [22], 2024ItalyMultimodal (speech + facial)Clinical trial NCT04726228Classification accuracyYLow risk (registered trial, multimodal)
Cascella et al. [23], 2023ItalyYOLOv8Lab/clinical feasibilityDetection metricsYSome concerns (pilot, limited validation)
Casti et al. [24], 2019ItalyDL pain intensity systemLabAccuracy, calibrationYLow risk (strong methodological rigor)
Casti et al. [25], 2021ItalyTransfer entropy + MLLabAccuracy, robustnessYLow risk
Chen et al. [26], 2022USAAU combinations + MILClinical + labAccuracy, AUCYLow risk
Dutta and M [27], 2018IndiaHybrid DLLab + simulatedAccuracy, computational metricsYSome concerns (older methods, limited clinical data)
Ghosh et al. [28], 2025India/SwitzerlandMultimodal (facial + audio)Lab datasetsAccuracy (2–5 classes)YLow risk
Guo et al. [29], 2021ChinaCNN/LSTMCold pressorF1 scoreYSome concerns (small sample)
Heintz et al. [30], 2025USA multicenterCNN-basedPerioperativeAUROC, Brier scoreYLow risk (robust clinical dataset)
Mao et al. [31], 2025ChinaConv-Transformer multitaskLabRegression + classificationYLow risk
Mieronkoski et al. [32], 2020FinlandsEMG + MLExperimental painc-index, featuresYSome concerns (small sample, modest accuracy)
Morsali and Ghaffari [33], 2025Iran/UKErAS-NetLab datasetsAccuracy, cross-datasetYLow risk
Park et al. [34], 2024KoreaML (facial, ANI, vitals)PostoperativeAUROCYLow risk (clinical real-world)
Pikulkaew et al. [35], 2021ThailandCNNLabPrecision, accuracyYLow risk
Rezaei et al. [36], 2021CanadaDLLong-term careSensitivity, specificityYLow risk (validated on target population)
Rodriguez et al. [37], 2022Spain/DenmarkCNN + LSTMLabAUC, accuracyYLow risk
Semwal and Londhe [38], 2024IndiaSpatio-temporal networkLabAccuracyYSome concerns (no external validation)
Tan et al. [39], 2025SingaporeSTA-LSTMClinicalAccuracy, F1YLow risk
Yuan et al. [40], 2024ChinaAU-guided CNNICU, ventilated ptsAccuracy, regressionYLow risk
Zhang et al. [41], 2025ChinaVGG16 pretrainedPostoperativeAUROC, F1YLow risk

AI: artificial intelligence; CNN: convolutional neural network; RNN: recurrent neural network; ROC: receiver operating characteristic; NEVVA: Non-Verbal Visual Analog device; ICU: intensive care unit; CV: computer vision; AU: action unit; AUROC: area under the receiver operating characteristic curve; YOLOv8: You Only Look Once version 8; ML: machine learning; MIL: multiple instance learning; AUC: area under the curve; DL: deep learning; LSTM: long short-term memory; sEMG: surface electromyography; ErAS-Net: enhanced residual attention-based subject-specific network; ANI: analgesia nociception index; STA-LSTM: Spatio-Temporal Attention Long Short-Term Memory.