Summary of the included studies.
| Source/Year | Dataset/n | Population | Approach/Inclusion criteria | AI method | Outcome (as reported) |
|---|---|---|---|---|---|
| Bargshady et al. [17], 2024 | Lab datasets (AI4PAIN: 51; BioVid: 87) | Adults | Acute Pain Datasets (video-based) | Video vision transformers (ViViTs) | Accuracy 66.9% (AI4PAIN), 79.9% (BioVid), outperforming ResNet baselines |
| Bargshady et al. [18], 2020 | UNBC-McMaster, MIntPAIN | Adults | Benchmark datasets | Ensemble DL model (CNN + RNN hybrid, EDLM) | Accuracy > 89%, ROC 0.93; robust vs. single-stream CNN |
| Bellal et al. [19], 2024 | ICU, 30 patients | Critically ill, non-communicative adults | NEVVA® pilot device calibration | AI-based computer vision integrated in devices | Feasible, device calibrated against expert assessment |
| Benavent-Lledo et al. [20], 2023 | UNBC, BioVid | Adults | Public pain expression datasets | Transformer-based computer vision | Accuracy > 96% (UNBC), > 94% (BioVid); high precision, recall |
| Cascella et al. [21], 2024 | Oncology + public datasets (Delaware, UNBC) | Cancer patients, adults | Binary classifier using AUs | Neural network (17 AUs, OpenFace) | Accuracy ~94%; AUROC 0.98 |
| Cascella et al. [22], 2024 | Oncology | Adult cancer patients | Video + audio (facial + speech) | Multimodal AI (speech emotion + facial expression) | Feasibility shown; early accuracy promising |
| Cascella et al. [23], 2023 | Clinical feasibility (real-time) | Adults | Real-time pain detection from facial videos | YOLOv8 object detection | Feasible, metrics reported with good accuracy (JPR) |
| Casti et al. [24], 2019 | Clinical/Lab setting | Adults | Automatic pain detection calibration | DL-based system (CNN) | Benchmarked; addressed inter-/intra-observer variability |
| Casti et al. [25], 2021 | Public dataset (video pain sequences) | Adults | Landmark time-series analysis | Transfer entropy (TE) + ML classifiers | TE-based approach improved accuracy, robust to noise |
| Chen et al. [26], 2022 | UNBC + lung cancer dataset | Adults, including patients with lung cancer | Pain-related AUs | Weakly supervised MIL/MCIL | Accuracy 87%, AUC 0.94 (UNBC); validated also on clinical lung cancer data |
| Dutta and M [27], 2018 | UNBC + live video | Adults | Real-time video-based pain recognition | Hybrid DL model | Validated in real-time; high accuracy reported |
| Ghosh et al. [28], 2025 | UNBC, BioVid + VIVAE (audio) | Adults | Multimodal (facial + audio) | Ensemble DL with CNN + fusion | Accuracy up to 99.5% (3-class), 87.4% (5-class); audio peak 98% |
| Guo et al. [29], 2021 | Cold pressor experiment; 29 subjects | Adults | Cold pain induction | CNN (Inception V3, VGG-LSTM, ConvLSTM) | F1 score 79.5% (personalized ConvLSTM) |
| Heintz et al. [30], 2025 | Perioperative, multicenter (503 pts) | Adults perioperative | Computer vision nociception detection | CNN-based | Strong AUROC, external validation, and feasibility proven |
| Mao et al. [31], 2025 | UNBC | Adults | Pain intensity estimation | Conv-Transformer (multi-task joint optimization) | Outperformed SOTA; improved regression + classification |
| Mieronkoski et al. [32], 2020 | 31 healthy volunteers, experimental | Adults | Pain induction + sEMG | ML (supervised on muscle activation) | Modest c-index 0.64; eyebrow/lip muscles most predictive |
| Morsali and Ghaffari [33], 2025 | UNBC, BioVid | Adults | Public Pain Datasets | ErAS-Net (attention-based DL) | Accuracy 98.8% (binary, UNBC); 94.2% (4-class); cross-dataset BioVid 78% |
| Park et al. [34], 2024 | 155 pts post-gastrectomy | Postoperative adults | Clinical recordings | ML models (facial, ANI, vitals) | AUROC 0.93 (facial); better than ANI/vitals |
| Pikulkaew et al. [35], 2021 | UNBC dataset | Adults | Sequential facial images | CNN (DL motion detection) | Precision: 99.7% (no pain), 92.9% (becoming pain), 95.1% (pain) |
| Rezaei et al. [36], 2021 | Dementia patients, LTC setting | Older adults, dementia | Unobtrusive video dataset | Deep learning + pairwise/contrastive training | Outperformed baselines; validated on dementia cohort |
| Rodriguez et al. [37], 2022 | UNBC + CK | Adults | Raw video frames | CNN + LSTM | Outperformed SOTA AUC (UNBC); competitive on CK |
| Semwal and Londhe [38], 2024 | Multimodal dataset | Adults | Facial + multimodal integration | Multi-stream spatio-temporal network | Showed robust multiparametric pain assessment |
| Tan et al. [39], 2025 | 200 patients | Adults perioperative/interventional | Video recording (STA-LSTM) | STA-LSTM DL network | Accuracy, sensitivity, recall, F1 ≈ 0.92; clinical feasibility |
| Yuan et al. [40], 2024 | ICU, public + 2 new datasets | Critically ill adults (ventilated) | Facial occlusion management | AU-guided CNN framework | Superior performance in binary, 4-class, regression tasks |
| Zhang et al. [41], 2025 | 503 postop patients + volunteers | Adults postoperative | Clinical Pain Dataset (CPD; 3,411 images) + Simulated Pain Dataset (CD) | VGG16 pretrained | AUROC 0.898 (CPD severe pain), 0.867 (CD); software prototype developed |
AI: artificial intelligence; ResNet: Residual Network; UNBC: University of Northern British Columbia Pain Expression dataset; MIntPAIN: Multimodal International Pain dataset; DL: deep learning; CNN: convolutional neural network; RNN: recurrent neural network; EDLM: ensemble deep learning model; ROC: receiver operating characteristic; ICU: intensive care unit; NEVVA: Non-Verbal Visual Analog device; AUs: action units; AUROC: area under the receiver operating characteristic curve; YOLOv8: You Only Look Once version 8; JPR: Journal of Pain Research; ML: machine learning; AUC: area under the curve; MIL: multiple instance learning; MCIL: multiple clustered instance learning; VIVAE: Visual and Vocal Acute Expression dataset; VGG: visual geometry group; LSTM: long short-term memory; ConvLSTM: convolutional long short-term memory; SOTA: state-of-the-art; sEMG: surface electromyography; ErAS-Net: enhanced residual attention-based subject-specific network; ANI: analgesia nociception index; LTC: long-term care; CK: Cohn-Kanade dataset; STA-LSTM: Spatio-Temporal Attention Long Short-Term Memory; CD: Control Dataset.