Photography-based diagnostic models
| Author, year | Task; classes (n) | Feature extractors/Features extracted | Classifier | Accuracy | Specificity (TNR) | Sensitivity (recall) | Precision (PPV) | AUC | F1-score or Jaccard index |
|---|---|---|---|---|---|---|---|---|---|
| Camalan et al. [1], 2021 | Classification; suspicious (54) and normal (54) ROIs in photographic images | - | Inception ResNet-v2 | 86.5% | - | - | - | - | - |
| - | ResNet-101 | 79.3% | - | - | - | - | - | ||
| Figueroa et al. [2], 2022 | Classification; suspicious (i.e., OSCC and OPMD) (~ 2,800) and normal (~ 2,800) photographic images | - | GAIN network | 84.84% | 89.3% | 76.6% | - | - | - |
| Flügge et al. [3], 2023 | Classification; OSCC (703) and normal (703) photographic images | - | Swin-transformer DL network | 0.98 | 0.98 | 0.98 | - | - | 0.98 |
| Jubair et al. [4], 2022 | Classification; suspicious [i.e., OSCC and OPMD (236)] and benign (480) photographic images | - | EfficientNetB0 | 85% | 84.5% | - | - | 0.92 | - |
| Jurczyszyn et al. [5], 2020 | Classification; OSCC (35) and normal (35) photographic images (1 normal and one of leukoplakia in the same patient) | MaZda software/Textural features, as run length matrix (two), co-occurrence matrix (two), Haar Wavelet transformation (two) | Probabilistic neural network | - | 97% | 100% | - | - | - |
| Lim et al. [6], 2021 | Classification; no referral (493), refer—cancer/high-risk (636), refer—low-risk (685), and refer—other reasons (641) | - | ResNet-101 | - | - | 61.70% | 61.96% | - | 61.68% |
| Shamim et al. [7], 2019 | Classification; benign and precancerous (200) photographic images | - | VGG19 | 98% | 97% | 89% | - | - | - |
| AlexNet | 93% | 94% | 88% | - | - | - | |||
| GoogLeNet | 93% | 88% | 80% | - | - | - | |||
| ResNet50 | 90% | 96% | 84% | - | - | - | |||
| Inceptionv3 | 93% | 88% | 83% | - | - | - | |||
| SqueezeNet | 93% | 96% | 85% | - | - | - | |||
| Classification; types of tongue lesions (300) photographic images | - | VGG19 | 97% | - | - | - | - | - | |
| AlexNet | 83% | - | - | - | - | - | |||
| GoogLeNet | 88% | - | - | - | - | - | |||
| ResNet50 | 97% | - | - | - | - | - | |||
| Inceptionv3 | 92% | - | - | - | - | - | |||
| SqueezeNet | 90% | - | - | - | - | - | |||
| Sharma et al. [8], 2022 | Classification; OSCC (121), OPMD (102) and normal (106) photographic images | - | VGG19 | 76% | - | OSCC: 0.43 | OSCC: 0.76 | OSCC: 0.92 | OSCC: 0.45 |
| - | Normal: 1 | Normal: 0.9 | Normal: 0.99 | Normal: 0.95 | |||||
| - | OPMD: 0.78 | OPMD: 0.7 | OPMD: 0.88 | OPMD: 0.74 | |||||
| VGG16 | 72% | - | - | - | OSCC: 0.94 | - | |||
| - | - | - | Normal: 0.96 | - | |||||
| - | - | - | OPMD: 0.92 | - | |||||
| MobileNet | 72% | - | - | - | OSCC: 0.88 | - | |||
| - | - | - | Normal: 0.99 | - | |||||
| - | - | - | OPMD: 0.80 | - | |||||
| InceptionV3 | 68% | - | - | - | OSCC: 0.88 | - | |||
| - | - | - | Normal: 0.1 | - | |||||
| - | - | - | OPMD: 0.88 | - | |||||
| ResNet50 | 36% | - | - | - | OSCC: 0.43 | - | |||
| - | - | - | Normal: 0.33 | - | |||||
| - | - | - | OPMD: 0.42 | - | |||||
| Song et al. [9], 2021 | Classification; malignant (911), premalignant (1,100), benign (243) and normal (2,417) polarized white light photographic images | - | VGG19 | 80% | - | 79% | 83% | - | 81% |
| Song et al. [10], 2023 | Classification; suspicious (1,062), normal (978) photographic images | - | SE-ABN | 87.7% | 88.6% | 86.8% | 87.5% | - | - |
| SE-ABN + manually edited attention maps | 90.3% | 90.8% | 89.8% | 89.9% | - | - | |||
| Tanriver et al. [11], 2021 | Segmentation, object detection and classification; carcinoma (162), OPMD (248) and benign (274) photographic images | - | EfficientNet-b4 | - | - | 85.5% | 86.9% | - | 85.8% |
| Inception-v4 | - | - | 85.5% | 87.7% | - | 85.8% | |||
| DenseNet-161 | - | - | 84.1% | 87.9% | - | 84.4% | |||
| ResNet-152 | - | - | 81.2% | 82.6% | - | 81.1% | |||
| Ensemble | - | - | 84.1% | 84.9% | - | 84.3% | |||
| Thomas et al. [12], 2013 | Classification; 192 sections of photographic images from 16 patients | GLCM, GLRL and intensity based first order features (eleven selected features) | Backpropagation based ANN | 97.92% | - | - | - | - | - |
| Warin et al. [13], 2021 | Object detection and classification; OPMD (350) and normal (350) photographic images | - | DenseNet-121 | - | 100% | 98.75% | 99% | 0.99 | 99% |
| Warin et al. [14], 2022 | Object detection and classification; OPMD (315) and OSCC (365) photographic images | - | DenseNet-169 | - | OSCC: 99% | OSCC: 99% | OSCC: 98% | OSCC: 1 | OSCC: 98% |
| - | OPMD: 97% | OPMD: 95% | OPMD: 95% | OPMD: 0.98 | OPMD: 95% | ||||
| ResNet-101 | - | OSCC: 94% | OSCC: 92% | OSCC: 96% | OSCC: 0.99 | OSCC: 94% | |||
| - | OPMD: 94% | OPMD: 97% | OPMD: 97% | OPMD: 0.97 | OPMD: 97% | ||||
| Warin et al. [15], 2022 | Object detection and classification; OPMD (300) and normal (300) photographic images | - | DenseNet-121 | - | 90% | 100% | 91% | 0.95 | 95% |
| ResNet-50 | - | 91.67% | 98.39% | 92% | 0.95 | 95% | |||
| Welikala et al. [16], 2020 | Object detection and classification; referral (1,054) and non-referral (379) photographic images | - | ResNet-101 | - | - | 93.88% | 67.15% | - | 78.30% |
| Xue et al. [17], 2022 | Classification; ruler (440) and non-ruler (2,377) photographic images; first batch (2,817 images/250 patients), second batch (4,331 images/168 patients) | - | ResNetSt | 99.6% | 99.6% | 100% | 97.9% | 99.6% | 98.9% |
| Vit | 99.8% | 99.8% | 100% | 0.98 | 99.8% | 99.5% |
ANN: artificial neural network; DL: deep learning; GAIN: guided attention inference; GLCM: gray-level co-occurrence matrix; GLRL: grey level run-length matrix; OPMD: oral potentially malignant disorders; OSCC: oral squamous cell carcinoma; PPV: positive predictive value; ROI: region of interest; TNR: true negative rate; AUC: area under the curv