From:  A deep learning framework for classifying autism spectrum disorder from children’s facial images using a multi-scale ViT architecture and edge computing

 Comparative performance of the proposed method against existing facial-analysis ASD detection approaches.

Ref.MethodDatasetAccuracy (%)AdvantagesLimitations
Pan and Foroughi [22]AlexNet (edge-oriented)Facial images (edge-deployment concept)-Introduces an edge-computing pipeline for school environmentsNo reproducible benchmark metrics reported; high-level concept paper.
Ahmad et al. [38]ResNet50 (transfer learning)Kaggle Autism Image Data92.0Systematic comparison of multiple CNN backbones; clear training protocolSingle public dataset; on-device/latency not addressed.
Attar and Paygude [26]MobileNetV2 + RGSOFacial images (not explicitly named)98.0Lightweight backbone with meta-heuristic optimization; strong reported scoresConference paper; dataset naming/splits not fully specified for reproducibility.
Shahzad et al. [24]ResNet101 + EfficientNetB3 with hybrid attentionFacial images (dataset details in article body)96.50Attention-based fusion improves feature saliencyMore computationally intensive; edge deployment not discussed.
ProposedMS-ViT + Edge + AugmentationAutistic Children Facial Dataset96.85High accuracy with edge-device efficiency; multi-scale features; robust augmentationRequires additional pre-processing.

ASD: autism spectrum disorder; CNN: convolutional neural network; MS-ViT: multi-scale vision transformer.