picture_as_pdf Download PDF

IARC 60th Anniversary - 19-21 May 2026

Session : 19/05/26 - Posters

CerviTrans-I: An Interpretable Patch-Aware Transformer Architecture for Cervical Cancer Classification

SHARMA N. ¹, REDDY BOLLU T. ¹, ANAND A. ²

¹ Indian Institute of Technology, Roorkee, Roorkee, India; ² International Institute for Population Sciences, Roorkee, India

Background: Cervical cancer persists as a major global public health challenge. It is one of the leading causes of cancer-related mortality among women, particularly in low- and middle-income countries. Early and accurate identification of precancerous and cancerous cervical cell abnormalities is essential for effective screening, timely intervention, and improved survival outcomes. Recent advancements in artificial intelligence have enabled an automated diagnosis system for cervical cancer. Natural language processing-based transformer architectures have shown enhanced capabilities in modeling global contextual dependencies and complex visual patterns compared to traditional deep learning methods such as convolutional neural networks. However, the limited interpretability of the transformer model restricts its clinical adoption. Method: In this paper, we propose a novel CerviTrans-I, an interpretable patch-aware transformer architecture for cervical cancer classification, which is fine-tuned on a pre-trained hindered transformer model. The proposed framework splits each image into patches and processes them through the transformer blocks, where the main classification token aggregates the information across all patches, while the feed-forward layers update only the individual patch features. By restricting the information path, the final prediction is directly decomposed into contributions from each patch, facilitating the interpretable cervical cancer classification. Basic image processing techniques are applied to improve the feature extraction and overall performance. The proposed architecture is validated on the two publicly available cervical cell image datasets, the Herlev and SIPaKMeD datasets. The Herlev dataset contains 917 Pap smear images distributed across seven cell classes. The SipakMed dataset consists of 4049 single-cell images in five classes. For enhancing the performance of the architecture, both datasets are merged and consolidated into a binary classification. For model development and evaluation, the combined dataset was split into training (70%), validation (15%), and testing (15%) subsets. The input image size is augmented to 224 * 224 * 3. Result: Using five-fold cross-validation, CerviTrans-I achieved an average accuracy of 99.02%, precision of 99%, and F1-score of 98.96%, demonstrating its robust performance in cervical cancer classification. The experimental results show that the proposed method outperformed the state-of-the-art deep learning model by approximately 3%. The attention maps highlight important regions in the input images, such as abnormal tissue areas and cell structures, that contribute most significantly to the model’s predictions for each class. The highlighted regions aligned closely with expert annotations, indicating that the model’s predictions are both interpretable and trustworthy. Conclusion: The proposed framework strengthens the attention-constrained, patch-aware modeling to achieve both high predictive accuracy and transparent decision-making, providing a clinically viable solution for automated cervical cancer diagnosis.

The Proposed Framework: CerviTrans-I; here, Input images are processed using a patch-embedded, and interpretable multi-head patch-aware attention mechanism, visualized with the attention map