You are here
Home > Blog > Oncology > Tongue Tumor Staging Using Artificial Intelligence

Tongue Tumor Staging Using Artificial Intelligence

Tongue Tumor Staging Using Artificial Intelligence


This study addresses the critical issue of misdiagnosis in determining the stage of tongue tumor development, which can lead to varied treatment approaches and hinder patients from receiving appropriate care. The research aims to establish an automatic recognition system for different stages of tongue tumors, including malignant, benign, and leukoplakia, utilizing artificial intelligence methods and pathological tissue section images.

The proposed approach enhances the Swin Transformer framework, a deep learning model, to achieve accurate identification of lesion and non-lesion areas in tissue slice images through a patch-based method. The system employs a self-assembly method to reconstruct the output, marking lesion areas with a heat map. Subsequently, the study introduces a user-friendly automatic recognition system for the various stages of tongue tumor development.

The model exhibits a high recognition accuracy of 98.45%, surpassing the prediction accuracy of each category when compared to specialist doctors with 13 years of experience. The improved Swin Transformer framework ensures precise and automated identification of different stages of tongue tumor development. This innovative approach holds promise for enhancing diagnostic accuracy in clinical settings, ultimately contributing to more effective and tailored treatment strategies.


Oral cancer, particularly tongue cancer, is a prevalent global health concern with a less than 60% 5-year survival rate. Tongue cancer is characterized by high malignancy, local recurrence, and neck metastasis, necessitating radical surgeries for patient survival. Accurate post-surgery assessment of tumor development (benign, malignant, or leukoplakia) is crucial for determining further treatment strategies. The conventional method involves manual magnification of digital whole slide images (DWSI) of hematoxylin and eosin (H&E) stained tumor tissue by histopathologists. However, the scarcity of pathologists, the subjectivity of analysis, and the risk of misjudgments highlight the need for an intelligent-aided diagnosis system.

Convolutional Neural Networks (CNN) in AI have shown promise in medical image classification, but their limitations in handling global information prompted exploration of the Visual Transformer (ViT) model. While ViT excels in image classification, it falls short in tasks requiring dense predictions. The Swin Transformer (Swin-T) addresses these limitations by incorporating hierarchical, localization, and translation invariance into its network structure, offering effective solutions.

Given the large size of H&E-stained DWSI pixels, a patch-based method is employed to process the images in smaller sections, reducing the demand on computer memory. The Swin-T framework is adjusted to enable intelligent prediction of tongue cancer development, achieving a Top1 accuracy of 98.45%. The developed model is encapsulated into an automatic detection system, demonstrating high stability and reliability in clinical and human–machine tests. This innovative approach holds promise for enhancing the accuracy and efficiency of tongue cancer diagnosis, bridging the gap between clinical demands and available resources.


This study focuses on developing an automatic recognition system for different stages of tongue tumor development using artificial intelligence methods and pathological tissue section images. The process involves obtaining H&E-stained DWSI of postoperative tongue tumor tissues, which are then cut into smaller images for analysis. Lesion and non-lesion areas are identified, and a patch-based method is employed for discrimination, addressing potential misjudgments. The dataset comprises 389 patients diagnosed with various tongue conditions, including malignancy, benign cases, and leukoplakia.

To enhance model interpretation, the Swin Transformer framework is utilized for automatic lesion detection. The model’s structure incorporates four stages with multi-head self-attention modules and a two-layer multi-layer perceptron. A stage module, Swin-T_5S, is introduced to prevent rapid feature map shrinking. The model undergoes training and validation using a dataset randomly split into an 8:2 ratio.

A stratified sampling method is employed to collect images, considering different magnifications (×20 and ×40) to ensure data feasibility. Model evaluation metrics include accuracy, confusion matrix, precision, recall, F1-score, and specificity. The study employs the Cosine annealing learning rate for model optimization.

Results demonstrate the model’s high recognition accuracy of 98.45%, surpassing specialist doctors with 13 years of experience. The evaluation metrics provide a comprehensive assessment of the model’s performance in predicting tongue tumor development stages. The study addresses challenges in pathology diagnosis, emphasizing the potential of AI-based systems in improving accuracy and efficiency in clinical settings.


The study systematically evaluates the performance of the Swin-T_5S model alongside several classical CNN models, presenting a comprehensive comparison. Notably, Swin-T_5S emerges as the top-performing model, achieving a remarkable Top1 accuracy of 98.45%. This surpasses other well-established models such as VGG16, ResNet50, DenseNet121, MobileNetV3, InceptionV4, and InceptionResNetV2.

The validation phase demonstrates the robustness of the Swin-T_5S model in accurately distinguishing between lesion and non-lesion areas, achieving a validation accuracy exceeding 99.9%. This underscores the effectiveness of the model in differentiating key features within the tongue tumor images.

Further analysis involves subdividing the lesion area into distinct categories, revealing a superior accuracy of 98.45% for images captured at a ×20 magnification compared to 93.13% for those at ×40 magnification. Additionally, performance metrics such as Precision, Recall, and F1-Score consistently favor the ×20 magnification images, reinforcing the suitability of this level of magnification for representing the stratification of tongue tumor stages.

The study delves into model interpretation through visualization techniques. The Grad-CAM method provides insights into the areas of focus within the input images, showcasing the model’s learned patterns and its ability to discern relevant features, particularly in identifying single-cell plaques within the tongue tumor microenvironment.

Dimensionality reduction techniques, specifically the t-distributed stochastic neighbor embedding (TSNE) algorithm, further validate the Swin-T_5S model’s capability to distinctly separate the three categories in both 1536-dimensional and 3-dimensional spaces. This visualization aids in understanding the model’s classification decisions and its ability to identify misclassified samples.

Building on the success of the Swin-T_5S model, the study introduces an automatic identification system designed to facilitate clinical auxiliary diagnosis. This system incorporates a patch-based method, generating a heat map for the entire sample. The output is a 3-dimensional vector, and the category with the maximum prediction probability serves as the final diagnosis. Importantly, the system includes a category denoted as “Others,” providing a mechanism to flag samples with ambiguous predictions, thereby alerting healthcare professionals to potential areas requiring manual intervention and preventing misdiagnosis.

Clinical tests further validate the stability of the developed system. Confusion matrices and ROC curves highlight its reliability in comparison to specialist doctors, with the system consistently outperforming doctors with varying levels of experience. Notably, the system excels, particularly in the challenging category of leukoplakia, where clinical expertise plays a crucial role.

In summary, the study underscores the Swin-T_5S model’s exceptional accuracy, interpretability, and potential applicability in a real-world clinical setting. The automatic identification system based on this model demonstrates promising results, positioning it as a valuable tool for assisting healthcare professionals in diagnosing and stratifying tongue tumor stages.


This study leverages the Swin-T model, adapting it to develop an AI system for automatic forecasting of tongue tumor development stages. Employing a patch-based method, the model efficiently overcomes computational memory limitations, exhibiting a Top1 accuracy of 98.45% for images captured at ×20 magnification. Precision, Recall, and F1-Score metrics all exceed 0.978, indicating the system’s robust performance in intelligently identifying three types of tongue tumors: malignant, benign, and leukoplakia.

Post-testing, the system demonstrates stability, and its performance in a man-machine comparison proves to be excellent. However, the study acknowledges limitations, such as the absence of data from other tumor samples. This hinders the verification of the model’s applicability to other tumor types, emphasizing the need for further research validation. The model’s adaptability is highlighted, suggesting that adjustments can be made using enhanced approaches presented in this work, such as module and deep design, if the sample data aligns with specific conditions.

The study underscores the pivotal role of pathological examination, particularly when combined with digital whole slide imaging (DWSI), in the modern diagnosis of tumors. Compared to clinical physical examinations and magnetic resonance imaging, pathological examination, especially when enhanced by DWSI, provides crucial insights for analyzing and categorizing tumors as benign or malignant. Treatment strategies for tongue tumors vary significantly based on the tumor’s developmental stage. Malignant tumors typically necessitate a combination of therapeutic modalities, including extensive resection, neck lymph node dissection, radiotherapy, and chemotherapy. However, these methods carry potential side effects, ranging from bone marrow suppression to severe systemic complications, underscoring the importance of accurate tumor stage differentiation.

Manual observation of pathological slide images, relying on professional knowledge and experience, poses a risk of misjudgment, particularly in cases where the distinctions between malignant and benign characteristics are subtle. Consequently, the development of an AI-aided diagnostic system for tongue tumor stages becomes crucial. The study emphasizes the significance of accurate differentiation in treatment planning to enhance patient survival and minimize potential side effects associated with aggressive therapies.

For focus localization of tongue tumors, the model exhibits comprehensive coverage using the Class Activation Map (CAM) method. CAM highlights regions with annotated features, offering a strong correlation to predictions and providing insights into the tongue tumor microenvironment. The darker red color indicates increased importance, particularly in malignant types. This visual approach aids in revealing the intricacies of the lesion area, acknowledging the subjective nature of such assessments among experts.

In this investigation, an innovative AI model, grounded in the Swin-T framework, was meticulously crafted for the automated classification of multi-stage development in tongue tumors. The model effectively utilized H&E-stained digital whole slide images (DWSI), achieving an impressive Top1 accuracy of 98.45% specifically for images captured at ×20 magnification. Notably, the study incorporated visual analyses through the Grad-CAM and TSNE methods to enhance the interpretability of the model’s predictions.

A noteworthy outcome of this research is the development of a user-friendly system, seamlessly integrating AI capabilities for the automatic categorization of tongue tumors. This system holds the potential to serve as a valuable auxiliary tool for clinicians, offering automated insights into the various developmental stages of tongue tumors. The high accuracy achieved, especially at a magnification of ×20, underscores the model’s effectiveness in providing reliable and efficient assistance in the diagnosis and classification of tongue tumors.

Oncology Related Tools


Latest Research

Tongue Tumor

About Author

Similar Articles

Leave a Reply