World’s First AI Model Achieves Over 90% Accuracy in Thyroid Cancer Diagnosis

A cross-disciplinary group of researchers has introduced the globe’s premier AI model aimed at categorizing both the progression phase and risk level of thyroid cancer, attaining remarkable precision surpassing 90 percent.

This new AI model aims to substantially reduce the prep time for frontline clinicians before consultations by around 50%. The findings are
published
in the journal

npj Digital Medicine

, with team members hailing from the LKS Faculty of Medicine at the University of Hong Kong (HKUMed), the InnoHK Laboratory of Data Discovery for Health (InnoHK D24H), as well as the London School of Hygiene & Tropical Medicine (LSHTM).

Thyroid cancer ranks as one of the most common types of cancer both in Hong Kong and worldwide. Effective treatment strategies for this condition frequently depend on two key systems: first, the 8th edition of the AJCC or TNM staging system used to ascertain the stage of the cancer; secondly, the ATA risk stratification framework employed to classify the level of cancer risk.

These systems are vital for forecasting patient survival rates and steering therapeutic choices. Nonetheless, manually merging intricate clinical data into these frameworks can be tedious and inefficient.

The research group created an AI assistant utilizing large language models (LLMs) such as ChatGPT and DeepSeek, engineered to comprehend and handle human language, aimed at examining medical records for improving the precision and speed of diagnosing thyroid cancer stages and categorizing risks associated with them.

This model utilizes four pre-existing open-source large language models—Mistral (from Mistral AI), Llama (developed by Meta), Gemma (by Google), and Qwen (created by Alibaba)—for analyzing textual content within clinical documentation. The artificial intelligence system underwent training using an accessible dataset sourced from the U.S., focusing specifically on pathology reports related to 50 thyroid cancer patients included in the Cancer Genome Atlas Program (TCGA). Following this initial phase, the model’s effectiveness was further evaluated through testing conducted on additional pathology records associated with 289 TCGA participants as well as 35 hypothetical scenarios generated by specialists in endocrinology.

By integrating the outputs from all four large language models, the researchers enhanced the overall effectiveness of the artificial intelligence system. This resulted in an accuracy range of 88.5% to 100% for classifying ATA risks and between 92.9% and 98.1% for AJCC cancer stages. It’s anticipated that this progress will reduce the time required for clinicians to prepare documents before consultations by approximately half compared to conventional manual review methods.

Professor Joseph T Wu, who holds the position of Sir Kotewall Professor in Public Health and serves as the Managing Director of InnoHK D24H at HKUMed, highlighted the exceptional capabilities of their new system. He stated, “The model demonstrates over 90% precision when categorizing both AJCC cancer stages and ATA risk levels.” Additionally, he pointed out an important benefit: “This model can function independently from external networks, enabling localized use without necessitating the sharing or uploading of confidential patient data, thus ensuring optimal confidentiality for patients.”

“Given the recent launch of DeepSeek, we carried out additional comparative testing using a ‘zero-shot approach’ with the newest iterations of DeepSeek known as R1 and V3, along with GPT-4o. It was encouraging to discover that our model matched their performance,” stated Professor Wu.

Dr. Fung Man-him, who serves as a clinical associate professor and head of endocrine surgery at the Department of Surgery within the School of Clinical Medicine at HKUMed, noted, “Our AI model not only excels in accurately interpreting intricate details from pathology reports, surgical logs, and patient charts but also slashes the preparatory workload for physicians nearly in half when contrasted with manual analysis. Additionally, this technology can concurrently offer cancer staging along with clinical risk assessment using two globally accepted clinical frameworks.”

“The AI model is versatile and could be readily integrated into various settings in the public and private sectors, and both local and international health care and research institutes,” said Dr. Fung. “We are optimistic that the real-world implementation of this AI model could enhance the efficiency of frontline clinicians and improve the quality of care. In addition, doctors will have more time to counsel with their patients.”

“In line with government’s strong advocacy of AI adoption in health care, as exemplified by the recent launch of LLM-based medical report writing system in the Hospital Authority, our next step is to evaluate the performance of this AI assistant with a large amount of real-world patient data.

“After validation, the AI model can be easily implemented in actual clinical environments and hospitals to assist healthcare providers in enhancing both operational and treatment efficiencies,” clarified Dr. Carlos Wong, who serves as an Honorary Associate Professor in the Department of Family Medicine and Primary Care at the School of Clinical Medicine, HKUMed.


More information:

M. H. Fung et al., Creating a named entity structure for categorizing thyroid cancer stages and determining risk levels utilizing large language models, Matrix

npj Digital Medicine

(2025).
DOI: 10.1038/s41746-025-01528-y

Furnished by The University of Hong Kong


The tale was initially released on
Medical Xpress
. Subscribe to our
newsletter
For the most recent science and technology news updates.

Leave a Reply

Your email address will not be published. Required fields are marked *