Deep Learning-Based Gleason Classification of Prostate Cancer using Phikon-v2*
Session Number
1
Advisor(s)
: Dr. Alex Rodriguez, Dr. Mitchell Conery, Ravi Madduri, Argonne National Laboratory
Location
A1113
Discipline
Computer Science
Start Date
15-4-2026 10:15 AM
End Date
15-4-2026 11:00 AM
Abstract
Prostate cancer (PrCa) is the second leading cause of cancer-related death in American men. Although mechanisms to determine PrCa aggressiveness exist, they are subject to significant inter-observer variability. One example is Gleason grading, which determines PrCa severity based on glandular morphology. Scoring variability, which occurs even between experienced pathologists, poses a challenge when developing treatments. A promising way to mitigate Gleason scores’ subjectiveness is with artificial intelligence (AI), as past research has shown that AI can improve cancer diagnosis. Therefore, this project develops a multilayer perceptron (MLP) that predicts prostate cancer severity. An MLP, created on Polaris, was trained on 4,712 TCGA histology slides using an 80/20 train-test split with the following distribution: Gleason scores 6 (n=105), 7 (n=513), 8 (n=98), 9 (n=256), 10 (n=6). To utilize domain-specific visual patterns, the foundation model Phikon-v2 was used as a frozen feature extractor. The MLP achieved 93% overall accuracy with a macro average F1 score of 0.94. However, analysis of the confusion matrix indicates that the model was able to identify 6/6 Gleason score 10 images due to the class-imbalance. These results demonstrate that pathology-specific foundation models can offer a standardized tool to assist in clinical decision-making with enough data diversity.
Deep Learning-Based Gleason Classification of Prostate Cancer using Phikon-v2*
A1113
Prostate cancer (PrCa) is the second leading cause of cancer-related death in American men. Although mechanisms to determine PrCa aggressiveness exist, they are subject to significant inter-observer variability. One example is Gleason grading, which determines PrCa severity based on glandular morphology. Scoring variability, which occurs even between experienced pathologists, poses a challenge when developing treatments. A promising way to mitigate Gleason scores’ subjectiveness is with artificial intelligence (AI), as past research has shown that AI can improve cancer diagnosis. Therefore, this project develops a multilayer perceptron (MLP) that predicts prostate cancer severity. An MLP, created on Polaris, was trained on 4,712 TCGA histology slides using an 80/20 train-test split with the following distribution: Gleason scores 6 (n=105), 7 (n=513), 8 (n=98), 9 (n=256), 10 (n=6). To utilize domain-specific visual patterns, the foundation model Phikon-v2 was used as a frozen feature extractor. The MLP achieved 93% overall accuracy with a macro average F1 score of 0.94. However, analysis of the confusion matrix indicates that the model was able to identify 6/6 Gleason score 10 images due to the class-imbalance. These results demonstrate that pathology-specific foundation models can offer a standardized tool to assist in clinical decision-making with enough data diversity.