Automated Classification of Acute Myeloid Leukemia via Random Forest Analysis of Cytomorphological Images
Session Number
1
Advisor(s)
Dr. Mitchell Convery, Dr. Alex Rodriguez, Mr. Ravi Maddur Argonne National Laboratory, Melon, Illinois, United State
Location
A113
Discipline
Computer Science
Start Date
15-4-2026 10:15 AM
End Date
15-4-2026 11:00 AM
Abstract
Acute myeloid leukemia (AML) is an aggressive cancer in which early diagnosis is critical to patient outcomes. Manual analysis of bone marrow and blood smears remains the standard method of diagnosis. In this study, a random forest model was developed to classify AML cell images as malignant or non-malignant using the AML-Cytomorphology dataset from the Munich Leukemia Laboratory (MLL) at Helmholtz Zentrum München. Structural features were extracted from single-cell images and refined using Principal Component Analysis (PCA) before classification. To address the class imbalance in the dataset where cancer images outnumbered control images by approximately 3:1, the majority class was under-sampled to achieve a balanced training set of 20,305 images. A Random Forest classifier trained on this dataset achieved 78% balanced accuracy. These results suggest that feature extraction combined with balanced sampling provides a foundation for AI screening of AML from cytomorphological images. Future work could use patient metadata to further improve classification performance.
Automated Classification of Acute Myeloid Leukemia via Random Forest Analysis of Cytomorphological Images
A113
Acute myeloid leukemia (AML) is an aggressive cancer in which early diagnosis is critical to patient outcomes. Manual analysis of bone marrow and blood smears remains the standard method of diagnosis. In this study, a random forest model was developed to classify AML cell images as malignant or non-malignant using the AML-Cytomorphology dataset from the Munich Leukemia Laboratory (MLL) at Helmholtz Zentrum München. Structural features were extracted from single-cell images and refined using Principal Component Analysis (PCA) before classification. To address the class imbalance in the dataset where cancer images outnumbered control images by approximately 3:1, the majority class was under-sampled to achieve a balanced training set of 20,305 images. A Random Forest classifier trained on this dataset achieved 78% balanced accuracy. These results suggest that feature extraction combined with balanced sampling provides a foundation for AI screening of AML from cytomorphological images. Future work could use patient metadata to further improve classification performance.