Doctoral Dissertations

Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Duke University


Department of Biomedical Engineering

First Advisor

Joseph Y. Lo, Ph.D.


health and environmental sciences, applied sciences, artificial intelligence, breast cancer, computer-aided diagnosis, machine learning

Subject Categories

Artificial Intelligence and Robotics | Biomedical Engineering and Bioengineering | Computer Sciences | Oncology | Radiology


The purpose of this study was to improve breast cancer diagnosis by reducing the number of benign biopsies performed. To this end, we investigated modular and ensemble systems of machine learning methods for computer-aided diagnosis (CAD) of breast cancer. A modular system partitions the input space into smaller domains, each of which is handled by a local model. An ensemble system uses multiple models for the same cases and combines the models' predictions.

Five supervised machine learning techniques (LDA, SVM, BP-ANN, CBR, CART) were trained to predict the biopsy outcome from mammographic findings (BIRADS™) and patient age based on a database of 2258 cases mixed from multiple institutions. The generalization of the models was tested on second set of 2177 cases. Clusters were identified in the database using a priori knowledge and unsupervised learning methods (agglomerative hierarchical clustering followed by K-Means, SOM, AutoClass). The performance of the global models over the clusters was examined and local models were trained for clusters.

While some local models were superior to some global models, we were unable to build a modular CAD system that was better than the global BP-ANN model. The ensemble systems based on simplistic combination schemes did not result in significant improvements and more complicated combination schemes were found to be unduly optimistic. One of the most striking results of this dissertation was that CAD systems trained on a mixture of lesion types performed much better on masses than on calcifications. Our study of the institutional effects suggests that models built on cases mixed between institutions may overcome some of the weaknesses of models built on cases from a single institution. It was suggestive that each of the unsupervised methods identified a cluster of younger women with well-circumscribed or obscured, oval-shaped masses that accounted for the majority of the BP-ANN’s recommendations for follow up. From the cluster analysis and the CART models, we determined a simple diagnostic rule that performed comparably to the global BP-ANN. Approximately 98% sensitivity could be maintained while providing approximately 26% specificity. This should be compared to the clinical status quo of 100% sensitivity and 0% specificity on this database of indeterminate cases already referred to biopsy.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.