Session III

Distributed Classification by Divide and Conquer Approach

Max Chen '25, Illinois Mathematics and Science AcademyFollow

Session Number

CMPS 19

Advisor(s)

Qiang Wu, Middle Tennessee State University

Discipline

Computer Science

Start Date

17-4-2024 11:05 AM

End Date

17-4-2024 11:20 AM

Abstract

In this paper, we investigate the efficacy of the divide and conquer approach for implementing distributed logistic regression and distributed support vector machine (SVM) algorithms for classification of large-scale datasets. This approach is designed to handle datasets that exceed thecapacity of a single processor, necessitating the partitioning of data into multiple subsets. Logistic regression or SVM is then applied to each subset, yielding individual local classifiers. Subsequently, a global classifier is derived by aggregating these local classifiers to make the final decision. We propose three strategies for the aggregation stage: voting based on predicted labels, averaging of real-valued predictions, and averaging of posterior probabilities. Our analysis reveals that for distributed logistic regression, probability averaging is the most robust approach and is therefore recommended. Conversely, in the context of distributed SVM, probability averaging requires additional modeling but has a minimal impact on the performance. Therefore, functional averaging is recommended instead.

COinS

Apr 17th, 11:05 AM Apr 17th, 11:20 AM

Distributed Classification by Divide and Conquer Approach

Session III

Distributed Classification by Divide and Conquer Approach

Session Number

Advisor(s)

Discipline

Start Date

End Date

Abstract

Browse

Search

Author Corner

Links

Links

Session III

Distributed Classification by Divide and Conquer Approach

Presenter Information

Session Number

Advisor(s)

Discipline

Start Date

End Date

Abstract

Share

Browse

Search

Author Corner

Links

Links