Detecting Expert Users in Stack Exchange Using Machine Learning Presenter(s)
Session Number
CMPS(ai) 16
Advisor(s)
Dr. Courtland VanDam, Mr. Rohan Leekha, Dr. Timothy Reid,Mr. Nour Jedidi, MIT Lincoln Laboratory
Discipline
Computer Science
Start Date
17-4-2025 10:30 AM
End Date
17-4-2025 10:45 AM
Abstract
Online question-answering platforms, such as StackExchange, have grown rapidly in recent years, making it necessary to identify the credibility of users and the information they share online to maintain trust within these communities. This issue can be addressed through accurate expert detection methods to determine whether or not users are experts in a certain field. For our study, we conducted analyses on a dataset consisting of various posts and comments written by over 10,000 StackExchange users to identify which classification techniques can most accurately distinguish between the written contributions of experts and non-experts. After comparing 12 different methods, we discovered that transformer-based embeddings, ensemble learning, and Naive Bayes models achieved higher f1-scores. We conducted an ablation study of these three approaches and found that using Gemini embeddings helped maintain a high detection rate even when the class imbalance became more skewed to reality. Although expert detection remains a challenging task, our study provides promising results for accurately identifying the expertise of StackExchange users. Future analysis could include metadata (e.g. users’ voting behavior or whose posts they comment on) along with their written contributions or utilizing transfer learning to test our model performance on other online platforms, like Reddit or Quora.
Detecting Expert Users in Stack Exchange Using Machine Learning Presenter(s)
Online question-answering platforms, such as StackExchange, have grown rapidly in recent years, making it necessary to identify the credibility of users and the information they share online to maintain trust within these communities. This issue can be addressed through accurate expert detection methods to determine whether or not users are experts in a certain field. For our study, we conducted analyses on a dataset consisting of various posts and comments written by over 10,000 StackExchange users to identify which classification techniques can most accurately distinguish between the written contributions of experts and non-experts. After comparing 12 different methods, we discovered that transformer-based embeddings, ensemble learning, and Naive Bayes models achieved higher f1-scores. We conducted an ablation study of these three approaches and found that using Gemini embeddings helped maintain a high detection rate even when the class imbalance became more skewed to reality. Although expert detection remains a challenging task, our study provides promising results for accurately identifying the expertise of StackExchange users. Future analysis could include metadata (e.g. users’ voting behavior or whose posts they comment on) along with their written contributions or utilizing transfer learning to test our model performance on other online platforms, like Reddit or Quora.