Using NLP (Natural Language Processing) and Models Like TF-IDF (Term Frequency – Inverse Document Frequency), GloVe (Global Vectors for Word Representation), Open AI’s GPT, and Sentence-BERT (Bidirectional Encode Representations from Transformers) to Sort Through and Organize the Search Queries to Prevent Question Repeats in StackOverflow

Session Number

CMPS 41

Advisor(s)

Dr. Phadmakar Patankar, Illinois Mathematics and Science Academy

Discipline

Computer Science

Start Date

17-4-2024 10:45 AM

End Date

17-4-2024 11:00 AM

Abstract

This research presents an overview for search query management in StackOverflow, a popular platform for programming in which users can ask and answer questions about their code. With the use of Natural

Language Processing (NLP) techniques, and models including TF-IDF (Term Frequency – Inverse Document Frequency), GloVe (Global Vectors for Word Representation), OpenAI’s GPT, and Sentence-BERT (Bidirectional Encoder Representations from Transformers), the research aims to effectively sort and organize search queries to prevent questions from being repeated in a different way. The TF-IDF method constructs a robust document-term matrix to quantify term importance, while GloVe enhances comprehension by converting words into vector representation. OpenAI’s GPT model generates contextually coherent responses, and Sentence-BERT allows for the comparison of semantic similarities to detect duplicate questions. Through integration of these methods, the research enhances search query management, ensuring efficient information retrieval and improved user experience on StackOverflow. The evaluation findings on real-world datasets highlight the effectiveness of the proposed method in reducing duplicate questions and optimizing query resolution processes. This research enhances search features in online technical forums, providing practical tips to boost user interaction and knowledge sharing in programming communities.

Share

COinS
 
Apr 17th, 10:45 AM Apr 17th, 11:00 AM

Using NLP (Natural Language Processing) and Models Like TF-IDF (Term Frequency – Inverse Document Frequency), GloVe (Global Vectors for Word Representation), Open AI’s GPT, and Sentence-BERT (Bidirectional Encode Representations from Transformers) to Sort Through and Organize the Search Queries to Prevent Question Repeats in StackOverflow

This research presents an overview for search query management in StackOverflow, a popular platform for programming in which users can ask and answer questions about their code. With the use of Natural

Language Processing (NLP) techniques, and models including TF-IDF (Term Frequency – Inverse Document Frequency), GloVe (Global Vectors for Word Representation), OpenAI’s GPT, and Sentence-BERT (Bidirectional Encoder Representations from Transformers), the research aims to effectively sort and organize search queries to prevent questions from being repeated in a different way. The TF-IDF method constructs a robust document-term matrix to quantify term importance, while GloVe enhances comprehension by converting words into vector representation. OpenAI’s GPT model generates contextually coherent responses, and Sentence-BERT allows for the comparison of semantic similarities to detect duplicate questions. Through integration of these methods, the research enhances search query management, ensuring efficient information retrieval and improved user experience on StackOverflow. The evaluation findings on real-world datasets highlight the effectiveness of the proposed method in reducing duplicate questions and optimizing query resolution processes. This research enhances search features in online technical forums, providing practical tips to boost user interaction and knowledge sharing in programming communities.