Sefara, Tshephisho JRangata, Mapitsi R2023-09-222023-09-222023-08Sefara, T.J. & Rangata, M.R. 2023. Topic classification of tweets in the broadcasting domain using machine learning methods. http://hdl.handle.net/10204/13087 .979-8-3503-1480-9DOI: 10.1109/icABCD59051.2023.10220553http://hdl.handle.net/10204/13087Twitter is one of the microblogging sites with millions of daily users. Broadcast companies use Twitter to share short messages to engage or share opinions about a particular topic or product. With a large number of conversations available on Twitter, it is difficult to identify the category of topics in the broadcasting domain. This paper proposes the use of unsupervised learning to generate topics from unlabelled tweet data sets in the broadcasting domain using the latent Dirichlet allocation (LDA) method. Approximately six groups of topics were generated and each group was assigned a label or category. These labels were used to label the data by finding the dominating label in each tweet as the main category. Supervised learning was conducted to train six machine learning models which are multinomial logistic regression, XGBoost, decision trees, random forest, support vector machines, and multilayer perceptron (MLP). The models were able to learn from the data to predict the category of each tweet from the testing data. The models were evaluated using accuracy and the f1 score. Linear support vector machine and MLP obtained better classification results compared to other trained models.FulltextenTopic modellingMachine learningNatural Language ProcessingTwitterTopic classificationTopic classification of tweets in the broadcasting domain using machine learning methodsConference PresentationSefara, T. J., & Rangata, M. R. (2023). Topic classification of tweets in the broadcasting domain using machine learning methods. http://hdl.handle.net/10204/13087Sefara, Tshephisho J, and Mapitsi R Rangata. "Topic classification of tweets in the broadcasting domain using machine learning methods." <i>2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 3-4 August 2023</i> (2023): http://hdl.handle.net/10204/13087Sefara TJ, Rangata MR, Topic classification of tweets in the broadcasting domain using machine learning methods; 2023. http://hdl.handle.net/10204/13087 .TY - Conference Presentation AU - Sefara, Tshephisho J AU - Rangata, Mapitsi R AB - Twitter is one of the microblogging sites with millions of daily users. Broadcast companies use Twitter to share short messages to engage or share opinions about a particular topic or product. With a large number of conversations available on Twitter, it is difficult to identify the category of topics in the broadcasting domain. This paper proposes the use of unsupervised learning to generate topics from unlabelled tweet data sets in the broadcasting domain using the latent Dirichlet allocation (LDA) method. Approximately six groups of topics were generated and each group was assigned a label or category. These labels were used to label the data by finding the dominating label in each tweet as the main category. Supervised learning was conducted to train six machine learning models which are multinomial logistic regression, XGBoost, decision trees, random forest, support vector machines, and multilayer perceptron (MLP). The models were able to learn from the data to predict the category of each tweet from the testing data. The models were evaluated using accuracy and the f1 score. Linear support vector machine and MLP obtained better classification results compared to other trained models. DA - 2023-08 DB - ResearchSpace DP - CSIR J1 - 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 3-4 August 2023 KW - Topic modelling KW - Machine learning KW - Natural Language Processing KW - Twitter KW - Topic classification LK - https://researchspace.csir.co.za PY - 2023 SM - 979-8-3503-1480-9 T1 - Topic classification of tweets in the broadcasting domain using machine learning methods TI - Topic classification of tweets in the broadcasting domain using machine learning methods UR - http://hdl.handle.net/10204/13087 ER -27061