Mbooi, Mahlatse SRangata, Mapitsi RSefara, Tshephisho J2024-07-302024-07-302024-03Mbooi, M.S., Rangata, M.R. & Sefara, T.J. 2024. Topic modelling of short texts in the health domain using LDA and bard. http://hdl.handle.net/10204/13735 .979-8-3503-1491-5979-8-3503-1492-2DOI: 10.1109/ICTAS59620.2024.1050711610.1109/ICTAS59620.2024http://hdl.handle.net/10204/13735This paper proposes a model for the topic modelling of tweets in the health and mental health domain using the Latent Dirichlet Allocation (LDA) method. The data were obtained from the sentiment140 project. The data were prepared for topic modelling by performing Natural Language Processing (NLP) tasks such as stemming and data cleaning. LDA method was trained on the data to create a cluster of topics. We explored 1 to 6 clusters and, after thorough analysis, three topics were chosen to create the LDA model. Each topic was labelled with a label name that is generated using Bard and coding analysis. This method can be used to label unlabelled data without using sophisticated supervised machine learning methods. Labelled data can be used to improve data management, information retrieval, supervised machine learning, and other techniques.AbstractenTopic modellingLatent dirichlet allocationNatural Language ProcessingNLPTopic modelling of short texts in the health domain using LDA and bardConference PresentationMbooi, M. S., Rangata, M. R., & Sefara, T. J. (2024). Topic modelling of short texts in the health domain using LDA and bard. http://hdl.handle.net/10204/13735Mbooi, Mahlatse S, Mapitsi R Rangata, and Tshephisho J Sefara. "Topic modelling of short texts in the health domain using LDA and bard." <i>2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 7-8 March 2024</i> (2024): http://hdl.handle.net/10204/13735Mbooi MS, Rangata MR, Sefara TJ, Topic modelling of short texts in the health domain using LDA and bard; 2024. http://hdl.handle.net/10204/13735 .TY - Conference Presentation AU - Mbooi, Mahlatse S AU - Rangata, Mapitsi R AU - Sefara, Tshephisho J AB - This paper proposes a model for the topic modelling of tweets in the health and mental health domain using the Latent Dirichlet Allocation (LDA) method. The data were obtained from the sentiment140 project. The data were prepared for topic modelling by performing Natural Language Processing (NLP) tasks such as stemming and data cleaning. LDA method was trained on the data to create a cluster of topics. We explored 1 to 6 clusters and, after thorough analysis, three topics were chosen to create the LDA model. Each topic was labelled with a label name that is generated using Bard and coding analysis. This method can be used to label unlabelled data without using sophisticated supervised machine learning methods. Labelled data can be used to improve data management, information retrieval, supervised machine learning, and other techniques. DA - 2024-03 DB - ResearchSpace DP - CSIR J1 - 2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 7-8 March 2024 KW - Topic modelling KW - Latent dirichlet allocation KW - Natural Language Processing KW - NLP LK - https://researchspace.csir.co.za PY - 2024 SM - 979-8-3503-1491-5 SM - 979-8-3503-1492-2 T1 - Topic modelling of short texts in the health domain using LDA and bard TI - Topic modelling of short texts in the health domain using LDA and bard UR - http://hdl.handle.net/10204/13735 ER -27926