Most general sentiment analysers degrade quality when tested on Tweets in the broadcast domain. This domain covers both radio and television broadcast. This paper proposes domain-specific data in the broadcast domain. Furthermore, it proposes the use of machine learning methods for the sentiment analysis of tweets in this domain. Data were collected from Twitter using Twitter application programming interfaces. The data were preprocessed, and most special characters and emoticons were not removed, as sentiment analysis involves the use of opinions and emotions which are expressed using emoticons and other characters. The data were automatically labelled using a pre-trained sentiment analyser to enable the use of supervised learning on the data. Two supervised machine learning methods, such as XGBoost and multinomial logistic regression (MLR), are trained and evaluated on the data. The performance of the models was affected by two factors; limited data and the use of a general sentiment analyser to label the data in a specific domain.
Reference:
Sefara, T.J. & Rangata, M.R. 2023. Domain-specific sentiment analysis of tweets using machine learning methods. Communications in Computer and Information Science, 1935. http://hdl.handle.net/10204/13518
Sefara, T. J., & Rangata, M. R. (2023). Domain-specific sentiment analysis of tweets using machine learning methods. Communications in Computer and Information Science, 1935, http://hdl.handle.net/10204/13518
Sefara, Tshephisho Joseph, and Mapitsi R Rangata "Domain-specific sentiment analysis of tweets using machine learning methods." Communications in Computer and Information Science, 1935 (2023) http://hdl.handle.net/10204/13518
Sefara TJ, Rangata MR. Domain-specific sentiment analysis of tweets using machine learning methods. Communications in Computer and Information Science, 1935. 2023; http://hdl.handle.net/10204/13518.