ResearchSpace

Classification of exaggerated news headlines

Show simple item record

dc.contributor.author Rangata, Mapitsi R
dc.contributor.author Sefara, Tshephisho J
dc.date.accessioned 2024-03-19T07:44:10Z
dc.date.available 2024-03-19T07:44:10Z
dc.date.issued 2024-02
dc.identifier.citation Rangata, M.R. & Sefara, T.J. 2024. Classification of exaggerated news headlines. <i>Communications in Computer and Information Science, 2030.</i> http://hdl.handle.net/10204/13643 en_ZA
dc.identifier.isbn 978-3-031-53730-1
dc.identifier.issn 1865-0929
dc.identifier.issn 1865-0937
dc.identifier.uri https://doi.org/10.1007/978-3-031-53731-8_20
dc.identifier.uri http://hdl.handle.net/10204/13643
dc.description.abstract The amount of data online is increasing as companies generate news articles daily. These news articles contain headlines that have a level of exaggeration aimed to win the readers. In addition, these companies are competing against one another; hence creating appealing and exaggerated news headlines is one of the options to win the readers. Some of the exaggerated headlines contain some level of misleading information. Hence, this paper aims to apply machine learning methods and natural language processing to detect and identify exaggerated news headlines in South African context. Machine learning models such as logistic regression, decision trees, support vector machines, and XGBoost are trained on data that contain labelled news headlines as binary classification. The models produced good results, with XGboost and SVM obtaining 70% in terms of accuracy. Furthermore, the F measure was used to evaluate the models and decision trees obtained 56% followed by SVM with 53%. The classification of exaggerated news headlines is a difficult task. Therefore, we oversampled the data to obtain balanced labels. The performance of the models was increased. SVM obtained 84% followed by logistic regression, XGBoost, and decision trees with accuracy of 78%, 72% and 71%, respectively. en_US
dc.format Fulltext en_US
dc.language.iso en en_US
dc.relation.uri https://link.springer.com/chapter/10.1007/978-3-031-53731-8_20 en_US
dc.source Communications in Computer and Information Science, 2030 en_US
dc.subject Online data increase en_US
dc.subject News headlines en_US
dc.subject Machine learning en_US
dc.subject Natural language en_US
dc.subject Exaggerated news en_US
dc.title Classification of exaggerated news headlines en_US
dc.type Article en_US
dc.description.pages 248–260 en_US
dc.description.note © 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG. This is the preprint version of the published item. en_US
dc.description.cluster Next Generation Enterprises & Institutions en_US
dc.description.impactarea Data Science en_US
dc.identifier.apacitation Rangata, M. R., & Sefara, T. J. (2024). Classification of exaggerated news headlines. <i>Communications in Computer and Information Science, 2030</i>, http://hdl.handle.net/10204/13643 en_ZA
dc.identifier.chicagocitation Rangata, Mapitsi R, and Tshephisho J Sefara "Classification of exaggerated news headlines." <i>Communications in Computer and Information Science, 2030</i> (2024) http://hdl.handle.net/10204/13643 en_ZA
dc.identifier.vancouvercitation Rangata MR, Sefara TJ. Classification of exaggerated news headlines. Communications in Computer and Information Science, 2030. 2024; http://hdl.handle.net/10204/13643. en_ZA
dc.identifier.ris TY - Article AU - Rangata, Mapitsi R AU - Sefara, Tshephisho J AB - The amount of data online is increasing as companies generate news articles daily. These news articles contain headlines that have a level of exaggeration aimed to win the readers. In addition, these companies are competing against one another; hence creating appealing and exaggerated news headlines is one of the options to win the readers. Some of the exaggerated headlines contain some level of misleading information. Hence, this paper aims to apply machine learning methods and natural language processing to detect and identify exaggerated news headlines in South African context. Machine learning models such as logistic regression, decision trees, support vector machines, and XGBoost are trained on data that contain labelled news headlines as binary classification. The models produced good results, with XGboost and SVM obtaining 70% in terms of accuracy. Furthermore, the F measure was used to evaluate the models and decision trees obtained 56% followed by SVM with 53%. The classification of exaggerated news headlines is a difficult task. Therefore, we oversampled the data to obtain balanced labels. The performance of the models was increased. SVM obtained 84% followed by logistic regression, XGBoost, and decision trees with accuracy of 78%, 72% and 71%, respectively. DA - 2024-02 DB - ResearchSpace DP - CSIR J1 - Communications in Computer and Information Science, 2030 KW - Online data increase KW - News headlines KW - Machine learning KW - Natural language KW - Exaggerated news LK - https://researchspace.csir.co.za PY - 2024 SM - 978-3-031-53730-1 SM - 1865-0929 SM - 1865-0937 T1 - Classification of exaggerated news headlines TI - Classification of exaggerated news headlines UR - http://hdl.handle.net/10204/13643 ER - en_ZA
dc.identifier.worklist 27686 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record