dc.contributor.author |
Rangata, Mapitsi R
|
|
dc.contributor.author |
Sefara, Tshephisho J
|
|
dc.date.accessioned |
2024-03-19T07:44:10Z |
|
dc.date.available |
2024-03-19T07:44:10Z |
|
dc.date.issued |
2024-02 |
|
dc.identifier.citation |
Rangata, M.R. & Sefara, T.J. 2024. Classification of exaggerated news headlines. <i>Communications in Computer and Information Science, 2030.</i> http://hdl.handle.net/10204/13643 |
en_ZA |
dc.identifier.isbn |
978-3-031-53730-1 |
|
dc.identifier.issn |
1865-0929 |
|
dc.identifier.issn |
1865-0937 |
|
dc.identifier.uri |
https://doi.org/10.1007/978-3-031-53731-8_20
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/13643
|
|
dc.description.abstract |
The amount of data online is increasing as companies generate news articles daily. These news articles contain headlines that have a level of exaggeration aimed to win the readers. In addition, these companies are competing against one another; hence creating appealing and exaggerated news headlines is one of the options to win the readers. Some of the exaggerated headlines contain some level of misleading information. Hence, this paper aims to apply machine learning methods and natural language processing to detect and identify exaggerated news headlines in South African context. Machine learning models such as logistic regression, decision trees, support vector machines, and XGBoost are trained on data that contain labelled news headlines as binary classification. The models produced good results, with XGboost and SVM obtaining 70% in terms of accuracy. Furthermore, the F measure was used to evaluate the models and decision trees obtained 56% followed by SVM with 53%. The classification of exaggerated news headlines is a difficult task. Therefore, we oversampled the data to obtain balanced labels. The performance of the models was increased. SVM obtained 84% followed by logistic regression, XGBoost, and decision trees with accuracy of 78%, 72% and 71%, respectively. |
en_US |
dc.format |
Fulltext |
en_US |
dc.language.iso |
en |
en_US |
dc.relation.uri |
https://link.springer.com/chapter/10.1007/978-3-031-53731-8_20 |
en_US |
dc.source |
Communications in Computer and Information Science, 2030 |
en_US |
dc.subject |
Online data increase |
en_US |
dc.subject |
News headlines |
en_US |
dc.subject |
Machine learning |
en_US |
dc.subject |
Natural language |
en_US |
dc.subject |
Exaggerated news |
en_US |
dc.title |
Classification of exaggerated news headlines |
en_US |
dc.type |
Article |
en_US |
dc.description.pages |
248–260 |
en_US |
dc.description.note |
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG. This is the preprint version of the published item. |
en_US |
dc.description.cluster |
Next Generation Enterprises & Institutions |
en_US |
dc.description.impactarea |
Data Science |
en_US |
dc.identifier.apacitation |
Rangata, M. R., & Sefara, T. J. (2024). Classification of exaggerated news headlines. <i>Communications in Computer and Information Science, 2030</i>, http://hdl.handle.net/10204/13643 |
en_ZA |
dc.identifier.chicagocitation |
Rangata, Mapitsi R, and Tshephisho J Sefara "Classification of exaggerated news headlines." <i>Communications in Computer and Information Science, 2030</i> (2024) http://hdl.handle.net/10204/13643 |
en_ZA |
dc.identifier.vancouvercitation |
Rangata MR, Sefara TJ. Classification of exaggerated news headlines. Communications in Computer and Information Science, 2030. 2024; http://hdl.handle.net/10204/13643. |
en_ZA |
dc.identifier.ris |
TY - Article
AU - Rangata, Mapitsi R
AU - Sefara, Tshephisho J
AB - The amount of data online is increasing as companies generate news articles daily. These news articles contain headlines that have a level of exaggeration aimed to win the readers. In addition, these companies are competing against one another; hence creating appealing and exaggerated news headlines is one of the options to win the readers. Some of the exaggerated headlines contain some level of misleading information. Hence, this paper aims to apply machine learning methods and natural language processing to detect and identify exaggerated news headlines in South African context. Machine learning models such as logistic regression, decision trees, support vector machines, and XGBoost are trained on data that contain labelled news headlines as binary classification. The models produced good results, with XGboost and SVM obtaining 70% in terms of accuracy. Furthermore, the F measure was used to evaluate the models and decision trees obtained 56% followed by SVM with 53%. The classification of exaggerated news headlines is a difficult task. Therefore, we oversampled the data to obtain balanced labels. The performance of the models was increased. SVM obtained 84% followed by logistic regression, XGBoost, and decision trees with accuracy of 78%, 72% and 71%, respectively.
DA - 2024-02
DB - ResearchSpace
DP - CSIR
J1 - Communications in Computer and Information Science, 2030
KW - Online data increase
KW - News headlines
KW - Machine learning
KW - Natural language
KW - Exaggerated news
LK - https://researchspace.csir.co.za
PY - 2024
SM - 978-3-031-53730-1
SM - 1865-0929
SM - 1865-0937
T1 - Classification of exaggerated news headlines
TI - Classification of exaggerated news headlines
UR - http://hdl.handle.net/10204/13643
ER -
|
en_ZA |
dc.identifier.worklist |
27686 |
en_US |