Building a dataset for misinformation detection in the low-resource language

Mukwevho, MRananga, SMbooi, Mahlatse SIsong, BMarivate, V2024-09-162024-09-162024-05Mukwevho, M., Rananga, S., Mbooi, M.S., Isong, B. & Marivate, V. 2024. Building a dataset for misinformation detection in the low-resource language. http://hdl.handle.net/10204/13760 .http://hdl.handle.net/10204/13760In the modern digital age, the widespread dissemination of misinformation has become a serious issue. Most focus in identifying misinformation online has been targeted at the English language in contrast to low-resource languages like Tshivenda. In this paper, we create a new dataset for news in the Tshivenda language to assist in developing resources for misinformation in the language. In our proposed methodology, we leveraged conditional random fields (CRF), gated recurrent unit (GRU), and long short-term memory (LSTM) to collect and annotate social media content. By applying these deep learning approaches to existing Tshivenda posts, we can assess their effectiveness for identifying false news in a low-resource language setting. This paper emphasises the vital need to combat misinformation in languages with limited resources, such as Tshivenda. Through the creation of a specialised dataset and the use of advanced techniques, it aims to address the problem of the spread of misinformation in low represented language communities.FulltextenMisinformationNatural Language ProcessingLNPSocial mediaLow-resource languageConditional Random FieldsCRFGated Recurrent UnitGRULong short-term memoryLSTMBuilding a dataset for misinformation detection in the low-resource languageConference PresentationMukwevho, M., Rananga, S., Mbooi, M. S., Isong, B., & Marivate, V. (2024). Building a dataset for misinformation detection in the low-resource language. http://hdl.handle.net/10204/13760Mukwevho, M, S Rananga, Mahlatse S Mbooi, B Isong, and V Marivate. "Building a dataset for misinformation detection in the low-resource language." <i>IST-Africa Conference (IST-Africa), 20-24 May 2024</i> (2024): http://hdl.handle.net/10204/13760Mukwevho M, Rananga S, Mbooi MS, Isong B, Marivate V, Building a dataset for misinformation detection in the low-resource language; 2024. http://hdl.handle.net/10204/13760 .TY - Conference Presentation AU - Mukwevho, M AU - Rananga, S AU - Mbooi, Mahlatse S AU - Isong, B AU - Marivate, V AB - In the modern digital age, the widespread dissemination of misinformation has become a serious issue. Most focus in identifying misinformation online has been targeted at the English language in contrast to low-resource languages like Tshivenda. In this paper, we create a new dataset for news in the Tshivenda language to assist in developing resources for misinformation in the language. In our proposed methodology, we leveraged conditional random fields (CRF), gated recurrent unit (GRU), and long short-term memory (LSTM) to collect and annotate social media content. By applying these deep learning approaches to existing Tshivenda posts, we can assess their effectiveness for identifying false news in a low-resource language setting. This paper emphasises the vital need to combat misinformation in languages with limited resources, such as Tshivenda. Through the creation of a specialised dataset and the use of advanced techniques, it aims to address the problem of the spread of misinformation in low represented language communities. DA - 2024-05 DB - ResearchSpace DP - CSIR J1 - IST-Africa Conference (IST-Africa), 20-24 May 2024 KW - Misinformation KW - Natural Language Processing KW - LNP KW - Social media KW - Low-resource language KW - Conditional Random Fields KW - CRF KW - Gated Recurrent Unit KW - GRU KW - Long short-term memory KW - LSTM LK - https://researchspace.csir.co.za PY - 2024 T1 - Building a dataset for misinformation detection in the low-resource language TI - Building a dataset for misinformation detection in the low-resource language UR - http://hdl.handle.net/10204/13760 ER -28133