Classification of Indonesian Tale Categories using Support Vector Machine and FastText Feature Extraction

Helena Nurramdhani Irmanda; Ria Astriratma; Ati Zaidiah; Muhammad Rahman Hadi; Nayandra Agastia Putra

doi:10.31315/telematika.v21i2.10867

Classification of Indonesian Tale Categories using Support Vector Machine and FastText Feature Extraction

Helena Nurramdhani Irmanda, Ria Astriratma, Ati Zaidiah, Muhammad Rahman Hadi, Nayandra Agastia Putra

Abstract

The purpose of this work is to develop a model to classify the various kinds of Indonesian folktales and to assess how well the support vector machine (SVM) approach and fastText feature extraction perform. The first phase of the study process is the gathering of data, namely the fairy tale dataset that has been annotated with categorizations for each genre of fairy tale. Following the collection of data, the pre-processing step is conducted. The purpose of the pre-processing step is to prepare the data for further processing in the subsequent stage. Following the completion of the preprocessing step, the training data and testing data are segregated. The subsequent step involves doing feature extraction using fastText. Moreover, the classification process is conducted using the Support Vector Machine (SVM) approach in order to get the ultimate outcome of the modeling process. The last phase involves assessing the performance of the constructed model. The categorization model for Indonesian fairy tales has a commendable accuracy rate of 85%, indicating its effectiveness. The aforementioned findings are substantiated by an accuracy metric of 85%, a recall metric of 85%, and an F1-score of 86%, indicating favorable outcomes.
Previous researchs have not conducted any studies on the categorization of types of Indonesian fairy tales.

Keywords

svm, fasttext, text mining

Full Text:

PDF

References

P. P. Ardini, “Pengaruh Dongeng dan Komunikasi Terhadap Perkembangan Moral Anak Usia 7-8 Tahun,” J. Pendidik. Anak, vol. 1, no. 1, 2015, doi: 10.21831/jpa.v1i1.2905.

R. Rukiyah, “Dongeng, Mendongeng, dan Manfaatnya,” Anuva, vol. 2, no. 1, p. 99, 2018, doi: 10.14710/anuva.2.1.99-106.

U. D. Rosada, “Memperkuat Karakter Anak melalui Dongeng berbasis Media Visual,” Child. Advis. Res. Educ., vol. 04, no. 1, pp. 42–49, 2016, [Online]. Available: http://e-journal.unipma.ac.id/index.php/JPAUD/article/view/583/515.

A. Deolika, K. Kusrini, and E. T. Luthfi, “Analisis Pembobotan Kata Pada Klasifikasi Text Mining,” J. Teknol. Inf., vol. 3, no. 2, p. 179, 2019, doi: 10.36294/jurti.v3i2.1077.

C. Darujati and A. B. Gumelar, “Pemanfaatan teknik supervised untuk klasifikasi teks bahasa indonesia,” J. Bandung Text Min., vol. 16, no. 1, pp. 1–5, 2012.

T. Yao, Z. Zhai, and B. Gao, “Text classification model based on fasttext,” in 2020 IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS), 2020, pp. 154–157.

M. M. Kusairi et al., “SVM Method with FastText Representation Feature for Classification of Twitter Sentiments Regarding the Covid-19 Vaccination Program 1,2,” vol. x, no. 02, pp. 140–150, 2022.

L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, “Text preprocessing for text mining in organizational research: Review and recommendations,” Organ. Res. Methods, vol. 25, no. 1, pp. 114–146, 2022.

A. W. Pradana and M. Hayaty, “The effect of stemming and removal of stopwords on the accuracy of sentiment analysis on indonesian-language texts,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, pp. 375–380, 2019.

T. Siddiqui and others, “Sarcasm detection from twitter database using text mining algorithms,” Turkish J. Comput. Math. Educ., vol. 12, no. 11, pp. 1916–1924, 2021.

Y. Permana, A. Emarilis, and others, “Stemming Analysis Indonesian Language News Text with Porter Algorithm,” in Journal of Physics: Conference Series, 2021, vol. 1845, no. 1, p. 12019.

Q. H. Nguyen et al., “Influence of data splitting on performance of machine learning models in prediction of shear strength of soil,” Math. Probl. Eng., vol. 2021, pp. 1–15, 2021.

S. Thavareesan and S. Mahesan, “Sentiment lexicon expansion using Word2vec and fastText for sentiment prediction in Tamil texts,” in 2020 Moratuwa engineering research conference (MERCon), 2020, pp. 272–276.

N. Kalcheva, M. Karova, and I. Penev, “Comparison of the accuracy of SVM kemel functions in text classification,” in 2020 International Conference on Biomedical Innovations and Applications (BIA), 2020, pp. 141–145.

S. Adinugroho and Y. A. Sari, Implementasi Data Mining Menggunakan Weka. Universitas Brawijaya Press, 2018.

DOI: https://doi.org/10.31315/telematika.v21i2.10867

DOI (PDF): https://doi.org/10.31315/telematika.v21i2.10867.g6671

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Copyright of :TELEMATIKA: Jurnal Informatika dan Teknologi InformasiISSN 1829-667X (print); ISSN 2460-9021 (online)

Dipublikasi olehJurusan Teknik Informatika, UPN Veteran YogyakartaJl. Babarsari 2 Yogyakarta 55281 (Kampus Unit II)Telp: +62 274 485786email: jurnaltelematika@upnyk.ac.id

Jurnal Telematika sudah diindeks oleh beberapa lembaga berikut:

Status Kunjungan Jurnal Telematika

Username
Password
Remember me