LATENT SEMANTIC ANALYSIS (LSA) DAN AUTOMATIC TEXT SUMMARIZATION (ATS) DALAM OPTIMASI PENCARIAN ARTIKEL COVID 19

Herlina Jayadianti, Ruth Damayanti, Juwairiah Juwairiah

Abstract


Abstrak
The Covid 19 pandemic has given much awareness to all people around the world about the importance of maintaining health and changing lifestyles and lifestyles to be healthier. Clear, correct and precise information is indispensable to provide insight into this respiratory virus. Digital media is widely used by the public to find links about the covid19 virus. Health topics about covid 19 from several sites will be collected by scrapping method, and the data retrieval results will be processed to become an automatic summary using Latent Semantic Analysis (LSA), where this method, will help to find the hidden meaning of a collection of sentences. The formation of the summary is assisted by the cross method. The system also has an article search to allow users to find the right information. The results of this study showed that LSA method assisted by the cross method could be used in automatic summary shrinking well, test results in f-measure and recall values on average of 90.68% and 85% with the percentage of trained data: test data is 90:10. Data collection conducted during February-June 2020 was taken 120 training documents, and 12 test documents. Testing is done with a compression rate of 30%
Keywords: automatic summary, health article, scrapping, latent semantic analysis, singular value decomposition, cross method


Pandemi Covid 19 telah memberikan banyak penyadaran pada seluruh masyarakat dunia mengenai pentingnya menjaga kesehatan dan merubah pola hidup dan gaya hidup menjadi lebih sehat. Informasi yang jelas, benar dan tepat sangat diperlukan untuk memberi wawasan tentang virus pernafasan ini. Media digital banyak dipakai oleh masyarakat untuk mencari tautan mengenai virus covid19. Topik kesehatan mengenai covid 19 dari beberapa situs akan dikumpulkan dengan metode scrapping, dan hasil pengambilan data akan diolah untuk menjadi sebuah ringkasan otomatis dengan menggunakan Latent Semantic Analysis(LSA), dimana metode ini, akan membantu untuk menemukan makna tersembunyi dari sebuah kumpulan kalimat.Pembentukan ringkasan dibantu dengan metode cross method. Sistem ini juga memiliki sebuah pencarian artikel, untuk membuat pengguna dapat menemukan informasi secarap tepat. Hasil dari penelitian ini menunjukan bahwa metode LSA yang dibantu dengan cross method dapat digunakan dalam penyusan ringkasan otomatis dengan baik, Hasil pengujian menghasilkan nilai f-measure dan recall rata-rata sebesar 90.68% dan 85% dengan presentase data latih: data uji adalah 90:10. Pengumpulan data dilakukan selama bulan Februari-Juni 2020 diambil 120 dokumen latih, dan 12 dokumen uji. Pengujian dilakukan dengan compression rate sebesar 30%
Kata kunci: ringkasan otomatis, berita kesehatan, scrapping, Latent Semantic Analysis,Singular Value Decomposition, Cross Method.


Keywords


automatic summary; health article; scrapping; latent semantic analysis; singular value decomposition; cross method

Full Text:

PDF

References


Adriani, M., Asian, J., Nazief, B., Tahaghoghi, S. M. M., & Williams, H. E. (2007). Stemming Indonesian: A confix-stripping approach. ACM Transactions on Asian Language Information Processing. https://doi.org/10.1145/1316457.1316459

Chakraborty Amartya, Sunanda Bose, (2020), Around the world in 60 days: an exploratory study of impact of COVID-19 on online global news sentiment, Journal of Computational Social Science.

Doko, A., Štula, M., & Stipaničev, D. (2013). A recursive TF-ISF Based Sentence Retrieval Method with Local Context. International Journal of Machine Learning and Computing, 3(2), 195–200. https://doi.org/10.7763/ijmlc.2013.v3.301

H Sheikha (2020). Text mining Twitter social media for Covid-19: Comparing latent semantic analysis and latent Dirichlet allocation, University of Gävle, Faculty of Engineering and Sustainable Development, Department of Computer and Geospatial Sciences. diva-portal.org

Irmawati, I. (2017). Sistem Temu Kembali Informasi Pada Dokumen Dengan Metode Vector Space Model. Jurnal Ilmiah FIFO, 9(1), 74. https://doi.org/10.22441/fifo.v9i1.1444

Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes. https://doi.org/10.1080/01638539809545028

Larocca Neto, J., Santos, A. D., Kaestner, C. A. A., & Freitas, A. A. (2000). Generating text summaries through the relative importance of topics. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/3-540-44399-1_31

L Singh, S Bansal, 2020 A first look at COVID-19 information and misinformation sharing on Twitter (2020), Cornell University, https://arxiv.org/abs/2003.13907, arxiv.org

M Cinelli, W Quattrociocchi, A Galeazzi (2020)- The covid-19 social media infodemic, Cornell University, arXiv preprint arXiv - arxiv.org

Martin, C. D., & Porter, M. A. (2012). The extraordinary SVD. American Mathematical Monthly. https://doi.org/10.4169/amer.math.monthly.119.10.838

Mustaqhfiri, M., Abidin, Z., & Kusumawati, R. (2012). Peringkasan Teks Otomatis Berita Berbahasa Indonesia Menggunakan Metode Maximum Marginal Relevance. Matics. https://doi.org/10.18860/mat.v0i0.1578

Shkapenyuk, V., & Suel, T. (2002). Design and implementation of a high-performance distributed web crawler. Proceedings - International Conference on Data Engineering. https://doi.org/10.1109/ICDE.2002.994750

Steinberger, J., & Ježek, K. (2009). Evaluation measures for text summarization. Computing and Informatics.

Vargiu, E., & Urru, M. (2012). Exploiting web scraping in a collaborative filtering- based approach to web advertising. Artificial Intelligence Research. https://doi.org/10.5430/air.v2n1p44


Refbacks

  • There are currently no refbacks.