Natural Language Processing in Meaning Representation for Sentiment Analysis in Serbian Language

10th International Scientific Conference Technics, Informatics and Education – TIE 2024, str. 108-115

АУТОР(И) / AUTHOR(S): Marko M. Živanović , Olga Ristić , Sandra Milunović Koprivica

Download Full Pdf  

DOI: 10.46793/TIE24.108Z

САЖЕТАК /ABSTRACT:

This paper explores machine learning algorithms that contribute to meaning representation and context modeling in sentiment analysis. Language preprocessing techniques are described in detail. The study also discusses string distance calculations and the application of Naive Bayes for classification, emphasizing important model metrics such as accuracy. The final section of the paper presents a practical example encompassing the process of data collection, analysis, preprocessing, classification using machine learning algorithms, and model evaluation. Testing demonstrated the system’s ability to classify sentiments in Serbian Language.

КЉУЧНЕ РЕЧИ / KEYWORDS: 

Naive Bayes; Model Metrics; Meaning; Context; Sentiments

ПРОЈЕКАТ / ACKNOWLEDGEMENTS:

This study was supported by the Ministry of Science, Technological Development and Innovation of the Republic of Serbia, and these results are parts of Grant No. 451-03-66 / 2024-03 / 200132 with the University of Kragujevac – Faculty of Technical Sciences Čačak.

ЛИТЕРАТУРА / REFERENCES:

  1. Караџић Стефановић, В. (1818). Српски Рјечник. Штампарија Јерменског манастира.
  2. Институт за српски језик САНУ. (n.d.). Retrieved Februar 23, 2024, from https://web.archive.org/web/20190323133841/http://www.isj.sanu.ac.rs/projekti/rsanu/
  3. Herdan, G. (1960). Type-token mathematics. Mouton.
  4. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130-137.
  5. Западни Срби. (n.d.). Сарајевска регија и Романија. Retrieved from March 03,2024,from https://www.zapadnisrbi.com/zapadni-srbi/republika-srpska3/sarajevsko-romanijska-regija/20-sarajevska-regija-i-romanija?showall=1
  6. Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707-710.
  7. Mosteller, F., & Wallace, D. L. (2012). Applied Bayesian and classical inference: The case of the Federalist papers. Springer Science & Business Media.
  8. Mosteller, F., & Wallace, D. L. (2012). Applied Bayesian and classical inference: The case of the Federalist papers. Springer Science & Business Media.
  9. Jurafsky, D., & Martin, J. H. (2023). Speech and Language Processing https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf Last access to the site: 30th of May 2024.
  10. Peng, F., Schuurmans, D., & Wang, S. (2004). Augmenting naive bayes classifiers with statistical language models. Information Retrieval, 7, 317-345.
  11. Heydarian, M., Doyle, T. E., & Samavi, R. (2022). MLCM: Multi-label confusion matrix. IEEE Access, 10, 19083-19095.
  12. Huang, J., & Ling, C. X. (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3), 299-310.
  13. Dawson, R. (2011). How significant is a boxplot outlier?. Journal of Statistics Education, 19(2).
  14. Jahić, S. & Vičič, J. (2023). Impact of Negation and AnA-Words on Overall Sentiment Value of the Text Written in the Bosnian Language. Applied Sciences, 13, 7760. https://doi.org/10.3390/app13137760
  15. Draskovic, D., Zecevic, D. & Nikolic, B. (2022). Development of a Multilingual Model for Machine Sentiment Analysis in the Serbian Language. Mathematics, 10, 3236. https://doi.org/10.3390/math10183236
  16. Laković, L., Čakić, S., Jovović, I. & Babić, D. (2023). Exploratory analysis of text using available NLP technologies for Serbian language,  22nd International Symposium INFOTEH-JAHORINA (INFOTEH), East Sarajevo, Bosnia and Herzegovina, pp. 1-4, https://doi.org/10.3390/10.1109/INFOTEH57020.2023.10094202
  17. Bogdanović, M., Kocić, J. & Stoimenov, L. (2024). SRBerta—A Transformer Language Model for Serbian Cyrillic Legal Texts. Information, 15, 74. https://doi.org/10.3390/info15020074