Proceedings of International Scientific Conference „ALFATECH – Smart Cities and modern technologies“ (pp. 171-175)
АУТОР(И) / AUTHOR(S): Mirjana Tomic
, Dejan Djukic
Download Full Pdf 
DOI: 10.46793/ALFATECHproc25.171T
САЖЕТАК / ABSTRACT:
In this work, a method for automated generating grammatically correct sentences in Serbian language has been presented. This work presented a significant challenge, as the Serbian language is a highly inflected language, with complex word morphology, noun and adjective genders and declinations, verbal conjugations and concordance rules. The word components, such as word roots and their inflectional and morphological particles have been stored in JSON structures. The word components have been combined using custom produced software in Python programming language. The main software functions are the access to JSON data base of linguistic data, and the execution of algorithms for combining word parts into grammatically correct words, and words into syntactically and semantically correct sentences. The principal features of the software include morphological formation of verbs, nouns and adjectives, and combining these words with prepositions in a way to form sentences that appear to belong to the natural language use in Serbian language. The examples of generated sentences by this method show that such sentences, albeit somewhat simple, can be successfully generated by using this approach. The applications of the method presented here are numerous: from educational use, e.g. in language training, to more general research tools in the domain of natural language processing (NLP), not only for the Serbian language, but for a wider family of languages following complex grammatical rules, such as highly inflected, and morphologically complex languages.
КЉУЧНЕ РЕЧИ / KEYWORDS:
Natural language processing, NLP, automated natural language generation, NLG, Serbian language, data-base design, language and text synthesis, education, programming methodology
ПРОЈЕКАТ / ACKNOWLEDGEMENT:
ЛИТЕРАТУРА / REFERENCES:
- D. Djukic and Z. Radovanovic, “Machine learning and theory of information in natural language processing,” in AIIT 2024 Proceedings, Nov. 8, 2024, COBISS.SR-ID 158823945. [2] D. Jurafsky and J. H. Martin, Speech and Language Processing. Pearson Prentice Hall, 2008. [3] D. Mitrović, Osnovi lingvističke gramatike srpskog jezika. Beograd: Filološki fakultet, 2010.
- P. Piper and I. Klajn, Normativna gramatika srpskog jezika. Novi Sad: Matica srpska, 2013.
- S. Bird, E. Loper, and E. Klein, Natural Language Processing with Python. O’Reilly Media Inc., 2009.
- Y. Goldberg, Neural Network Methods for Natural Language Processing. Morgan & Claypool Publishers, 2017.
- D. Klein and C. D. Manning, “Accurate Unlexicalized Parsing,” in Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 2003, pp. 423–430. [8] Ž. Bošković, “Clitics as Nonbranching Elements and the Linear Correspondence Axiom,” Linguistic Inquiry, vol. 35, no. 2, pp. 329–340, 2004.
- J. Reisinger and M. Pasca, “Latent Variable Models of Concept-Attribute Attachment,” in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2009, pp. 620–628.
- N. Mikelić Preradović and J. Koehler, “Automatic Generation of Questions for Vocabulary Assessment,” Language Resources and Evaluation, vol. 42, no. 2, pp. 161– 173, 2008.
- L. Zlatić, “Morphosyntactic Features and the Structure of the Serbian Noun Phrase,” in Proceedings of the 22nd Annual Penn Linguistics Colloquium, vol. 3, no. 1, pp. 145–159, 1997.
- B. Andrić, Automatsko generisanje rečenica u srpskom jeziku: primena i izazovi. Novi Sad: Univerzitet u Novom Sadu, 2018.
- A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention Is All You Need,” in Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008, 2017.
- JSON, “JavaScript Object Notation (JSON) Data Interchange Format,” [Online]. Available: https://www.json.org. [Accessed: 31-Jan-2025].
- Python Software Foundation, “Python Documentation,” 2023. [Online]. Available: https://docs.python.org. [Accessed: 31-Jan-2025].
- J. Hajič, Disambiguation of Rich Inflection (Computational Morphology of Czech). Prague: Charles University Press, 2004.
- J. Stojanović and S. Filipović, “Processing Morphologically Complex Words in Serbian,” Linguistica, vol. 52, no. 1, pp. 25– 44, 2012.
- O. M. Tomić, Balkan Sprachbund Morpho-Syntactic Features. Springer, 2006.
- D. Andor, C. Alberti, D. Weiss, et al., “Globally Normalized Transition-Based Neural Networks,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 2442–2452, 2016.
- B. Plank and Ž. Agić, “Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 614–620, 2018.
- I. A. Sag, T. Wasow, and E. M. Bender, Syntactic Theory: A Formal Introduction. CSLI Publications, 2003. [22] A. Kostić, Word Frequency and Lexical Processing in Serbian. Belgrade: Institute for Experimental Phonetics and Speech Pathology, 1991