1st International Scientific Conference Education and Artificial Intelligence (EDAI 2024), [pp. 165-174]
AUTHOR(S) / АУТОР(И): Marko Stanković
, Aleksandar Milenković
, Marina Svičević
, Nemanja Vučićević 
Download Full Pdf
DOI: 10.46793/EDAI24.165S
ABSTRACT / САЖЕТАК:
For some time now, researchers around the world have been examining the effects of using AI in mathematics education to provide additional support and assistance to students. One line of research focuses on helping students who wish to participate in math competitions by solving more complex mathematical problems. In addition to regular national math competitions, which allow students to progress to international mathematical Olympiads, there are competitions aimed at popularizing mathematics and developing logical thinking in students. One such competition is the international Kangaroo competition. In this paper, we analyze the performance of the AI Math Solver on the Interactive Mathematics platform in solving problems from the 2024 Kangaroo competition for students in the 3rd and 4th grades of elementary school, as well as the 7th and 8th grades of elementary school, and the 3rd and 4th grades of high school. The tasks were uploaded in the form of images (screenshots), both in Serbian and English, because in the formulation of the tasks and/or provided answers for the Kangaroo competition, images often appear. Out of a total of 84 tasks, both in Serbian and in English, it correctly solved 24, which is just under 30% success in both cases. Furthermore, some tasks solved in Serbian were not solved in English, and vice versa. Additionally, differences were found in the distribution of correct answers among tasks of different difficulty levels.
KEYWORDS / КЉУЧНЕ РЕЧИ:
AI tools, Kangaroo competition, math education, non-standard task
ACKNOWLEDGEMENT / ПРОЈЕКАТ:
Authors are supported by the Ministry of Science, Technological Development and Innovation, Republic of Serbia. The first author acknowledges support under Contract No. 451-03-66/2024-03/200139, while the second, third and fourth author are supported under Contract No. 451-03-65/2024-03/200122.
REFERENCES / ЛИТЕРАТУРА:
- Ahn, J., Verma, R., Lou, R., Liu, D., Zhang, R., & Yin, W. (2024). Large Language Models for Mathematical Reasoning: Progresses and Challenges. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, 225–237. Association for Computational Linguistics. https://aclanthology.org/2024.eacl-srw.17/
- Akveld, M., Caceres-Duque, L. F., Nieto Said, J. H., & Sánchez Lamoneda, R. (2020). The Math Kangaroo Competition. Espacio Matemático 1(2), 74-91. https://doi.org/10.3929/ETHZ-B-000456237
- Castelvecchi, D. (2024). DeepМind hits milestone in solving maths problems — AI’s Next Grand Challenge. Nature, 632(8024), 236–237. https://doi.org/10.1038/d41586-024-02441-2
- Cherian, A., Peng, K.-C, Lohit, S., Smith, K.A., & Tenenbaum, J.B. (2023). Are Deep Neural Networks SMARTer Than Second Graders? 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10834-10844. https://doi.org/10.1109/cvpr52729.2023.01043
- DeepMind. (2024). AI achieves silver-medal standard solving International Mathematical Olympiad problems. DeepMind Blog. Retrieved December 2, 2024, from https://deepmind.google/discover/blog/ai-solves-imo- problems-at-silver-medal-level/
- Elbanna, S., & Armstrong, L. (2023). Exploring the integration of ChatGPT in education: adapting for the future. In Management & Sustainability: An Arab Review 3(1), 16–29. https://doi.org/10.1108/msar-03-2023-0016
- Frieder, S., Pinchetti, L., Chevalier, A., Griffiths, R.R., Salvatori, T., Lukasiewicz, T., Petersen, P., & Berner, J. (2024). Mathematical Capabilities of ChatGPT. Proceedings of the 37th International Conference on Neural Information Processing Systems, 27699–27744. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2023/file/58168e8a92994655d6da3939e7cc0918-Paper- Datasets_and_Benchmarks.pdf
- Koncel-Kedziorski, R., Roy, S., Amini, A., Kushman, N., & Hajishirzi, H. (2016). MAWPS: A Math Word Problem Repository. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1152–1157. Association for Computational Linguistics. https://doi.org/10.18653/v1/n16-1136
- Lo, C. K. (2023). What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Education Sciences 13(4), 410. MDPI. https://doi.org/10.3390/educsci13040410
- Lu, P., Bansal, H., Xia, T., Liu, J., Li, C., Hajishirzi, H., Cheng, H., Chang, K.-W., Galley, M., & Gao, J. (2024). MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts. Proceedings of ICLR. https://openreview.net/attachment?id=KUNzEQMWU7&name=pdf
- Marchisio, K., Ko, W., Bérard, A., Dehaze, T., & Ruder, S. (2024). Understanding and mitigating language confusion in LLMs. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 6653–6677. Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.emnlp- main.380
- Memarian, B., & Doleck, T. (2023). ChatGPT in education: Methods, potentials, and limitations. Computers in Human Behavior: Artificial Humans 1(2), 100022. Elsevier BV. https://doi.org/10.1016/j.chbah.2023.100022
- Spasić, A. J., & Janković, D. S. (2023). Using ChatGPT Standard Prompt Engineering Techniques in Lesson Preparation: Role, Instructions and Seed-Word Prompts. 2023 58th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST), 47–50. https://doi.org/10.1109/icest58410.2023.10187269
- Sundaram, S. S., Gurajada, S., Padmanabhan, D., Abraham, S. S., & Fisichella, M. (2024). Does a language model “understand” high school math? A survey of deep learning based word problem solvers. Wiley Interdisciplinary Reviews. Data Mining and Knowledge Discovery 14(4). https://doi.org/10.1002/widm.1534
- Trinh, T. H., Wu, Y., Le, Q. V., He, H., & Luong, T. (2024). Solving olympiad geometry without human demonstrations. In Nature, 625(7995), 476–482. https://doi.org/10.1038/s41586-023-06747-5
- Wei, X. (2024). Evaluating chatGPT-4 and chatGPT-4o: performance insights from NAEP mathematics problem solving. Frontiers in Education, 9, Article 1452570. https://doi.org/10.3389/feduc.2024.1452570
- Yiu, E., Qraitem, M., Wong, C., Majhi, A. N., Bai, Y., Ginosar, S., Gopnik, A., & Saenko, K. (2024). KiVA: Kid- inspired Visual Analogies for Testing Large Multimodal Models (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2407.17773
- Zhang, F., Li, C., Henkel, O., Xing, W., Baral, S., Heffernan, N., & Li, H. (2024). Math-LLMs: AI Cyberinfrastructure with Pre-trained Transformers for Math Education. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-024-00416-y
- Zhao, J., Zhang, Z., Zhang, Q., Gui, T., & Huang, X. (2024). LLaMA Beyond English: An Empirical Study on Language Capability Transfer. ArXiv. https://doi.org/10.48550/arXiv.2401.01055