Use of Covariance Matrix in Automatic Speaker Recognition

Proceedings of International Scientific Conference „ALFATECH – Smart Cities and modern technologies“ (pp. 204-207)

AUTOR(I) / AUTHOR(S): Ivan Jokić , Stevan Jokić

Download Full Pdf

DOI: 10.46793/ALFATECHproc25.204J

SAŽETAK / ABSTRACT:

One procedure for automatic speaker recognition based on use of 21 mel-frequency cepstral coefficients as speaker features and covariance matrix as speaker model is tested in this paper. Tests are conducted on the Solo part of the CHAINS speech database which contains 37 recordings for each of 36 speakers. Each speech recording is represented by appropriate matrix of feature vectors. Modeling of recording of speaker is done by covariance matrix of matrix of feature vectors. Results of recognition accuracy are compared for two cases, when on elements of speaker model is applied sigmoid function and when it is not. Tests are done in five stages. Application of sigmoid function on elements of covariance matrices results in most of tests in significantly increasing of recognition accuracy. Achieved mean recognition accuracy for all done tests when sigmoid function is not applied is 87,84% and when sigmoid function is applied is 94,64%.

KLJUČNE REČI / KEYWORDS:

Automatic speaker recognition; Mel-Frequency Cepstral Coefficients; Covariance matrix

PROJEKAT / ACKNOWLEDGEMENT:

LITERATURA / REFERENCES:

Kinnunen, T., Li, H. (2010). An Overview of TextIndependent Speaker Recognition: From Features to Supervectors. Speech Communication, 52(1), 12-40. https://doi.org/10.1016/j.specom.2009.08.009
Maurya, A., Kumar, D., Agarwal R.K. (2018). Speaker Recognition for Hindi Speech Signal using MFCC-GMM Approach. 6^th International Conference on Smart Computing and Communications, ICSCC 2017, 7-8 December 2017, Kurukshetra, India, Procedia Computer Science, 125 (2018), 880-887. https://doi.org/10.1016/j.procs.2017.12.112
Devi, K. J., Devi, A. A, Thongam, K. (2019). Automatic Speaker Recognition using MFCC and Artificial Neural Network. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 9(1S), 39-42. https://doi.org/10.35940/ijitee.A1010.1191S19
Wirdiani, A., Machetho, S. N., Putra, I. K. G. D., Sudarma, M., Hartati, R. S., Ferdian, H. A. (2024). Improvement Model for Speaker Recognition using MFCC-CNN and Online Triplet Mining. International Journal on Advanced Science, Engineering and Information Technology, 14(2), 420-427. https://doi.org/10.18517/ijaseit.14.2.19396
Elharati, H. A., Alshaari, M. and Këpuska, V. Z. (2020). Arabic Speech Recognition System Based on MFCC and HMMs. Journal of Computer and Communications, 8(3), 28-34. https://doi.org/10.4236/jcc.2020.83003
Bojanić, M., Delić, V., Sečujski, M. (2014). Relevance of the Types and the Statistical Properties of Features in the Recognition of Basic Emotions in Speech. Facta Universitatis, Series: Electronics and Energetics, 27(3), 425-433. https://doi.org/10.2298/FUEE1403425B
Reggiswarashari, F., Sihwi, S. W. (2022). Speech emotion recognition using 2D-convolutional neural International Journal of Electrical and Computer Engineering (IJECE), 12(6), 6594-6601. http://doi.org/10.11591/ijece.v12i6.pp6594-6601
Domazetovska, S., Gavriloski, V., Anachkova, M., Petreski, Z. (2021). Urban Sound Recognition Using Different Feature Extraction Facta Universitatis, Series: Automatic Control and Robotics, 20(3), 155-165. https://doi.org/10.22190/FUACR211015012D
Zhang, S., Gao, Y., Cai, J., Yang, H., Zhao, Q., and Pan, F. (2023). A Novel Bird Sound Recognition Method Based on Multifeature Fusion and a Transformer Encoder. Sensors 2023, 23(19), 8099. https://doi.org/10.3390/s23198099
Libal, U., Biernacki, P. (2024). MFCC-Based Sound Classification of Honey Bees. International Journal of Electronics and Telecommunications, 70(4), 849-853. https://doi.org/10.24425/ijet.2024.152069
Abdul, Z. Kh. and Al-Talabani A. K. (2022). Mel Frequency Cepstral Coefficient and Its Applications: A Review. IEEE Access, 10, 122136-122158. https://doi.org/10.1109/ACCESS.2022.3223444
Sigmund, M. (2019). Speaker Discrimination Using LongTerm Spectrum of Speech. Journal of Information Technology and Control, 48(3), 446-453. https://doi.org/10.5755/j01.itc.48.3.21248
Büyük, O., Arslan, M. L. (2018). Combination of LongTerm and Short-Term Features for Age Identification from Voice. Advances in Electrical and Computer Engineering, 18(2), 101-108. https://doi.org/10.4316/AECE.2018.02013
Jokić, I., Delić, V., Perić, Z. Application of Mel-Frequency Cepstral Coefficients in Automatic Speaker Recognition as Part of IoT Solutions for Security and Optimization in Smart Cities. ALFATECH Journal, 1(1), in press.
Cummins, F., Grimaldi, M., Leonard, T., Simko, J. The CHAINS Corpus: CHAracterizing INdividual Speakers. In Proc. of the 11^th International Conference “Speech and Computer” SPECOM’2006, St. Petersburg, Russia, June 25-29, 2006, 431-435

Post Views: 122