Use of Covariance Matrix in Automatic Speaker Recognition

Proceedings of International Scientific Conference „ALFATECH – Smart Cities and modern technologies“ (pp. 204-207) 

 

AUTOR(I) / AUTHOR(S): Ivan Jokić , Stevan Jokić    

 

Download Full Pdf   

DOI:  10.46793/ALFATECHproc25.204J

SAŽETAK / ABSTRACT:

One procedure for automatic speaker recognition based on use of 21 mel-frequency cepstral coefficients as speaker features and covariance matrix as speaker model is tested in this paper. Tests are conducted on the Solo part of the CHAINS speech database which contains 37 recordings for each of 36 speakers. Each speech recording is represented by appropriate matrix of feature vectors. Modeling of recording of speaker is done by covariance matrix of matrix of feature vectors. Results of recognition accuracy are compared for two cases, when on elements of speaker model is applied sigmoid function and when it is not. Tests are done in five stages. Application of sigmoid function on elements of covariance matrices results in most of tests in significantly increasing of recognition accuracy. Achieved mean recognition accuracy for all done tests when sigmoid function is not applied is 87,84% and when sigmoid function is applied is 94,64%.

KLJUČNE REČI / KEYWORDS:

Automatic speaker recognition; Mel-Frequency Cepstral Coefficients; Covariance matrix

PROJEKAT / ACKNOWLEDGEMENT:

LITERATURA / REFERENCES:

  • Kinnunen, T., Li, H. (2010). An Overview of TextIndependent Speaker Recognition: From Features to Supervectors. Speech Communication, 52(1), 12-40. https://doi.org/10.1016/j.specom.2009.08.009
  • Maurya, A., Kumar, D., Agarwal R.K. (2018). Speaker Recognition for Hindi Speech Signal using MFCC-GMM Approach. 6th International Conference on Smart Computing and Communications, ICSCC 2017, 7-8 December 2017, Kurukshetra, India, Procedia Computer Science, 125 (2018), 880-887.    https://doi.org/10.1016/j.procs.2017.12.112
  • Devi, K. J., Devi, A. A, Thongam, K. (2019). Automatic Speaker Recognition using MFCC and Artificial Neural Network. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 9(1S), 39-42. https://doi.org/10.35940/ijitee.A1010.1191S19
  • Wirdiani, A., Machetho, S. N., Putra, I. K. G. D., Sudarma, M., Hartati, R. S., Ferdian, H. A. (2024). Improvement Model for Speaker Recognition using MFCC-CNN and Online Triplet Mining. International Journal on Advanced Science, Engineering and Information Technology, 14(2), 420-427. https://doi.org/10.18517/ijaseit.14.2.19396
  • Elharati, H. A., Alshaari, M. and Këpuska, V. Z. (2020). Arabic Speech Recognition System Based on MFCC and HMMs. Journal of Computer and Communications, 8(3), 28-34. https://doi.org/10.4236/jcc.2020.83003
  • Bojanić, M., Delić, V., Sečujski, M. (2014). Relevance of the Types and the Statistical Properties of Features in the Recognition of Basic Emotions in Speech. Facta Universitatis, Series: Electronics and Energetics, 27(3), 425-433. https://doi.org/10.2298/FUEE1403425B
  • Reggiswarashari, F., Sihwi, S. W. (2022). Speech emotion recognition using       2D-convolutional neural International Journal of Electrical and Computer Engineering (IJECE), 12(6),  6594-6601. http://doi.org/10.11591/ijece.v12i6.pp6594-6601
  • Domazetovska, S., Gavriloski, V., Anachkova, M., Petreski, Z. (2021). Urban Sound Recognition Using Different Feature Extraction Facta Universitatis, Series: Automatic Control and Robotics, 20(3), 155-165. https://doi.org/10.22190/FUACR211015012D
  • Zhang, S., Gao, Y., Cai, J., Yang, H., Zhao, Q., and Pan, F. (2023). A Novel Bird Sound Recognition Method Based on Multifeature Fusion and a Transformer Encoder. Sensors 2023, 23(19), 8099. https://doi.org/10.3390/s23198099
  • Libal, U., Biernacki, P. (2024). MFCC-Based Sound Classification of Honey Bees. International Journal of Electronics and Telecommunications, 70(4), 849-853. https://doi.org/10.24425/ijet.2024.152069
  • Abdul, Z. Kh. and Al-Talabani A. K. (2022). Mel Frequency Cepstral Coefficient and Its Applications: A Review. IEEE Access, 10, 122136-122158. https://doi.org/10.1109/ACCESS.2022.3223444
  • Sigmund, M. (2019). Speaker Discrimination Using LongTerm Spectrum of Speech. Journal of Information Technology and Control, 48(3), 446-453. https://doi.org/10.5755/j01.itc.48.3.21248
  • Büyük, O., Arslan, M. L. (2018). Combination of LongTerm and Short-Term Features for Age Identification from Voice. Advances in Electrical and Computer Engineering, 18(2), 101-108. https://doi.org/10.4316/AECE.2018.02013
  • Jokić, I., Delić, V., Perić, Z. Application of Mel-Frequency Cepstral Coefficients in Automatic Speaker Recognition as Part of IoT Solutions for Security and Optimization in Smart Cities. ALFATECH Journal, 1(1), in press.
  • Cummins, F., Grimaldi, M., Leonard, T., Simko, J. The CHAINS Corpus: CHAracterizing INdividual Speakers. In Proc. of the 11th International Conference “Speech and Computer” SPECOM’2006, St. Petersburg, Russia, June 25-29, 2006, 431-435