Confusion about the choice of evaluation metrics for model performance assesment in chemoinformatics, bioinformatics and in general

2nd International Conference on Chemo and Bioinformatics ICCBIKG 2023 (67-72)

АУТОР(И) / AUTHOR(S): Bono Lučić, Viktor Bojović, Antonija Kraljević, Jadranko Batista

Е-АДРЕСА / E-MAIL: lucic@irb.hr

Download Full Pdf   

DOI: 10.46793/ICCBI23.067L

САЖЕТАК / ABSTRACT:

In chemo/bioinformatics, we evaluate the quality of models using model performance parameters/metrics. A large group of models in this field are binary classification models, which are a consequence of the general digitization of information and data in chemistry and life sciences. When classifying different models developed with different methods and by different research groups for the same data sets, we try to classify the models according to their quality. In this case, the question of selecting appropriate metrics arises, leading to incorrect (non-optimal) application of inappropriate metrics and thus incorrect assessment of the quality of the models and their incorrect (non-optimal) ranking. The article addresses the limitations and problems of using the Matthews correlation coefficient (MCC) and the F1 parameter to describe the quality of classification models. To overcome these difficulties, it is proposed to use the parameter that estimates the real accuracy of the model above the accuracy level of the random model. Its use is suggested as an additional baseline test that confirms that the developed model is better than the corresponding random model. Finally, the use of the parameter called real accuracy and the well-known parameter Cohen’s kappa (κ) should be preferred to the parameters MCC and F1, since they can be derived as special cases of κ.

КЉУЧНЕ РЕЧИ / KEYWORDS:

chemo/bioinformatics, evaluation metrics, real model accuracy, Cohen’s kappa

ЛИТЕРАТУРА / REFERENCES:

  • The DREAM Consortium, 2019, https://dreamchallenges.org, (Sept 31, 2023).
    A. Cichońska, et al. Crowdsourced mapping of unexplored target space of kinase inhibitors, Nature Communivations, 12 (2021) no. 3307 (18 pages).
  • M. Necci, et al., Critical assessment of protein intrinsic disorder prediction, Nature Methods, 18 (2021) 472–481.
  • B. Lučić, et al, Estimation of random accuracy and its use in validation of predictive quality of classification models within predictive challenges, Croatica Chemica Acta, 92 (2019) 379–391.
  • D.M. Powers, Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation, Journal of Machine Learning Technologies, 2 (2011) 37-63.
  • B.W. Matthews, Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochimica et Biophysica Acta (BBA) – Protein Structure, 405 (1975) 442–451.
  • J. Cohen, A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20 (1960) 37–46.