3rd International Conference on Chemo and BioInformatics, Kragujevac, September 25-26. 2025. (pp. 305-310)
АУТОР(И) / AUTHOR(S): Andrej Milisavljević, Urban Bren, Marko Jukič
Download Full Pdf 
DOI: 10.46793/ICCBIKG25.305M
САЖЕТАК / ABSTRACT:
Understanding protein–ligand binding site behavior is central to structure-based drug design. Using the PDBBind+ database, we analyzed the amino acid composition of proteins that have protein–small molecule ligand binding sites, with respect to solvent accessibility and presence of amino acids within binding sites. Tryptophan, Phenylalanine, Tyrosine, Methionine, and Glycine were enriched within solvent-accessible binding regions. The methodology we used is robust, reproducible, and applicable to other, similar, datasets. We extended the analysis to individual taxonomic groups including the human and virus protein subsets, providing insights relevant to the medicinal chemistry community. Composition analysis by protein classification revealed differences in amino acid enrichment across protein types, notably a depletion of cysteine on lyase protein surfaces. We anticipate that these findings will support the creation of future predictive models for use in binding site feature analysis and drug screening across diverse biological targets, advancing binding site prediction and helping accelerate computationally intensive drug discovery within medicinal chemistry.
КЉУЧНЕ РЕЧИ / KEYWORDS:
cheminformatics, protein surface analysis, binding site, small-molecule protein interactions, in-silico drug design
ПРОЈЕКАТ / ACKNOWLEDGEMENT:
Financial resources through the Slovenian Research and Innovation Agency (ARIS) project and program grants P2-0438, I0-E015, P1-0403, Z1-50021, L2-4430, J1-4414, J3-4497, J7-4638, J3-4498, J1-4398, J4-4633, J7-50043, J1-50034, GC-0001, J1-60001, J2-60044, and L7-60161.
ЛИТЕРАТУРА / REFERENCES:
- Sotriffer C, Klebe Identification and mapping of small-molecule binding sites in proteins: computational tools for structure-based drug design. Il Farmaco. 57 (2002) 243–251
- Persch E, Dumele O, Diederich Molecular Recognition in Chemical and Biological Systems. Angewandte Chemie International Edition. 54 (2015) 3290–3327
- Hendlich M, Rippmann F, Barnickel G. LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model. 15 (1997) 359–363
- Morra G, Genoni A, Neves M, et al. Molecular Recognition and Drug-Lead Identification: What Can Molecular Simulations Tell Us? Curr Med Chem. 17 (2010) 25–41
- Sussman JL, Lin D, Jiang J, et al. Protein Data Bank (PDB): Database of Three-Dimensional Structural Information of Biological Macromolecules. Acta Crystallogr D Biol Crystallogr. 54 (1998) 1078–1084
- Gallina AM, Bork P, Bordo Structural analysis of protein‐ligand interactions: the binding of endogenous compounds and of synthetic drugs. Journal of Molecular Recognition. 27 (2014) 65–72
- Berman HM, Burley SK. Protein Data Bank (PDB): Fifty-three years young and having a transformative impact on science and society. Q Rev Biophys. 58 (2025) e9.
- Li G-B, Yu Z-J, Liu S, et al. IFPTarget: A Customized Virtual Target Identification Method Based on Protein–Ligand Interaction Fingerprinting Analyses. J Chem Inf Model. 57 (2017) 1640–1651
- Wang R, Fang X, Lu Y, et al. The PDBbind Database: Collection of Binding Affinities for Protein−Ligand Complexes with Known Three-Dimensional Structures. J Med Chem. 47 (2004) 2977–2980
- Zacharias J, Knapp E-W. Protein Secondary Structure Classification Revisited: Processing DSSP Information with PSSC. J Chem Inf Model. 54 (2014) 2166–2179
- Hekkelman ML, Salmoral DÁ, Perrakis A, et al. DSSP 4: FAIR annotation of protein secondary structure. Protein Science. 34 (2025).
- Carter DSSPcont: continuous secondary structure assignments for proteins. Nucleic Acids Res. 31 (2003) 3293–3295
- Zhong Q, Pevzner SJ, Hao T, et An inter‐species protein–protein interaction network across vast evolutionary distance. Mol Syst Biol. 12 (2016).
- Tien MZ, Meyer AG, Sydykova DK, et Maximum Allowed Solvent Accessibilites of Residues in Proteins. PLoS One. 8 (2013) e80635
- Villar HO, Kauvar LM. Amino acid preferences at protein binding sites. FEBS Lett. 349 (1994) 125–130
- Soga S, Shirai H, Kobori M, et Use of Amino Acid Composition to Predict Ligand-Binding Sites. J Chem Inf Model. 47 (2007) 400–406
- Kuo H-C, Lin J-C, Ong P-L, et al. Discovering amino acid patterns on binding sites in protein complexes. Bioinformation. 6 (2011) 10–14
- van Oss CJ. Hydrophobicity and hydrophilicity of biosurfaces. Curr Opin Colloid Interface Sci. 2 (1997) 503–512
- Rose GD, Wolfenden R. Hydrogen Bonding, Hydrophobicity, Packing, and Protein Folding. Annu Rev Biophys Biomol Struct. 22 (1993) 381–415
- Tsai C, Lin SL, Wolfson HJ, et Studies of protein‐protein interfaces: A statistical analysis of the hydrophobic effect. Protein Science. 6 (1997) 53–64
- Villar HO, Kauvar LM. Amino acid preferences at protein binding sites. FEBS Lett. 349 (1994) 125–130
- Soga S, Shirai H, Kobori M, et Use of Amino Acid Composition to Predict Ligand-Binding Sites. J Chem Inf Model. 47 (2007) 400–406
- Ma B, Elkayam T, Wolfson H, et al. Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proceedings of the National Academy of Sciences. 100 (2003) 5772–5777
- Xiang X, Liu H. IDPM: an online database for ion distribution in protein molecules. BMC Bioinformatics. 19 (2018) 102
- Ahmad S, Jose da Costa Gonzales L, Bowler-Barnett EH, et al. The UniProt website API: facilitating programmatic access to protein knowledge. Nucleic Acids Res. 53 (2025) W547–W553
- UniProt: a hub for protein Nucleic Acids Res. 43 (2015) D204–D212
- Landrum et al. RDKit: Open-source cheminformatics. https://www.rdkit.org. (2025)
- Gowers RJ, Linke M, Barnoud J, Reddy TJE, Melo MN, Seyler SL, Beckstein O. MDAnalysis: a Python package for the rapid analysis of molecular dynamics simulations (No. LA-UR-19-29136). Los Alamos National Laboratory (LANL), Los Alamos, NM (United States). (2019)
- Michaud‐Agrawal N, Denning EJ, Woolf TB, et al. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. J Comput Chem. 32 (2011) 2319–2327
- Cock PJA, Antao T, Chang JT, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 25 (2009) 1422–1423
- McKinney pandas: a foundational Python library for data analysis and statistics. Python for high performance and scientific computing. 14 (2011) 1-9.
- Wickham ggplot2. Wiley interdisciplinary reviews: computational statistics. 3 (2011) 180-185.
- Tange O., GNU Parallel. (2018).
