﻿<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="editorial">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Explor Drug Sci</journal-id>
<journal-id journal-id-type="publisher-id">EDS</journal-id>
<journal-title-group>
<journal-title>Exploration of Drug Science</journal-title>
</journal-title-group>
<issn pub-type="epub">2836-7677</issn>
<publisher>
<publisher-name>Open Exploration Publishing</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.37349/eds.2023.00007</article-id>
<article-id pub-id-type="manuscript">10087</article-id>
<article-categories>
<subj-group>
<subject>Editorial</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Machine learning for drug science</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-8640-357X</contrib-id>
<name>
<surname>Jr.</surname>
<given-names>Walter F. de Azevedo</given-names>
</name>
<xref ref-type="aff" rid="I1" />
<xref ref-type="corresp" rid="cor1">
<sup>*</sup>
</xref>
</contrib>
<contrib contrib-type="editor">
<name>
<surname>Albericio</surname>
<given-names>Fernando</given-names>
</name>
<role>Academic Editor</role>
<aff>University of KwaZulu-Natal, South Africa; University of Barcelona, Spain</aff>
</contrib>
</contrib-group>
<aff id="I1">Brazilian National Research Council (CNPq), Brasília DF 71.605-170, Brazil</aff>
<author-notes>
<corresp id="cor1">
<bold>
<sup>*</sup>
</bold>
<bold>Correspondence:</bold> Walter F. de Azevedo Jr., Brazilian National Research Council (CNPq), SHIS QI 01, Conjunto B, Edifício Santos Dumont, Lago Sul, Brasília DF 71.605-170, Brazil. <email>walter@azevedolab.net</email></corresp>
</author-notes>
<pub-date pub-type="ppub">
<year>2023</year>
</pub-date>
<pub-date pub-type="epub">
<day>16</day>
<month>04</month>
<year>2023</year>
</pub-date>
<volume>1</volume>
<issue>2</issue>
<fpage>77</fpage>
<lpage>80</lpage>
<history>
<date date-type="received">
<day>29</day>
<month>01</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>08</day>
<month>02</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2023.</copyright-statement>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This is an Open Access article licensed under a Creative Commons Attribution 4.0 International License (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.</license-p>
</license>
</permissions>
</article-meta>
</front>
<body>
<p>Artificial intelligence (AI) has taken the daily news with increasing impact. The crescent growth of computational power and the rapid development of algorithms to harness this computational capacity delineate the perfect scenario for this avalanche of information about AI. Drug science is not immune to this influence, and many drug discovery projects employ AI. A search on PubMed using as strings “artificial intelligence” and “drug discovery” returned 1,149 publications up to 2022 (January 23, 2023). The histogram is shown <xref ref-type="fig" rid="fig1">Figure 1</xref>. The plot indicates a rapid increase in publications after 2018.</p>
<fig id="fig1" position="float">
<label>Figure 1</label>
<caption>
<p>Publications related to applications of AI to drug discovery found in PubMed from 1991 to 2022</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="10087-g001.tif" />
</fig>
<p>One recent application of AI to drug science is a study of rapamycin using Chat Generative Pre-trained Transformer (ChatGPT) [<xref ref-type="bibr" rid="B1">1</xref>]. This study used Pascal’s wager argument to speculate on the potential uses of rapamycin [<xref ref-type="bibr" rid="B1">1</xref>] to prolong life. ChatGPT (an AI program developed by Open AI) took the preclinical results and identified the effects of rapamycin on the extent of life. This special issue centers on machine learning (ML), a subfield of AI. ML focuses on automatically learning from data without being without explicit programming. ML techniques benefit from the explosion of biological and drug data to generate models to predict drug efficiency.</p>
<p>Computer-aided drug design (CADD) is the application of computational techniques in drug development [<xref ref-type="bibr" rid="B2">2</xref>]. CADD comprises two main computational approaches to address the design of a new drug: ligand-based drug design (LBDD) and structure-based drug design (SBDD) [<xref ref-type="bibr" rid="B3">3</xref>]. LBDD is used when the structure of a target is not available. LBDD employs experimental data available for ligands and seeks to derive a model based on descriptors to predict the efficiency of a molecule. SBDD relies on three-dimensional (3D) information about the targets. It is possible to have the structural information derived from experimental techniques [e.g., X-ray diffraction (XRD) crystallography] or computational modeling. The most striking application of the generation of 3D using computational approaches was the development of deep learning (DL) techniques to model structures of proteins [<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B5">5</xref>]. DL methods rely on multiple layers of neural networks to generate models of protein structures based on sequence information. In the taxonomy of AI, the DL techniques are included in a subfield of ML methods.</p>
<p>LBDD and SBDD benefit from ML approaches. LBDD may use ML to generate polynomial equations to predict affinity based on ligand structures [<xref ref-type="bibr" rid="B6">6</xref>]. For SBDD, it is possible to employ ML to explore the scoring function (SF) space (SFS) concept [<xref ref-type="bibr" rid="B7">7</xref>]. SFS is a mathematical space composed of SFs. These SFs predict binding affinity based on the atomic coordinates of a protein-ligand complex. It is common to use experimental structures or complexes obtained through protein-ligand docking simulations. ML techniques may explore SFS to find an adequate computational model to predict binding affinity.</p>
<p>SFS brings together ML techniques and systems biology thinking. The concept of SFS set up a systems-level approach to address the creation of computational models to calculate affinity based on the atomic coordinates. SFS abstraction is illustrated in <xref ref-type="fig" rid="fig2">Figure 2</xref>. Consider an element of the protein space [e.g., cyclin-dependent kinase 2 (CDK2)] complexed with a ligand of the chemical space (CDK2 inhibitor). ML techniques can generate an SF to estimate the affinity based on the atomic coordinates. It is possible to employ docking to create CDK2-inhibitor complexes for which binding affinity data is available. Then the dataset of CDK2-inhibitor complexes is split into two subsets, named training and test sets. ML techniques employ the training set to generate a new model to predict the binding affinity. The predictive performance is determined using metrics such as root-mean-squared error (RMSE), mean absolute error (MAE), and coefficient of determination (<italic>R</italic><sup>2</sup>). A recent study recommended this set of metrics (RMSE, MAE, and <italic>R</italic><sup>2</sup>) to validate supervised ML (SML) models in biology [<xref ref-type="bibr" rid="B8">8</xref>]. SML techniques englobe several regression methods (e.g., random forest, ensemble methods, and DL) with different predictive performances dependent on the characteristics of the dataset.</p>
<fig id="fig2" position="float">
<label>Figure 2</label>
<caption>
<p>Schematic view of the SFS. Equations on the SFS are generic polynomials only to indicate the concept of an infinite number of equations. ┄→: the best scoring function for this protein</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="10087-g002.tif" />
</fig>
<p>One key aspect of the progress of ML techniques is the available libraries for computational tools employed to generate ML models (e.g., scikit-learn). Scikit-learn was used to develop Statistical Analysis of Docking Results and SF (SAnDReS) [<xref ref-type="bibr" rid="B9">9</xref>, <xref ref-type="bibr" rid="B10">10</xref>]. This program relies on energy terms calculated using docking programs to generate a targeted SF. This state-of-the-art interpretation of CADD adds plasticity to the procedure ignoring the concept of one-size-fits-all to create SFs [<xref ref-type="bibr" rid="B7">7</xref>]. Taking this abstraction, the focus is on discovering an adequate model from the SFS for one target. With this perception, targeted SF is employed to rank protein-ligand complexes in virtual screening simulations.</p>
<p>Due to the crescent number of complexes with affinity and structural data, unexplored parts of the protein and chemical spaces are now reachable. Particularly for the protein structures, additional protein space is reachable thanks to DL techniques used to model proteins. These 3D models are available at the protein data bank (PDB) (<underline><uri xlink:href="https://www.rcsb.org/">https://www.rcsb.org/</uri></underline>) and Uniprot (<underline><uri xlink:href="https://www.uniprot.org/">https://www.uniprot.org/</uri></underline>). All this advancement brings new opportunities to create models for one protein target. Also, as software progress continues, the number of ML models to calculate binding affinity will quickly increase, making it possible to create SF databases (SFDBs). These SFDBs will keep developed SFs that could be downloaded and employed for docking simulations or to calculate the affinity for 3D structures solved using XRD crystallography, nuclear magnetic resonance (NMR) spectroscopy, or cryogenic electron microscopy (cryo-EM). The SFS concept is a new paradigm for CADD, letting us generate more consistent ML models to predict protein-drug affinity.</p>
<p>In this special issue, it is explored that the SFS concept and other emerging ML techniques used in the study of drug science. This volume has a team of authors with experience in this interdisciplinary field who contributed to this issue.</p>
</body>
<back>
<glossary>
<title>Abbreviations</title>
<def-list>
<def-item>
<term>3D</term>
<def>
<p>three-dimensional</p>
</def>
</def-item>
<def-item>
<term>AI</term>
<def>
<p>artificial intelligence</p>
</def>
</def-item>
<def-item>
<term>CADD</term>
<def>
<p>computer-aided drug design</p>
</def>
</def-item>
<def-item>
<term>CDK2</term>
<def>
<p>cyclin-dependent kinase 2</p>
</def>
</def-item>
<def-item>
<term>DL</term>
<def>
<p>deep learning</p>
</def>
</def-item>
<def-item>
<term>LBDD</term>
<def>
<p>ligand-based drug design</p>
</def>
</def-item>
<def-item>
<term>ML</term>
<def>
<p>machine learning</p>
</def>
</def-item>
<def-item>
<term>SBDD</term>
<def>
<p>structure-based drug design</p>
</def>
</def-item>
<def-item>
<term>SF</term>
<def>
<p>scoring function</p>
</def>
</def-item>
<def-item>
<term>SFS</term>
<def>
<p>scoring function space</p>
</def>
</def-item>
</def-list>
</glossary>
<sec id="s1">
<title>Declarations</title>
<sec>
<title>Author contributions</title>
<p>WFdAJ: Writing—original draft.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of interest</title>
<p>The author declares that he has no conflicts of interest.</p>
</sec>
<sec>
<title>Ethical approval</title>
<p>Not applicable.</p>
</sec>
<sec>
<title>Consent to participate</title>
<p>Not applicable.</p>
</sec>
<sec>
<title>Consent to publication</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of data and materials</title>
<p>Not applicable.</p>
</sec>
<sec>
<title>Funding</title>
<p>WFdAJ’s research is funded by CNPq (Brazil) [309029/2018-0; 306298/2022-8]. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</p>
</sec>
<sec>
<title>Copyright</title>
<p>© The Author(s) 2023.</p>
</sec>
</sec>
<ref-list>
<ref id="B1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<collab>ChatGPT Generative Pre-trained Transformer</collab>
<name>
<surname>Zhavoronkov</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Rapamycin in the context of Pascal’s wager: generative pre-trained transformer perspective</article-title>
<source>Oncoscience</source>
<year iso-8601-date="2022">2022</year>
<volume>9</volume>
<fpage>82</fpage>
<lpage>4</lpage>
<pub-id pub-id-type="doi">10.18632/oncoscience.571</pub-id><pub-id pub-id-type="pmid">36589923</pub-id><pub-id pub-id-type="pmcid">PMC9796173</pub-id></element-citation>
</ref>
<ref id="B2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vemula</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Jayasurya</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Sushmitha</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>YN</given-names>
</name>
<name>
<surname>Bhandari</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>CADD, AI and ML in drug discovery: a comprehensive review</article-title>
<source>Eur J Pharm Sci</source>
<year iso-8601-date="2023">2023</year>
<volume>181</volume>
<elocation-id>106324</elocation-id>
<pub-id pub-id-type="doi">10.1016/j.ejps.2022.106324</pub-id><pub-id pub-id-type="pmid">36347444</pub-id></element-citation>
</ref>
<ref id="B3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aparoy</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Reddy</surname>
<given-names>KK</given-names>
</name>
<name>
<surname>Reddanna</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Structure and ligand-based drug design strategies in the development of novel 5- LOX inhibitors</article-title>
<source>Curr Med Chem</source>
<year iso-8601-date="2012">2012</year>
<volume>19</volume>
<fpage>3763</fpage>
<lpage>78</lpage>
<pub-id pub-id-type="doi">10.2174/092986712801661112</pub-id><pub-id pub-id-type="pmid">22680930</pub-id><pub-id pub-id-type="pmcid">PMC3480706</pub-id></element-citation>
</ref>
<ref id="B4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Baek</surname>
<given-names>M</given-names>
</name>
<name>
<surname>DiMaio</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Anishchenko</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Dauparas</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ovchinnikov</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>GR</given-names>
</name>
<etal>et al.</etal>
</person-group>
<article-title>Accurate prediction of protein structures and interactions using a three-track neural network</article-title>
<source>Science</source>
<year iso-8601-date="2021">2021</year>
<volume>373</volume>
<fpage>871</fpage>
<lpage>6</lpage>
<pub-id pub-id-type="doi">10.1126/science.abj8754</pub-id><pub-id pub-id-type="pmid">34282049</pub-id><pub-id pub-id-type="pmcid">PMC7612213</pub-id></element-citation>
</ref>
<ref id="B5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>de Azevedo</surname>
<given-names>WF</given-names>
</name>
</person-group>
<article-title>Application of machine learning techniques for drug discovery</article-title>
<source>Curr Med Chem</source>
<year iso-8601-date="2021">2021</year>
<volume>28</volume>
<fpage>7805</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.2174/092986732838211207154549</pub-id><pub-id pub-id-type="pmid">34911417</pub-id></element-citation>
</ref>
<ref id="B6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Weber</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>MacKerell</surname>
<given-names>AD Jr</given-names>
</name>
</person-group>
<article-title>Computer-aided drug design: an update</article-title>
<source>Methods Mol Biol</source>
<year iso-8601-date="2023">2023</year>
<volume>2601</volume>
<fpage>123</fpage>
<lpage>52</lpage>
<pub-id pub-id-type="doi">10.1007/978-1-0716-2855-3_7</pub-id><pub-id pub-id-type="pmid">36445582</pub-id></element-citation>
</ref>
<ref id="B7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ross</surname>
<given-names>GA</given-names>
</name>
<name>
<surname>Morris</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Biggin</surname>
<given-names>PC</given-names>
</name>
</person-group>
<article-title>One size does not fit all: the limits of structure-based models in drug discovery</article-title>
<source>J Chem Theory Comput</source>
<year iso-8601-date="2013">2013</year>
<volume>9</volume>
<fpage>4266</fpage>
<lpage>74</lpage>
<pub-id pub-id-type="doi">10.1021/ct4004228</pub-id><pub-id pub-id-type="pmid">24124403</pub-id><pub-id pub-id-type="pmcid">PMC3793897</pub-id></element-citation>
</ref>
<ref id="B8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Walsh</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Fishman</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Garcia-Gasulla</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Titma</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Pollastri</surname>
<given-names>G</given-names>
</name>
<collab>ELIXIR Machine Learning Focus Group</collab>
<name>
<surname>Harrow</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Psomopoulos</surname>
<given-names>FE</given-names>
</name>
<name>
<surname>Tosatto</surname>
<given-names>SCE</given-names>
</name>
</person-group>
<article-title>DOME: recommendations for supervised machine learning validation in biology</article-title>
<source>Nat Methods</source>
<year iso-8601-date="2021">2021</year>
<volume>18</volume>
<fpage>1122</fpage>
<lpage>7</lpage>
<comment>Erratum in: Nat Methods. 2021;18:1409–10.</comment>
<pub-id pub-id-type="doi">10.1038/s41592-021-01205-4</pub-id><pub-id pub-id-type="pmid">34316068</pub-id></element-citation>
</ref>
<ref id="B9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xavier</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Heck</surname>
<given-names>GS</given-names>
</name>
<name>
<surname>Avila</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Levin</surname>
<given-names>NMB</given-names>
</name>
<name>
<surname>Pintro</surname>
<given-names>VO</given-names>
</name>
<name>
<surname>Carvalho</surname>
<given-names>NL</given-names>
</name>
<etal>et al.</etal>
</person-group>
<article-title>SAnDReS a computational tool for statistical analysis of docking results and development of scoring functions</article-title>
<source>Comb Chem High Throughput Screen</source>
<year iso-8601-date="2016">2016</year>
<volume>19</volume>
<fpage>801</fpage>
<lpage>12</lpage>
<pub-id pub-id-type="doi">10.2174/1386207319666160927111347</pub-id><pub-id pub-id-type="pmid">27686428</pub-id></element-citation>
</ref>
<ref id="B10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bitencourt-Ferreira</surname>
<given-names>G</given-names>
</name>
<name>
<surname>de Azevedo WF</surname>
<given-names>Jr</given-names>
</name>
</person-group>
<article-title>SAnDReS: a computational tool for docking</article-title>
<source>Methods Mol Biol</source>
<year iso-8601-date="2019">2019</year>
<volume>2053</volume>
<fpage>51</fpage>
<lpage>65</lpage>
<pub-id pub-id-type="doi">10.1007/978-1-4939-9752-7_4</pub-id><pub-id pub-id-type="pmid">31452098</pub-id></element-citation>
</ref>
</ref-list>
</back>
</article>