In Silico Discovery of Natural Products Against Dengue RNA-Dependent RNA Polymerase Drug Target

The viral infection caused by the dengue virus (DENV) is one of the most challenging diseases in the tropical regions of the world. The absence of drugs for dengue to this date calls for intense efforts to discover and develop the much coveted therapeutics for this mosquito-borne disease. One of the most attractive antiviral targets is the DENV RNAdependent RNA polymerase (RdRp), which catalyzes the de novo initiation as well as elongation of the flavivirus RNA genome. In this work, almost 5000 natural products were docked to DENV RdRp. The top 197 molecules with greater binding energies than the known ligand of the target were further clustered down to furnish 35 classes of molecular structures. These compounds with satisfactory predicted drug properties and with known natural origin can be further explored to pave the way for the first anti-dengue drug.


Introduction
Mosquito-borne viruses or flaviviruses of the Flaviviridae family are composed of pathogens that cause major public health concerns [1,2]. One of the medically important flaviviruses is Dengue Virus (DENV) for which the only available strategy to date to address this health problem is to control if not eradicate the mosquito (Aedes spp.) vector [2]. Although appropriate vector control is effective, its successful implementation has been difficult due to political, geographical, and logistical considerations. Although vaccination is also an effective strategy against the disease, as exemplified by the success of the vaccine against the flavivirus Yellow Fever Virus (YFV), the just recently marketed vaccine (Dengvaxia) has poor efficacy in dengue naïve individuals and even make future dengue infections more severe [3]. The development of vaccines against dengue is complicated by the emergence of four serotypes (DENV1, DENV2, DENV3, DENV4) and the lack of a suitable animal model for dengue disease [1][2][3][4]. Infection with one of the serotypes can lead to influenza-like symptoms, which may progress to the more severe dengue hemorrhagic fever and dengue shock syndrome, or even death especially when the disease area lacks approved therapies [5,6]. DENV infections have gone to a collective estimate of 50-100 million cases annually and spread over a wide geographic distribution especially in tropical and subtropical regions [3,4,7]. The Association of Southeast Asian Nations (ASEAN) member-countries have the highest number of dengue incidence in the Asia-Pacific region, accounting for 75 percent of the dengue cases worldwide [8,9].
Recent developments in research on dengue mitigation argue in favor of antiviral chemotherapy. It has been shown that high viral load is highly correlated with the development of the more severe, life-threatening form of the disease. Thus, a drug able to reduce viral load at an early stage would potentially prevent the proliferation of the disease. In endemic outbreaks, prophylactic mass treatment around index cases is essential. Decreasing viral infection in humans should result in a decrease in the number of infected vectors and thus interruption of the transmission chain. Therefore, an anti-DENV drug, delivered at the outset of the disease, might not only save lives but also curb potential epidemics. Moreover, a readily available drug would allow a rapid response in the case of a sudden outbreak, and unlike prospective vaccines, on-theshelf drugs do not require cold storage, a distinct benefit for use in impoverished areas in the country. Moreover, with the use of computational techniques, it is now possible in the early stage of drug discovery to focus on molecules that are inexpensive and readily accessible such as those that can be obtained from natural sources.
DENV has a single-stranded, ~11 kb, positive-sense RNA genome [10] that plays an active role in providing RNA signals that act as regulators of the viral replication process [11]. It encodes a single long open reading frame (ORF), lined by three structural and seven non-structural proteins. The structural proteins include the capsid, membrane, and envelope proteins while the nonstructural proteins are NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5 [12]. In flaviviruses, success in viral replication is credited to these nonstructural proteins and among them, nonstructural protein 5 (NS5) is the largest and most conserved [13][14][15]. DENV NS5 contains 900 amino acids and is bifunctional [16]. It is a large oligomer with a methyltransferase (MTase) domain at the N-terminal and an RNA-dependent RNA polymerase (RdRp) domain at its Cterminal that play key enzymatic roles such as catalyzing 5'-RNA methylation and RNA synthesis, respectively [14,17,18]. These enzymes are encoded in a single multifunctional protein and are found as separate domains within a polypeptide [15]. One of the most attractive antiviral targets is the DENV RdRp, which catalyzes de novo initiation as well as elongation of the flavivirus RNA genome [17,19]. Viral polymerases, such as RdRp, are clinically-proven therapeutic targets. Also, RdRp is the most conserved viral protein among all four serotypes of DENV [19], as well as among other flaviviruses such as the West Nile Virus (WNV) and Yellow Fever Virus [16], making DENV RdRp an important target for anti-dengue design. Crystal structures of the RdRp domain and full-length NS5 have been determined [14][15][16]. Its architecture assumes a right hand-like configuration consisting of the "fingers", "palm", and "thumb" subdomains -a formation recognized as characteristic of polymerase [16,[20][21][22] and the palm domain structure is particularly conserved in all RNA and DNA polymerases [23]. In fact, several studies have been performed to search for DENV RdRp inhibitors [24][25][26][27][28][29].
Natural products play a dominant role in the discovery of leads for the development of chemical therapeutics by serving as constant sources of bioactive lead compounds. In the area of cancer, for example, 49% of the 175 new drugs introduced from the years 1940s through 2014 had been either isolated from natural sources or directly derived from them [30]. Meanwhile, molecular docking has become an increasingly important tool for drug discovery. Compared to traditional experimental high-throughput screening, virtual screening, such as with molecular docking, is an inexpensive, fast, direct, and rational drug discovery approach. We have, for example, demonstrated the utility of computer-aided techniques in the identification of potential leads for tuberculosis [31][32][33][34][35][36][37]. In this study, over 4000 natural products from ChemDiv Natural-Product-Based Library and over 800 in-house natural products from the Philippines were docked to dengue RdRp. The top hits were further evaluated in silico for their pharmacokinetics and pharmacodynamics properties.

Materials and Methods
All computations were performed with the use of Discovery Studio (DS) v4.5 [38] running on Windows 7 operating system in a machine with an Intel® CoreTM i7-3770 and 3.40GHz quad-core processor.

Preparation of the crystal structure of Dengue NS5 RdRp
The crystal data of the protein structure was obtained from the Protein Data Bank (www.rcsb.org) with the accession code 5F3T (Figure 1). This particular protein entry is the crystal structure, solved at 2.05 Å-resolution, of the dengue virus RdRp, which co-crystallized with an inhibitor called JF-31-MG (JF31), which was identified through fragment-based screening [19]. The structure was cleaned up in preparation for succeeding processes. The preparation step included insertion of missing atoms in residues and removal of alternate conformations, removal of water and ligand molecules, insertion of missing loop regions based on their SEQRES data, optimization of loops regions with the LOOPER algorithm, minimization of the loop regions, and protonation of the structure. The location of the pocket containing the co-crystal ligand (JF31) was the basis in defining the binding site. The protein is represented by a solid ribbon, colored blue to red from the N terminal to the C terminal, respectively.

Natural compounds libraries
Two sets of libraries were prepared and screened for this study. The first set was composed of 847 natural compounds that originated from Philippine plants and microorganisms. They were gathered from literature and sketched using Accelrys Draw and the files were saved in .mol format. The second set consisted of over 4,000 natural compounds from the ChemDiv Natural-Product-Based Library (http://www.chemdiv.com). The Ligand Preparation protocol in DS prepared the ligands for input into succeeding protocols by performing tasks such as removing duplicates, enumerating isomers and tautomers, and generating 3D conformations. Default parameters were applied except the Lipinski filter, which was turned off. Lipinski filter was not applied in this study based on several accounts, which showed that some compounds have become successful drug candidates, including natural compounds, despite non-compliance to Lipinski's Rule of Five [39,40].

Virtual Screening via Molecular Docking
The pre-prepared protein model was defined as the receptor and the position of JF-31-MG46 as determined in the crystal structure was defined as the binding site in the docking experiments.
To verify the protein model, JF-31-MG46 was docked and the theoretical receptor-ligand complex binding free energy (ΔG) was calculated. The values obtained for the 5F3T/JF-31-MG46 complex was considered as the baseline in this virtual screening work.
The CDOCKER in DS employs the CHARMm force field [41,42] and uses a grid-based molecular docking that allows the docking of multiple ligands against a single protein receptor. It holds the receptor rigid while the ligands are allowed to be flexible during the refinement. The random ligand conformations are generated from the initial ligand structure through high-temperature molecular dynamics, which are followed by random rotations to simulate annealing and full force field minimization.
The natural product compounds were docked into the receptor model and were ranked based on the calculated binding energy of the ligand-target complex. The compounds with greater (more negative) binding energies than that of JF-31-MG46, when in complex with the receptor, were selected as the top hits.

In Silico Prediction of Pharmacokinetics and Toxicity Properties
The ADMET protocol in DS was applied to predict the human intestinal absorption, aqueous solubility, cytochrome P450 2D6 inhibition, hepatotoxicity, and plasma protein binding. The ADMET protocol of DS uses rule-based models to predict the ADMET (absorption, distribution, metabolism, excretion, toxicity) characteristics of a compound based on previously known information. The human intestinal absorption (HIA) model was developed using standard multivariate method involving partition coefficient (AlogP) and polar surface area (PSA) descriptors from 234 training compounds [43]. The aqueous solubility was obtained using multiple linear regression method on a training set of 775 compounds with experimentally measured solubilities in water at 25 o C [44]. The plasma protein binding potential was calculated based on a model developed using AlogP98 and 1D similarity predictors [45]. CYP2D6 predicts the ability of a molecule to bind to the cytochrome P450 2D6 enzyme based on inputted 2D chemical structure. The CYP2D6 model was developed with the use of decision tree technique with a training set consisting of 100 compounds with known CYP2D6 inhibition data [46]. Moreover, the hepatotoxicity model predicts the human hepatotoxicity based on a recursive partitioning approach involving 382 training compounds with known information on liver toxicity in humans [47].
Moreover, the TOPKAT (TOxicity Prediction by Komputer Assisted Technology) protocol was performed to assess toxicity measures such as Ames Mutagenicity, Weight of Evidence Carcinogenicity, and Developmental Toxicity Potential. TOPKAT employs cross-validated Quantitative Structure Toxicity Relationship (QSTR) models in predicting various toxic endpoints (https://www.3ds.com). The TOPKAT models were developed from topology-based structural descriptors by using multiple linear regression method. Furthermore, Optimum Prediction Space (OPS) validation method is utilized to assist in interpreting the results. Low confidence is assigned to a toxicity prediction that lies outside the OPS [48].

Clustering of Top Hits.
The list of top hit compounds was clustered to further reduce the number of compounds. Firstly, the Calculate Molecular Properties protocol was performed to assign molecular fingerprints of the top hits, applying FCFP_20, which uses functional classes (e.g., H-bond acceptors, H-bond donors, positive ionizable groups, negative ionizable groups, halogens, and aromatic groups) as its atom abstraction method. This generates extended-connectivity fingerprints, which characterizes the features of a molecule. Subsequently, the Cluster Ligands protocol was run to group the top hits with similar features. This is done because structurally similar molecules would most likely exhibit similar properties. Molecules, or cluster centers, were obtained by assigning a compound that would best represent a particular group in terms of similarity particularly the Tanimoto distance measure [49].

Molecular Docking
Molecular docking aims to predict the structure of ligand-receptor complexes using computational methods that involves a flexible ligand and a rigid receptor. The docked complexes are evaluated based on their binding energies (BE). The results of the docking of the known ligand JF-31-MG46 into the prepared and optimized RdRp receptor model were consistent with the documented interactions that contribute to the stability of the complex in vitro [19]. Specifically, the model was able to simulate the interactions with the following residues: C709, S710, R729, R737, M761, Y766, H798, Y803 (Figure 2). Table 1 shows the free energy change (ΔG) for the best pose for each of the 35 representative ligands and their ADMET predictions. The ΔG values represent the first filter in the selection of putative inhibitors among the compounds studied. The 35 ligands (Figure 3) represent the 197 top hits, which exhibited more negative binding energies than the known inhibitor, JF-31-MG46. Out of these, almost half (17) were reported to have been isolated from Philippine natural sources, the remaining half came from ChemDiv database of natural products. Eleven of the 17 natural products from the Philippines were isolated from plants, 1 from bacteria, 1 from fungi, and 4 were obtained from marine organisms. The top 4 hits, namely, (E)-3-Acetyl-4-(3,7-dimethylocta-2,6dienyloxy)-2,6-dihydroxybenzoic acid [50], (-)-5-methoxybalnophonin [51], 3′-formyl-2′,4′,6′trihydroxy-5′-methyldihydrochalcone [52], and bruceantin [53] originated from plants species.   The molecular structures shown here are those obtained after ligand preparation, and may be slightly different from the original source as a result of isomerism or change in protonation state.

Clustering of Top Hits.
By clustering the 197 molecules into classes or clusters, each molecule is assigned into a group such that molecules within a cluster share similar properties. It is based on the root-mean-square (RMS) difference of descriptor properties or Tanimoto distance for fingerprints. As a result, the top 197 hits were narrowed down to 35 cluster representatives shown in Figure 3.

ADME-Tox Prediction for Top Hits.
Most drug candidates fail in clinical trials due to poor ADME properties. Therefore, it is important to avoid the compounds with poor druglike properties even at the outset of any drug discovery endeavor. Optimizing these properties during early drug discovery is crucial for reducing ADMET problems later in the development process. Thus, to make the drug discovery process more time-and cost-efficient, an additional filter was applied in silico to predict the Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) properties of the bioactive compounds based solely on chemical structures. The ADMETox profiles predicted for the 35 top hits are also summarized in Table 1. The results showed that the 35 top hits, having gone through the ADMETox filter were indeed predicted to have no serious toxicity issues and that they possess the desirable drug properties; except for a few with high probability of being mutagen (i.e. 12, 15, 16 and 31), carcinogen (i.e. 11,17, 26, 28, and 33), and toxic to fetus (i.e. 6, 19, 23, 24 and 32). The rest have no predicted toxicity problems, i.e. with optimal solubility, good intestinal absorption (except 4, 7 and 22), are non-hepatotoxic (except 15, 20, 25, and 35), noninhibitor of Cytochrome P450 (CYP 2D6), and majority of them have low plasma protein binding potential. These natural compounds can now be pursued as potential leads in the discovery of the first anti-dengue drug.

Conclusion
Virtual screening by molecular docking of almost 5000 natural products against dengue RNAdependent RNA polymerase was performed in order to find promising leads for dengue. The top 197 molecules were further clustered down to furnish 35 distinct molecular architectures that can be further elaborated in silico, subsequently synthesized, and tested experimentally. These compounds with satisfactory predicted drug properties and with known natural origin are attractive subjects for further exploration to secure the first ever anti-dengue drug.