Prediction of pec50(M) and Molecular docking Study for the Selective inhibition of arachidonate 5-liPoxygenaSe

arachidonate 5-lipoxygenase (aLOX5) is considered a prime target for drug discovery in the area of liver fibrosis, rheumatoid arthritis, atherosclerosis, cancer and asthma. To date, the lead rate in the discovery of drugs that inhibit aLOX5 for the treatment of the above diseases is not satisfactory. So, the development of powerful and effective ALOX5-targeted drugs is desired. In this regard, Quantitative Structure-Activity Relationship (QSAR) and molecular docking can have a major role in screening and designing drugs. In this work, 3D-QSAR models were proposed, which were built using the techniques like Multiple Linear Regres sion (MLR), and Partial Least Squares (PLS) for the pEC50(M) taking a diverse dataset of 112 molecules. The technique of the ‘Index of Ideality of Correlation (IIC)’ was also investigated to generate an optimal descrip tor derived from the SMILES molecular structure. The effect of the number and nature of descriptors on the model were analyzed. The models can be helpful in providing better directions for the development of novel drug targets for 5-lipoxygenase. A significant improvement in the stability of the model was observed by the incorporation of the optimal descriptor. The molecular docking results showed that the aLOX5 receptor was well inhibited by the 112 ligands showing the least binding energy (-10.8 Kcal/mol). In order to validate the binding mode of the ligands docked with autoDock Vina software, the top-scored compounds were re-docked using DockThor online docking server. The results obtained from docking suggest that the ligands with IDs 18, 20, 24, 30 and 44 are some of the potential inhibitors for aLOX5.


5
-Lipoxygenase (5-LO or ALOX5) is an important enzyme that helps in producing proinflammatory mediators, like leukotriene B4 and cysteinylleukotrienes. Inhibition of ALOX5 can be a potential remedial approach for inflammation. At present, the orally active inhibitor for the ALOX5 is Zileuton brought by Abbott Laboratories (1996). The other brand names associated with Zileuton are Zyflo and Zyflo CR. However, it is reported this drug is known to cause liver diseases or liver toxicity [1]. The discovery of potent drugs that inhibit ALOX5 drug targets, without any harmful side effects like liver toxicity is a challenging task.
QSAR is an important chemometric method that is widely used in virtual screening to discover new leads [2][3][4][5][6]. A number of high-quality research papers based on QSAR study are available in literature [7][8][9][10][11][12]. An inclusive study on the present system of ALOX5 is necessary for the discovery of potent drugs without any side/or toxic effects. However, the literature review revealed, it is now gaining the right attention and is all set to achieve the momentum. Five good QSAR models for developing benzoquinone derivatives as ALOX5 inhibitors using CoMFA , RF, MLR, and SVM were described in [13]. Another group of authors has used QSAR doi: https://doi.org/10.15407/ubj93.06.101 models to indicate the significance of the chemical characteristics for the ALOX5 inhibition for a sequence of coumarin derivatives [14]. The binding affinity and interactions with the active sites of human 5-LOX were estimated by the molecular docking study. Computational methods were implemented to get 5-LOX inhibitors and to screen chalcone and flavones derivatives [15]. The potential hits from gridbased ligand docking with energetics were re-docked using Genetic Optimization. 3D QSAR models were developed using IC 50 values of 51 compounds for analyzing the biological activities of the inhibitors of 5-LOX with R 2 > 0.75 [16]. In the literature, QSAR study on EC50 values for the ALOX5 inhibition has not been reported until now.
In this communication, QSAR models were built using MLR and simplified molecular input line entry system(SMILES). A good combination of 2D, 3D and optimal descriptor is evaluated for the prediction of pEC50 values for theALOX5 receptor. In our recent publication the prediction of pEC50 values for Adenosine A2A Receptor was reported, where the incorporation of the optimal descriptor showed better performance [17]. In this article, the combination of 2D, 3D and optimal descriptor is evaluated for the prediction of pEC50 values for the ALOX5 receptor. Molecular docking has also been performed to find the active binding sites available in ALOX5 receptor.

Molecules, Software codes and methods
Data set preparation and data reduction. The dataset contained EC50 (nM) values of 112 different inhibitor compounds for the ALOX5 receptor derived from the popular Binding database [18]. The EC 50 values converted to their pEC50 equivalent (negative decimal logarithm of EC50). Spartan-10 and OpenBabel were used to generate the SMILES and MDL (.mol) chemical structures for these molecules from the SDF (structure data file).
Descriptors. PaDEL_2.18 was used to obtain more than 900 molecular 2D and 3D descriptors. CORAL software [19][20][21] was used to generate the optimal descriptors based on SMILES. Preliminarily scans were performed using 100 descriptors each time. The preliminary scan was performed to identify high correlation descriptors for ALOX5.
Optimal descriptor and Index of Ideality of correlation (IIc). The application of 'Index of Ideality of Correlation (IIc)′ in QSAR/QSPR is elaborated in the literature [27][28][29][30][31]. The established principle of IIC is obtained from the distribution of points in the graph plotted between experimental and observed pEC50 values [32]. The optimal descriptor (DCW) can be obtained by using the following mathematical relationships : (2) where HARD, Sk, SSk and SSSk are parts of SMILES code of each molecule [31,33,34] and the calculation of IIc cb is represented elsewhere [28][29][30][31]: The calculation of DCW for the SMILES molecular structure: Clc4cc(CCc3c(c(= O)OCC) c1c(c2ccccc2c(O)c1)[nH]3)ccc4 (ID-1) is supplied in the Table 6S of the Supplementary file. The individual correlation weight(CWs) of each smiles attribute(SA k ) such as Cl... = 0.3084, c.. = -0.0554, similarly (SA kk ) c...Cl......= 0.3995 and so on. Therefore for the full SMILES code, the DCW was calculated using the eq. 1, which became 4.54963. The extrapolative QSAR model on the pEC50 for ALOX5 could be modelled by the following simple mathematical linear equation: The above eq. 8 has been used to develop the QSAR model in this work which is described in section 3.2. The optimal parameter DCW has also been used to develop hybrid QSAR model detailed in section 3.3. Molecular Docking. Molecular docking is one of the most outstanding techniques used in recent days for the realistic design of drugs. Docking deals with the process of binding drugs with the protein by determining active binding sites. Since molecular docking facilitates the realization of the biological activities of the drug targets, these days it has become a widely used technique in discovering effective drugs for a variety of diseases. In this article, the development of the therapeutic activities of the above-mentioned 112 different inhibitors against ALOX5 has been analyzed through the docking process. Molecular docking was carried out to estimate the binding energies and site interactions to evaluate the inhibition potential of the ALOX5 main protease.
The .pdb files for the ligands were prepared using OpenBabel (ver. 2.3.2). The crystallographic structure of ALOX5 was taken from the Protein Data Bank. The PDB code of the protein structure was 3O8y. For docking purposes, the protein structure derived from the data bank and then was prepared by removing all water molecules, adding polar hydrogen and Kolman charges. Molecular docking was executed with the help of Autodock vina [38] and some other Auto Dock Tools. Docked structures and interaction of the ligand with the protein residues in the active sites were analyzed by Discovery Studio Visualizer.

QSar Models for alox5 inhibition
QSARINS (QSAR-Insubria) software [22] code developed at the University of Insubria was applied to build the desired robust QSAR models. Since more than 900 different descriptors were there and taking all descriptors simultaneously to build the model requires heavy computational time, preliminary scans were performed using 100 descriptors in each case to identify potential descriptors for the 5-LOX receptor. Finally, a descriptor set of 33 molecules with their pEC50(m) was chosen to construct QSAR models using GA. The 112 ligands dataset for the ALOX5 receptor was randomly distributed into four different sets such as the training, invisible training, calibration and validation sets as the robustness of a model basically depends on the quality of the training set and during the training process, the decrease in performance for a model due to overtraining can be significantly removed by the calibration set [23][24][25]. Once, these processes of the model building were over, the performance of the QSAR models was further verified by an external validation set. A linear regression model was formed between the response variable pEC50 and the descriptors using the ordinary least squares (OLS) method. The models were arranged in accordance with their R 2 , Q 2 , R 2 -Q 2 and RMSE values. Internal and external validation methods and principal component analysis were explored to verify the robustness and predictability of the constructed models. The model with the minimum value of R 2 -Q 2 was considered more stable.
2D-QSAR models. The semi-empirical (AM1) quantum chemical calculation was used for the geometrical optimization of the molecules. After the optimization process, mathematical models have been developed with a good collinearity between the descriptors with the endpoint.
The 2D-descriptors like MATS3c,SpMax1_ Bhp andATSC6s showed excellent correlation with the experimental pEC50 values for ALOX5. The model built with these descriptors mathematically described in the following equation.
Internal and external validation parameters for these models and performance towards the validation set are presented in Table 1, 2 and 3. The R 2 for the training set is 0.7340 (Eq. 9) and for the validation set is 0.9481 (Eq. 9). For a robust model Average R 2 m should be greater than 0.5 and ΔR 2 m should be lower than 0.2 [26], where as k and k′ should be in the range of 0.85 and 1.15. For the above model (Eq. 9), the values of Average R 2 m and ΔR 2 m , k and k′ are within the required ranges. The model qualified the required internal and external validation characteristics to justify that this is a robust model.
Single Optimal Descriptor Based QSAR Models . The model based on the optimal descriptor defined in section 2.3 is described in Eq. 10. For this model the optimal descriptor DCW was first determined from their SMILES Attributes (SAs). The calculation of these type of descriptors is described in literature [23,[34][35][36]. This model displayed very good statistical parameters (Table 1-3). However , to describe any model as robust only these statistical parameters such as R 2 (test set), Q 2 (test set), etc. are not sufficient. Another validation characteristic like c R 2 p is also needed [37]. For good models, the value of c R 2 p should be greater 0.5. For the model defined by Eq. 10 the c R 2 p was found to be 0.8718 (training), 0.9443 (validation) and 0.7971 (test set). pEC 50 = 6.7556863 (± 0.0067206) + +0.0755563 (± 0.0003592) * DCW(1,7) (10) The plot of the experimental and predicted pEC 50 obtained by Eq. 10 is given in the Fig. 1.
Hybrid QSAR models(2D,3D& DCW). Some models were built using 2D descriptors, 3D descriptors and the optimal descriptor introduced in the previous section. Fig. 2(a) shows R 2 and Q 2 values for the training dataset without the optimal descriptor. From this figure, it is clear that as the number of variables is increased, the R 2 and Q 2 values rise up to a maximum of six variables then a decrement in Q 2 is detected. The maximum R 2 (training set) is 0.7340 for the three variable QSAR model (Eq. 9) with R 2 ext 0.9481 (Eq. 9). Increasing the number of descriptors (more than 3)shows a marginal increase in the R 2 (training set) with a significant reduction in the internal and external validation characteris- Average R 2 tics. Such a type of performance occurs due to the overtraining of the dataset. Therefore the optimal descriptor has been introduced in the model. The graph for R 2 and Q 2 with the optimal descriptor is presented in Fig. 2(b) which shows a significant enhancement in the R 2 and Q 2 values. Principal Component Analysis (PCA) for the descriptors Eaq (kJ/mol), XLogP, E HOMO (kJ/mol), MATS6c and DCW(1,7) was studied by means of the score plots and loading plots. Fig. 3 is the score plot for the descriptors Eaq (kJ/mol), DCW(1,7), XLogP and MATS6c which form the hybrid model (Eq. 11). It clearly shows that molecule number (ID = 77) is an outlier. Similarly, PCA Loading plot for the descriptors Eq. 11 is given in Fig. 4 for the above four descriptors.
One of the robust QSAR models built with DCW(1,7) as one of the descriptors is defined as follows: pEC50 = 5.7771 + 0.0001 (Eaq (kJ/mol))+ + 0.0727 (DCW(1,7)) -0.0642 (XLogP) + + 0.8800 (MATS6c) (11) For the model presented in this section (Eq. 11) the R 2 for the training set is 0.8681. The R 2 for the validation set is 0.9762. Tables 1, 2 and 3 show the internal and external validation parameter values for this model. The values of average R 2 m , ΔR 2 m , k and k′ are within the required ranges confirming the robustness of this hybrid model. The value of R 2 shows a good fit for modelling ALOX5 inhibition. The LOF is very small which makes sure that there is no overfitting. The low value of Kxx specifies that the correlation between the model descriptors is very less resulting in a model having the least redundant in- Fig. 3. Score plot for model 8 (Eq. 11) formation in the descriptors. Fig. 5(a) demonstrates a comparison between the predicted values and the experimental values of pEC50 for the training dataset.

Model validation according oecd principles
OECD principles were used to ascertain the efficiency of the QSAR models proposed in this work. According to these principles, the models should have a definite endpoint. The endpoint for the described models is pEC50. The second principle says that models should be represented using a definite algorithm that can derive a proper relationship between the descriptors and the endpoint. The algorithms used to obtain such a relationship here are MLR and OLS. The third principle states that the models can have reliable predictions with leverage values below the critical leverage with ±3 standard deviations. William graph was used to represent the applicability domain (DA) of the models. According to the fourth principle, the difference between the experimental values and forecast values should be the minimum. The difference between the experimental values and the values predicted by the models was very low. The goodness of the fit of the models was measured with the coefficient of determination (R 2 ) and adjusted R 2 (R 2 adj ). R 2 is used to compare between the predicted and experimental activities. The difference between the R 2 and R 2 adj value for the defined models were less than 0.3 which indicates that the number of descriptors involved in the QSAR model is acceptable. The value for R 2 adj indicates the ease of adding a new descriptor to the model. The fit of the QSAR models can be determined by rootmean-squared error (RMSE). This method is used to The validity of the models was evaluated by the OECD principles and their regulatory values. The models were validated by the LOO and LMO internal validation methods. The obtained results authenticate the internal predictions as the value estimated by LOO (Q 2 LOO ) is almost the same as the R 2 value signifying the reliability of the defined models. The error in the predictions is very low. Fig. 5(b) presents the similarity between the experimental values and values estimated by LOO (leave-one-out). The Leaving -Many-Out (LMO) method that leaves out thirty percent of the dataset to evaluate the performance of the models is very helpful in the validation process as each deviation of data is treated as important, unlike LOO.
The statistical validation parameters for the defined models are presented in Tables 1, 2 and 3. For good predictability R 2 -Q 2 value should not exceed 0.3. It can be seen in Table 2 that, the difference between R 2 and Q 2 LMO is very less than 0.3, authenticating the models as robust.
From the statistical measures, it is clear that the QSAR models defined in this work satisfy both the  internal and external validation criteria as required. Moreover, the models with the optimal descriptor are showing better results and therefore they can be regarded as robust and can be considered for further applications in drug discovery. Fig. 6(a) shows the correlation between descriptors and pEC 50 inhibition (K xy ). From the figure, it can be observed that the values of Q 2 LMO are very alike authenticating the models as a good fit. The Y-scrambling method has been carried out to exhibit that the models are not the result of casual correlations. The low values of R 2 Y-scr and Q 2 Y-scr indicates the robustness of the developed models. The R 2 Y-scr and Q 2 Y-scr values against R 2 and Q 2 are presented in Fig. 6(b). The R 2 and Q 2 values are far away from the R 2 y-scr and Q 2 y-scr values confirming the nonexistence of random correlation in the model. The extrapolative capability of the models was evaluated with Y-scr some external prediction tools such as R 2 ext [37], RMSEext (RMSEext is the Root Mean Square Error in external prediction), MAEext (Mean Absolute Error in external prediction), PRESSext (Predictive Residual Sum of Squares in external validation), , CCC ext (CCC in external prediction), average R 2 m and ΔR 2 m . These values are similar to the values calculated by the training set. Since the predictions that are within the applicability domain (AD) are considered reliable the approach of leverage (h) and standardized residuals were also applied here to present the AD of the models. The leverage value for the defined hybrid model is calculated as 0.140. Fig. 7(a) presents William's graph for Eq. 11 which shows that the majority of the compounds are within the AD of the model. Fig. 7(b) is the William's graph calculated by LOO for the same model. In both graphs, the molecule with ID = 77 is an outlier. The prediction of the outlier by William's plot justifies and gives a second confirmation after the prediction by the score plot obtained by the PCA study (Fig. 3). The graph of Insubria (Fig. 8) of QSARINS facilitates visualization of the model's AD. It can find out the molecules lacking experimental response. Here, it is quite similar to William's graph. The graph of Insubria is also indicating that molecule number 77 is an outlier.

docking results
The results of the docking process described in this section include the docking scores for different compounds with different EC values. The highest negative binding energy indicates the best docking score. Table 4 shows the interaction between the protein and different ligands in 3D. It provides the docking scores of the interaction of the protein with different ligands. To validate the docking results of Autodockvina the protein receptor was redocked with the ligands which scored high ranks in the docking process performed with Autodock Vina. The re-docking was performed using Dock-Thor online docking server. The ordering of the ligands on the basis of their scores is the same for both the docking protocols, which enhances the validity of the docking results obtained from the first attempt using AutodockVina. Table 5 shows docking scores for different conformations of the ligand 30. The scores are quite similar indicating favourable interactions between the binding sites and the ligand.
The 2D figures of the ligand's interaction with the active site residues of the protein target are presented in this section ( Fig. 9-12). PHE450, GLN549, TYR470, ALA453, SER447, ARG370, ALA456, ARG457, VAL243, ARG246, LEU244, VAL361, LEU288, ASP285, and GLU287 are found to be the active site residues of the receptor. Hydrogen bonds are a primary contributor factor in supporting the binding affinity of drugs with the receptor. Strong hydrogen bonding interaction represents a high binding capability between the ligand and the protein. The ligands 30, 20, 18, 24 and 44 have shown a strong binding affinity towards the receptor by forming hydrogen bonds. Strong hydrogen bonds have bond angles close to 170 or 180 degrees. Some characteristics of the h-bonds, such as distance, the bond angles between the donor atom and the acceptor atom, the name of the donor and acceptor atom

T a b l e 4. Docking Scores of different compounds (Сontinued)
are presented in Table 6. Almost all the bond angles are above 120 degrees and close to 170 degrees confirming the strength of the bonds formed between the receptor and the ligands. From all the active residues the aminoacids ASP442, ARG246, THR366, ARG370 and LEU288 are forming hydrogen bonds with the interacting ligands. The qualitative aspect of this interaction is that hydrogen bonds are distributed over the sides and centre of the molecule, which represents high inhibition efficiency to bind the receptor-binding domain. The formed hydrogen bonds were in the categories of strong and moderate hydrogen bonds (1.76-2.60 Å) showing the high binding potential of the ligands for the receptor.
conclusion. Computational techniques for estimating the activities of ALOX5 inhibitors can smooth the progress of the drug design process by reducing cost and time. In the present communication, successful three sets of QSAR models were presented. The first model was built using 2D-Descriptors (MATS3c, ATSC6s and SpMax1_Bhp); the second model was built using a single optimal descriptor (DCW) and the third model was built using some 3D-descriptors [E HOMO (kJ/mol)), XLogP, Eaq (kJ/mol)] along with the DCW descriptor and one of the above discussed 2D descriptors. The models fulfil all regulatory principles established by OECD; the robustness of the model was tested through internal validation techniques (LOO, LMO and Yscrambling), and the predictability of the models was determined with an external prediction set. The presented MLR based QSAR models provide an added mode of control to screen, check and develop better drug candidates. The study of PCA, William's Plot, graph of Insubria (AD) were helpful in identifying the outliers in the dataset. The incorporation of the