Biological and Pharmaceutical Bulletin
Online ISSN : 1347-5215
Print ISSN : 0918-6158
ISSN-L : 0918-6158
Regular Articles
Trivariate Linear Regression and Machine Learning Prediction of Possible Roles of Efflux Transporters in Estimated Intestinal Permeability Values of 301 Disparate Chemicals
Makiko ShimizuRiku HayasakaYusuke KamiyaHiroshi Yamazaki
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML
Supplementary material

2022 Volume 45 Issue 8 Pages 1142-1157

Details
Abstract

A system for predicting apparent bidirectional permeability (Papp) across Caco-2 cells of diverse chemicals has been reported. The present study aimed to investigate the relationship between in silico-generated Papp (from apical to basal side, Papp A to B) for 301 substances with diverse structures and a binary classification of the reported roles of efflux P-glycoprotein or breast cancer resistant protein. The in silico log(Papp A to B/Papp B to A) values of 70 substances with reported active efflux and 231 substances with no reported active efflux were significantly different (p < 0.01). The probabilities of active efflux transport estimated by trivariate analysis with log MW, log DpH 6.0, and log DpH 7.4 for the 70 active-efflux-positive compounds were higher than those of the other 231 substances (p < 0.01); the area under the corresponding receiver operating characteristic (ROC) curve was 0.81. Further probability values estimated using a machine learning algorithm with 30 chemical descriptors as inputs yielded an area under the ROC curve of 0.79. Using a secondary set of 52 efflux-positive and 48 efflux-negative medicines, the final trivariate-generated probabilities resulted in no significant differences between these binary groups (p = 0.09); however, the final machine learning model demonstrated a good area under the ROC curve of 0.79. Consequently, a combination of the previously established system for generating the permeability coefficients across intestinal monolayers (a continuous variable) and the currently proposed system for predicting the roles of additional active efflux (a binary classification) could prove useful; high accuracy was achieved by applying machine learning using in silico-generated chemical descriptors.

INTRODUCTION

Oral substance absorption in the gastrointestinal tract is dependent on many factors, e.g., the local pH, drug-metabolizing enzymes, and influx or efflux transporters.1) Caco-2 cell monolayers with the pH gradient adjusted to that between the gastrointestinal lumen and plasma have been shown to well reflect the human oral absorption of medicines in the gut.2,3) We successfully used trivariate regression analyses (based on simple physicochemical properties estimated in silico) to predict the apparent permeability (Papp) values obtained using bidirectional Caco-2 cell permeation experiments [i.e., influx from the apical side to the basal side (hereinafter referred to as A to B) and efflux from the basal side to the apical side (hereinafter referred to as B to A)] for 230 disparate industrial chemicals and drugs.4) The observed influx and efflux log Papp values of these substances were both significantly correlated in trivariate analyses with molecular weights and octanol–water distribution coefficients (log D) at the apical pH of 6.0 and at the basal pH of 7.4; furthermore, improved accuracy was obtained by applying a light gradient boosting machine learning system (LightGBM) using up to 19 in silico chemical descriptors.4)

In the previous study, some predicted log Papp values of drugs in the primary set of test substances fell outside the threefold-error region, presumably because of the contributions of active efflux/influx pumps in the experimental Caco-2 membrane environment.4) To estimate the oral absorption in vivo of a diverse range of general chemicals, food components, and drugs it is important to provide additional information on the net differences in the in silico estimated Papp A to B and Papp B to A values caused by active transport in addition to passive diffusion. The aim of the present study was to investigate the relationship between the in silico estimated Papp A to B and Papp B to A values for substances and the sparsely reported roles of efflux P-glycoprotein (PgP) or breast cancer resistant protein (BCRP) in their absorbability. Our focus was simple to understand apparent net differences in the in silico estimated Papp A to B and Papp B to A of a diverse range of general chemicals. Separated investigation for PgP and BCRP was not suitable under the limited active transport information for general chemicals. We report herein the estimation of possible contributions of efflux transporters in the intestinal permeability of chemicals (in binary classification: 1 = positive efflux transport, 0 = no efflux transport) using two methods: (1) trivariate analyses with molecular weights and log D at the apical pH of 6.0 and at the basal pH of 7.4 and (2) a new and reliable machine learning system using 30 in silico-derived chemical descriptors.

MATERIALS AND METHODS

Substances

The structures of the 301 substances (including industrial chemicals, food-derived substances, and pharmaceuticals) involved in the study were projected onto a two-dimensional plane in the chemical space (with unlabeled axes, Fig. 1) by applying the previously reported generative topographic mapping methods,410) thereby allowing visualization of their chemical diversity. The data sets to which linear regression and machine learning were applied in this study covered the 301 substances (Table 1), which included 219 chemicals previously tested,4) 11 compounds with reported permeability values,11,12) and 71 substances for which we previously established physiologically based pharmacokinetic modeling for rats and/or humans.10,13) A literature search was performed in PubMed with the keywords “‘chemical name’ and (PgP substrate transporter or BCRP substrate transporter)” to identify possible roles of efflux transporters to establish the binary classification. A secondary set of 100 medicines (52 efflux positive and 48 efflux negative) were taken from a new Japanese drug database containing information on the roles of efflux transports (Supplementary Table S1). Histograms and receiver operating characteristic (ROC) curves for the 301 primary chemicals and 100 secondary medicines showing estimates of the probable contributions of efflux pumps to intestinal permeability were prepared using Prism software (GraphPad Software, San Diego, CA, U.S.A.). Statistical analyses were also performed using Prism.

Fig. 1. Coordinate Values of the Primary Set of 301 Chemicals (A) and the Secondary Set of 100 Medicines (B) in a Two-Dimensional Plane with 25 Partitions Illustrating Variety in Their Chemical Structures

Chemicals with information in the literature on possible roles of efflux transporters (i.e., by P-glycoprotein or breast cancer resistant protein) contributing to the membrane permeability are shown in gray, representing binary classification scores of 1. Those shown in white represent a binary classification of 0, meaning no roles of efflux transporters or no relevant information. This proposed evaluation of variety in the chemical space410) was also conducted for our studies on intestinal permeability4,7) and for rat and human pharmacokinetic modeling studies.9,10)

Table 1. Possible Contributions of Efflux Transporters in Intestinal Permeability of Primary 301 Chemicals in Binary Classification Estimated by Trivariate Analyses Using Physiological Properties and with a New Machine Learning System Using in Silico 30 Chemical Descriptors
NameCas No.Role of efflux PgP or BCRPa)log MWlog D pH 6.0log D pH 7.4In silico log Papp A to B, nm/slog(Papp A to B/Papp B to A)Split group No.Probability of role for active efflux transporters
Trivariate, cross validationTri-variate, finalMachine learning, cross validationMachine learning, final
Abemaciclib1231929-97-7115)2.711.642.921.80−0.9770.550.550.770.69
Acetaminophen103-90-2016)2.180.430.432.47−0.0680.090.100.020.03
Acrylonitrile107-13-101.730.590.592.03−0.5870.000.000.030.03
Acyclovir59277-89-302.35−1.37−1.371.38−0.05100.300.300.850.29
AG-041R199800-49-202.743.493.491.72−0.8480.420.420.670.49
Allopurinol315-30-0017)2.13−0.73−0.741.79−0.0410.090.100.000.02
Alprazolam28981-97-702.492.772.772.70−0.0450.270.250.080.07
Ambroxol18683-91-502.58−0.310.771.58−1.30100.520.510.040.11
Amlodipine88150-42-902.610.311.291.04−1.5560.510.500.460.63
Aniline62-53-301.971.231.252.73−0.0940.000.000.040.03
Apigenin520-36-502.432.592.522.06−0.3040.200.210.070.10
Apixaban503612-47-3118)2.660.990.992.320.0380.450.450.820.75
Apomorphine58-00-402.431.642.741.79−0.3680.300.320.500.30
Aspirin50-78-202.26−1.05−2.251.740.1930.150.130.010.03
Atenolol29122-68-7119)2.42−2.66−1.770.870.18100.470.460.320.32
Atomoxetine83015-26-302.410.701.531.39−1.3960.320.320.040.05
Atorvastatin134523-00-5120)2.752.761.401.530.0050.370.360.310.61
Azamethiphos35575-96-302.511.731.732.51−0.2020.300.310.170.12
Azithromycin83905-01-5121)2.87−1.590.131.07−0.9990.830.830.810.90
Balsalazide80573-04-202.55−1.81−2.041.13−0.4930.480.460.600.23
Baricitinib1187594-09-7122)2.571.201.202.03−0.2520.370.370.700.67
Benidipine105979-17-702.703.754.821.66−0.7770.470.450.790.67
Bensulide741-58-202.604.094.091.73−0.4410.310.290.050.11
Benzimidazole51-17-202.071.431.572.920.1720.000.000.020.02
Benzoic acid65-85-0023)2.09−0.03−1.373.071.1890.000.000.000.02
Benzydamine642-72-802.491.172.361.13−1.8560.390.390.510.20
Beraprost88430-50-6124)2.602.160.801.48−0.3130.280.270.910.87
Betaxolol63659-18-702.49−0.010.881.54−1.3640.410.410.360.33
Bisoprolol66722-44-9125)2.51−0.700.201.38−1.3310.450.460.180.38
Bisphenol A80-05-7126)2.363.733.732.39−0.0610.120.110.030.08
Bisphenol F620-92-802.302.952.952.55−0.0780.090.100.030.02
Bisphenol S80-09-102.401.791.612.670.4390.200.200.020.02
Bosentan147536-97-802.743.071.791.37−1.0510.400.350.950.47
Caffeine58-08-202.290.420.422.720.02100.190.180.420.23
Canagliflozin842133-18-0127)2.653.003.001.60−0.5480.360.370.610.66
Carbamazepine298-46-402.372.172.172.53−0.1940.170.180.040.06
Ceritinib1032900-25-6028)2.751.012.231.10−1.4610.600.600.940.56
Chloramphenicol56-75-702.510.790.781.89−0.1550.350.340.120.25
Chlorogenic acid327-97-902.55−3.05−4.140.87−0.9520.450.450.260.22
Chlorpromazine50-53-302.502.583.660.94−1.92100.350.340.290.28
Chlorpyrifos2921-88-2029)2.554.794.792.240.14100.220.220.050.04
Chlorzoxazone95-25-002.231.590.872.900.1230.050.050.000.02
Chrysin480-40-0030)2.413.243.182.01−0.3360.150.160.070.05
Cimetidine51481-61-902.400.000.751.21−0.4040.340.340.400.17
Ciprofloxacin85721-33-1131,32)2.52−2.95−2.671.39−0.0770.490.510.790.75
Cotinine486-56-602.250.320.332.45−0.1350.150.150.020.08
Coumarin91-64-502.162.062.062.87−0.0810.020.020.000.03
Curcumin458-37-7033)2.572.392.382.110.0510.340.320.090.28
Dabigatran211914-51-1134)2.67−1.20−0.661.42−0.1040.570.580.670.78
Darolutamide1297538-32-9135)2.601.891.891.70−0.5060.370.370.340.36
Dexamethasone50-02-2136)2.591.921.921.93−0.3640.350.360.620.77
Dextromethorphan125-71-3037)2.430.772.021.42−1.6470.380.360.710.30
Dichloromethane75-09-201.931.361.361.89−0.4340.000.000.040.03
Diclofenac15307-86-5038)2.472.951.602.791.4020.160.140.050.04
Digoxin20830-75-5139)2.891.331.331.16−0.5970.600.620.770.84
Dihydrocodeine125-28-002.48−0.540.751.37−1.3050.470.450.880.72
Diltiazem42399-41-7140)2.621.572.821.51−1.3830.470.480.720.67
Diphenylamine122-39-402.233.193.192.14−0.2210.030.030.000.02
Disopyramide3737-09-502.53−0.160.851.21−1.1480.450.460.530.26
Disulfoton-sulfone2497-06-502.492.182.182.40−0.0830.270.270.010.07
Domitroban112966-96-802.582.921.562.040.0370.200.220.060.09
Doxorubicin23214-92-8141)2.74−1.36−0.080.88−0.6340.670.680.700.90
Duloxetine116539-59-4042)2.471.192.021.11−1.7970.360.350.140.07
Edoxaban480449-70-5143)2.740.641.281.30−0.9220.560.570.700.71
Efonidipine111011-63-302.805.405.571.63−0.7010.440.400.710.46
EGCG989-51-5044)2.661.671.521.62−0.2410.440.420.830.43
Empagliflozin864070-44-002.651.901.901.36−0.9550.430.410.870.42
Epicatechin gallate1257-08-5145)2.652.152.011.64−0.2440.370.390.350.62
Eprosartan133040-01-402.630.980.471.250.2090.390.390.910.48
Erythromycin114-07-8146)2.87−0.081.191.06−0.8310.730.740.970.90
Esaxerenone1632006-28-0147)2.672.552.552.200.1420.400.400.650.69
Estradiol50-28-202.433.373.372.12−0.2080.180.180.390.23
Fenbufen36330-85-502.401.860.502.711.0590.120.130.030.05
Fentanyl437-38-7148)2.532.103.381.28−1.5890.380.390.250.39
Fexofenadine83799-24-0149)2.702.492.491.52−0.2390.410.430.220.64
Flavone525-82-602.353.383.382.26−0.1550.140.120.010.02
Florfenicol73231-34-202.550.700.681.540.1470.370.380.190.10
Flunitrazepam1622-62-402.502.452.452.77−0.06100.270.270.260.16
Fluoro-loxoprofen1241405-00-402.420.96−0.372.751.2810.210.180.160.10
Fluvastatin93957-54-1150)2.611.980.632.240.91100.280.290.880.79
Fluvoxamine54739-18-302.500.631.891.45−1.4050.450.420.190.19
G004865483-06-302.753.632.292.00−0.0730.330.330.370.16
GB-115678996-63-902.642.572.571.70−0.3950.400.370.750.30
Glibenclamide10238-21-8151)2.692.951.621.96−0.07100.310.320.290.62
Haloperidol52-86-8052)2.581.072.311.49−1.0030.460.460.810.20
Hippuric acid495-69-202.25−1.98−3.141.010.0870.140.170.020.02
Ibrolipim133208-93-202.652.272.271.88−0.1350.420.400.080.10
Ibuprofen15687-27-102.311.500.153.041.3040.060.070.070.05
Imipramine50-49-7142)2.451.802.861.58−1.3210.310.320.040.32
Indomethacin53-86-102.552.961.602.801.1240.180.200.390.30
Ipragliflozin761423-87-402.612.542.541.37−0.8250.370.350.760.36
Isophthalonitrile626-17-502.111.061.062.71−0.12100.030.020.010.02
Isopsoralen523-50-202.272.162.162.70−0.0420.090.100.040.03
Itopride122898-67-302.55−0.071.221.06−1.6410.480.490.350.44
Ivabradine155974-00-8153)2.671.072.361.21−1.7250.580.540.950.72
Ketoprofen22071-15-4054)2.401.330.003.041.0110.180.150.060.06
Lamivudine134678-17-4155)2.36−1.04−1.031.41−0.2250.290.290.070.20
Lemildipine94739-29-402.664.334.331.750.0820.330.320.580.47
Lenalidomide191732-72-6156)2.410.03−0.171.04−0.2050.280.280.030.29
Letermovir917389-32-3157)2.761.991.991.780.1110.510.490.760.74
Lisinopril76547-98-302.610.700.521.10−0.08100.410.410.760.41
Loflazepate71735-10-902.52−1.01−1.661.910.29100.370.380.120.10
Lomitapide182431-12-502.844.285.131.68−0.8760.510.520.480.39
Losartan114798-26-4158)2.631.440.531.520.03100.340.350.550.54
Lovastatin75330-75-5159)2.614.164.161.36−0.5940.270.290.660.78
Loxoprofen68767-14-6060)2.391.03−0.322.861.1350.140.150.030.08
L-Thyroxine51-48-9061)2.894.213.571.250.21100.460.470.370.10
Lucifer yellow CH67769-47-512.65−8.73−9.580.79−0.1850.690.760.020.44
Macitentan441798-33-0062)2.733.032.391.67−0.8930.380.380.600.28
Maleic acid108-31-601.99−0.08−0.081.320.1770.000.000.080.04
Mangiferin4773-96-0063)2.630.580.451.41−0.9790.430.430.730.34
m-Cresol108-39-402.032.132.132.82−0.0960.000.000.100.06
Mefenamic acid61-68-702.383.201.853.001.2710.100.060.030.06
Melengestrol acetate2919-66-602.604.034.032.10−0.2090.270.290.810.33
Menthofuran494-90-602.183.973.972.61−0.2370.000.000.040.06
Metformin657-24-9064)2.11−4.08−4.060.99−0.44100.220.220.330.09
Methotrexate59-05-2164)2.66−4.23−5.531.020.4270.500.560.600.87
Methoxsalen298-81-702.331.991.992.49−0.0920.150.160.070.05
Metoprolol51384-51-1065)2.43−1.13−0.241.47−1.4860.420.410.460.33
Midazolam59467-70-802.512.893.222.49−0.4950.310.290.090.11
Mirtazapine85650-52-8042)2.421.952.931.64−1.4120.280.290.560.45
Mofezolac78967-07-402.531.23−0.082.541.2790.250.250.030.24
Molinate2212-67-102.272.292.292.22−0.2020.090.100.040.06
Mono(2-ethylhexyl) phthalate4376-20-902.442.741.512.651.3110.170.140.020.07
Monobutyl phthalate131-70-402.350.56−0.672.431.1580.140.140.060.04
Morphine57-27-2166)2.45−1.37−0.091.29−0.9360.480.460.470.78
m-Toluic acid99-04-702.130.45−0.893.101.2030.000.000.010.03
N,2-Dimethylaniline611-21-202.082.082.132.80−0.1760.000.000.040.04
N,N-Diethyl-3-methylbenzamide134-62-302.282.322.322.740.00100.110.100.080.09
N,N-Dimethylaniline121-69-702.082.362.412.89−0.1020.000.000.020.03
Nadolol42200-33-9167)2.49−1.76−0.921.220.0580.460.480.310.54
N-Ethylaniline103-69-502.082.202.252.78−0.0930.000.000.000.03
Nicotine54-11-502.21−1.38−0.091.73−1.4760.290.270.060.05
Nifedipine21829-25-4168,69)2.543.583.582.60−0.2060.250.260.130.51
Nilvadipine75530-68-602.593.803.801.94−0.4520.290.290.540.36
N-Methylaniline100-61-802.031.931.982.66−0.1340.000.000.030.04
Norfloxacin70458-96-7170)2.50−3.26−2.981.36−0.1410.500.510.800.75
Novaluron116714-46-602.694.964.922.551.0560.300.320.200.10
N-Phenylglycine103-01-502.18−1.67−2.972.521.1240.090.090.020.02
o-Cresol95-48-702.032.132.132.86−0.0340.000.000.040.05
Ofloxacin82419-36-1131)2.56−2.23−1.991.87−0.29100.510.510.930.78
Olanzapine132539-06-1171)2.490.952.201.44−1.4610.390.400.110.61
Olopatadine113806-05-6172)2.531.881.871.35−0.6580.310.310.480.51
Omeprazole73590-58-6173)2.542.062.052.680.4820.310.310.390.41
Opicapone923287-50-7174)2.622.141.412.750.2040.300.330.270.42
Oseltamivir196618-13-0175)2.49−0.900.391.43−0.8780.460.480.370.74
p-Aminobenzoic acid150-13-002.14−0.15−1.502.701.1190.010.000.000.02
Paraacetaldehyde123-63-702.120.530.532.370.2080.040.050.080.04
p-Coumaric acid7400-08-0076)2.210.14−1.222.350.9680.050.050.020.02
p-Cresol106-44-512.032.132.132.760.0380.000.000.010.06
Pemafibrate848259-27-8177)2.691.740.791.650.4490.370.380.440.60
Perfluorodecanoic acid335-76-2078)2.711.951.382.200.3830.420.420.020.05
Perfluorooctanoic acid335-67-1079)2.620.710.142.140.5830.400.390.020.12
PF-049373191245603-92-202.641.921.921.73−1.1710.410.400.950.61
Phenacetin62-44-202.251.741.742.710.0570.110.100.020.04
Phenobarbital50-06-6180)2.371.291.052.650.1860.200.190.090.25
Phthalazine253-52-102.111.231.233.090.0660.010.020.040.02
Phthalazone119-39-102.160.960.963.040.2420.050.060.020.02
Phthalimide85-41-602.170.930.922.84−0.25100.080.070.050.03
Phthalonitrile91-15-602.111.061.062.82−0.1320.000.020.020.02
p-Hydroxybenzaldehyde123-08-002.091.381.242.830.1390.000.000.000.02
p-Hydroxybenzoic acid99-96-702.140.25−1.112.821.2550.000.000.010.02
Pitavastatin147511-69-1150)2.621.780.482.170.5730.320.310.590.62
p-Nitrophenol100-02-702.141.711.333.060.4910.000.000.000.02
Pomalidomide19171-19-8181)2.440.210.032.44−0.1830.300.290.170.27
p-Phenetidine156-43-402.141.111.212.710.1680.040.040.020.03
Pranlukast103177-37-3082)2.680.630.631.96−0.2490.480.480.200.16
Pravastatin81093-37-0183)2.630.88−0.471.000.3090.340.340.930.82
Progesterone57-83-0141)2.503.583.582.08−0.5060.210.230.380.40
Propranolol525-66-6184)2.410.371.261.35−1.5620.320.340.090.18
Psoralen66-97-702.272.152.152.790.0330.100.100.010.04
p-Toluic acid99-94-502.130.45−0.893.071.2510.000.000.010.03
Pyrithiobac123342-93-802.510.28−0.612.260.7850.300.310.070.21
Quetiapine111974-69-7185)2.582.732.841.48−1.2230.330.330.530.71
Quinine130-95-0186)2.510.351.661.32−1.63100.460.440.570.53
Quinotolast101193-40-202.490.31−0.832.010.2860.280.270.120.08
Quizartinib950769-58-1187)2.753.324.301.29−1.11100.500.500.800.66
Rabeprazole117976-89-302.561.791.781.69−0.6970.330.340.860.51
Ranitidine66357-35-5188)2.50−1.92−0.631.33−0.0480.500.520.370.43
Resorcinol108-46-3089)2.040.910.912.53−0.1390.000.000.000.02
Rhodamine12362669-70-9190)2.541.572.081.04−1.0960.370.370.180.44
Risperidone106266-06-2185)2.610.371.661.50−1.3750.550.520.860.67
Rivaroxaban366789-02-8191)2.641.821.822.30−0.0440.390.400.220.35
Rosuvastatin287714-41-4192)2.680.23−1.131.520.2870.360.410.650.67
Roxadustat808118-40-302.55−0.29−1.431.850.7030.350.340.680.25
Sacubitril149709-62-6193)2.612.451.082.170.7570.240.270.250.50
Safinamide133865-89-102.480.501.681.78−0.88100.420.410.030.07
Salicylamide65-45-202.141.091.042.890.2210.030.030.000.02
Salicylic acid69-72-702.14−0.50−1.453.011.1150.020.040.010.02
Sarpogrelate125926-17-202.631.501.361.33−0.3360.410.400.800.36
Scopoletin92-61-502.281.751.612.940.2480.110.120.060.04
Sepimostat103926-64-302.57−2.48−2.461.02−0.6130.540.520.100.15
Sesamin607-80-7094)2.552.762.762.10−0.6580.290.300.340.19
Silmitasertib1009820-21-602.543.032.932.120.6350.300.280.020.14
Sparfloxacin110871-86-8160)2.59−2.54−2.301.79−0.5670.520.550.910.76
Styrene100-42-502.023.253.252.690.0190.000.000.010.03
Sulfasalazine599-79-1195)2.60−0.91−2.221.76−0.5050.370.390.000.13
Sulindac38194-50-2096)2.551.530.182.590.8160.260.260.220.22
Sumatriptan103628-46-2097)2.47−0.990.261.61−1.2340.460.460.320.31
Suvorexant1030377-33-302.653.273.272.02−0.3810.380.360.980.55
TA-510133118-88-402.521.991.992.700.1620.300.300.410.39
Tacrolimus104987-11-3198)2.914.244.241.32−0.5290.500.520.810.87
Tadalafil171596-29-5199,100)2.592.692.692.02−0.4490.320.330.740.72
Tedizolid856866-72-302.571.661.662.620.0610.370.350.830.28
Telithromycin191114-48-41101)2.911.563.071.17−1.0930.720.720.930.87
Tepotinib1100598-32-00102)2.740.241.001.69−0.9750.610.590.420.29
Terephthalonitrile623-26-702.111.061.062.78−0.02100.030.020.010.02
Tetrabromobisphenol A79-94-702.746.396.221.660.03100.290.290.030.08
Thalidomide50-35-10103)2.410.450.382.350.0880.270.270.500.21
Theophylline58-55-902.260.250.232.43−0.1040.160.160.130.07
Timolol26839-75-802.50−1.25−0.551.44−1.1680.440.460.280.33
Tolbutamide64-77-702.431.24−0.103.011.1860.180.180.110.05
Toluene108-88-301.962.582.582.73−0.01100.000.000.010.06
Tranilast53902-12-802.510.20−0.982.621.0960.310.290.180.15
trans-Ferulic acid537-98-40104)2.290.17−1.192.561.1890.110.100.010.03
trans-Resveratrol501-36-002.362.712.712.36−0.1470.160.150.020.02
Trazodone19794-93-502.572.403.121.77−1.1450.410.370.060.42
Triazolam28911-01-502.543.483.482.56−0.22100.260.260.200.11
Trichloroethylene79-01-602.122.712.711.82−0.4780.000.000.020.04
Trimethylamine75-50-301.77−2.85−2.151.29−1.1150.000.000.100.03
Trimethylamine N-oxide1184-78-71105)1.88−1.11−1.080.920.0520.000.000.020.08
Vadadustat1000025-07-902.49−1.13−2.271.35−0.2150.300.320.060.14
Vatanidipine116308-55-502.846.506.851.76−0.7750.460.400.840.44
Verapamil52-53-9141)2.691.472.711.36−1.3760.540.540.560.72
Warfarin81-81-20106)2.541.820.463.001.2820.250.240.120.09
Wyeth-1464350892-23-402.511.740.522.380.79100.220.230.060.06
YJC-105921226894-87-602.750.621.841.42−0.9890.610.620.390.44
Zolpidem82626-48-002.492.813.122.21−0.6030.260.270.350.29
Zonisamide68291-97-41107)2.330.640.642.58−0.1130.210.200.000.08
1,1,2,2-Tetrahydroperfluoro-1-decanol678-39-702.675.475.472.02−0.3610.320.290.040.09
1,2,3-Trimethylbenzene526-73-802.083.903.902.53−0.1170.000.000.020.03
1,2,4-Tribromobenzene615-54-302.504.504.502.780.6420.190.190.020.02
1,2,4-Trimethylbenzene95-63-602.083.903.902.630.0760.000.000.040.03
1,2-Dibromobenzene583-53-902.373.643.642.58−0.1020.120.130.020.02
1,2-Dichloroethane107-06-202.001.811.811.97−0.3340.000.000.050.04
1,2-Phenylenediamine95-54-502.030.460.492.68−0.1120.000.000.020.01
1,3-Dinitrobenzene99-65-002.231.541.542.67−0.0770.100.090.050.03
1,3-Di-o-tolylguanidine97-39-202.380.490.691.59−1.0890.270.260.030.06
1,3-Diphenylguanidine102-06-702.32−0.34−0.141.53−1.2790.260.250.000.02
1,3-Phenylenediamine108-45-202.03−0.07−0.022.770.0480.000.000.010.02
1,4-Dibromobenzene106-37-602.373.633.632.660.2310.140.130.050.02
1,4-Dioxane123-91-101.94−0.30−0.302.22−0.1240.000.000.080.05
1,4-Phenylenediamine106-50-302.03−0.54−0.102.45−0.3190.060.050.010.02
1,7-Dimethylxanthine611-59-602.26−0.19−0.232.42−0.2260.190.180.100.09
1-Methylxanthine6136-37-402.22−0.43−0.601.47−0.46100.160.150.020.03
1-Naphthaleneacetic acid86-87-302.271.32−0.033.171.3130.050.050.000.04
2-(1H-Imidazol-2-yl)pyridine18653-75-302.160.991.102.80−0.0570.080.070.020.02
2,3,5,6-Tetrafluorobenzoic acid652-18-602.29−0.97−1.542.120.7030.210.200.010.02
2,3,5,6-Tetrafluorobenzylalcohol4084-38-202.261.571.572.890.1340.110.110.040.04
2,3-Dimethylaniline87-59-202.081.861.882.74−0.0640.000.000.040.04
2,4,6-Tribromophenol118-79-602.523.792.782.891.18100.160.170.020.02
2,4-Dibromophenol615-58-702.403.253.112.770.5340.140.150.020.02
2,4-Dimethylaniline95-68-102.081.851.882.810.0780.000.000.030.04
2,4-Dinitrophenol51-28-502.26−0.17−1.253.030.2290.120.120.010.02
2,5-Dimethylaniline95-78-302.081.861.882.820.0870.000.000.020.04
2,6-Dimethylaniline87-62-702.081.531.532.75−0.13100.000.000.030.04
2-Aminobiphenyl90-41-502.232.692.692.71−0.0190.050.050.000.02
2-Chloroaniline95-51-202.112.112.112.85−0.0970.000.000.010.02
2-Chlorobenzoic acid118-91-202.20−1.16−2.052.560.8990.120.110.000.02
2-Chlorophenol95-57-802.112.322.292.790.1240.000.000.020.02
2-Hydroxybenzimidazole615-16-702.131.361.362.780.0480.010.020.020.02
2-Hydroxybiphenyl90-43-702.233.123.122.43−0.0210.040.030.000.02
2-Hydroxyphenethyl alcohol7768-28-702.141.131.132.72−0.1340.050.040.020.02
2-Hydroxyphenylacetic acid614-75-502.18−0.91−2.252.060.5030.080.060.000.02
2-Mercaptobenzimidazole583-39-102.181.801.802.940.1170.050.040.020.03
2-Mercaptoimidazole872-35-502.00−0.62−0.621.930.0580.000.000.020.05
2-Methoxy-4-nitroaniline97-52-902.231.391.392.690.0620.080.100.040.03
2-Methyl-1,4-naphthoquinone58-27-51108)2.241.901.902.39−0.1420.070.080.030.09
2-Nitrotoluene88-72-202.142.462.462.75−0.0630.000.000.000.03
2-tert-Butylphenol88-18-602.183.403.402.10−0.4220.000.000.030.04
3,4-Dimethylaniline95-64-702.081.851.882.760.0080.000.000.030.04
3,5-Dimethylaniline108-69-002.081.861.882.69−0.1590.000.000.010.04
3-Aminobenzenesulfonic acid121-47-102.24−4.54−4.651.300.0160.360.320.010.02
3-Aminophenol591-27-502.040.530.542.64−0.0570.000.000.010.02
3-Cyanopyridine100-54-902.020.530.532.730.0150.000.000.010.02
3-Ethylphenol620-17-702.092.622.622.68−0.1370.000.000.020.03
3′-Hydroxyacetanilide621-42-102.180.430.432.550.0230.100.100.010.03
3-Hydroxybiphenyl580-51-802.233.123.122.48−0.0960.020.030.010.01
3-Hydroxycoumarin939-19-502.211.781.782.940.4540.070.070.020.02
3-Hydroxyflavone577-85-502.382.392.391.890.0460.170.180.050.03
3-Nitroaniline99-09-202.141.391.392.70−0.0770.040.030.030.03
3-Nitrophthalic acid603-11-202.32−3.39−3.551.350.0830.370.340.040.03
4,4'-Dihydroxybiphenyl92-88-602.272.222.212.610.0380.090.100.030.02
4-Acetamidobenzenesulfonyl chloride121-60-802.371.021.021.420.0460.230.220.030.05
4-Aminotoluene-3-sulfonic acid88-44-802.27−5.00−5.091.240.0860.410.370.020.02
4-Chloro-o-cresol1570-64-502.162.822.822.820.0580.000.000.010.03
4-Chlorophenol106-48-902.112.512.512.710.0770.000.000.010.02
4-Ethylphenol123-07-902.092.622.622.62−0.0320.000.000.020.03
4-Hydroxy-2,6-dimethylaniline3096-70-602.140.650.752.62−0.1530.060.060.020.04
4-Hydroxybiphenyl92-69-302.233.123.112.650.0990.030.030.000.01
4-Nitrotoluene-2-sulfonic acid121-03-902.34−4.42−4.431.380.11100.410.400.240.04
4-Nonylphenol84852-15-30109)2.346.096.091.95−0.5990.000.010.080.05
4-sec-Butylphenol99-71-802.183.293.292.55−0.1020.000.000.020.04
4-α-Cumylphenol599-64-402.334.394.392.340.0020.060.060.040.08
5-(p-Tolyl)-1H-tetrazole24994-04-502.200.04−1.173.071.0330.060.050.000.03
5-Amino-2-chlorotoluene-4-sulfonic acid88-53-902.35−4.51−4.561.20−0.0570.380.410.020.02
6-Amino-1-naphthol-3-sulfonic acid87-02-502.38−4.10−4.480.99−0.0550.370.400.010.03
6-Amino-2-naphthalenesulfonic acid93-00-502.35−4.45−4.561.150.0310.390.410.030.02
7-Ethoxycoumarin31005-02-402.282.192.192.920.0780.100.110.060.05
7-Hydroxycoumarin93-35-602.211.581.442.930.1540.070.070.030.03
7-Hydroxyflavone6665-86-702.383.283.251.65−0.0940.130.140.030.03

a) 1, roles of efflux PgP or BCRP have been reported; 0, no roles or no info.

Multi Regression and Machine Learning Analyses

Univariate, bivariate, and trivariate linear regression analyses and trivariate logistic regression analysis with simple physicochemical properties were performed to generate the binary classification (Table 2) using Microsoft Excel and Prism software, respectively.14) To evaluate the generalizability of the predictions, tenfold cross-validations were carried out for the trivariate linear regressions9) using the split groups 1–10 indicated in Table 1. To verify the trivariate prediction system, estimated probabilities of active efflux for a secondary set of 100 medicines were generated in silico using the final above-established equations.

Table 2. Correlation Coefficients and Factors Obtained Using Univariate, Bivariate, and Trivariate Analyses of Physiological Properties for Estimating the Probability of Active Efflux for the 301 Chemicals to Achieve Binary Classification
PropertyCorrelation coefficientp-ValueFactor95% confidence interval
Univariate analysis
log MW0.42<0.01**0.760.57 to 0.92
log DpH 6.00.120.03*−0.026−0.050 to −0.002
log DpH 7.40.060.27−0.013−0.035 to 0.010
log Papp(AtoB)/(BtoA)0.21<0.01**−0.13−0.20 to −0.06
Bivariate analysis
log MW and log DpH 6.00.46<0.01**
log MW-<0.01**0.800.62 to 0.99
log DpH 6.0-<0.01**−0.039−0.060 to −0.017
Intercept-<0.01**−1.6−2.0 to −1.2
log MW and log DpH 7.40.44<0.01**
log MW-<0.01**0.800.61 to 0.99
log DpH 7.4-<0.01**−0.028−0.048 to −0.007
Intercept-<0.01**−1.6−2.1 to −1.2
log MW and log Papp (AtoB/BtoA)0.43<0.01**
log MW-<0.01**0.760.57 to 0.92
log Papp(AtoB/BtoA)-0.057−0.064−0.13 to 0.002
Intercept-<0.01**−1.5−1.9 to −1.02
Trivariate linear analysis
log MW, log DpH 6.0, and log DpH 7.40.47<0.01**
log MW-<0.01**0.780.60 to 0.97
log DpH 6.0-<0.01**−0.10−0.17 to −0.04
log DpH 7.4-0.03*0.065−0.005 to −0.13
Intercept-<0.01**−1.6−2.0 to −1.2
Trivariate logistic regression
log MW, log DpH 6.0, and log DpH 7.40.50<0.01**
log MW--6.34.5 to 8.4
log DpH 6.0--−0.55−0.96 to −0.15
log DpH 7.4--0.21−0.69 to −0.098
Intercept--−17−22 to −12

Predicted probability in linear regression = 0.78 × (log MW) − 0.10 × (log DpH 6.0) + 0.065 × (log DpH 7.4) −1.6. * p < 0.05: ** p < 0.01.

In an attempt to improve the accuracy of the predicted probabilities of active efflux, machine learning algorithms using LightGBM were adopted as described previously9) with slight modifications. For each chemical substance, descriptor sets containing 1710 items related to chemical structural and physicochemical properties were obtained using open-source in silico programs (such as RDKit and Mordred), as described previously.9) Estimation models were extensively validated with a modeling approach designed to integrate cross-validation9) using the split groups 1–10 (indicated in Table 1). We thereby determined the optimum set of hyperparameters to obtain the highest correlation coefficients for the models and to perform initial evaluation of model performance. The estimated probabilities of active efflux for a secondary set of 100 medicines were also generated using the final above-established model in silico. There was no over fitting evaluated by performing Y-scrambling.

RESULTS AND DISCUSSION

To confirm that the evaluated substances exhibited adequate structural diversity, the chemical structures of the primary 301 and secondary 100 test substances were projected onto a two-dimensional chemical space with 25 subdivisions (Fig. 1). By means of the keyword search mentioned in Materials and Methods, the substances were divided by binary classification into two groups: a score of 1 for possible roles of PgP or BCRP transporters mentioned in the literature or a score of 0 for no apparent roles of transporters or no information (Table 1). The physicochemical properties of the test substances were estimated using in silico methods; the molecular weight (MW), log DpH 6.0, log DpH 7.4, and in silico-estimated Papp A to B and log(Papp A to B/Papp B to A) are given in Table 1.

The relevance was examined of in silico log(Papp A to B/Papp B to A) values generated by the previously established machine learning system for pH-dependent Caco-2 systems4) to the binary classification values based on the reported roles of PgP or BCRP efflux (Fig. 2). A histogram of the apparent differences in intestinal influx and efflux log Papp values for the primary 301 chemicals is shown in Fig. 2A. The mean and median in silico-generated log(Papp A to B/Papp B to A) values of 70 substances with active efflux transport (i.e., with binary efflux scores of 1) were −0.42 and −0.25, respectively; these log(Papp A to B/Papp B to A) values were lower than those of −0.08 and −0.06 for the 231 substances with no roles for active efflux or no information (i.e., with binary scores of 0). The two binary groups were significantly different with p < 0.01 by the unpaired t-test. An ROC curve is shown in Fig. 2B; the area under the ROC curve was 0.64 [95% confidence interval (CI) 0.57–0.72] for in silico log(Papp A to B/Papp B to A).

Fig. 2. Histogram of Estimated Intestinal Influx and Efflux LogPapp Values for the Primary Set of 301 Chemicals (A) and the Receiver Operating Characteristic (ROC) Curve (B)

Estimates were generated using a light gradient boosting machine learning system (LightGBM) that incorporated 17 and 19 in silico chemical descriptors for influx and efflux, respectively.4) Reported roles of efflux P-glycoprotein or breast cancer resistant protein are shown as 1 in the binary classification (gray bars), whereas no roles or no information are shown as 0 (white bars).

Univariate linear regression analyses for the 70 substances with active efflux transport and the 231 substances without apparent active efflux transport revealed that the observed binary classification of the 301 compounds was correlated significantly (but with relatively low correlation coefficients) with log MW (r = 0.42, p < 0.01), with log(Papp A to B/Papp B to A) (r = 0.21, p < 0.01), and with log DpH 6.0 (r = 0.12, p = 0.03) under the present conditions (Table 2). Bivariate analyses established that the observed binary classification of the 301 substances was correlated with the log MW and log DpH 6.0 (r = 0.46, p < 0.01), log MW and log DpH 7.4 (r = 0.44, p < 0.01), and with log MW and log(Papp A to B/Papp B to A) (r = 0.43, p < 0.01); for the latter pair, the calculated factor for log(Papp A to B/Papp B to A) was not significant, whereas all other factors were significant. (Table 2). Trivariate analysis revealed that the binary classification of the 301 compounds was significantly correlated with the log MW, log DpH 6.0, and log DpH 7.4 (r = 0.47, p < 0.01) (Table 2). Although further multivariate analyses were performed with other combinations of three chemical parameters from among log MW, log DpH 6.0, log DpH 7.4, and log(Papp A to B/Papp B to A), no further improvement in correlation coefficient was evident (data not shown). Although solubility is another important factor influencing the drug absorption, in our previous study,4) the in silico solubility (log S) values of substances did not contribute to the estimation for experimentally observed the log Papp A to B or log Papp B to A. In the similar way, the current liner regression system was not improved with the in silico solubility. This procedure led to the following equation: predicted probability of active efflux = 0.78 × (log MW) − 0.10 × (log DpH 6.0) + 0.065 × (log DpH 7.4) − 1.6 (p < 0.01). To confirm this model, trivariate logistic regression analysis was also performed (Table 2), resulting in another following equation: predicted probability (in logistic regression) = 6.3 × (log MW) − 0.55 × (log DpH 6.0) + 0.29 × (log DpH 7.4) − 17 (p < 0.01). There were a good correlation coefficient of 0.95 and a slope of 1.00 between the 301 predicted probability values in the linear and logistic regression.

To optimize the generalization capabilities of the predictions of probability of active efflux, 10-fold cross-validation processes were carried out for the trivariate regression. A histogram of the estimated probabilities for the primary 301 substances is shown in Fig. 3A. The mean and median estimated probabilities of the 70 substances with active efflux transport (i.e., with binary classification scores of 1) were 0.41 and 0.39, respectively; these values were higher than those of 0.19 and 0.15 for the 231 substances with no role for active efflux or no relevant information (i.e., with binary scores of 0). The difference between the two groups was significant by unpaired t-test (p < 0.01). Under the current conditions, the balanced accuracy was 0.77 with a threshold probability of 0.27. The corresponding ROC curve is shown in Fig. 3B; the area under the ROC curve was 0.81 (95%CI 0.76–0.87). The trivariate predictions were verified after cross-validation processes and finalized. Using the secondary set of 100 medicines shown in Supplementary Table S1, trivariate probability predictions were performed (Fig. 3C). The mean probability of 0.45 estimated using the trivariate equation for the 52 substances with active efflux transport was apparently higher than that of 0.38 for the 48 substances without active efflux transport; however, the difference between these binary groups was not significant by unpaired t-test (p = 0.09), and the accuracy was 0.66. The ROC curve is given in Fig. 3D; the area under the ROC curve was 0.66 (95%CI 0.51–0.80). Although the simple trivariate regression system had good potential for predicting the internal 301 data set after cross-validation, when it was applied to the outer 100 new medicines, its potential in predicting the binary classification was not good under the present conditions. These findings should be the apparent limitation that the simple trivariate regression system on general chemicals, especially substances not actively transported, might affect the predictability for application to a new medicine group.

Fig. 3. Histograms (A, C) and Receiver Operating Characteristic (ROC) Curves (B, D) for the Primary Set of 301 Chemicals (A, B) and the Secondary Set of 100 Medicines (C, D) for Which Possible Contributions of Efflux Transporters to Intestinal Permeability Were Estimated by Trivariate Analyses with the Following Input Parameters: Molecular Weight and Octanol–Water Distribution Coefficients at the Apical pH of 6.0 and at the Basal pH of 7.44)

Reported roles of efflux P-glycoprotein or breast cancer resistant protein are shown in binary classification as 1 (grey bars); no roles or no information are shown as 0 (white bars).

As a result of these findings, we aimed to achieve higher accuracy for predictions of the binary classification of the 301 test compounds by using a machine learning algorithm that followed the approach reported previously, with slight modifications.4) To estimate the probability of the role of efflux transport, we applied machine learning algorithms that used the 30 physicochemical descriptors shown in Supplementary Table S2 as input values. A histogram of estimated probabilities using the machine learning algorithm for the 301 test compounds is given in Fig. 4A. The estimated mean and median probabilities for the 70 test substances known to have active efflux transport were 0.50 and 0.47, respectively; these values were higher than those of 0.26 and 0.04 for the 231 test substances with no active efflux transport or no information. The difference between the two groups was significant, with p < 0.01 by unpaired t-test. Under the current conditions, the balanced accuracy was 0.77 with a threshold value of 0.20. The corresponding ROC curve is shown in Fig. 4B; the area under the ROC curve was 0.79 (95%CI 0.73–0.85). The predictions were verified after cross-validation processes and were finalized. Using the secondary set of 100 medicines shown in Supplementary Table S1, finalized machine learning predictions were performed (Fig. 4C). The mean and median probabilities of the 52 medicines with active efflux transport were 0.57 and 0.61, respectively; these values were higher than those of 0.32 and 0.25 for the 48 medicines with no active efflux transport or no information. The difference between the two groups was significant by unpaired t-test (p < 0.01) and the accuracy was 0.67. The corresponding ROC curve is given in Fig. 4D; the area under the ROC curve was 0.76 (95%CI 0.67–0.86).

Fig. 4. Histograms (A, C) and Receiver Operating Characteristic (ROC) Curves (B, D) for the Primary Set of 301 Chemicals (A, B) and the Secondary Set of 100 Medicines (C, D)

The histograms show the estimated probability of contributions of efflux transporters to intestinal permeability predicted using a new machine learning system with 30 in silico-generated input parameters consisting of the physicochemical descriptors shown in Supplementary Table S2. Reported roles of efflux P-glycoprotein or breast cancer resistant protein are shown in binary classification as 1 (grey bars); no roles or no information are shown as 0 (white bars).

The above-described computational methods represent a new alternative approach that could contribute to chemical safety screening by giving context to the results of physiologically based pharmacokinetic modeling after virtual oral administrations. In constructing detailed physiologically based pharmacokinetic models, active efflux clearance and passive permeation clearance should be separated to describe the concentration-dependent cellular membrane permeation of a compound. However, this newly developed approach to the estimation of the probability of contributions of efflux transporters to low apparent intestinal permeabilities of chemicals will contribute to the effectiveness of computational toxicology for assessing the potential risk of industrial chemicals and/or food components. The simplified models described here could be easily applied to predict a variety of drug exposures and have potential for use by a wide range of industry researchers and regulatory authorities. The current system based on influx and efflux log Papp values of substances may be also a possible survey tool for the asymmetrical transport of non-transporter substrates in future. Anyway, a combination of the previously established system4) for estimating the permeability coefficients across intestinal cell monolayers (a continuous variable) and the currently proposed in silico estimation system for the roles of active efflux in addition to passive diffusion in membrane permeabilities (a binary classification) could be useful in drug research and toxicological assessments. In conclusion, high accuracy in predicting the binary classification was achieved by applying machine learning based on in silico-derived physicochemical descriptors, and the generalizability of the results was ensured by choosing a set of test substances with diverse chemical structures.

Acknowledgments

The authors thank Drs. Kimito Funatsu, Fumiaki Shono, Masato Kitajima, Junya Ohori, Kentaro Handa, Hiroshi Yano, Rie Saito, Izumi Sano, Masaya Fujii, Jun Tomizawa, Wataru Kobari, Airi Kato, and Norie Murayama for their assistance and David Smallbones for copyediting a draft of this article. This work was supported in part by the Japan Chemical Industry Association Long-range Research Initiative Program.

Conflict of Interest

The authors declare no conflict of interest.

Supplementary Materials

This article contains supplementary materials.

REFERENCES
 
© 2022 The Pharmaceutical Society of Japan
feedback
Top