Machine-learning-driven oral-to-inhalation extrapolation for predicting inhalation toxicity values

Kazushi Matsumura

doi:10.1273/cbij.25.1

Abstract

Relevant exposure routes need to be taken into account when performing chemical risk assessments for humans. Chemical risks are often assessed by performing route-to-route extrapolations based on oral repeated dose toxicity studies if route-specific toxicity data are unavailable. When performing a route-to-route extrapolation, an extrapolation factor derived from differences in absorption after exposure through different routes needs to be estimated. In this study, we used a machine learning (ML) and regression-based approach to estimate extrapolation-factor-like coefficients for oral-to-inhalation extrapolations using chemical structures and physicochemical properties. We used well-reviewed chemicals with human chronic toxicity values for specific administration routes (oral reference dose and inhalation reference concentration) available. ML regression models for predicting inhalation reference concentrations were developed using oral reference doses and molecular features as descriptors. The ML-based regression models gave better predictions than models using only molecular features or even single constant extrapolation factors, suggesting that the ML-based approach offers advantages over other oral-to-inhalation extrapolation methods.

1. Introduction

The relevant exposure routes of the chemicals of interest need to be assessed when a chemical risk assessment for humans is performed. A toxicological reference value (TRV) is a route-specific point estimate of the dose or concentration at which a chemical is expected to have no adverse effects on human health. For example, the reference dose (RfD) for oral exposure and the reference concentration (RfC) for inhalation are widely used and well-reviewed TRVs for non-cancer health risks in humans. The RfD and RfC are estimates of daily oral or inhalation exposure that is unlikely to pose an appreciable risk of deleterious non-cancer effects during a human lifetime. A TRV is generally derived from repeated dose animal toxicity studies with the chemical of interest administered through a relevant exposure route with uncertainty factors and modifying factors defined and assessed by expert toxicologists to address the limitations of the data used. In many cases, oral toxicological laboratory animal studies are preferred over other routes of administration because the oral route is most relevant to, for example, drugs, foods ingredients and pesticides residues on plants. The lack of exposure-route-specific data for many chemicals and recent increased interest in animal welfare (with the aim of meeting the principles “replacement, reduction, and refinement”) mean that route-to-route (RtR) extrapolation has become important for predicting TRVs for specific administration routes.

The UK Interdepartmental Group of Health Risks from Chemicals has issued guidance about extrapolating toxicity data obtained for one exposure route to other exposure routes [1]. The guidance indicates five key considerations: (1) local or systemic effects, (2) the target organ dose, (3) absorption as a determinant of the target organ dose, (4) route-specific metabolic factors, and (5) the first-pass effect. These considerations mean that many factors need to be assessed when considering the appropriateness of performing an RtR extrapolation of toxicity data, including (1) differences in absorption efficiencies, (2) differences in systemic effects (an extrapolation should not be performed for local effects), (3) differences in metabolism (including first-pass effects), and (4) the solubility of the chemical of interest in body fluids [2, 3]. However, it is generally time-consuming and resource-intensive to quantify all of the differences between target chemicals. Therefore, in practice, an initial analysis is performed focusing on the difference between the absorption efficiencies for the well-characterized administration route (typically oral) and the other targeted route to give an “extrapolation factor” (EF). The European Chemical Agency (ECHA) has proposed that a default factor of 2 should be used, meaning that the percentage of the chemical of interest absorbed after exposure through the oral route is half the percentage absorbed after exposure through the inhalation route in the absence of route-specific information [4]. Schröder et al. investigated oral-to-inhalation EFs using the Fraunhofer RepDose database of repeated-dose studies and proposed that an EF of 3 should be used to cover uncertainty related to unexpected local effects that could occur in an inhalation study [5]. In contrast, Rennen et al. assessed their database and found that an extrapolation method based only on correction for differences in absorption was not valid [6]. These studies suggest that more research efforts are required to improve our understanding of the reliability of RtR extrapolation.

The quantitative in vitro to in vivo extrapolation (QIVIVE) approach converts an in vitro concentration associated with bioactivity into an in vivo exposure level [7, 8]. This approach involves reverse dosimetry based on physiologically based kinetic (PBK) modeling to translate in vitro to in vivo responses and to derive in vivo TRVs [9–11]. In parallel to QIVIVE extrapolation based on actual experimental data, Wignall et al. developed a quantitative structure–activity relationship (QSAR) model using several machine-learning (ML) approaches that could compute TRVs [12]. They developed the “conditional toxicity value predictor” for predicting several TRVs relating non-cancer risks (e.g., RfD and RfC) and cancer risks (e.g., oral slope factor, inhalation unit risk, and cancer potency value). The conditional toxicity value predictor uses physicochemical properties calculated from the chemical structure and gives more precise and accurate TRVs predictions than are given using high-throughput screening assays and QIVIVE approach. These advances indicate that the ML-based approach using molecular descriptors has the potential to accurately estimate TRVs for chemicals for which no animal experiment data are available.

In this study, we investigated the applicability and validity of each of several molecular descriptors (chemical structure, physicochemical properties, and in vitro bioactivity data) for performing better RtR extrapolations. Oral-to-inhalation RtR extrapolations were performed because of the abundance of data available. The RfC and RfD from well-reviewed sources were used because of the reliability of the available toxicity data for humans. RfCs are usually determined by performing (1) repeated dose inhalation toxicity studies using animals or (2) RtR extrapolations based on repeated dose oral toxicity studies (i.e., using the same point-of-departure values for determining RfDs) and taking uncertainty factors (e.g., interspecies extrapolations, exposure duration, and database deficiencies) and modifying factors (expert assessments of scientific uncertainties) into consideration. Therefore, although RfCs have already been derived through oral-to-inhalation extrapolation for some chemicals, our objectives using the RfCs are of value because the RfCs are defined with the absorption efficacy (i.e., EF) and other factors relevant to human risk assessments taken into consideration. We developed ML-based regression models for predicting RfCs using RfDs and molecular features as descriptors. The performances of the ML models were compared with those of models using only molecular features or EFs.

2. Materials and Methods

After the data had been processed, chemical structures and properties were computed and predictive models were developed using the KNIME Analytics Platform (version 4.7.2).

2.1 Toxicity reference values for the chemicals

We compiled a database of publicly available peer-reviewed human health toxicity values for listed chemicals from the U.S. Environmental Protection Agency (EPA) (e.g., the Integrated Risk Information System (IRIS) [13], Provisional Peer Reviewed Toxicity Values (PPRTV) [14], Health Effects Assessment Summary Tables (HEAST), and Regional Screening Levels (RSLs) [15]), the California EPA Office of Environmental Health Hazard Assessment (OEHHA) [16], and the Agency for Toxic Substances and Disease Registry (ATSDR) [17]. The HEAST were taken from information given in the EPA Toxicity Values Database (ToxValDB v9.4) [18]. Mixtures were removed from the compiled chemical list. If a specific TRV for a chemical was available from more than one source, the TRV was selected following the U.S. EPA Superfund Program hierarchy shown below [19].

Tier 1: EPA IRIS

Tier 2: EPA PPRTV

Tier 3: Other toxicity values (HEAST, RSLs, OEHHA, ATSDR)

Chronic RfC and RfD values were taken from the EPA list. Chronic inhalation reference exposure levels were taken from the OEHHA list. Chronic minimum risk levels were taken from the ATSDR list, and if toxicity values for a chemical were available for more than one target tissue, the lowest minimum risk level was taken to be representative. The TRVs were converted from units of ppm or ppb into mg/m³ using the equation

m g / m 3 = p p m × M 22.4 × 273 273 + T × P 1013 ,

where M is the molecular weight, T is the temperature in degrees Celsius (37 °C here), and P is the atmospheric pressure (1013 hPa here). In Tier 3, if a TRV was available from more than one source, the lowest TRV was taken to be representative. The RfC units (mg/m³) and RfD units (mg/(kg body weight)) were converted into the same units (mg/d) using default physiological parameters (body weight 70 kg/person and inhalation rate 20 m³/d) given in the ECHA guidance [4]. We identified 182 chemicals with both RfDs and RfCs. The models were more strongly affected by higher values than lower values, so the standardized RfCs and RfDs were logarithm (base 10) transformed and then multiplied by −1.

2.2 Molecular descriptors

Simplified molecular-input line-entry system (SMILES) data for each chemical were taken from the EPA CompTox Chemicals Dashboard version 2.4.1 (https://comptox.epa.gov/dashboard/, accessed June 13, 2024) [20]. Physicochemical properties were calculated using the RDKit Descriptor Calculation node. If values could not be calculated for a chemical, the chemical was removed from the list for developing a predictive model using physicochemical properties. The standardized values to the range 0–1 were also used for the predictive model development to understand the impact of the physicochemical similarity on the constructed model performance. Molecular Access System (MACCS) keys and extended-connectivity fingerprints (ECFPs) were calculated using the CDK Fingerprints node. The ECFPs were generated using a radius of 4 (ECFP4) and then hashed to 1024 bits.

Biological activities were taken from the ToxCast database (invitrodb version 4.1; winning model) [21]. Assay endpoints related to background correction, autofluorescence, and artifact detection were removed. A chemical with hitcall ≥ 0.9 was considered to be an active substance, and a chemical with hitcall < 0.9 was considered to be an inactive substance [22]. Chemicals that were inactive for all of the measured endpoints were excluded from the modeling processes involving biological activities. We collected the AC₅ (the concentration giving 5% of the maximum response) for each active chemical for each assay endpoint. For inactive endpoints, the maximum tested concentration (in µM) multiplied by 100 was used. Higher AC₅s had more effect than lower AC₅s on the models, so the AC₅s were logarithm (base 10) transformed and then multiplied by −1. Missing values were replaced by the median values for the various assay endpoints. The chemicals used for each model are summarized in Table 1.

Table 1. Summary of the chemicals used for each model

Modeling with

Number of

Chemicals

Descriptors

−Log₁₀ RfC (mg/d)

Mean (90% CI)

[min – max]

−Log₁₀ RfD (mg/d)

Mean (90% CI)

[min – max]

Mean (90% CI)

[min – max]

LogP*

Mean (90% CI)

[min – max]

Fingerprint

182

MACCS

1.16 (0.221)

[−2.90 – 6.10]

0.258 (0.198)

[−3.53 – 7.31]

127 (9.98)

[9.01 – 406]

1.55 (0.294)

[−5.99 – 7.60]

ECFP4

Physicochemical property

181

Original

1.15 (0.221)

[−2.90 – 6.10]

0.258 (0.200)

[−3.53 – 7.31]

127 (10.0)

[9.01 – 406]

1.57 (0.294)

[−5.99 – 7.60]

Normalized

Biological

activity

Activity pattern

0.825 (0.344)

[−2.78 – 6.10]

0.0219 (0.288)

[−3.53 – 7.31]

131 (12.4)

[42.0 – 406]

1.81 (0.439)

[−5.99 – 6.61]

AC₅

Note: MW = molecular weight, CI = confidence interval.

*Experimental values (where available from EPA CompTox Chemical Dashboard), or simulated values (where available from OPERA model in the EPA CompTox Chemical Dashboard, or calculated by RDKit)

2.3 Predictive model development and performance assessment

Models for predicting RfCs using RfDs and molecular descriptors were developed using a gradient-boosted decision tree algorithm (using the XGBoost Tree Ensemble Learner (Regression) node with the default parameters set in the node to develop the model, and using XGBoost Predictor (Regression) node to predict the values). Feature importance was calculated using the same node based on the weight (the number of times a feature was used to split the data across all of the trees), gain (which implies the average gain across all splits the feature is used in), cover (the average coverage across all splits the feature is used in), total gain (which sums up the gain across all splits the feature is used in), and total cover (which sums up the total coverage across all splits the feature is used in). The feature importance based on each metric was calculated in each iteration through the leave-one-out approach, then normalized to the range 0–1 to assess the relative importance of each feature (Supplementary Figures S1, S4, S6, and S7). The performance of a model was assessed by performing leave-one-out cross-validation based on the mean squared error (MSE) using the Numeric Score node and the Pearson correlation coefficients using the Linear Correlation node to evaluate the correlation between predicted values and actual RfCs [23].

2.4 Chemical similarity

The Euclidean distance was defined as the similarity measure for chemicals using the fingerprint or a physicochemical property. The Euclidean distance for the five-nearest neighbors was calculated using the Similarity Search node and then normalized to be in the range 0–1. The normalized distances were then converted into similarities by subtracting the distance values from 1. Chemical analogs were profiled using the Repeated dose (HESS) toxicological profile in the OECD QSAR Toolbox (version 4.6).

3. Results and discussion

3.1 RfD-to-RfC extrapolation using EFs

We assessed the correlations between the RfDs and RfCs of the 182 selected chemicals. As expected, the RfDs and RfCs weakly correlated (r = 0.56) (Figure 1A). The EF for each chemical was calculated by dividing the RfD by the RfC (Figure 1B). The median EF was 3.5, which was similar to the empirical and reported EFs [1, 4, 5], suggesting that the selected chemicals had a wide range of EFs (the 25th and 75th percentile EFs were 0.525 and 129, respectively).

We also performed a grid search to give an ideal unique EF for RfC-to-RfD extrapolation of the selected chemicals (Figure 2). The predicted RfCs were calculated by dividing the RfDs by values between 0.1 and 100 in 0.1 increments. Using this approach, EF = 7.9 gave the best predictions with MSE = 2.6. This performance was used as a benchmark for the subsequent ML-based models.

Figure 1. Characteristics of the selected chemicals

(A) Correlations between the inhalation reference concentrations (RfCs) and oral reference doses (RfDs) of the selected chemicals. The dotted red line is an Excel trendline. The r indicates the correlation coefficient. The box and whisker plot to the right shows the RfC and RfD ranges. The middle line is the median, the cross within the box is the mean, and the top and bottom of the box are the 25th and 75th percentiles, respectively. The dots are outliers. (B) Range of calculated extrapolation factors on a logarithmic scale. The middle line is the median, the cross within the box is the mean, and the top and bottom of the box are the 25th and 75th percentiles, respectively.

Figure 2. Oral reference dose (RfD)-to-inhalation reference concentration (RfC) prediction performances achieved using extrapolation factors

Grid search for identifying ideal extrapolation factors for the selected chemicals. The mean squared errors for the relationships between the actual and predicted RfCs derived from each extrapolation factor are shown.

3.2 RfC prediction model based on molecular descriptors

We developed RfC prediction models using two types of fingerprints, a substructural fragment-based fingerprint (MACCS) and a circular topology-based fingerprint (ECFP4). The MACCS-based model performed better than the ECFP4-based model (Figure 3A and 3B). Banerjee et al. used computational methods to investigate in vitro effects using a similar approach to ours and found that the MACCS fingerprint performed best and gave similar results to the machine learning and similarity-based approaches [24]. The MACCS-based model performance (MSE = 2.0) was better than our benchmark performance (MSE = 2.6; Figure 2), and the correlation coefficient was higher for the MACCS-based model (r = 0.64; Figure 3C) than the RfC-RfD method (r = 0.56; Figure 1A). These results suggested that the ML-based RfC predictions for the selected chemicals using chemical structure information were valid. When we only selected chemicals very structurally similar to the chemicals in the training set, both models gave better prediction performances when the similarity thresholds were up to ~0.8, then the performances deteriorated as the similarity threshold increased further (Figures 3A and 3B). The results indicated the importance of structurally similar chemicals in the training set to allow precise RfC predictions to be made by a ML model. The correlations between some combinations of feature importances were weak (Supplementary Figure S1A), suggesting that the feature importance calculated by each metric implied different impacts of molecular analogues on the prediction. Therefore, we summarized top-ranked features derived from all metrics (Figure 3D). Among the top-ranked important features, halogens (Br, F, and Cl) were found as important features in the model and that the number of halogenated (particularly chlorinated) chemicals in the list was relatively large (Supplementary Figure S2). This suggested that the relationship between RfD and RfC was different for halogenated chemicals than for other chemical groups. Further studies focused on the relationships were therefore performed and are discussed in section 3.3.

We also developed RfC predictive models using physicochemical properties. We developed models using the original and normalized calculated physicochemical properties. The prediction performances of both models (MSE = 2.0 for the model using the original values, MSE = 2.0 for the model using the normalized values, meaning both models performed similarly, as expected) were better than the benchmark prediction performances (MSE = 2.6; Figure 2), as shown in Figure 4A. The correlation coefficient for the model using the normalized values (r = 0.64; Figure 4C) was higher than the coefficient for the RfC-RfD relationship (r = 0.56; Figure 1A), suggesting that the ML-based was relatively peaky when calculated by the original values compared with the normalized values (Supplementary Figure S3C). This suggested that it was important to preprocess the physicochemical properties before performing ML-based modeling to make accurate predictions for chemicals based on physicochemical similarity. As with the feature importance of fingerprint, the correlations between some combinations of feature importances were weak (Supplementary Figure S4A), suggesting that the feature importance calculated by each metric implied different impacts of physicochemical properties on the prediction. Therefore, we summarized top-ranked features derived from all metrics (Figure 4D). Among the top-ranked important features, SlogP (SlogP and slogp_VSA2) and AMW (actual molecular weight) are well-known parameters that make strong contributions to chemical permeability and absorption, known as Lipinski’s rule [25]. Falcón-Cano et al. found that SlogP, the TPSA (topological polar surface area), SMR (molecular refractivity), the Hall-Kier alpha value, and kappa 3 were the most important variables for permeability prediction models [26]. For our model, SlogP, TPSA, SMR (smr_VSA5, smr_VSA10), and kappa 3 were also among the top-ranked important features. Our results suggested that several parameters related to the permeabilities of the chemicals offer advantages when performing ML-based oral-to-inhalation extrapolations.

Figure 3. Inhalation reference concentration (RfC) prediction performance based on fingerprints

Effect of the Euclidean similarity threshold (in 0.01 increments) of the selected chemicals on the prediction performance of the model assessed using the (A) mean squared errors and (B) correlation coefficients between the RfCs and predicted values. Circles and triangles represent the values derived from the models using MACCS fingerprints and ECFP4, respectively. Negative r values were excluded from the plot. The red line is the oral reference dose (RfD)-to-RfC extrapolation benchmark shown in Figures 1A and 2. (C) The correlation coefficient (r) for the relationship between the RfCs and values predicted using the MACCS-based model for the selected chemicals. The dotted red line is an Excel trendline. (D) Feature importance analysis of the MACCS-based model. Normalized importances calculated by each metric were summed up.

RfC predictions for the selected chemicals using the physicochemical properties were valid. The prediction performances of the model using the original physicochemical properties were not affected by the similarity-based chemical selection, but the performances of the model using the normalized values were dependent on the similarity threshold. The prediction performances deteriorated between the similarity thresholds of 0.6 and 0.8, but then improved as the similarity threshold increased from 0.8 (Figures 4A and 4B). As is the case for structural similarity, physicochemical similarity in the training set was important to making accurate RfC predictions. The Euclidean similarity distribution

Figure 4. Inhalation reference concentration (RfC) prediction performances using physicochemical properties

Effects of the Euclidean similarity thresholds (changing in 0.01 increments) of chemicals on the prediction performances of the models assessed using the (A) mean squared errors and (B) correlation coefficients between the RfCs and predicted values. Circles and triangles represent the values derived from the models using the normalized and original physicochemical properties, respectively. Negative r values are excluded. The red line is the benchmark derived from the oral reference dose (RfD)-to-RfC extrapolation shown in Figure 2. (C) The correlation coefficient (r) for the relationship between the RfCs and values predicted using the model using normalized physicochemical properties for the selected chemicals. The dotted red line is an Excel trendline. (D) Feature importance analysis of the model using normalized physicochemical properties. Normalized importances calculated by each metric were summed up.

Big data produced through the ToxCast/Tox21 program have been widely used for toxicity assessments [27, 28]. These data have also been used to predict chronic points of departure for animals [29, 30]. Wignall et al. found that high-throughput screening-based predictions using human RfDs gave a very low R² (0.087) [12]. Our results were consistent with the results found by Wignall et al., the predicted values based on in vitro biological activity patterns and the AC₅s did not correlate well with the actual RfCs (Figures 5A and 5B). The low prediction performance could have been caused by the numbers of active and inactive chemicals tested in the ToxCast/Tox21 program being imbalanced [27, 31] (Supplementary Figure S5). Indeed, the mean in vitro AC₅s for the detected assay endpoints were clustered around −4, which was the imputed value for the inactive endpoints (Figure 5C). Even the most sensitive endpoint did not give a correlation with the actual RfCs (Figure 5D). However, chemicals with low RfCs (i.e., very toxic chemicals) were active in a relatively high number of bioassays (Figures 5C and 5D). This suggested that increasing the number of active assay endpoints could improve the prediction performance.

Figure 5. Prediction performances using biological activities

The correlation coefficient (r) between the inhalation reference concentrations (RfCs) and predicted values using the (A) biological activity pattern (active or inactive) and (B) concentration giving 5% of the maximum response (AC₅) are shown. The dotted red line is an Excel trendline. The relationships between the RfCs and the (C) mean and (D) highest AC₅s for the assessed endpoints. The circle size indicates the relative number of active endpoints.

We found that the RfC prediction could be improved using several molecular descriptors. Subsequent analyses were focused on the number of chemicals available for modeling and the model prediction performance, MACCS fingerprint, and normalized physicochemical properties.

3.3 Characteristics of chemicals based on their structural/physicochemical similarities

According to the Organisation for Economic Co-operation and Development (OECD) (Q)SAR Model Principle 3 [32], a valid model is associated with a defined applicability domain (AD). As suggested in the validation guidance document, we visualized the Euclidean chemical space to quantify the AD for the developed model (Figure 6A). As expected, because the physicochemical properties were calculated based on SMILES, the Euclidean similarity calculated using the MACCS fingerprint correlated with the Euclidean similarity calculated using the physicochemical properties. When we set the applicability domain threshold at the similarity that gave the relatively low MSE (0.8 for the both models), 55 chemicals overlapped, and 12 and 38 chemicals were within the ADs of the MACCS- and physicochemical-property-based models, respectively. The halogenated analogs were predominantly within the ADs of both models (Figure 6B), consistent with the results of the feature importance analysis of the MACCS-based model (Figure 3D). This agreed with the results of a study performed by Craig et al., in which most of the developmental toxicants in the IRIS database were classed as halogenated non-metals [33]. The top-ranked features in the physicochemical property-based model also had the features of halogenated chemicals. The top-ranked features were thought to be related to permeability (Figure 4D), and, indeed, halogenation is often used to enhance membrane binding and chemical permeation [34]. We found lower EFs for halogenated chemicals than non-halogenated chemicals (Figure 6C). This suggested that orally administered halogenated chemicals are more readily absorbed than inhaled halogenated chemicals. This is in line with the results of a study of clinical experiences performed by Teschke, which found that acute intoxication by aliphatic halogenated hydrocarbons occurs most often through ingestion and rarely through inhalation [35], probably because inhaled volatile halogenated chemicals are partly eliminated in exhaled air [36]. Our results indicated that higher prediction performances were found at higher similarity thresholds (Figures 3A and 4A), suggesting that similarity-based chemical selection was effective for ML-based oral-to-inhalation extrapolation, probably because of the characteristics of the filtered chemicals.

Figure 6. Chemical spaces and characteristics of the developed models and selected chemicals

(A) Selected chemicals are scattered based on the similarities calculated using the MACCS fingerprints and physicochemical properties. The dotted lines indicate the similarity thresholds (0.8) that gave relatively low mean squared error for the both prediction models. The numbers are the numbers of chemicals within each area. (B) Numbers of analogs of the selected chemicals within the applicability domains. (C) Percentages of halogenated chemicals in the calculated extrapolation factor ranges. The red line and dots indicate the total number of chemicals.

3.4 ML-based RfD-to-RfC extrapolation using molecular descriptors

In section 3.2, we demonstrated that fingerprints and physicochemical properties could be used to predict RfCs using the ML algorithm. We therefore used these parameters as descriptors in ML-based RfD-to-RfC extrapolation models to improve the prediction performance compared with prediction models using only molecular descriptors. We developed two ML-based RfD-to-RfC extrapolation models using MACCS fingerprints and normalized physicochemical properties (here called RfD-MACCS and RfD-Phys models, respectively). These models had higher correlation coefficients (r = 0.74 for the RfD-MACCS model (Figure 7A) and 0.71 for the RfD-Phys model (Figure 7D)) than the correlation coefficients for the RfD-to-RfC extrapolation model only using EFs (r = 0.56; Figure 1A) and the ML-based model using molecular descriptors (r = 0.64 for both the MACCS- and physicochemical-based models; Figures 3C and 4C). The hybrid models also gave better prediction performances than the models only using EFs or molecular descriptors and gave better prediction performances than the best performances of the models using only molecular descriptors for chemicals within the AD, as described in section 3.3 (Figures 7B, 7C, 7E, and 7F). The feature importance analyses indicated that the RfD was calculated as the most important descriptor by three out of five metrics (weight, total gain, and total cover) for both hybrid models (Supplementary Figures S6 and S7). The results suggest that (1) the RfD could be an effective descriptor for ML-based RfC prediction models and (2) the RfD could strongly correlate with the RfC for chemicals that are similar in terms of structure and/or physicochemical properties, taking into consideration that similarity was calculated from molecular descriptors. Overall, we found that ML-based RfD-to-RfC extrapolation using several molecular descriptors, taking the similarities of these parameters into consideration, has certain advantages over other methods for oral-to-inhalation extrapolation. Table 2 represents the summary of the predictive performance constructed in this study.

Table 2. Summary of the predictive performance for all constructed models

Modeling with	Molecular descriptors	XGBoost (input descriptors)			EF
Modeling with	Molecular descriptors	RfD+MD	MD	RfD	EF
Fingerprint	ECFP4	1.622	2.640	2.410	2.583
Fingerprint	MACCS	1.546	2.011
Physicochemical property	Original	1.646	1.997
Physicochemical property	Normalized	1.645	1.998

Note: MD = molecular descriptors (fingerprint or physicochemical property).

The values indicate mean squared error.

Figure 7. Prediction performances for oral reference dose (RfD)-to-inhalation reference concentration (RfC) extrapolation using molecular descriptors

Prediction performances derived from RfD-to-RfC prediction models using (A–C) MACCS fingerprints and (D–F) physicochemical properties. (A and D) Correlation coefficients for the relationships between the actual and predicted RfCs. The dotted red line is an Excel trendline. (B, C, E, and F) Effects of Euclidean similarity thresholds of the chemicals on the prediction performances assessed using (B and E) mean squared errors, and (C and F) correlation coefficients. Negative r values are excluded from the plot. The solid and dotted lines represent the performances derived when only using extrapolation factors (Figures 1 and 2A) and only using molecular descriptors (Figures 3A, 3B, 4A, and 4B), respectively.

3.5 Limitations and future prospects

The RtR extrapolation method is a well-established approach for estimating toxicity between different exposure routes, using an EF derived from absorption differences. We predicted that ML-based oral-to-inhalation extrapolation using molecular descriptors would perform better than using a single constant EF for well-reviewed human oral toxicity values (RfDs) and inhalation toxicity values (RfCs). The results suggest that there are difficulties when using EFs only for inhalation toxicity screening based on oral toxicity data, and this agreed with the results of a study performed by Rennen et al. [6] that the no-observed-adverse-effect-level for inhalation could not be explained by an EF based only on absorption differences. Our results also indicate that chemical selection based on structural and physicochemical similarities strongly improves the prediction performance. The feature importance analyses indicate that functional groups that affected permeability (i.e., halogens) were probably critical to prediction performance improvements. The results provide new insights into the effects of chemical characteristics on oral-to-inhalation extrapolations for human health risk screening not achieved in previous systematic analyses [5, 6]. However, the limited availability of authorized chronic oral and inhalation toxicity values mean that our models might have limited adaptability for other chemicals (the applicability domains of the models were probably biased toward halogenated chemicals). Indeed, the chemical space of the analyzed chemicals in this study partially covers the chemical space of ToxCast database (Figure S8). Another limitation is that we tested only one algorithm. XGBoost is used widely by data scientists to achieve excellent performance, and indeed, our comparison suggests the performance is comparable to the other algorithms (Table S2). Nevertheless, further optimization (e.g., algorithm selection, hyper-parameter tuning) should be required to develop the more precise prediction model for inhalation toxicity assessment.

Reliance on animal tests for chemical safety assessments is increasingly being questioned, so new approach methods (NAMs) are required to replace animal tests. Ramanarayanan et al. described a conceptual model for performing inhalation risk assessments using NAMs [37]. The conceptual model incorporated (1) an in vitro test designed to identify adverse effects on tissues at the point of entry to the human respiratory system and (2) an in silico dosimetry model to predict inhaled particle deposition at specific sites in the respiratory tract. Three-dimensional reconstructed bronchial epithelial tissues have been widely used to perform long-term in vitro inhalation tests using single chemicals [38] or mixtures [39, 40]. Multiple-path in silico particle dosimetry models [41] and computational fluid dynamics models [42] have been used to predict inhaled chemical deposition. These approaches will be useful for performing local inhalation risk assessments, but the systemic effects of inhaled chemicals are poorly understood even though these approaches are well established. Co-culturing systems using multiple cell types have been used to mimic multi-tissue crosstalk, and recent studies have been focused on emulating the complexities of target organs [43] or interconnections between one or two types of cells derived from different organs [44, 45]. We believe that available in vivo animal test data still remain relevant to higher-tier hazard endpoints such as repeated dose toxicity [46], thus it is worth to validate the developed NAMs using such animal test data to improve the relevance and acceptance of results from NAMs. Historically well-documented approaches to regulatory human toxicity assessments and innovative NAMs need to be used together in next-generation risk assessments, and ML-driven analyses will be useful for identifying important characteristics and providing insights into target chemicals.

4. Conclusion

We concluded that the ML-based oral-to-inhalation approach is useful for screening non-cancer health risks to humans. The ML-driven approach can complement traditional PBK-based RtR extrapolation methods to fill gaps for chemicals with unknown route-specific toxicities, taking previously unidentified characteristics of the chemicals into account.

Acknowledgments

I am grateful to Mr. Hiroaki Suzuki, Drs. Shigeaki Ito, Tsuneo Hashizume, and Ian Jones for support and advice on this study. I thank Dr. Gareth Thomas from Edanz (https://jp.edanz.com/ac) for editing a draft of this manuscript.

Conflict of interest

The author is an employee of Japan Tobacco Inc. and declares no conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

[1] The Interdepartmental Group on Health Risks from Chemicals. Guidelines on route-to-route extrapolation of toxicity data when assessing health risks of chemicals, 2006. http://www.iehconsulting.co.uk/IEH_Consulting/IEHCPubs/IGHRC/cr12.pdf (accessed 2024-08-20)
[2] Pepelko, W. E.; Withey, J. R. Methods for route-to-route extrapolation of dose. Toxicol. Ind. Health. 1985, 1(4), 153–175. DOI: 10.1177/074823378500100410
[3] Pepelko, W. E. Feasibility of route extrapolation in risk assessment. Br. J. Ind. Med. 1987, 44(10), 649–651. DOI: 10.1136/oem.44.10.649
[4] European Chemicals Agency. Guidance on information requirements and chemical safety assessment. Chapter R.8: Characterisation of dose [concentration]-response for human health, 2012. https://echa.europa.eu/documents/10162/17224/information_requirements_r8_en.pdf/e153243a-03f0-44c5-8808-88af66223258 (accessed 2024-08-20)
[5] Schröder, K.; Escher, S. E.; Hoffmann-Dörr, S.; Kühne, R.; Simetska, N.; et al. Evaluation of route-to-route extrapolation factors based on assessment of repeated dose toxicity studies compiled in the database RepDose^®. Toxicol. Lett. 2016, 261, 32–40. DOI: 10.1016/j.toxlet.2016.08.013
[6] Rennen, M. A.; Bouwman, T.; Wilschut, A.; Bessems, J. G.; De Heer, C. Oral-to-inhalation route extrapolation in occupational health risk assessment: a critical assessment. Regul. Toxicol. Pharmacol. 2004, 39(1), 5–11. DOI: 10.1016/j.yrtph.2003.09.003
[7] Yoon, M.; Campbell, J. L.; Andersen, M. E.; Clewell, H. J. Quantitative in vitro to in vivo extrapolation of cell-based toxicity assay results. Crit. Rev. Toxicol. 2012, 42(8), 633–652. DOI: 10.3109 /10408444.2012.692115
[8] Chang, X.; Tan, Y. M.; Allen, D. G.; Bell, S.; Brown, P. C.; et al. IVIVE: facilitating the use of in vitro toxicity data in risk assessment and decision making. Toxics. 2022, 10(5), 232. DOI: 10.3390/toxics10050232
[9] Loizou, G.; McNally, K.; Dorne, J. L. C.; Hogg, A. Derivation of a human in vivo benchmark dose for perfluorooctanoic acid from ToxCast in vitro concentration–Response data using a computational workflow for probabilistic quantitative in vitro to in vivo extrapolation. Front. Pharmacol. 2021, 12, 630457. DOI: 10.3389/fphar.2021.630457
[10] Louisse, J.; Beekmann, K.; Rietjens, I. M. Use of physiologically based kinetic modeling-based reverse dosimetry to predict in vivo toxicity from in vitro data. Chem. Res. Toxicol. 2017, 30(1), 114–125. DOI: 10.1021/acs.chemrestox.6b00302
[11] Thomas, R. S.; Philbert, M. A.; Auerbach, S. S.; Wetmore, B. A.; Devito, M. J.; et al. Incorporating new technologies into toxicity testing and risk assessment: moving from 21st century vision to a data-driven framework. Toxicol. Sci. 2013, 136(1), 4–18. DOI: 10.1093/toxsci/kft178
[12] Wignall, J. A.; Muratov, E.; Sedykh, A.; Guyton, K. Z.; Tropsha, A.; et al. Conditional toxicity value (CTV) predictor: An in silico approach for generating quantitative risk estimates for chemicals. Environ. Health Perspect. 2018, 126(5), 057008. DOI: 10.1289/EHP2998
[13] Integrated Risk Information System, https://iris.epa.gov/AtoZ/?list_type=alpha (accessed June 13, 2024)
[14] Provisional Peer-Reviewed Toxicity Values Assessments, https://www.epa.gov/pprtv/provisional-peer-reviewed-toxicity-values-pprtvs-assessments (accessed June 13, 2024)
[15] Regional Screening Levels, https://www.epa.gov/risk/regional-screening-levels-rsls (accessed June 13, 2024)
[16] California Office of Environmental Health Hazard Assessment, https://oehha.ca.gov/chemicals (accessed June 13, 2024)
[17] Agency for Toxic Substances and Disease Registry, https://wwwn.cdc.gov/TSP/MRLS/mrlsListing.aspx (accessed June 13, 2024)
[18] Judson, R. Toxvaldb V9.4, 2022. DOI:10.23645/epacomptox.20394501.v5
[19] U.S. Environmental Protection Agency. Human Health Toxicity Values in Superfund Risk Assessments. December 5, 2003. https://www.epa.gov/sites/default/files/2015-11/documents/hhmemo.pdf (accessed 2024-08-20)
[20] Williams, A. J.; Grulke, C. M.; Edwards, J.; McEachran, A. D.; Mansouri, K.; et al. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J. Cheminf. 2017, 9, 1–27. DOI: 10.1186/s13321-017-0247-6
[21] US EPA ORD, C. for C. T. Toxcast Database: Invitrodb Version 4.1, 2018. DOI:10.23645/epacomptox.6062623.v12
[22] Feshuk, M.; Kolaczkowski, L.; Dunham, K.; Davidson-Fritz, S. E.; Carstens, K. E.; et al. The ToxCast pipeline: updates to curve-fitting approaches and database structure. Front. Toxicol. 2023, 5, 1275980. DOI: 10.3389/ftox.2023.1275980
[23] Esaki, T. Appropriate evaluation measurements for regression models. CBIJ. 2021, 21, 59-69. DOI: 10.1273/cbij.21.59
[24] Banerjee, P.; Siramshetty, V. B.; Drwal, M. N.; Preissner, R. Computational methods for prediction of in vitro effects of new chemical structures. J. Cheminf. 2016, 8, 1–11. DOI: 10.1186/s13321-016-0162-2
[25] Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Delivery Rev. 1997, 23(1–3), 3–25. DOI: 10.1016/S0169-409X(96)00423-1
[26] Falcón-Cano, G.; Molina, C.; Cabrera-Pérez, M. Á. Reliable prediction of Caco-2 permeability by supervised recursive machine learning approaches. Pharmaceutics. 2022,14(10), 1998. DOI: 10.3390/pharmaceutics14101998
[27] Jeong, J.; Kim, D.; Choi, J. Application of ToxCast/Tox21 data for toxicity mechanism-based evaluation and prioritization of environmental chemicals: Perspective and limitations. Toxicol In Vitro. 2022, 84, 105451. DOI: 10.1016/j.tiv.2022.105451
[28] Mezencev, R.; Subramaniam, R. The use of evidence from high-throughput screening and transcriptomic data in human health risk assessments. Toxicol. Appl. Pharmacol. 2019, 380, 114706. DOI: 10.1016/j.taap.2019.114706
[29] Wang, D. Infer the in vivo point of departure with ToxCast in vitro assay data using a robust learning approach. Arch. Toxicol. 2018, 92(9), 2913–2922. DOI: 10.1007/s00204-018-2260-6
[30] Wetmore, B. A.; Wambaugh, J. F.; Ferguson, S. S.; Li, L.; Clewell III, H. J.; et al. Relative impact of incorporating pharmacokinetics on predicting in vivo hazard and mode of action from high-throughput in vitro toxicity assays. Toxicol. Sci. 2013, 132(2), 327–346. DOI: 10.1093/toxsci/kft012
[31] Kurosaki, K.; Wu, R.; Uesawa, Y. A toxicity prediction tool for potential agonist/antagonist activities in molecular initiating events based on chemical structures. Int. J. Mol. Sci. 2020, 21(21), 7853. DOI: 10.3390/ijms21217853
[32] Organisation for Economic Co-operation and Development. Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models, 2014. https://doi.org/10.1787/9789264085442-en (accessed 2024-08-20)
[33] Craig, E. A.; Wang, N. C.; Zhao, Q. J. Using quantitative structure–activity relationship modeling to quantitatively predict the developmental toxicity of halogenated azole compounds. J. Appl. Toxicol. 2014, 34(7), 787–794. DOI: 10.1002/jat.2940
[34] Gerebtzoff, G.; Li-Blatter, X.; Fischer, H.; Frentzel, A.; Seelig, A. Halogenation of drugs enhances membrane binding and permeation. ChemBioChem. 2004, 5(5), 676–684. DOI: 10.1002/cbic.200400017
[35] Teschke, R. Aliphatic halogenated hydrocarbons: report and analysis of liver injury in 60 patients. J Clin Transl Hepatol. 2018, 6(4), 350. DOI: 10.14218/JCTH.2018.00040
[36] Morgan, A.; Black, A.; Belcher, D. R. The excretion in breath of some aliphatic halogenated hydrocarbons following administration by inhalation. Ann. Occup. Hyg. 1970, 13(4), 219–233. DOI: 10.1093/annhyg/13.4.219
[37] Ramanarayanan, T.; Szarka, A.; Flack, S.; Hinderliter, P.; Corley, R.; et al. Application of a new approach method (NAM) for inhalation risk assessment. Regul Toxicol Pharmacol. 2022, 133, 105216. DOI: 10.1016/j.yrtph.2022.105216
[38] Cervena, T.; Vrbova, K.; Rossnerova, A.; Topinka, J.; Rossner Jr, P. Short-term and long-term exposure of the MucilAir™ model to polycyclic aromatic hydrocarbons. Altern Lab Anim. 2019, 47(1), 9–18. DOI: 10.1177/0261192919841484
[39] Ito, S., Ishimori, K.; Ishikawa, S. Effects of repeated cigarette smoke extract exposure over one month on human bronchial epithelial organotypic culture. Toxicol Rep. 2018, 5, 864–870. DOI: 10.1016/j.toxrep.2018.08.015
[40] Ito, S.; Matsumura, K.; Ishimori, K.; Ishikawa, S. In vitro long-term repeated exposure and exposure switching of a novel tobacco vapor product in a human organotypic culture of bronchial epithelial cells. J Appl Toxicol. 2020, 40(9), 1248–1258. DOI: 10.1002/jat.3982
[41] Mori, A.; Ito, S.; Sekine, T. A revision of the multiple-path particle dosimetry model focusing on tobacco product aerosol dynamics. Int J Numer Methods Biomed Eng. 2024, 40(3), e3796. DOI: 10.1002/cnm.3796
[42] Corley, R. A.; Kuprat, A. P.; Suffield, S. R.; Kabilan, S.; Hinderliter, P. M.; et al. New approach methodology for assessing inhalation risks of a contact respiratory cytotoxicant: Computational fluid dynamics-based aerosol dosimetry modeling for cross-species and in vitro comparisons. Toxicol Sci. 2021, 182(2), 243–259. DOI: 10.1093/toxsci/kfab062
[43] Tanabe, I.; Yoshida, K.; Ishikawa, S.; Ishimori, K.; Hashizume, T.; et al. Development of an in vitro sensitisation test using a coculture system of human bronchial epithelium and immune cells. Altern Lab Anim. 2023, 51(6), 387–400. DOI: 10.1177/02611929231204823
[44] Schimek, K.; Frentzel, S.; Luettich, K.; Bovard, D.; Rütschle, I.; et al. Human multi-organ chip co-culture of bronchial lung culture and liver spheroids for substance exposure studies. Sci Rep. 2020, 10(1), 7865. DOI: 10.1038/s41598-020-64219-6
[45] Maschmeyer, I.; Lorenz, A. K.; Schimek, K.; Hasenberg, T.; Ramme, A. P.; et al. A four-organ-chip for interconnected long-term co-culture of human intestine, liver, skin and kidney equivalents. Lab Chip. 2015, 15(12), 2688–2699. DOI: 10.1039/C5LC00392J
[46] Brescia, S.; Alexander-White, C.; Li, H.; Cayley, A. Risk assessment in the 21st century: where are we heading?. Toxicol Res. 2023, 12(1), 1–11. DOI: 10.1093/toxres/tfac087

Corresponding author

Register with J-STAGE for free!