The Journal of Toxicological Sciences
Online ISSN : 1880-3989
Print ISSN : 0388-1350
ISSN-L : 0388-1350
Original Article
Adjustment of a no expected sensitization induction level derived from Bayesian network integrated testing strategy for skin sensitization risk assessment
Yuki OtsuboTaku NishijoHideyuki MizumachiKazutoshi SaitoMasaaki MiyazawaHitoshi Sakaguchi
Author information
JOURNALS FREE ACCESS FULL-TEXT HTML

2020 Volume 45 Issue 1 Pages 57-67

Details
Abstract

Skin sensitization is a key adverse effect to be addressed during hazard identification and risk assessment of chemicals, because it is the first step in the development of allergic contact dermatitis. Multiple non-animal testing strategies incorporating in vitro tests and in silico tools have achieved good predictivities when compared with murine local lymph node assay (LLNA). The binary test battery of KeratinoSensTM and h-CLAT could be used to classify non-sensitizers as the first part of bottom-up approach. However, the quantitative risk assessment for sensitizing chemicals requires a No Expected Sensitization Induction Level (NESIL), the dose not expected to induce skin sensitization in humans. We used Bayesian network integrated testing strategy (BN ITS-3) for chemical potency classification. BN ITS-3 predictions were performed without a pre-processing step (selecting data from their physic-chemical applicability domains) or post-processing step (Michael acceptor chemistry correction), neither of which necessarily improve prediction accuracy. For chemicals within newly defined applicability domain, all under-predictions fell within one potency class when compared with LLNA results, indicating no chemicals that were incorrectly classified by more than one class. Considering the potential under-prediction by one class, a worst case value to each class from BN ITS-3 was used to derive a NESIL. When in vivo and human data from suitable analogs cannot be used to estimate the uncertainty, adjusting the NESIL derived from BN ITS-3 may help perform skin sensitization risk assessment. The overall workflow for risk assessment was demonstrated by incorporating the binary test battery of KeratinoSensTM and h-CLAT.

INTRODUCTION

Skin sensitizers can cause allergic contact dermatitis (ACD) upon contact with skin (Kimber et al., 2011). Animal tests, such as the guinea pig maximization test, the Buehler test, and the murine local lymph node assay (LLNA) are traditionally used to assess skin sensitization of chemicals. ACD consists of a sensitization phase wherein antigen specific T-cells proliferate and an elicitation phase where subsequent contact with an allergen produces a skin reaction (Goebel et al., 2012). Due to regulatory requirements and ethical concerns, non-animal test methods have been developed that focus on earlier key events (KEs, KE1: protein binding, KE2: keratinocyte response and KE3: dendritic cell activation) of the adverse outcome pathway (AOP) during the induction phase of skin sensitization (OECD, 2012). The direct peptide reactivity assay (DPRA), examining KE1, reflects binding between proteins and the chemicals using model peptides containing cysteine or lysine residues (Gerberick et al., 2004, 2007). The KeratinoSensTM examines KE2 by simulating antioxidant response elements (AREs) in HaCaT-derived human keratinocytes to detect activation of the Keap1–Nrf2–ARE pathway as an oxidative and electrophilic stress response (Emter et al., 2010). The human Cell Line Activation Test (h-CLAT) addresses KE3 by evaluating increased expression of surface markers CD86 and CD54 on THP-1 cells, a human monocytic leukemia cell line that serves as a dendritic cell surrogate (Ashikaga et al., 2006, 2010; Sakaguchi et al., 2006).

No single in vitro test can cover the entire AOP with higher accuracy than the LLNA. Consequently, integrated results from multiple in vitro tests, as well as in silico models, would be needed to assess the skin sensitizing potential of various chemicals. Recently, we reported on a binary test battery that combines KeratinoSensTM and h-CLAT (Otsubo et al., 2017) as a hazard identification model, taking into account some limitations of the 2 out of 3 approach (Urbisch et al., 2015). The binary test battery could determine if a test chemical was non-sensitizing, based on two negative results from KeratinoSensTM and h-CLAT. The battery has 93.4% (compared with the LLNA), or 94.4% sensitivity (compared with human data) for 203 chemicals. Considering its predictive limitations, the test battery achieved high sensitivity, similar to the 3 out of 3 ITS, requiring three negative results of the DPRA, KeratinoSensTM, and h-CLAT. Therefore, the binary test battery consisting of KeratinoSensTM and h-CLAT could function as part of a “bottom-up” approach for skin sensitization hazard prediction.

On the other hand, when at least one positive result was obtained by any of three in vitro tests, a potency classification was necessary for risk assessment. Several promising potency prediction models were submitted to the OECD as case studies of defined approaches (OECD, 2016). The OECD guideline documents included defined approaches that used validated OECD in vitro test methods and simple, rule-based data interpretation procedures (2016). For example, the sequential testing strategy (STS) provides hazard and potency classification based on DPRA and h-CLAT data. In addition, the integrated testing strategy (ITS) provides hazard and potency classification by using the quantitative parameters of DPRA and h-CLAT, and the result from a commercial in silico tool (DEREK Nexus) that identifies structural alerts for sensitization (Nukada et al., 2013; Takenouchi et al., 2015; Kleinstreuer et al., 2018). However, the STS and ITS do not address KE2. Over the past few years, some testing strategies, based on liner regression and machine learning system, were also developed (Natsch et al., 2018; Jaworska et al., 2015). Natsch et al. (2018) reported a prediction model based on a combination of kinetic peptide reactivity data and KeratinoSensTM. The regression analysis was applied into the model and the construction is well-defined; however, this model does not consider KE3. On the other hand, the BN ITS-3 uses the quantitative results of DPRA, KeratinoSensTM, and h-CLAT to represent KE1-3 of the skin sensitization AOP (Jaworska et al., 2015). The model also uses physicochemical parameters (Log D, water solubility, ionization ratio and plasma protein binding) and structure-based prediction by TIMES. The BN ITS-3 can determine the sensitizing potency of chemicals as a probability distribution over 4 potency classes of LLNA [Non-sensitizer (NS), Weak (W), Moderate (M) or Strong to Extreme (S)]. BN ITS-3 predicted LLNA potency class in a training set (n = 147) or an external test set (n = 60) with 86.4% and 88.3% accuracy, respectively. However, there were still some mis-predictions, including over- and underestimation of the potencies of various chemicals.

We examined the accuracy of potency classification (4 classes) of BN ITS-3 using a dataset of 175 chemicals (141 sensitizers and 34 NS) constructed in our previous work. Jaworska et al. (2015) performed BN ITS-3 predictions using both a pre-processing step of selecting data from their physic-chemical applicability domains and a post-processing step of Michael acceptor (MA) chemistry correction, which could lead to less sensitizing predictions than those without processing steps. On the other hand, in this study, BN ITS-3 predictions were performed without either processing step in order to avoid potential under-predictions. While LLNA potency is generally expressed as percent weight per volume, Jaworska et al. (2015) converted all the values into molar units. However, substances with weak LLNA potency or low molecular weight (MW) could be under-predicted. Thus, in this study, LLNA potency was expressed as percent weight per volume. Then, we estimated uncertainty due to under-predictions in BN ITS-3 and assigned a worst value of EC3 to each class (e.g. EC3 = 10% for weak sensitizers) to derive a No Expected Sensitization Induction Level (NESIL) for use in skin sensitization quantitative risk assessment (QRA).

MATERIALS AND METHODS

Bayesian network integrated testing strategy (BN ITS-3)

We used the BN ITS-3, developed by Procter & Gamble (Jaworska et al., 2015) using both the commercially-available BayesiaLab 5.4 platform (Bayesia SAS, Laval, Cedex, France) and the code which was kindly provided by Jaworska. The BN ITS-3 uses the following data: 1) bioavailability-related variables that rely on physicochemical properties (water solubility, log D, fraction ionized, plasma protein binding) calculated at pH = 7 using ACD labs Percepta (ACD Labs, Toronto, Canada); 2) structure-based in silico prediction, TIMES-SS (Laboratory of Mathematical Chemistry, Bourgas, Bulgalia), which generates chemical alerts for sensitization using skin metabolism and autoxidation simulators; 3) quantitative data from DPRA, KeratinoSensTM, and h-CLAT that represent skin sensitization KE1-3 of the AOP. In the BN ITS-3, the relevant in-domain evidence for DPRA, KeratinoSensTM, and h-CLAT takes into account both the water solubility cutoffs and fraction ionized. Chemicals that were fully ionized at pH = 7 are deemed “not suitable” for cell-based assays due to poor bioavailability. In fact, in Jaworska et al. (2015), only squaric acid and tartaric acid, which feature ionization rates at pH = 7 of 99.997% and 99.998% respectively, were considered fully ionized chemicals. Thus, the results of KeratinoSensTM and h-CLAT were excluded from the model test set for these chemicals. On the other hand, the water solubility cutoffs and fraction ionized were considered in only the model test set, but not in the model training set. Additionally, the criteria for fraction ionized in Jaworska et al. (2015) remains unknown. Therefore, in this study, we did not consider water solubility cutoffs and fraction ionized for the applicability domain. The BN ITS-3 predicts the potency of skin sensitization as a probability distribution over 4 potency classes in LLNA. While the LLNA EC3 values, which correspond to the concentrations required to produce a positive response, are expressed as percent weight per volume, Jaworska et al. (2015) converted all the values into molar units. This conversion could result in underestimation of substances with EC3 values close to 90%, or for those with low MW (e.g., a W sensitizer or a M sensitizer in weight-based classification can be changed to a NS or a W sensitizer in molar-based classification, respectively). Therefore, in this study, LLNA potency was expressed as percent weight per volume. While BN ITS-3 predictions are post-processed to correct for MA chemistry, MA correction was only applied to the model test set and, therefore, changed the predicted potency into a weaker class (Jaworska et al., 2015). Thus, in this study, to avoid prediction underestimation, MA correction was not considered. The predicted probability distribution was converted to Bayes factors (B), where a NS prediction can be accepted only when B > 3 (substantial strength of evidence), while a sensitizer prediction with B > 1 can be accepted. The highest B across potency classes drives the prediction, and a lower B indicates stronger uncertainty.

Chemical Dataset

The analysis was performed using the same dataset of 203 chemicals that were compiled by Otsubo et al. (2017). Of the 203 chemicals, seven metal salts, such as potassium dichromate and sodium lauryl sulfate, and one mixture (methylchloroisothiazoline/methyl isothiazoline) were excluded from the analysis because they were not considered in the model training set and test set of Jaworska et al. (2015). Furthermore, bisphenol A glycerolate dimethacrylate (Bis-GMA) was excluded from the analysis because the data for lysine peptide are not available. Methyl pyruvate, clotrimazole, 1-cyclohexylethyl 2-butenoate, 1-octen-3-yl acetate, methyl salicylate, sulfanilamide, 1-butanol, Isopropanol, propylene glycol, 4-hydroxybenzoic acid, glycerol, lactic acid, octanenitrile, vinylidene dichloride, N,N-diethyl-m-toluamide, sulphanilic acid, dimethyl formamide, and saccharin were not considered because they had negative results in all three in vitro assays and did not need potency prediction assessment. The remaining 175 chemicals (34 non-sensitizers and, 55 W, 56 M and 30 S sensitizers) were analyzed for potency prediction by the BN ITS-3. Supplementary Table 1 lists the data for the DPRA, KeratinoSensTM, h-CLAT, LLNA, TIMES, physicochemical properties, and BN ITS-3.

Direct Peptide Reactivity Assay (DPRA): OECD test guideline 442C

The DPRA evaluates KE1, the molecular initiating event in the skin sensitization AOP. The protein reactivity was reported as percent depletion using two synthetic model hepta-peptides, one containing cysteine (Ac-RFAACAA-COOH) and the other containing lysine (Ac-RFAAKAA-COOH) (Gerberick et al., 2004, 2007). The test chemical was mixed with the cysteine or lysine peptides at 10:1 or 50:1, and the depletion of the peptides after 24 hr (as a result of binding to the test chemical) was measured using HPLC. Each of percent peptide depletion was calculated for individual peptides, and they were averaged. If the average depletion value was > 6.38%, the chemicals were classified as positive.

KeratinoSensTM: OECD test guideline 442D

The KeratinoSensTM addresses KE2 or keratinocyte activation by measuring the activation of the keap1–Nrf2–ARE pathway (Emter et al., 2010). The KeratinoSensTM is a reporter cell-based assay measuring luciferase gene induction as an indicator of ARE activation in HaCaT. The cells were incubated with the test chemical for 48 hours, and then luciferase induction was measured by luminometer. Test chemicals were classified as positive if the luciferase gene induction showed a statistically significant increase >1.5-fold over the vehicle control at a concentration <1000 μM, with cell viability >70%. The estimated concentration of 1.5-fold luciferase induction (EC1.5) was calculated by linear interpolation.

Human cell line activation test (h-CLAT): OECD test guideline 442E

The h-CLAT addresses KE3 or dendritic cell activation. Dendritic cells are known to up-regulate the expression of CD86 and CD54 cell surface proteins following exposure to sensitizers. The h-CLAT measures CD86 and CD54 expressions on THP-1 cells that serve as a dendritic cell surrogate (Ashikaga et al., 2006, 2010; Sakaguchi et al., 2006). The cells were treated with a test chemical for 24 hr at the concentration yielding 75% cell viability (CV75) and stained with fluorescence-labeled antibodies for CD86 or CD54. Changes in CD86 and CD54 expressions, induced by the test chemical, were then measured by flow cytometry. Relative fluorescence intensity (RFI) to vehicle control was measured by flow cytometry. When CD86 RFI was equal to or greater than 150% or CD54 RFI was equal to or greater than 200%, the test chemical was classified as positive. The estimated concentration inducing 150% of CD86 RFI and/or 200% of CD54 RFI (EC150 or EC200) was calculated, similar to the EC3 value determination in LLNA.

KOWWIN ver.4.1 in EPI suiteTM

The log Kow of each chemical was estimated using the program KOWWIN ver.4.1 in the EPI suiteTM (Environmental Protection Agency, Washington, DC, USA) as an indicator of each chemical’s water solubility.

Tissue Metabolism Simulator for predicting skin sensitization (TIMES) ver.2.27.17

The TIMES (Laboratory of Mathematical Chemistry, Bourgas, Bulgaria) is a licensed software package with prediction modules for several endpoints, including skin sensitization. The TIMES-SS generates chemical alerts for sensitization using a skin metabolism and autoxidation simulator and classifies test chemicals into three potency categories: non-sensitizers, weak sensitizers and strong sensitizers (Patlewicz et al., 2007, 2014). The TIMES-SS was used to assign the pre- and pro-haptens that require abiotic (e.g., auto-oxidation) or biotic (e.g., metabolic transformation) activation before becoming sensitizers.

ACD/Percepta

ACD/Percepta (ACD Labs) was used to calculate physicochemical properties (LogKow, LogD, water solubility, and plasma protein binding). Ionization ratios were calculated using the formula below:

ionization ratio = | 1 − 10LogD / 10LogKow|

where || means an absolute value.

Statistical analysis

The predictive performances of the test batteries were calculated according to Cooper statistics as sensitivity, specificity, and accuracy (Cooper et al., 1979). Sensitivity was the proportion of sensitizers predicted as positive. Specificity was the proportion of non-sensitizers predicted as negative. Accuracy was the overall proportion of correct predictions.

Case studies of fragrance ingredients to derive NESILs

The potency assessment is important to apply the QRA prior to use in consumer products. For a sensitizing chemical, the QRA requires a NESIL, expressed in μg/cm2, along with safety assessment factors to derive an acceptable exposure level (AEL) of a chemical in a particular product (Api et al., 2008; Felter et al., 2003; Basketter and Safford, 2016). Generally, a NESIL is a dose that is not expected to cause induction of skin sensitization in humans. The NESIL is derived from an EC3 value in LLNA (e.g., 1% of EC3 in LLNA is 250 μg/cm2 of NESIL) or from no observed effect levels (NOELs) in the Human Repeated Insult Patch Test or Human Maximization Test (McNamee et al., 2008). The Consumer Exposure Level (CEL) (μg/cm2) for the product is calculated according to the SCCS Notes of Guidance 10th revision (European Commission, 2018). If AEL > CEL, the chemical can be safely used in the product type. In this study, we selected a limited number of fragrance ingredients listed in the IFRA standard, 48th amendment of IFRA Code of Practice (IFRA, 2015), considering acylating chemicals (amine reactive chemicals), pre/pro-haptens, and lipophilic chemicals (log Kow > 3.5), which are deemed outside of BN ITS-3 applicability domain. Then, we examined how close a NESIL derived by BNITS-3 is a NESIL from either LLNA or human data.

RESULTS

Predictive performance of BN ITS-3 compared to four class potency in the LLNA

Supplementary Table 1 shows the dataset of 175 chemicals, including physicochemical properties, and the data for DPRA, KeratinoSensTM, h-CLAT, LLNA, TIMES, and BN ITS-3, sorted according to their EC3 values. In this study, EC3 values were expressed as percent weight per volume, while Jaworska et al. (2015) converted all the values into molar units. BN ITS-3 generated a probability distribution over four potency classes without pre-processing step of selecting data from the physic-chemical applicability domains or post-processing step of the MA correction. Then, the probability distribution was converted into a B for each potency category. The prediction of B (NS) > 3 and B (W, M, or S) < 1 indicated a non-sensitizer. On the other hand, the prediction of B (W, M, or S) > 1 indicated a sensitizer. The highest B indicated the predicted potency of the test chemical. As Table 1 presents, in the predictive performance for hazard identification in the LLNA, the sensitivity was 94.3% (133/141), the specificity was 85.3% (29/34) and the accuracy was 92.6% (162/175). BN ITS-3 could identify the sensitizing potential of test chemicals with high reliability. Table 1 also summarizes the predictive performance of the four classes of potency (NS, W, M, or S) of BN ITS-3, compared with LLNA. The four class potency accuracy was 66.3% (116/175). Accuracy for strong, moderate, or weak sensitizers in LLNA was 73.3% (22/30), 48.2% (27/56), or 69.1% (38/55), respectively. Thus, for 20.6% (29/141 sensitizers), the potency was under-predicted, and for 17.1% (30/175 chemicals), it was over-predicted. On the other hand, for 163 of 175 chemicals examined, the mis-prediction fell within one potency class of the true potency, indicating that there were almost no chemicals that were incorrectly classified by more than one class. Only four under-predicted chemicals resulted in two potency class under-predictions: hexyl salicylate and benzoyl peroxide were classified as strong potency on the LLNA, but weak potency on the BN ITS-3; diethylenetriamine and squaric acid were classified as moderately potent on the LLNA, but negative on the BN ITS-3.

Table 1. Predictive performance of BN ITS-3 for positive chemicals in any of three in vitro tests.

Under-predicted chemicals by BN ITS-3

Table 2 summarizes the 29 chemicals where potency was under-predicted on the BN ITS-3 compared with the LLNA. Of these, phthalic anhydride, maleic anhydride, and 1,2-cyclohexane dicarboxylic anhydride, which are acylating agents (Natsch et al., 2013, 2015), were classified as strong on the LLNA, but as moderate on the BN ITS-3. Benzoyl peroxide, which is an acylating agent or an amine reactive chemical (Piroird et al., 2015; Natsch et al., 2013), was classified as strong on the LLNA, but as weak on the BN ITS-3. Squaric acid diethyl ester and squaric acid, which are amine reactive chemicals (Natsch et al., 2013), were classified as strong and moderate (respectively) on the LLNA, but as moderate and non-sensitizing on the BN ITS-3. Undec-10-enal, p-isobutyl-α-methyl hydrocinnamaldehyde, hexyl salicylate, and farnesol, which are lipophilic sensitizers with log Kow > 3.5, were under-predicted as weak in the BN ITS-3. Compared with LLNA, 2-Nitro-4-phenylenediamine, dihydroeugenol, dibenzyl ether, 4-chloroaniline, and diethylenetriamine, which are pre/pro-haptens, were under-predicted. Benzoyl peroxide, squaric acid, hexyl salicylate, and diethylenetriamine, which produced two potency class under-predictions between the LLNA and BN ITS-3, are either an acylating agent, an amine reactive chemical, a lipophilic chemical, or a pre/pro-hapten. Therefore, these are all potential under-predicted chemicals on the BN ITS-3. By excluding potentially under-predicted chemicals, all the predictions fell within one potency class under-prediction when compared with LLNA. Table 3 summarizes the predictive performance of the BN ITS-3 for 94 chemicals by excluding 81 chemicals corresponding to acylating agents, amine reactive chemicals, lipophilic chemicals, and pre/pro-haptens from Table 1. The four class potency accuracy was 71.2% (67/94). Accuracy for strong, moderate, weak sensitizers or non-sensitizers in LLNA was 92.3% (12/13), 53.3% (16/30), 56.0% (14/25), or 96.2% (25/26), respectively. Potency was under-predicted in 20.6% (14/68 sensitizers) and over-predicted in 13.8% (13/94 chemicals). One approach to minimize uncertainty from mis-prediction as one class lower (20.6% of under-predictive rate) is to assign the lowest EC3 to each class from BN ITS-3 as the point of departure (PoD) for risk assessment as shown in Fig. 1 (Gerberick et al., 2001). A weak potency was adjusted to 10% of EC3. Moderate potency was adjusted to 1% of EC3. For strong potency, EC3 was not determined since the strong prediction in the BN ITS-3 indicates strong (EC3 ≤ 0.1%) or extreme (EC3 ≤ 0.01%) sensitizers. When a test chemical was predicted as a non-sensitizer by the BN ITS-3, 100% of EC3 was assigned to derive NESIL, considering mis-prediction as one class lower. This indicates that the usage limit would be set. Fig. 2 presents the workflow of sensitizing potential and potency classification by incorporating our previously published testing strategy (Otsubo et al., 2017). When at least one positive result was obtained by any of three in vitro tests, the prediction of the test chemical would be considered a sensitizer. Therefore, it should be noted that even the negative prediction by the BN ITS-3 represents 100% of EC3 that can be used for the risk assessment.

Table 2. Under-predicted chemicals by BN ITS-3.
Table 3. Predictive performance of BN ITS-3 after excluding potentially under-predicted chemicals.
Fig. 1

EC3 from the BN ITS-3 predictions.

Fig. 2

Workflow to evaluate hazard and potency classification for skin sensitization.

Case studies on fragrance ingredients

Ten fragrance ingredients with LLNA and human data were used as case studies, considering acylating chemicals (amine reactive chemicals), pre/pro-haptens, and lipophilic chemicals (log Kow > 3.5), which are deemed outside of BN ITS-3 applicability domain. Table 4 indicates that the LLNA- and human-derived NESIL values are quite different (more than an order of magnitude) in some cases and very similar in others. On the other hand, the NESIL from BN ITS-3 is close to or lower than LLNA-derived NESIL in most cases.

Table 4. Case studies on fragrance ingredients.

DISCUSSION

The purpose of this study was to derive the NESIL of the chemical predicted as a sensitizer using a binary test battery of KeratinoSensTM and h-CLAT, or the 3 out of 3 ITS. The BN ITS-3, developed by Procter & Gamble, could provide a quantitative WoE for potency categorization, allowing for risk assessment of chemicals (Jaworska et al., 2015). We analyzed 175 chemicals with the necessary data for the BN ITS-3 to calculate B, corresponding to the probabilistic prediction for four potency classes (NS, W, M, and S). The BN ITS-3 predicted LLNA hazard outcomes with 92.6% (162/175) accuracy and the four potency classes with 66.3% (116/175) accuracy. For 20.6% (29/141) of sensitizers, the potency was under-predicted and in 17.1% (30/175) it was over-predicted. Importantly, while these predictions were performed without either the pre-processing step of selecting data from their physic-chemical applicability domains, or the post-processing step of MA correction, these results were better than, or equal to, those of the smaller dataset obtained using both processing steps (Kleinstreuer et al., 2018); LLNA hazard outcomes demonstrated 83.2% (99/119) accuracy, and potency accuracy was 67.8% (78/115). These results suggested that the processing steps did not necessarily improve prediction accuracy. Moreover, for 163 of 175 chemicals examined, the mis-prediction fell within one potency class of the LLNA potency, indicating that there were almost no chemicals that were incorrectly classified by more than one class.. A detailed analysis of the under-predicted chemicals showed that acylating chemicals, amine reactive chemicals, lipophilic chemicals, and pre/pro-haptens had predictive limitations. By excluding these potentially under-predicted chemicals, the four class potency accuracy was 71.2% and all the predictions fell within one potency class lower than the LLNA-predicted class. Furthermore, to minimize uncertainty from under-predictions (20.6% of under-predictive rate) during skin sensitization risk assessment, the lowest EC3 to each class from BN ITS-3 was assigned as the PoD as shown in Fig. 1.

LLNA EC3 values are generally used to derive the NESIL. However, use of animals for testing cosmetic ingredients and products has been banned in EU since 2013 (European Union, 2009). Thus, alternative approaches for deriving NESIL for chemical risk assessment are urgently required. The complexity of the underlying mechanisms for induction of skin sensitization indicates that no existing in vitro test methods are appropriate for potency prediction; however, use of previously-developed testing strategies consisting of several test results may be a potential way to predict sensitizing potency. The BN ITS-3 achieved the high performance to predict the sensitizing potency in LLNA (Jaworska et al., 2015; Kleinstreuer et al., 2018). However, it should be noted that the four potency class prediction by the BN ITS-3 might be insufficient for the purposes of precise risk assessment, given its under-prediction rate of 20.6% (29/141 sensitizers). Chemicals with associated in vivo and human data to support suitable read across are very important references for risk assessment. Unless such supporting information like read across is available, the risk assessment would use potency predictions of BN ITS-3 to set a NESIL. When the under-predictions are not negligible, the conservative risk assessment would need to be done by estimating the degree of under-predictions in the BN ITS-3.

According to Kleinstreuer et al. (2018), 10 out of 35 MA chemicals (dimethyl fumarate, methyl heptane carbonate, toluene diamine sulfate, resorcinol, hexyl cinnamic aldehyde, ethyleneglycol dimethacrylate, carvone, ethyl acrylate, methylmethacrylate, and trans-2-hexenal) were under-predicted by the BN ITS-3 as compared with three class potency predictions by LLNA. The chemicals within the domain of MA were generally predicted with high accuracy of at least 80% by all of the non-animal test methods. The human and LLNA data are also concordant for most of the chemicals (Urbisch et al., 2015). Under-predictions that unexpectedly occur by MA correction need to be avoided, since MA correction will shift a chemical to a less potency class. For BN ITS-3, the validity of MA correction might need further investigation; however, MA correction was not considered as a post-processing step in this study.

As compared with the sensitizing potency in LLNA, the 29 chemicals that were under-predicted are presented in Table 2. Only four chemicals (benzoyl peroxide, squaric acid, hexyl salicylate, and diethylenetriamine) resulted in two potency class under-predictions. First, benzoyl peroxide and squaric acid are known to be amine reactive and show a higher reactivity for lysine, compared with cysteine reactivity (Piroird et al., 2015; Natsch et al., 2013). Of the 29 chemicals, phthalic anhydride, maleic anhydride, 1,2-cyclohexane dicarboxylic anhydride, and squaric acid diethyl ester were also amine reactive. Although this feature can be observed in acylating chemicals which predominantly transfer acyl groups onto lysine residues (Natsch et al., 2013), benzoyl peroxide, phthalic anhydride, maleic anhydride, and 1,2-cyclohexane dicarboxylic anhydride are also assigned as acylating chemicals. On the other hand, the anhydrides and reactive esters quickly hydrolyze in water and might not be applicable to existing in vitro test methods. Since acyl transfer chemicals or amine reactive chemicals are potentially under-predicted by the BN ITS-3, these chemicals would need to undergo risk assessment evaluation using associated in vivo and human data of suitable analogs, to support read across. Secondary, hexyl salicylate is a lipophilic sensitizer with log Kow = 5.06. Of the 29 chemicals, undec-10-enal, p-isobutyl-α-methyl hydrocinnamaldehyde, and farnesol were also lipophilic sensitizers with log Kow > 3.5. Chemicals with log Kow > 3.5, with the h-CLAT, or log Kow > 5, with the KeratinoSensTM, tend to produce false-negative results due to the aqueous nature of the cell culture system and solubility issues. Thus, the BN ITS-3, as proposed by Jaworska et al. (2015), used water solubility cutoffs for selecting data from the DPRA, KeratinoSensTM, and h-CLAT in only the analysis of the model test set, but not training set. The utility of cutoffs may need further investigation using an expanded dataset. Since lipophilic chemicals with high log Kow are potentially under-predicted chemicals by the BN ITS-3, these chemicals would need to be evaluated for risk assessment using suitable analogs to support read across. Finally, diethylenetriamine was predicted a pro-hapten by the TIMES. In addition, of the 29 chemicals, farnesol and dibenzyl ether are predicted pre-haptens and 4-chloroaniline is a pro-hapten by the TIMES. 2-Nitro-4-phenylenediamine and dihydroeugenol are also pre/pro-haptens by the TIMES. Pro-haptens need to be metabolically activated to act as sensitizers. Pre-haptens need to be abiotically (through autoxidation) activated to act as sensitizers. Jaworska et al. (2015) required more careful interpretation of BN ITS-3 for pre/pro-haptens, but the related processing steps have not been determined. Associated in vivo and human data from suitable analogs might be useful for risk assessment, and to minimize such uncertainty associated with predictions for pre/pro-haptens in the BN ITS-3. Taken together, most chemicals that were under-predicted by the BN ITS-3 were acylating chemicals (amine reactive chemicals), pre/pro-haptens, and lipophilic chemicals (log Kow > 3.5), which can be identified by using the TIMES or EPI suite. The prediction models of BN ITS-3 rely heavily on in vitro test methods, and also possess their corresponding limitations. Therefore, factors associated with the applicability domain of the individual methods would affect the performance of BN ITS-3.

When acylating chemicals (amine reactive chemicals), pre/pro-haptens, and lipophilic chemicals (log Kow > 3.5) were excluded from the applicability domain, the BN ITS-3 prediction slightly improved. However, the uncertainty that the BN ITS-3 results might fall within one class mis-prediction of LLNA results would remain unsolved. Therefore, a worst value of EC3 to each potency class of the BN ITS-3 (e.g. EC3 = 10% for weak sensitizers) would serve as NESIL for the QRA, accounting for being misclassified as one class lower. Unless associated in vivo and human data on suitable analogs are available, adjustment of NESILs derived from BN ITS-3 would be useful for skin sensitization risk assessment.

The LLNA results in our database were collected from the published available information. However, the analysis of inherent variability of in vivo reference data (i.e. LLNA) is an important aspect requiring further considerations when evaluating testing strategies. Dumont et al. (2016) reported that the number of discordant results increases when a chemical is tested in more than one solvent. Therefore, further characterization of variability and uncertainty in the LLNA data is needed.

The case studies on fragrance ingredients within the applicability domain of BN ITS-3 indicate that the BN ITS-3 could predict human NESIL with accuracy similar to that of LLNA, but the predictions of BN ITS-3 and LLNA seem a bit less conservative than human NESIL. Historically, a human study to support actual use levels for a chemical has been conducted at a dose close to or below the EC3 of the LLNA. The confirmatory human studies have been tested at a maximal concentration well below the expected NOEL. In contrast, the LLNA- and BN ITS-3-derived NESIL are not actual NOELs but concentrations at which an effect is observed.

In conclusions, our results indicated that, even when mis-predictions by the BN ITS-3 happened, almost all of them fell within one potency class under-prediction when compared with LLNA. For chemicals within newly defined applicability domain, no chemicals were incorrectly classified by more than one class. Thus, a worst value of EC3 to each potency class from the BN ITS-3 was assigned to derive a NESIL, taking into account uncertainty by being misclassified as one class lower. When in vivo and human data obtained from suitable analogs are not used to estimate the prediction uncertainty, adjustment of the NESIL based on information derived from BN ITS-3 would be useful for skin sensitization risk assessment.

ACKNOWLEDGMENTS

The authors are grateful to Petra S. Kern and Joanna S. Jaworska (Procter & Gamble) for supporting BN ITS-3 accessibility and discussing the data generated by the BN ITS-3.

Conflict of interest

The authors declare that there is no conflict of interest.

REFERENCES
 
© 2020 The Japanese Society of Toxicology
feedback
Top