Comparative assessment of 24-hr primary skin irritation test and human patch test data with in vitro skin irritation tests according to OECD Test Guideline 439 (for quasi-drugs in Japan)

Mariko Sugiyama; Masaharu Akita; Nathalie Alépée; Miyuki Fujishiro; Shigenobu Hagino; Yuki Handa; Hidefumi Ikeda; Noriyasu Imai; Setsuko Jitsukawa; Masakazu Katoh; Koji Kurihara; Daiki Kyotani; Shigeyuki Nomura; Yuko Okamoto; Hidenobu Okumura; Takashi Omori; Kenji Sugibayashi; Hiroaki Todo; Akemi Toyoda; Yasuo Ohno

doi:10.2131/jts.43.751

Abstract

The Organisation for Economic Co-operation and Development (OECD) Test Guideline (TG) 439 is an in vitro test method of reconstructed human epidermis (RhE), which was developed for hazard identification of irritating chemicals in accordance with a primary skin irritation test using rabbits with 4-hr exposure. A regulation for quasi-drugs in Japan requires data from primary skin irritation tests using rabbits to undergo 24-hr exposure, and this is used as an evidence for 24-hr closed patch tests in humans. In this study with the same chemicals, primary skin irritation test data using rabbits undergoing 24-hr exposure and a 24-hr occlusive human patch test data were analyzed by comparing the results obtained with four test methods adopted in OECD TG 439. The performances of in vitro test methods showed a positive predictive value of 72.7-85.7% to predict the results of 24-hr primary rabbit skin irritation test knowing that its positive predictive value was 57.1% against humans only. The prediction factors of in vitro test methods were higher for the human patch test data with a sensitivity reaching 60 to 80%. Three surfactants gave false negatives in some of the RhE methods evaluated with the human patch test, but in each case, they were correctly classified as positive when evaluated at double concentration. Therefore, the approach of setting the margin to 2 was effective in eliminating false negatives. This suggests that in vitro test methods are useful for assessing skin irritation potential without animal testing for the application of quasi-drugs in Japan.

INTRODUCTION

Skin irritation is an essential toxicological endpoint in assessing a cosmetic’s safety. The rabbit-based Draize method (Draize et al., 1944) has been developed and used for assessment of skin irritation, but several test conditions have been established for different purposes. Test methods for 4-hr exposure in rabbits, which are defined in OECD TG 404 (OECD, 2015a) are used when evaluating the hazards of chemicals and classifying the toxic effects according to United Nations (UN) Globally Harmonized System of Classification and Labelling of Chemicals (GHS) categories. When evaluating the safety of intentional applications to the skin on a daily basis, such as quasi-drugs, a 24-hr exposure study is required in Japan. In the safety data for quasi-drugs, primary skin irritation after 24-hr exposure in rabbits has been evaluated, and a 24-hr patch test with humans has been established after weak skin irritation was confirmed in animal studies (Japan Cosmetic Industry Association, 2015) (Guidebook for Quasi-drugs Application, 2017).

With regard to cosmetics, animal experiments were banned in the European Union in 2013 due to the growing movement for animal welfare, substituting by evaluations using alternative methods. As an alternative to the testing of skin irritation on animals, the OECD TG 439 using the reconstructed human epidermis (RhE) test methods was approved in 2010 and subsequently revised in 2013 and 2015 with the additional epidermis test methods (OECD, 2015b). This approach has been developed as a stand-alone alternative to OECD TG 404 for the assessment of chemical hazards and can be used to predict irritation above UN GHS category 2 in a 4-hr rabbit test.

Since it has been reported that skin reactions increase with prolonged exposure (Cruzan et al., 1986; Gilman et al., 1978), it was speculated that the test methods adopted in OECD TG 439 might not sufficiently sensitive to predict the results of a primary skin irritation study with 24-hr exposure using rabbits as required in the application for quasi-drugs.

In Japan, the Review Committee on the Proposed Application of Data on the Safety of Quasi-drugs for Manufacturing Approval Applications was organized between 2007 and 2009, and the use of alternatives in the application for marketing approval for quasi-drugs was considered. However, it was concluded that as a primary skin irritation assessment for quasi-drugs application dossiers, it is not sufficiently predictable to use the alternatives recommended at the moment (MHLW Grants System, 2011), but it is preferable to use the alternatives from the standpoint of animal welfare. Judgment on whether to use animal experiments or alternatives should be left to the applicant. Even if a test chemical has been identified as not classified using alternative methods, human patch testing is required for confirmation. Thus, some ethical consideration should be addressed.

As a result, OECD TG 439 has not been applied as a standalone for the assessment of skin irritation in the application for attaining marketing approval for quasi-drugs in Japan.

On the other hand, there are few data that sufficiently examine the predictability of the test methods adopted in OECD TG 439 for the results of primary skin irritation test and human patch test with 24-hr exposure as test samples. Imai et al. has been evaluated the correlation between the in vitro OECD TG 439 test method, EpiDerm^TM SIT (EPI-200) and primary skin irritation tests of 24-hr exposure in rabbits using the same test chemicals as in this report (Imai et al., 2018). It is important to note that only one adopted RhE test method was evaluated and no correlations were investigated with human patch tests. Therefore, the irritation potential of some ingredients was evaluated according to the four OECD TG 439 test methods and compared with the results of a primary skin irritation test and a 24-hr human occlusive patch test. In this study, using in vitro methods listed in OECD TG 439, the correlation of in vivo (24-hr exposure, rabbits, humans) and in vitro were evaluated and the usefulness of the in vitro test as a prediction method for the 24-hr human patch test was examined.

MATERIALS AND METHODS

Chemicals

The test chemicals used for assessment are ingredients used in cosmetics. Table 1 shows the International Nomenclature of Cosmetic Ingredients (INCI) names of the test chemicals and their characterization, i.e., the degree of lipophilicity categories of the chemicals, CAS Registration Number, Log Kow values, obtained as predicted values using EPI Suite^TM (ver.4.1) as well as the supplier’s details of each chemicals being tested using the four RhE test methods. Test chemicals with Log Kow > 3.5 are judged as oil soluble chemicals. Characteristics of the 40 test chemicals are shown in Table 2, summarizing their physical categories and their corresponding in vivo irritancy potential. The test chemicals included 12 surfactants and 23 oils, representing respectively 30% and 58% of the test set. The remaining 5 test chemicals were acids, polymers, and powders. Irritation categories were also widely distributed from none to severe scores.

Table 1. Test Chemicals.

Table 2. Categorization by physical properties and irritancy of test chemicals.

The existing in vivo dataset

Preparation of dataset for primary skin irritation test

The primary skin irritation test dataset is shown in Table 3. In this study, the results of previously existing primary skin irritation tests with 24-hr rabbit exposure were used in the interest of animal welfare. Data that had been produced and stored in companies were collected upon request of the Japan Cosmetic Industry Association (JCIA) and used as the in vivo dataset. No in vivo data were generated for this review, as historical data were considered as such. Therefore, information such as manufacturer names and trade names was treated as confidential and the reports remained unavailable upon request. However, the method is briefly described below (Draize, 1959).

Table 3. In vitro and in vivo results.

In the primary skin irritation test, the test chemical was applied to rabbits (n = 3-6) for 24 hr (0.5 mL or 0.5 g /6 cm²), it was removed, and erythema (score 0-4) and edema (score 0-4) were macroscopically determined at 24 and 72 hr after application. All data were obtained by applying test samples to intact skin, summing up the score for each animal, and dividing the score by the number of animals and the number of reading to obtain a mean score (Primary Irritation Indices: P.I.I.). The results were classified according to the P.I.I. values as an irritant (I) if they exceeded 2, a non-irritant (NI) if they are equal or less (≤) than 2. The P.I.I. and irritation potential (class) for each test chemical are presented in Table 3. In the primary skin irritation test, 16 irritant chemicals and 24 non-irritant chemicals were used.

The test results were also classified according to the category of irritant values (none (0), mild (0 < P.I.I. ≤ 2), moderate (2 < P.I.I. ≤ 5), and severe (5 < P.I.I. ≤ 8). Datasets were characterized by substance category (surfactant, acid, polymer, oil) and stimulus intensity (Table 2).

The existing dataset for human 24-hr closed patch testing

In this study, the results from human patch tests were obtained and compiled by the JCIA for the same INCI chemicals as those from the primary skin irritation test. No in vivo data were generated for this review, as historical data were considered as such. For 31 test chemicals, 24-hr human occlusive patch test data were collected from companies affiliated with the JCIA. Human studies were conducted in accordance with the Declaration of Helsinki (World Medical Association Declaration of Helsinki-Ethical Principles for Medical Research Involving Human Subjects) after review by each company’s ethics council. The number of datasets was 31 instead of 40 data for the primary skin irritation test, as human studies were not conducted when primary skin irritation tests revealed more than a moderate irritation effect. In addition, for similar reasons, test chemicals were occasionally found to be tested at lower concentrations than in the primary skin irritation test with rabbits. Similarly, to the in vivo primary skin irritation test dataset, manufacturer names and trade names were treated as confidential and were not disclosed.

In human patch tests, 31-55 healthy adult subjects were used to apply the test sample under occlusion for 24 hr, and test samples were judged by Japanese criteria (6-point evaluation; -, ±, +, ++, +++, ++++) 1-3 hr after adhesive removal (Kawamura et al., 1970). Adhesive plasters used were either Finn Chambers (φ = 8 mm) or Torii-Patch test adhesive plasters (small). Irritation indices were determined by dividing the number of positive subjects (+ or higher score) by the number of test subjects and by the irritation index based on test subjects’ readings in accordance with the method of Sugai (Sugai, 1995) (Table 3). Sugai’s Irritation Indices (S.I.I.) were calculated by giving each of the judgments a rating score (-: 0, ±: 0.5, +: 1, +++: 2, +++: 3, ++++: 4), summing the scores of the subjects, dividing the score by the number of subjects, and multiplying by 100. Sugai classified irritant indices of 5 or less as “safe products”, 5-15 as “acceptable”, 15-30 as “requiring improvement”, and +30 as “risk products”. In the human patch test, there is no general criterion that indicates the presence or absence of irritation, so in this study, an S.I.I. of 30 or more is regarded as an irritant (I), and an S.I.I. of less than 30 is regarded as a non-irritant (NI). These criteria were decided by referring to the opinion of the dermatologist on the S.I.I., in order not to apply any irritant to subjects, from an ethical view point.

In vitro evaluation by OECD TG 439 test methods

In vitro skin irritation test methods assess skin irritation potential by measuring the viability of the epidermal cells of an RhE model upon application of the test chemical, which diffuses through the stratum corneum and penetrates into the epidermal layer. The tested chemicals can be distinguished as an irritant (classified as UN GHS category 2) if the viability of epidermal cells measured by 3-(4,5-Dimethylthiazol-2-yl)- 2,5-diphenyltetrazolium bromide (MTT) is 50% or less, or as a non-irritant if it is greater than 50% (UN GHS No category). The current study was carried out on the commercially available RhE models (EpiSkin^TM SM, SkinEthic^TM RHE, EpiDerm EPI-200, and LabCyte EPI-MODEL24) with a protocol respectively defined that differ for example in the volume and the duration of the application of test chemicals (EpiSkin^TM, 2018; EpiDerm^TM, 2009; SkinEthic^TM, 2018; LabCyte, 2011).

Regarding in vitro data, the providers/developers of the test methods performed the testing according to OECD TG 439 protocols. Reagents were purchased and tests were conducted by L’Oréal Research & Innovation for EpiSkin^TM SIT and SkinEthic™ RHE SIT (hereinafter abbreviated as SkinEthic™SIT), Kurabo Industries Ltd. for EpiDerm^TM SIT (EPI-200) (hereinafter abbreviated as EpiDerm^TMSIT), and Japan Tissue Engineering Co., Ltd. for LabCyte EPI-MODEL24 SIT (hereinafter abbreviated as LabCyte SIT). The applied concentration and vehicle for test samples were based on the available data (primary skin irritation test and human patch test).

RESULTS

Correlation between in vivo studies

Data from the primary skin irritation test and human patch test are shown in Table 3. Of the test chemicals of which primary skin irritation test data (40 chemicals) and human patch test data (31 chemicals) were available, 23 chemicals were found to have been tested at the same concentration both in a primary skin irritation test and in the patch test, and 9 were tested at different concentrations. For four of the surfactants [1% Cetylpyridinium Chloride (#31), 1% Sodium Lauryl Sulfate (#38), 1% Lauryl Betaine (#39), and 1% Lautrimonium Chloride (#40)], human skin reactions were extrapolated to be positive at the same concentration as in the primary skin irritation test because the results of the human patch tests were positive at lower concentrations than in the primary skin irritation test on the rabbits. Predicted results for these chemicals were added to Table 3 in brackets. This resulted in 26 test samples whose results could be compared to the results of the primary skin irritation test and human patch test. Tables 4-1 through 4-4 show the concordance of the results of the primary skin irritation test in rabbits, the human patch test, and the in vitro assessment using the RhE models. Assessments were organized in a 2 × 2-contingency table and then sensitivity, specificity and accuracy were calculated according to the ECVAM SIVS study (Balls et al., 1990; Spielmann et al., 2007). Table 4-1 shows the correlation of the 26 test samples compared between rabbits and humans. The sensitivity was 100%, suggesting that all test samples that were positive in humans were predictable in rabbits. However, the positive predictive value (PPV) is 57.1%, indicating that the primary skin irritation test is significantly over-predicted against the human patch test.

Table 4-1. Contingency table for in vivo results.

The scatter plot using P.I.I in the primary skin irritation test and S.I.I. in human patch test shown in Fig. 1 demonstrates the correlation of skin response between rabbits and humans. The scatter plots were plotted separately for oil-soluble chemicals and other chemicals. Fig. 1 shows that in oil-soluble chemicals there are no correlations between rabbits and humans, and these included chemicals such as Oleyl Alcohol (OA in Fig. 1) that do not react to human skin even when P.I.I. was 4.7. Of the test chemicals evaluated as irritant in rabbits, three [5% Steareth-4 (#30), 100% Oleic Acid (#33), and 100% Oleyl Alcohol (#34)] were evaluated as non-irritants in human patch tests, and were oil-soluble chemicals. For chemicals other than oil-soluble chemicals, four surfactants which were evaluated as irritant in rabbits were correctly predicted for human patch test (Fig. 1, True Positive).

Fig. 1

Correlation of primary skin irritation in rabbits (P.I.I.) with human patch test (S.I.I.). Primary skin irritation test in rabbits; Positive (Irritant): P.I.I. (Primary Irritation Index) > 2.0, Negative (Non-Irritant): P.I.I. ≤ 2.0. Human patch test; Positive (Irritant): S.I.I. (Sugai’s Irritation Index) ≥ 30, Negative (Non-Irritant): S.I.I. < 30.

Correlation of OECD TG 439 adopted test methods with in vivo tests

The in vitro test results (cell viability and UN GHS categorization) using four RhE models (EpiSkin^TM SM, SkinEthic^TM RHE, EpiDerm EPI-200, and LabCyte EPI-MODEL24) are shown in the Table 3.

Table 4-2 summarizes the existing results of the in vivo (primary skin irritation in rabbits) and the obtained results in vitro as well as the sensitivity, specificity, accuracy, PPV and the negative predictive value (NPV). In the EpiDerm™ SIT, 100% Lauryl Alcohol (#29) assessment was rated as “Classified” and “Not Classified” in two runs, so the result was considered as Not Classified in the Tables and Figures (most conservative approach).

The correlations between in vivo (rabbits) and in vitro results were weak using all RhE test methods, with a sensitivity ranging from 40.0 to 62.5% (EpiSkin™ SIT: 62. 5%, SkinEthic™ SIT 62. 5%, EpiDerm™ SIT: 50. 0% and LabCyte SIT: 40. 0%), a specificity ranging from 87.5 to 95.5% (EpiSkin™ SIT: 91. 3%, SkinEthic™ SIT: 87. 0%, EpiDerm™ SIT: 87. 5%, and LabCyte SIT: 95. 5%), an accuracy ranging from 72.5 to 79.5% (EpiSkin™ SIT: 79.5%, SkinEthic™ SIT 76.9%, EpiDerm™ SIT: 72.5% and LabCyte SIT: 73.0%), and a PPV ranging from72.7 to 85.7% (EpiSkin™ SIT: 77.8%, SkinEthic™ SIT: 76.9%, EpiDerm™ SIT: 72.7%, and LabCyte SIT: 85.7%).

The scatter plot shown in Fig. 2 demonstrates the relationship between primary skin irritation indices (P.I.I.) and cellular viability. This Figure shows that when the P.I.I. in vivo test is greater than or equal to 5.3 (Myristyl Betaine: MB in Fig. 2), the viability in vitro of all chemicals is lower than 50%. On the other hand, it is difficult to adjust the cut-off value (cell viability %) of the in vitro test to identify a Draize score greater than or equal to a moderate irritant.

Fig. 2

Correlation of cell viability % data of the 4 adopted OECD TG 439 test methods with primary skin irritation test in rabbits (P.I.I.). EpiSkin: EpiSkin^TM SIT, SkinEthic: SkinEthic™ RHE SIT, EpiDerm: EpiDerm^TM SIT (EPI-200), LabCyte: LabCyte EPI-MODEL24 SIT. In vitro method using OECD TG439; Positive (Classified): Viability ≤ 50%, Negative (Not Classified): Viability > 50%. Primary skin irritation test in rabbits; Positive (Irritant): P.I.I. (Primary Irritation Index) > 2.0, Negative (Non-Irritant): P.I.I. ≤ 2.0.

Table 4-2. Contingency tables for RhE models.

In at least one of the four RhE test methods, false negatives against in vivo rabbit results were observed with 10 oil-soluble chemicals: 7 oil formulations [100% Rozmarinum Officinalis Rosemary leaf oil (#26; Rosemary Oil), 100% Isopropyl Palmitate (#28), 100% Lauryl Alcohol (#29), 100% Dicaprylyl Ether (#32), 100% Oleic Acid (#33), 100% Oleyl Alcohol (#34), and 100% Citrus Aurantium Dulcis (Orange) oil (#37; Orange oil)], 1 anionic surfactant [10% TEA-Laureth Sulfate (#25)] and 2 nonionic surfactants [5% Oleth-2 (#27), and 5% Steareth-4 (#30)]. False-negative results of in vitro data versus in vivo primary test were presumed to be chemicals that caused mild irritation at 4 hr, and showed increased irritation when applied for 24-hr. Knowing that it is known that the 4-hr primary skin irritation test already over predicts, increasing the application up to 24-hr would also have an impact of the relevance of these positive results obtained in rabbits against the observation in humans.

The true positive chemicals predicted to be irritating in both in vivo and in vitro assessments are those expected to exceed 2.3 (UN GHS category 2) in the Draize score of rabbits at 4 hr of application. There were 11 chemicals which were determined as “Classified” using any of these RhE test methods: 7 surfactants [TEA-Laureth Sulfate (#25), 10% Cetylpyridinium Chloride (#31), 10% Myristyl Betaine (#35), 100% Steartrimonium Chloride (#36), 5% Sodium Lauryl Sulfate (#38), 100% Lauryl Betaine (#39), and 10% Lautrimonium Chloride (#40], and 4 oils [100% Rosemary Oil (#26), 100% Lauryl Alcohol (#29), 100% Dicaprylyl Ether (#32), and 100% Orange Oil (#37)].

On the other hand, three false positive chemicals were detected in this study, which are 10% Lactic Acid (#7), 20% Lauric Acid (#19), and 10% Laureth-3 (#23). Of these, 10% Lactic Acid was considered to be influenced by low pH because the pH was less than 2. Chemicals with extreme pH are likely to damage the integrity of the cells upon contact with tissues, such as skin, and thus may be classified as skin corrosives (Cat. 1). For chemicals with pH ≤ 2.0 or pH ≥ 11.5, skin corrosion could be expected. However, it may also be important to take into consideration the acid/alkaline reserve (a measure of buffering capacity of a chemical) on skin (Young et al., 1988). The observation of only mild irritancy in rabbits may be the result of reduction by buffering ability on skin.

To assess the relevance of the in vitro test methods adopted in OECD TG 439, the concordance of the results with the results of 24-hr human patch test were examined. Table 4-3 summarizes the concordance between in vivo (human patch test) and in vitro (OECD TG 439 test methods) results using sensitivity, specificity, accuracy, PPV and NPV. The numbers of test chemicals for which correlations between the results of the in vitro assessment and in vivo (human) could be compared were 30, 30, 31, and 25 for the EpiSkin™ SIT, the SkinEthic™ SIT, the EpiDerm™ SIT, and the LabCyte SIT, respectively. In the LabCyte SIT, the results of 1% Lauryl Betain (#39) were rated as classified and not classified in two runs; thus the result was evaluated as not classified in the tables and graphs.

The correlation between the results of in vitro and in vivo (human) was a sensitivity of 60.0-80.0% (EpiSkin™ SIT: 80. 0%, SkinEthic™ SIT: 80. 0%, EpiDerm™ SIT: 80. 0%, LabCyte SIT: 60. 0%), a specificity of 92.3-100% (EpiSkin™ SIT: 96. 0%, SkinEthic™ SIT: 96. 0%, EpiDerm™ SIT: 92. 3%, LabCyte SIT: 100%), and an accuracy of 90.3-93.3% (EpiSkin™ SIT: 93. 3%, SkinEthic™ SIT: 93. 3%, EpiDerm™ SIT: 90. 3%, LabCyte SIT: 92. 0%), indicating that the correlation between the human patch test and in vitro was higher than that between the primary skin irritation in the rabbits and in vitro. Moreover, a PPV of 66.7-100% supported the performance of in vitro methods to predict the results of 24-hr human patch test.

Table 4-3. Contingency tables for RhE models.

Table 4-4. Contingency tables for RhE models (without oil soluble chemicals).

The scatter plot in Fig. 3 demonstrates the relationship between the human patch test (S.I.I.) and cellular viability. False negatives were 0.1% Benzalkonium Chloride (#2) (EpiSkin™ SIT, SkinEthic™ SIT), 1.0% Lauryl Betaine (#39) (EpiDerm™ SIT, LabCyte SIT), and 1.0% Sodium Lauryl Sulfate (#38) (LabCyte). In both epidermal models, a false negative result was observed in the surfactants.

Fig. 3

Correlation of cell viability % data of the 4 adopted OECD TG 439 test methods with human patch test (S.I.I.). EpiSkin: EpiSkin^TM SIT, SkinEthic: SkinEthic™ RHE SIT, EpiDerm: EpiDerm^TM SIT (EPI-200), LabCyte: LabCyte EPI-MODEL24 SIT. In vitro method using OECD TG439; Positive (Classified): Viability ≤ 50%, Negative (Not Classified): Viability > 50%. Human patch test; Positive (Irritant): S.I.I. (Sugai’s Irritation Index) ≥ 30, Negative (Non-Irritant): S.I.I. < 30.

Therefore, the examination was conducted to determine the lowest concentration that would produce positive results in the RhE test methods, by increasing the test concentration of test chemicals applied onto RhE models. As a result, 0.2% Benzalkonium Chloride was determined as classified in EpiSkin™ SIT and SkinEthic™ SIT. Similarly, 2.0% Sodium Lauryl Sulfate was classified with LabCyte SIT and 2.0% Lauryl Betaine with EpiDerm™ SIT and LabCyte SIT. Therefore, all false-negative test chemicals using all RhE models were classified at 2-fold higher concentrations (see Table 5).

Table 5. Lowest concentration judged as Classified.

DISCUSSION

Preparation of primary skin irritation test dataset

Although the test methods adopted in OECD TG 439 were not developed for evaluating skin irritation in Japanese quasi-drugs approval system, the usefulness of the in vitro test methods was examined for predicting primary irritation in rabbits and patch tests in humans. To begin with, primary skin irritation test results based on 24-hr exposure in rabbits and 24-hr occlusive human patch test results for quasi-drugs in Japan were prepared as an in vivo dataset. Chemicals included in the in vivo dataset were not re-tested with animals or humans, and therefore existing historical data were used. The collected in vivo dataset was considered appropriate as the in vivo dataset for evaluating the usefulness of the alternative methods, as chemicals in the dataset included an approximately equal number of irritants and non-irritants, along with a wide range of chemicals such as surfactants, oils, polymers, acid, and powders.

As human patch tests are usually performed to confirm negative results, it is difficult to obtain data on quasi-drugs that are strongly irritating in animal studies. The lack of positive data from human patch tests is a challenge to developing an alternative method.

Correlations between in vivo data (rabbits and humans)

In the application for quasi-drugs, primary skin irritation is explained by data from the primary skin irritation test using rabbits and the 24-hr patch test using humans. There are no false negative chemicals in the correlation between the results of the rabbit and human studies (Table 4-1 , Fig. 1) confirming the usefulness of the animal-based studies for safe conduct of human patch test. This result was consistent with previous studies in which skin reactivity was significantly higher in animals such as rabbits than in humans after 24 hr of exposure (Kästner, 1977; Motoyoshi et al., 1979; Tsuchiya et al., 1980). The performance of predictability, the sensitivity was 100% whereas PPV was 57.1%, suggesting that primary skin irritation test in rabbits is over predicted against the human patch test.

In addition, the skin reactivity in rabbits with 4-hr exposure was higher than skin reactivity in humans, and it has been reported that the irritation of chemicals is sometimes overestimated in animals. (Basketter et al., 2004) (Jírová et al., 2010). Overall, the usefulness of the animal-based studies for identifying a potential effect in human should be counterbalanced with the low PPV observed both with 4-hr and 24-hr exposure. A chemical being negative in the primary skin irritation test using rabbits has a high probably to be also negative in humans whereas the relevance of a positive effect observed in the primary skin irritation test should be further confirmed.

Differences in evaluation results by reconstructed human epidermal models

Epidermal test methods gave different results for 9 chemicals [0.1% Benzalkonium Chloride (#2), 20 and 100% Lauric Acid (#19), 10% TEA-Laureth Sulfate (#25), 100% Rosemary Oil (#26), 100% Lauryl Alcohol (#29), 100% Dicaprylyl Ether (#32), 100% Orange Oil (#37), 1% Sodium Lauryl Sulfate (#38), and 1% Lauryl Betaine (#39)]. Four of these were surfactants (#2, #25, #38, and #39), and five were lipophilic chemicals (#19, #26, #37, #29, and #32) classified as higher fatty acids, essential oils, higher alcohols, or ethers. The difference of the evaluation results by the epidermal models used here could not be explained by the physical properties of the chemicals. In addition, in this study, the test chemicals evaluated in the 4 test methods were purchased from various suppliers and therefore were not necessarily obtained from the same chemical providers for the in vivo studies. Therefore, differences in reagent sources could affect the test results.

Among the reconstructed epidermal models listed in OECD TG 439, Kano et al. (2010) examined the skin permeability coefficients of LabCyte EPI-MODEL and EpiSkinS (EpiSkin) and reported that the permeability of chemicals differed from model to model. There were no differences in cumulative amount of chemicals up to 1 hr after application permeated through RhE due to models (EpiDerm, LabCyte EPI-MODEL, EpiSkin) (Todo, 2016). In the protocol on in vitro methods in TG 439, the application period of the test chemical is as short as 15-60 min. The application times were adjusted model-by-model to detect chemicals which are equal to or more irritant than those classified in UN GHS Category 2. Therefore, it is considered that the differences in chemical permeability by the epidermis model did not affect the results significantly. As a result, the predictive ability (sensitivity, specificity, accuracy, PPV, and NPV) of in vivo (rabbits, humans) results did not differ clearly between reconstructed human models.

Correlations between in vitro test methods (OECD TG 439) with in vivo primary skin test (rabbits) results

Imai et al. (in press) used the same test chemicals as in this study and reported on the results of the EpiDerm™ SIT and the concordance of the in vivo (rabbits) results and speculated that oil-soluble chemicals that do not penetrate the epidermis may be less toxic because of wetting of the epidermal modeled surfaces. On the other hand, surfactant that quickly penetrates the epidermis model was considered to have little influence of the wetting surface of the EpiDerm™ model. Of the 9 surfactants classified as irritants in this study, 7 were judged to be irritants; therefore, the irritancy of the surfactants was easily detectable in the test method. We selected chemicals other than oil-soluble chemicals by consideration for Log Kow above 3.5 and classifications, and examined the correlation of in vivo-in vitro for these chemicals (Table 4-4). The number of test chemicals for which oil-soluble chemicals were excluded was 11 for EpiDerm™ SIT and LabCyte SIT and 12 for EpiSkin™ SIT and SkinEthic™ SIT. The correspondence between in vivo (rabbits) and in vitro results was 80.0-100% for sensitivity, 83.3% for specificity and 81.8-91.7% for accuracy; therefore, the increased sensitivity and accuracy were obtained. In addition, the correlation between the primary skin irritation index (P.I.I.) and cellular viability were investigated after the exclusion of oil-soluble chemicals (Fig. 4).

Fig. 4

Correlation of cell viability % data of the 4 adopted OECD TG439 test methods with primary irritation test in rabbits (P.I.I.) (test chemicals: Log Kow ≤ 3.5). EpiSkin: EpiSkin^TM SIT, SkinEthic: SkinEthic™ RHE SIT, EpiDerm: EpiDerm^TM SIT (EPI-200), LabCyte: LabCyte EPI-MODEL24 SIT. In vitro method using OECD TG439; Positive (Classified): Viability ≤ 50%, Negative (Not Classified): Viability > 50%. Primary skin irritation test in rabbits; Positive (Irritant): P.I.I. (Primary Irritation Index) > 2.0, Negative (Non-Irritant): P.I.I. ≤ 2.0.

Although there was one chemical that was judged as false negative [TEA-Laureth Sulfate (#25)], when viability was more than 50%, primary skin irritation index was mostly equal to or less than 2.0.

Therefore, if the chemical is not oil soluble it was suggested that in the in vitro method using TG 439 the non-irritant chemicals could correctly predict irritancy in rabbits with 24-hr exposure.

Correlations between in vivo human and in vitro (TG 439) results

Chemicals classified as oils-soluble showed no irritation in human patch tests but were judged to be irritating with 100% Lauric acid (#19) and 10% Rosemary Oil (#26) by in vitro methods (false positive).

For surfactants, there were five irritants in human patch tests, of which there were false negatives in the RhE models for 0.1% Benzalkonium Chloride (#2), 1% Sodium Lauryl Sulfate (#38), and 1% Lauryl Betaine (#39). Of these chemicals, 0.1% Benzalkonium Chloride (#2) has been reported to be less irritating when applied to humans for 4-hr (Kanto et al., 2013). As rabbits are more sensitive than humans, it was considered that a 4-hr patch test could be performed safely if the test was judged to be Not Classified in the test method adopted in OECD TG 439. These chemicals, #2, #38 and #39, were presumed to be chemicals with enhanced skin reactions when the exposure time was extended from 4 to 24 hr.

In conclusion, the correlation between the human patch test and the in vitro test methods were better than that of the primary skin irritation test with rabbits exposed for 24 hr. To avoid potential false negative outcomes in the case of surfactants, it was suggested that using the in vitro test methods listed in OECD TG 439 while setting the margin to 2 is an useful approach before conducting a 24-hr occlusive patch test safely in Japan for quasi-drugs.This study also revealed that the in vitro test using reconstructed human epidermis model adopted in OECD TG 439 was useful for evaluating human skin irritation.

Conflict of interest

The authors declare that there is no conflict of interest.

REFERENCES

Balls, M., Blaauboer, B., Brusick, D., Frazier, J., Lamb, D., Pemberton, M., Reinhardt, C., Roberfroid, M., Rosenkranz, H., Schmid, B., Spielmann, H., Stammati, A. and Walum, E. (1990): Report and recommendations of the CAAT/ERGATT workshop on the validation of toxicity test procedures. Altern. Lab. Anim., 18, 313-337.
Basketter, D.A., York, M., McFadden, J.P. and Robinson, M.K. (2004): Determination of skin irritation potential in the human 4-h patch test. Contact Dermat., 51, 1-4.
Cosmetic and quasi-drugs manufacturing and selling guidebook study group (2017): Guide to Marketing and Manufacturing of Quasi-drug and cosmetic Regulations in Japan 2017, pp.154-157, Yakuji Nippo, Ltd., Tokyo. (in Japanese)
Cruzan, G., Dalbey, W.E., D’Aleo, C.J. and Singer, E.J. (1986): A composite model for multiple assays of skin irritation. Toxicol. Ind. Health, 2, 309-320.
Draize, J.H., Woodard, G. and Calvery, H.O. (1944): Methods for the study of irritation and toxicity of substances applied topically to the skin and mucous membranes. J. Pharmacol. Exp. Ther., 82, 377-390.
Draize, J.H. (1959): Dermal toxicity. In: Appraisal of the safety of chemicals in foods, drugs and cosmetics, pp.46-48, The Association of Food and Drug Officials of the United States, Austin.
EpiDerm™ (2009): DB-ALM EpiDerm™ Skin Irritation Test Protocol. Available at: http://ecvam-dbalm.jrc.ec.europa.eu (accessed 11. July 2018).
EpiSkin™ (2018): DB-ALM EpiSkin™ Skin Irritation Test Protocol. Available at: http://ecvam-dbalm.jrc.ec.europa.eu (accessed 10. July 2018).
Gilman, M.R., Evans, R.A. and De Salva, S.J. (1978): The influence of concentration, exposure duration, and patch occlusivity upon rabbit primary dermal irritation indices. Drug Chem. Toxicol., 1, 391-400.
Hamada, T., Mizutani, H., Abe, T., Ogawa, T., Nagura, T. and Kuramoto, M. (1984): Rabbit skin closed patch test. Hifu, 26, 1084-1091.
Imai, N., Nomura, S., Goto, Y., Masunaga, T. and Nakade, M. (2018): Predicting in vitro skin irritation: use of the reconstructed human epidermis test method (EpiDerm^TM SIT) in a 24-h exposure test for skin irritation. Altern. Animal Test. Exp., 23, 1-8.
Japan Cosmetic Industry Association. (2015): Guidance for the safety evaluation of cosmetics 2015, pp.111-112, 153-154, Yakuji Nippo, Ltd., Tokyo.
Jírová, D., Basketter, D., Liebsch, M., Bendová, H., Kejlová, K., Marriott, M. and Kandárová, H. (2010): Comparison of human skin irritation patch test data with in vitro skin irritation assays and animal data. Contact Dermat., 62, 109-116.
Kano, S., Todo, H., Sugie, K., Fujimoto, H., Nakada, K., Tokudome, Y., Hashimoto, F. and Sugibayashi, K. (2010): Utilization of Reconstructed Cultured Human Skin Models as an Alternative Skin for Permeation Studies of Chemical Compounds. Altern. Animal Test. Exp., 15, 61-70.
Kanto, H., Washizaki, K., Ito, M., Matsunaga, K., Akamatsu, H., Kawai, K., Katoh, N., Natsuaki, M., Yoshimura, I., Kojima, H., Okamoto, Y., Okuda, M., Kuwahara, H., Sugiyama, M., Kinoshita, S. and Mori, F. (2013): Optimal patch application time in the evaluation of skin irritation. J. Dermatol., 40, 363-369.
Kästner, W. (1977): Zur Speziesabhängigkeit der Hautverträglichkeit von Kosmetikgrundstoffen. J. Soc. Cosmet. Chem., 28, 741-754.
Kawamura, T., Sasagawa, S., Masuda, T., Honda, S., Kinoshita, M., Harada, S., Ishizaki, T., Nagai, R., Hirokawa, K., Anzai, T., Anekoji, K., Hidano, A., Kawano, T., Ikegami, I., Sato, S. and Aoyama, T. (1970): Basic Studies on the Standardization of Patch Test. Japanese Journal of Dermatology, 80, 301-314.
LabCyte. (2011): EPI-MODEL24 SIT SOP, Version 8.3, Skin Irritation Test Using the Reconstructed Human Model “LabCyte EPI-MODEL24”. Available at: http://jacvam.jp/ (accessed 8. July 2018).
MHLW Grants System. (2011): Final document of committee for study about how to prepare document for assessment of safety in applications for quasi-drugs approval. -Report of skin irritation- FY 2009 Scientific Research of Ministry of Health, Labor and Welfare, Establishment of a safety evaluation system using an alternative method of animal experiments and research on international cooperation, https://mhlw-grants.niph.go.jp/niph/search/NIDD00.do?resrchNum=200940003A (accessed 27. May 2018).
Motoyoshi, K., Toyoshima, Y., Sato, M. and Yoshimura, M. (1979): Comparative studies on the irritancy of oils and the synthetic perfumes to the skin of rabbit, rat, guinea pig, miniature swine and man. Cosmet. Toiletries, 94, 41-48.
OECD. (2015a): Acute Dermal Irritation/Corrosion. Test Guideline 404, Organisation for Economic Co-operation and Development, adopted 28 July 2015, <https://www.oecd-ilibrary.org/environment/test-no-404-acute-dermal-irritation-corrosion_9789264242678-en> (accessed 14. July 2018).
OECD. (2015b): In Vitro Skin Irritation: Reconstructed Human Epidermis Test Method. Test Guideline 439, Organisation for Economic Co-operation and Development, adopted 28 July 2015, <https://www.oecd-ilibrary.org/environment/test-no-439-in-vitro-skin-irritation_9789264090958-en> (accessed 14. July 2018).
SkinEthic™ (2018): DB-ALM SkinEthic™ RHE Skin Irritation Test Protocol no. 135. <http://ecvam-dbalm.jrc.ec.europa.eu> (accessed 8. July 2018).
Spielmann, H., Hoffmann, S., Liebsch, M., Botham, P., Fentem, J.H., Eskes, C., Roguet, R., Cotovio, J., Cole, T., Worth, A., Heylings, J., Jones, P., Robles, C., Kandárová, H., Gamer, A., Remmele, M., Curren, R., Raabe, H., Cockshott, A., Gerner, I. and Zuang, V. (2007): The ECVAM international validation study on in vitro tests for acute skin irritation: report on the validity of the EPISKIN and EpiDerm assays and on the Skin Integrity Function Test. Altern. Lab. Anim., 35, 559-601.
Sugai, T. (1995): Safety Evaluation of Cosmetic Products. J. Jap. Cosmet. Sci. Soc., 19 (extraordinary supplement), 49-56. (in Japanese)
Todo, H. (2016): Difference in chemical permeation properties through cultured human skin models. Bioindustry, 33, 3-9. (in Japanese)
Tsuchiya, S., Kondo, M., Okamoto, K. and Takase, Y. (1980): Skin irritancy of cosmetic products and their materials. Hifu, 22, 373-377.
Young, J.R., How, M.J., Walker, A.P. and Worth, W.M. (1988): Classification as corrosive or irritant to skin of preparations containing acidic or alkaline substances, without testing on animals. Toxicol. In Vitro, 2, 19-26.

Corresponding author

Register with J-STAGE for free!