2025 Volume 48 Issue 1 Pages 1-5
As unexpected adverse events and successful drug repositioning have shown, drug effects are complex and include aspects not recognized by developers. How can we understand these unrecognized drug effects? Drug effects can be numerized by encompassing biological responses to drugs. For instance, the transcriptome data of cultured cells and toxicopathological images of mice treated with a compound represent the effects of the compound in vitro and in vivo, respectively. As a next step, we focused on pattern recognition, a data science framework to extract essentially important low-dimensional latent variables from high-dimensional observed data such as latent variable models. Latent variables are low-dimensional, allowing us to visualize drug effects in an easily recognizable form, such as a radar chart. This bird’s-eye view of drug effects enables us to compare them with existing knowledge, potentially articulating the effects of drugs as the known knowns and known unknowns. We believe that the three-step strategy of numerization, visualization, and articulation will allow us to understand drug effects comprehensively, and we are currently verifying this approach. In this review, we will introduce these candidate studies and hope to share our interest in “pattern recognition of biological responses,” the pillar of our group.
As unexpected adverse events and successful drug repositioning have shown, the effects of a drug are complex and often include aspects not recognized even by the developer.1,2) The biological response to a drug can be interpreted as the drug’s effects. By preparing samples from drug-treated cells and individuals and generating data exhaustively and nonarbitrary, as in omics analysis, the effects of a compound can be numerized. However, humans cannot easily recognize similarities and differences in such high-dimensional data. Therefore, we use a methodology to extract essential information from the data and convert it into an easily recognizable form, such as a radar chart or pattern recognition.3) This visualized information, unlike the original high-dimensional data, is easy for humans to understand and compare with existing knowledge.4,5) In our research, we hypothesize that it is possible to articulate the effects of drugs, both known and unknown, using this three-step strategy (numerization, visualization, and articulation). We are working to verify this hypothesis and build a research foundation (Fig. 1).

This figure shows the illustration of our strategy to articulate drug effects.
What are the methodologies to describe the properties of drugs with as low bias as possible? One of them is the use of comprehensiveness, such as omics analysis. By obtaining all variables belonging to a layer, it is possible to numerize the object of interest for that layer without arbitrariness. In particular, since the effects of a drug can be interpreted as a set of responses of an organism to its treatment, omics data using a treated organism as a sample are frequently used. Another method is the use of data close to sensory systems, such as image data. By using data that do not involve human cognition, numerical information with low arbitrariness can be obtained. On the other hand, information of life-science significance in such data is often invariant to rotation, etc., and the acquired data is redundant. In this respect, latent expression extraction by neural network is useful.
Here, we introduce the recent progress of our work regarding the above strategy, numerization, and pattern recognition of drug effects. The manuscript is composed of three parts. First, we confirmed the utility of the “comprehensive articulation strategy of compound action” in vitro and established a data analysis method. Next, we numerized immune cell trafficking as a biological response not present in vitro but in vivo. Additionally, we developed a neural network model for the numerical analysis of toxicopathological images in vivo.
A latent variable model, one of the pattern recognition methods, is a statistical method that extracts essential information (low dimension) representing the data of interest from the high-dimensional observed data. We developed a latent variable model suitable for analyzing drug effects using transcriptome data from drug-treated cultured cells.6) This method visualizes the biological responses to drugs reflected in the data and estimates the intensity of each response as a radar chart. Additionally, each biological response can be integrated with existing methods like pathway analysis, facilitating the interpretation of biological significance (Fig. 2).

This figure summarizes the proof-of-concept studies of our strategy in vitro.
Two examples validate this strategy. First, the potential toxicity of a drug was assessed for detectability.7) Focusing on endoplasmic reticulum (ER) stress, an important molecular mechanism associated with toxicity, cultured cells were treated with five U.S. Food and Drug Administration (FDA)-approved drugs predicted to be highly responsive despite no existing literature information. Molecular biological evaluation revealed that ER stress was induced in a concentration-dependent manner as predicted. Next, attention was turned to natural products, which generally exhibit diverse effects unlike molecularly targeted drugs.8) The FDA-approved natural product Rescinnamine and its derivative Syrosingopine, though structurally similar, were predicted to elicit several different biological responses by the proposed method: histone deacetylase (HDAC) inhibition and lipid accumulation. There were no reports on HDAC inhibition (with both) and lipid accumulation (with only Rescinnamine), so these effects were evaluated in cultured cells, confirming that both compounds showed HDAC inhibition and only Rescinnamine induced lipid accumulation.
Immune cell trafficking is an important biological response in vivo, related to various biological events such as drug-induced liver injury (Fig. 3). The first step in our strategy is to appropriately numerize the biological response. However, flow cytometry data, commonly used to measure immune cell trafficking, are not comparable and accumulative for subsequent analysis due to significant differences in personal characteristics and institutional variations. Therefore, we attempted to numerize drug-induced immune cell trafficking using the deconvolution method, which estimates cell ratios from tissue transcriptome data supplemented by existing immune cell transcriptome data. The bulk transcriptome, originating with microarrays around 2000, has amassed extensive data in public databases, readily accessible for research. Therefore, combining the deconvolution method with these legacy data is expected to yield comprehensive insights into immune cell trafficking (Fig. 4).

Illustration of immune cell trafficking as a biological response.

This figure summarizes our strategy to numerize immune cell trafficking as an in vivo biological response to drugs.
First, we found that the performance of existing methods was evaluated using blood, which is easier to obtain data from, rather than tissues.9) Therefore, we evaluated the performance of the deconvolution method in tissues by inducing various immune cell trafficking in mice using seven different liver toxicants, splitting the liver into two portions, and obtaining RNA-seq data from one portion and actual cell ratio measurements from the other portion using flow cytometry. As a result, we found that a model including both immune and parenchymal cells is useful when targeting tissues.10) Next, when analyzing rats with existing methods, we discovered that ancillary data from humans and mice were used as an alternative. Since data on rats have been accumulated mainly in toxicology, we aimed to leverage such legacy data. We isolated various immune cells from rats, performed RNA-seq, and constructed a deconvolution method suitable for rats. The superiority of the proposed model was confirmed by verifying species differences using the RNA-seq data of rat liver.11)
Through the above, we found that inter-experimental differences between auxiliary immune cell data and the data to be analyzed reduced the accuracy of the existing deconvolution methods. Therefore, we developed a deconvolution method using only gene names, not gene expression levels, with the aid of natural language processing. This method not only outperforms existing methods due to its robustness to inter-experimental differences but also enables comprehensive estimation of cell types with known marker genes.12)
Toxicopathological images used in drug safety assessment are a valuable source of information reflecting the biological response of an individual to a drug. These images are routinely obtained and accumulated by pharmaceutical companies, resulting in a large amount of data. However, numerization of pathological images is generally done by pathologists, which introduces subjectivity and limits the amount of extractable information, making them unsuitable for subsequent data analysis. Therefore, we focused on representation learning, where image features are converted into numerical values without arbitrariness using neural networks. While deep learning of pathological images has been actively studied in oncology, little knowledge exists on representation learning specifically for toxicopathological images. This study investigates whether representation learning of toxicopathological images can be used to numerize the biological response to drugs (Fig. 5).

Illustration of how to numerize biological responses encompassed in toxicolpathological images.
In a typical image representation learning method, a Convolutional Neural Network (CNN) trained on natural images such as dogs is further trained on supplementary tasks in the focused area13) (Fig. 6). In this study, a toxicopathological image numerization method was constructed by learning to estimate pathological findings on a variety of rat liver pathological images.14) For performance evaluation, the Mode of Actions (MoA) of administered compounds were first estimated, and the proposed method improved the estimation accuracy by 2.5-fold compared to multivariate composed of pathological findings annotated by pathologists. Next, aiming to reduce the cost of repeated-dose studies for drug safety evaluation, the study worked on predicting late-phase pathological findings from early-phase images of repeated dosing. The proposed method successfully predicted multiple findings (AUROC >0.8), whereas the pathological findings of the initial images could not predict any late findings.

Illustration of the representation learning model we tested.
Although we have highlighted the sensitivity to detect rare biological responses as an example of this strategy’s usefulness, its greatest strength lies in its ability to score, visualize, and recognize both known and unknown biological responses. Unrecognized effects of drugs, which may cause unexpected toxicity, can either be known but undetected or completely unknown and overlooked. This strategy allows both cases to be recognized and analyzed by forming a comprehensive, data-driven view of the entire biological response. This approach is expected to address challenges in drug development and enhance our understanding of drug effects.
The same strategy can be applied in vivo, but it is necessary to quantify responses from various angles, considering individual characteristics. In this study, a method was developed to numerize immune cell trafficking, an important in vivo biological response. The numerization of toxicopathological images similarly reflects individual characteristics and offers high feasibility. Since drug safety assessments involve varying concentrations and times, the cost of omics analysis is substantial. However, pathological images are routinely obtained during safety evaluations, making this strategy practical. In fact, joint research is underway with several pharmaceutical companies. Thus, our research is in high demand and expected to contribute significantly to various fields, particularly pharmaceuticals.
This study was financially supported by AMED under Grant Numbers: JP22mk0101250h and 23ak0101199h0001, by JSPS KAKENHI Grant-in-Aid for Scientific Research (C) [Grant Number: 21K06663] and JSPS KAKENHI [Grant Number: 16H06279 (PAGS)] from the Japan Society for the Promotion of Science, by a Grant-in-Aid from The Mochida Memorial Foundation for Medical and Pharmaceutical Research, and by a Grant-in-Aid from Takeda Science Foundation.
The author declares no conflict of interest.
This review of the author’s work was written by the author upon receiving the 2024 Pharmaceutical Society of Japan Award for Young Scientists.