Interpretable Drug Response Prediction via Optimal Transport-Guided Importance of Drug-Gene Relationships

Mohammed Aburidi

doi:10.1273/cbij.25.36

Abstract

Accurate prediction of drug response in cancer treatment remains a critical challenge due to the complex biological interactions underlying tumor sensitivity and resistance. In this work, we introduce OT-GNN, a novel graph neural network framework that leverages optimal transport theory to integrate prior drug-target interaction knowledge with gene expression profiles for interpretable and robust drug response prediction. By embedding an optimal transport-based alignment mechanism into the GNN architecture, OT-GNN dynamically reweights gene importance tailored to each drug–cell line pair, enhancing both predictive accuracy and biological interpretability. We evaluate OT-GNN on a processed NCI-60 dataset under zero-shot learning settings, demonstrating superior performance compared to traditional machine learning models, recent deep learning methods, and standard GNN variants without our proposed alignment. OT-GNN achieves state-of-the-art ROC-AUC and PR-AUC scores, with improved stability across multiple runs, highlighting its potential as a reliable tool for precision oncology applications. Our approach bridges the gap between data-driven modeling and biological prior knowledge, providing a pathway toward more transparent and effective drug response prediction.

1. Introduction

Precision oncology aspires to tailor cancer treatments to the molecular characteristics of individual patients, relying heavily on the ability to predict how tumors will respond to specific drugs. The proliferation of high-throughput pharmacogenomic resources, including the Cancer Cell Line Encyclopedia (CCLE) [1], the Genomics of Drug Sensitivity in Cancer (GDSC) [2], and the NCI-60 panel [3], has enabled systematic investigations of the relationship between genomic profiles and drug response. These datasets provide a fertile ground for machine learning (ML) approaches to model complex genotype-phenotype interactions [4]. However, drug response prediction remains a fundamentally difficult problem due to the inherent heterogeneity of tumor biology, sparsity of labeled data, high dimensionality of input features, and variability across cell lines and compounds [5].

Early efforts in drug response modeling relied on classical machine learning algorithms, such as support vector machines, random forests, and gradient boosting [6, 7]. While these approaches offer robust performance in controlled scenarios, they often treat genes as independent features, failing to model the complex interactions and regulatory dependencies between them. Moreover, these models lack biological interpretability, a critical limitation in high-stakes biomedical applications. To overcome these challenges, recent advances have embraced deep learning techniques that incorporate biological priors. In particular, graph neural networks (GNNs) have shown promise in integrating multi-omics data with prior knowledge in the form of biological networks, such as protein-protein interaction (PPI) graphs [8, 9]. These models capture the non-Euclidean structure of biological systems and exploit message passing mechanisms to learn expressive representations of gene interactions. Nevertheless, existing GNN-based models often rely on static graph structures and assume uniform importance across all nodes and edges, disregarding sample-specific biological signals. Additionally, their predictions are typically opaque, offering limited insight into which genes or pathways drive a given response.

To address these limitations, we propose OT-GNN, a novel graph-based framework that integrates optimal transport theory into GNN architecture for interpretable and effective drug response prediction. Our key innovation lies in leveraging optimal transport (OT) to align cell line-specific gene activity profiles with drug-target interaction priors [10,11]. This alignment produces an optimal coupling matrix that reflects the biological relevance of each gene in the context of a given drug and cell line. By embedding this transport plan into the GNN's message-passing mechanism, OT-GNN dynamically reweights gene contributions based on their biological importance, enabling both enhanced prediction and interpretability.

Unlike many existing models that apply a fixed graph structure or depend heavily on edge-level features, OT-GNN constructs a sample-specific attention mechanism rooted in biological priors, without introducing additional graph complexity. This leads to a more flexible and biologically grounded model that not only adapts to each input sample but also remains stable and computationally tractable. Importantly, the learned transport plans offer a principled way to interpret the model's predictions, as they highlight the genes most responsible for sensitivity or resistance to a particular drug.

We validate our model using a processed version of the NCI-60 dataset [12], applying zero-shot learning settings to test generalizability across unseen drugs and cell lines. Our experimental results demonstrate that OT-GNN outperforms traditional machine learning models, prior deep learning baselines, and GNN variants without the OT-based alignment mechanism. The improvements are particularly pronounced in terms of ROC-AUC and PR-AUC—two key metrics in imbalanced classification tasks [13]. Moreover, OT-GNN shows strong performance stability across multiple runs and data splits, reinforcing its robustness and reliability.

Our main contributions are as follows:

● We propose OT-GNN, a novel graph neural network architecture that incorporates optimal transport to integrate prior drug-target knowledge with gene expression data in a biologically meaningful manner.
● We develop an interpretable mechanism within the GNN framework that dynamically adjusts node importance based on the optimal transport alignment, without requiring edge-level features or complex graph preprocessing.
● We conduct comprehensive evaluations on a zero-shot drug response prediction task using the NCI-60 dataset, demonstrating that OT-GNN achieves superior predictive performance and model stability compared to state-of-the-art baselines.

By grounding predictions in biological knowledge and maintaining interpretability, OT-GNN moves beyond black-box deep learning approaches and offers a step toward more trustworthy, generalizable, and clinically actionable models in computational oncology.

2. Methods

2.1 Optimal Transport Foundations

Optimal transport (OT) [14,15,16,17,18] is a mathematical framework concerned with determining the most cost-efficient way to move mass—such as probability distributions—from one configuration to another. The goal is to minimize the total transportation cost between two distributions.

Let us define two discrete point sets, 𝒳 = { x i } i = 1 N and 𝒴 = { y i } i = 1 K ,representing source and target samples, respectively. Their associated discrete probability distributions are denoted by vectors p ∈ Δ N and q ∈ Δ K , where Δ n ≔ { u ∈ R + n | ∑ i = 1 n u i = 1 } is the probability simplex in R n .Thus,

p = ( p 1 , … , p N ) T , q = ( q 1 , … , q K ) T , p i , q j ≥ 0 , ∑ i = 1 N p i = ∑ j = 1 K q j = 1

The transportation plan is represented by a non-negative matrix Q ∈ R + K × N , where Q j i indicates the amount of mass transported from source point x i to target point y j . Valid transport plans satisfy the marginal constraints:

∑ i = 1 N Q j i = q j , ∀ j = 1 , … , K ,

(1)

∑ j = 1 K Q j i = p i , ∀ i = 1 , … , N .

(2)

Given a cost matrix C ∈ R + K × N with entries C j i = | y j − x i | 2 measuring the Euclidean distance between points, the OT problem is formulated as:

min Q ∈ U ( p , q ) ⟨ C , Q ⟩ F ≔ ∑ j = 1 K ∑ i = 1 N C j i Q j i ,

(3)

where the admissible set of transport plans is

U ( p , q ) ≔ { Q ∈ R + K × N ∣ Q 1 N = q , Q T 1 K = p } .

Here, 1 n is the all-ones vector in R n , and ⟨ ⋅ , ⋅ ⟩ F denotes the Frobenius inner product.

When C corresponds to a metric cost, the minimal value in (3) defines the Wasserstein distance W ( p , q ) :

W ( p , q ) ≔ min Q ∈ U ( p , q ) ⟨ C , Q ⟩ F

(4)

We solve this linear program using the simplex method [19], iteratively updating feasible solutions until optimality criteria are met.

2.2 Gene-Gene Interaction Graph Construction

To build a biologically grounded gene-gene interaction network enriched with informative node features and initial importance scores, we integrated data from three sources:

●　The PathwayCommons database provides the backbone graph G = ( V , E ) capturing curated physical and functional gene interactions.
●　Multi-omics data (gene expression, mutation status, copy number variation, and methylation) from the NCI-60 cell line panel, aggregated into node feature vectors.
●　Drug-target interaction (DTI) data from CTD [20], DrugBank [21], DGIdb [22], STITCH [23], and KIBA [24], used to initialize gene importance weights.

Gene expression values were normalized via Transcripts Per Million (TPM), then log 2 -transformed and winsorized to reduce outlier impact. Each gene node is represented by a 4-dimensional feature vector combining expression, mutation, copy number variation, and methylation data.

We define the initial Drug-Target Interaction (DTI)-based importance score between drug d i and gene g j as S dti ( d i , g j ) like [25], computed from aggregated literature evidence via PubMed co-mentions:

log _ count = log ( 1 + PubMedID _ count ) ,

S dti ( d i , g j ) = 0.5 + 0.5 × log _ count − min ( log _ count ) max ( log _ count ) − min ( log _ count )

Here, S dti ( d i , g j ) ∈ { 0 } ∪ [ 0.5 , 1 ] , with zero indicating no known association.

Here, log _ count represents the logarithmically scaled count of PubMed references in which both the drug and gene are co-mentioned, as retrieved using PubMed ESearch [26]. This co-occurrence reflects the extent of documented evidence linking each drug-gene pair in the biomedical literature. PubMed ID_count reflects the number of publications in which the drug and gene names co-occur in the title or abstract, retrieved using the PubMed ESearch API. While this co-mention count does not differentiate the nature of the relationship (e.g., positive or negative), it is commonly used in the literature as a proxy for relevance or potential association between biomedical entities.

2.3 Drug-target and Gene Activity Signals

To complement the drug-target signal, we also constructed a gene activity profile representing the functional state of genes in each cell line. For each gene 𝒈_𝒋 , we combined four types of molecular measurements: gene expression ( 𝐱 𝐣 expr ), copy number variation ( 𝐱 𝐣 cnv ), methylation ( 𝐱 𝐣 meth ), and mutation status ( 𝐱 𝐣 mut ). These features were normalized and fused into a unified activity score using a linear combination:

𝐀 𝐣 = 𝐰 expr ⋅ 𝐱 𝐣 expr + 𝐰 cnv ⋅ 𝐱 𝐣 cnv + 𝐰 meth ⋅ ( 1 − 𝐱 𝐣 meth ) + 𝐰 mut ⋅ ( 1 − 𝐱 𝐣 mut ) 　

(5)

Here, the weights 𝐰 ⋅ are hyperparameters reflecting the relative importance of each omics modality, and terms like ( 1 − 𝐱 𝐣 meth ) invert the interpretation of repressive signals (e.g., high methylation or damaging mutation reduces activity). This yields a vector 𝐀 ∈ 𝐑 | 𝐆 | for each cell line, where |𝑮| is the number of genes.

2.4 Profile Matching via Optimal Transport

Gien a drug 𝒅 and a corresponding prior importance vector 𝐒 𝐝 ∈ 𝐑 | 𝐆 | constructed from DTI databases (as described earlier), we formulated drug-cell compatibility as an optimal transport (OT) problem. Intuitively, this process seeks the minimal-cost "alignment" between the drug's targeted signal 𝑺_𝒅 and the cell's gene activity profile 𝑨 , under the assumption that a drug is more effective when its targets are active in the cell.

We normalized both vectors to define discrete probability distributions:

𝛍 = 𝐒 𝐝 ∑ 𝐣 𝐒 𝐝 , 𝐣 , 𝛎 = 𝐀 ∑ 𝐣 𝐀 𝐣 𝐟 ( 𝐱 )

(6)

Then, we computed the OT distance (e.g., Sinkhorn divergence) between 𝛍 and 𝛎 :

𝐖 𝛜 ( 𝛍 , 𝛎 ) = min 𝐓 ∈ 𝚷 ( 𝛍 , 𝛎 ) ∑ 𝐢 , 𝐣 𝐓 𝐢 𝐣 𝐂 𝐢 𝐣 − 𝛜 𝐇 ( 𝐓 )

(7)

where 𝐓 ∈ 𝐑 | 𝐆 | × | 𝐆 | is the transport plan, is the transport plan, 𝚷 ( 𝛍 , 𝛎 ) is the set of couplings with marginals 𝛍 and 𝛎 , 𝐂 𝐢 𝐣 = | 𝐠 𝐢 − 𝐠 𝐣 | 2 in feature space), 𝐇 ( 𝐓 ) = − ∑ 𝐢 , 𝐣 𝐓 𝐢 𝐣 log 𝐓 𝐢 𝐣 is the entropy of the transport plan, and 𝛜 is the entropic regularization parameter.

This OT distance 𝒲 _𝜖(𝝻, 𝞶) serves as a biologically motivated compatibility score: a smaller distance suggests that the drug’s target genes are functionally active in the cell line, making the drug more likely to elicit a strong response. 𝒲 _𝜖(𝝻, 𝞶) is a new vector over genes that represents how the drug’s target profile aligns with the gene activity profile of the cell line.

From the optimal transport plan 𝑻*, we derive the contextual alignment vector 𝑨_𝑶𝑻 ∈ 𝑹^|𝑮| by aggregating mass assigned to each gene across all drug targets:

𝐀 𝐎 𝐓 , 𝐣 = ∑ 𝐢 𝐓 𝐢 𝐣 * , for each gene 𝐣 .

Thus, 𝒲 _𝜖 is a scalar value used in the loss function to measure the alignment cost, whereas 𝑨_𝑶𝑻 is a gene-level importance vector used within the model to provide contextual biological signals to the graph neural network. These two quantities are related through the optimal transport plan 𝑻*, but serve distinct roles in the framework.

We incorporated this OT-guided score into the GNN training pipeline in two ways: (1) as an additional input feature, appended to the learned graph-level embedding; and/or (2) as a regularization term encouraging node importance propagation to be consistent with OT alignment. This integration enables the model to learn not only from structural relationships in the gene network but also from distributional alignment between drug action and cell context.

2.5 OT-Guided GNN Architecture

Our proposed model predicts drug response and uncovers gene-level importance by integrating prior drug-target knowledge with cell-specific molecular profiles using a graph neural network enhanced by optimal transport. We operate on a gene interaction network 𝑮=(𝑽, 𝑬) with |𝑽| genes and use node features 𝐗 ∈ 𝐑 | 𝐕 | × 𝐝 derived from multi-omics data. Importantly, our framework does not rely on edge features, focusing instead on learning meaningful node representations and importance propagation within the graph.

Given a drug 𝒅, the model receives as input: (i) a prior importance score vector 𝐒 𝐝 ∈ 𝐑 | 𝐕 | derived from drug-gene interactions, and (ii) a cell-specific gene activity vector 𝐀 ∈ 𝐑 | 𝐕 | constructed from omics data. These are used to compute an optimal transport-based alignment score that informs the propagation of importance across the graph.

2.5.1 Contextual Attention Refinement Layer (CAR Layer)

The Contextual Attention Refinement (CAR) Layer is designed to jointly refine node feature representations and node-level importance scores by integrating local graph topology with contextual drug-cell line alignment signals. Its job is to contextualize the importance signal with respect to the local network topology and gene activity. The output consists of an updated feature matrix 𝐗 ̂ ∈ 𝐑 𝐍 × 𝐝 and a refined importance vector 𝐈 𝐟 𝐢 𝐧 𝐚 𝐥 ∈ 𝐑 𝐍 . This layer constitutes five steps as follows:

1. Node Context Encoding:

We enhance node features with contextual alignment information by concatenating the OT-derived alignment vector 𝐀 𝐎 𝐓 ∈ 𝐑 | 𝐕 | to the node feature matrix 𝐗 ∈ 𝐑 | 𝐕 | × 𝐝 , resulting in the input matrix 𝐗 ′ ∈ 𝐑 | 𝐕 | × ( 𝐝 + 1 ) :

$$\boldsymbol{X}^{\boldsymbol{'}} = [\boldsymbol{X} | \boldsymbol{A}_{\boldsymbol{OT}}]$$

We then apply a graph neural network layer, such as GAT or TransformerConv, to propagate contextualized features over the graph:

𝐙 = GNN ( 𝐗 ′ , edge _ index )

where 𝐙 ∈ 𝐑 | 𝐕 | × 𝐝 ′ represents the contextual node embeddings.

2. Dual Attention Mechanism:

To balance structural and contextual relevance, we introduce two types of attention:

● Structural attention ( 𝛂 𝐢 𝐣 ): measures the relative importance of neighboring node 𝐣 to node 𝐢 , based on their contextual embeddings.
● Contextual gating ( 𝛃 𝐢 ): regulates how much of the new aggregated information should influence the updated node representation.

These are computed as:

𝛂 𝐢 𝐣 = exp ( 𝐚 𝐓 [ 𝐳 𝐢 | 𝐳 𝐣 ] ) ∑ 𝐤 ∈ 𝐍 ( 𝐢 ) exp ( 𝐚 𝐓 [ 𝐳 𝐢 | 𝐳 𝐤 ] )

𝛃 𝐢 = 𝛔 ( 𝐰 𝐓 𝐳 𝐢 + 𝐛 )

where 𝐳 𝐢 , 𝐳 𝐣 ∈ 𝐑 𝐝 ′ are contextual node embeddings, [ ⋅ | ⋅ ] denotes vector concatenation, 𝐚 , 𝐰 ∈ 𝐑 2 𝐝 ′ are learnable weights, and 𝛔 ( ⋅ ) is the sigmoid function.

3. Gated Feature Aggregation:

Node features are updated by aggregating messages from their neighbors, weighted by structural attention and modulated through a contextual gating mechanism:

𝐡 𝐢 = ∑ 𝐣 ∈ 𝐍 ( 𝐢 ) 𝛂 𝐢 𝐣 ⋅ 𝐳 𝐣

𝐱 𝐢 ̂ = 𝛃 𝐢 ⋅ 𝐡 𝐢 + ( 1 − 𝛃 𝐢 ) ⋅ 𝐱 𝐢

This step yields the updated feature matrix 𝐗 ̂ ∈ 𝐑 | 𝐕 | × 𝐝 , incorporating both neighborhood information and context-aware modulation.

4. Importance Score Refinement:

Refined node-level importance scores are computed from the updated features:

𝐈 𝐢 = ReLU ( 𝐖 𝐈 𝐱 𝐢 ̂ + 𝐛 𝐈 )

This produces the updated alignment vector 𝐈 𝐢 ∈ 𝐑 | 𝐕 | , reflecting the contextualized importance of each gene for the drug-cell pair.

5. Soft Thresholding and Normalization:

To enhance interpretability, a smooth thresholding function transforms the raw importance:

𝐈 𝐢 final = 𝐈 𝐢 1 + exp ( − 𝐤 ( 𝐈 𝐢 − 𝛉 ) )

The layer returns both 𝐗 ̂ = [ 𝐱 1 ̂ , … , 𝐱 𝐍 ̂ ] 𝐓 and the normalized importance vector 𝐈 𝐟 𝐢 𝐧 𝐚 𝐥 .

Figure 1 Overview of the OT-GNN framework for drug response prediction

The pipeline begins by constructing an attributed gene–gene interaction graph from the PathwayCommons database, where each node is enriched with multi-omics-derived gene activity features. In parallel, drug–cell line contextual alignment is computed via Optimal Transport (OT) between drug-target profiles and cell-specific gene activity distributions, yielding an importance vector that highlights contextually relevant genes. Both the attributed graph and the OT-derived importance scores are input into the CAR-GNN model, which features a Contextual Attention Refinement (CAR) layer. This layer integrates structural and biological signals using a dual attention mechanism, refines node features and importance scores, and propagates them through stacked GNN layers. A final graph-level embedding is aggregated and passed through a prediction head to estimate drug response. The model is trained using a composite loss that balances prediction accuracy, importance sparsity, and OT-based biological alignment.

2.5.2 Model Architecture and Prediction Head

The model stacks three IP Layers, each followed by GraphNorm, ReLU, and Dropout. The graph-level representation is obtained by global mean pooling of the final node embeddings:

𝐳 = 1 | 𝐕 | ∑ 𝐢 = 1 | 𝐕 | 𝐱 𝐢 ( 𝐋 ) ̂

To predict drug response, we use a fully connected layer followed by a sigmoid activation:

𝐲 ̂ = 𝛔 ( 𝐖 𝐟 𝐳 + 𝐛 𝐟 )

2.5.3 Loss Function

The training objective is a composite loss that balances predictive accuracy, interpretability, and biological plausibility:

$$\mathcal{L} = \underbrace{\mathcal{L}_{\text{BCE}}(\hat{y}, y)}_{\text{Prediction loss}} + \underbrace{\lambda_{\text{imp}} \cdot \|\mathbf{I}^{\text{final}}\|_{1}}_{\text{Importance sparsity}} + \underbrace{\lambda_{\text{OT}} \cdot \mathcal{W}_{\epsilon}(\mu, \nu)}_{\text{OT alignment regularization}}$$

where:

●𝓛_𝓑𝓒𝓔 (ŷ, y) is the binary cross-entropy loss for drug response prediction,
●| | 𝐈 𝐟 𝐢 𝐧 𝐚 𝐥 | | 1 encourages sparsity in the learned importance vector,
●𝒲 _𝜖(𝝻, 𝞶) is the entropic regularized optimal transport cost between the drug-target distribution 𝛍 and the gene activity distribution 𝛎 ,
●𝛌 imp and 𝛌 OT are trade-off hyperparameters.

This joint objective encourages accurate prediction, interpretable sparsity in importance scores, and biologically coherent alignment between drug targeting and cell context.

3. Experiments

3.1 Datasets

We based our experiments on the NCI-60 pharmacogenomic dataset [3], retrieving multi-omics profiles and drug sensitivity data using the rcellminer package [27]. Drug response was quantified using log-transformed GI50 values, which we binarized using a threshold of -4.6 to distinguish sensitive from resistant responses, yielding a near-balanced label distribution. After filtering for compounds annotated with NSC (National Service Center) identifiers, our final dataset included 52,000 drug-cell line interaction pairs, of which 34,000 were labeled as sensitive and 18,000 as resistant.

To evaluate the model under a zero-shot prediction setting, we designed a split that excluded 30% of cell lines and 40% of drugs from training. Specifically, 560 drugs and 40 cell lines were used for training and validation, while 370 drugs and 20 cell lines were held out for testing. This partitioning produced approximately 17,500 training instances, 4,200 for validation, and 6,300 for testing.

3.2 Baseline Methods and Hyperparameter Optimization

To comprehensively assess the performance of our proposed OT-GNN framework, we benchmark against a diverse suite of baseline models, spanning traditional machine learning methods, deep learning architectures, and graph neural networks. All models were evaluated using the same input features (multi-omics and DTI-based gene features) and data splits for a fair comparison. We applied systematic hyperparameter tuning for all models using the Optuna framework [28], optimizing validation performance through randomized or grid search strategies.

●Random Forest (RF): We used the RandomForestClassifier from scikit-learn, with key hyperparameters including the number of trees (100–1000), maximum depth (10–100), minimum samples per split (2–20), and maximum features ("sqrt", "log2", or None) tuned via Optuna.
●LightGBM: We implemented LightGBM [29] with a binary classification objective, tuning parameters such as num_leaves (31–255), learning_rate (1e-3–1.0), feature and bagging fractions (0.1–1.0), and L1/L2 regularization strengths (1e-8–10.0). Boosting rounds ranged from 100 to 2000.
●Multi-Layer Perceptron (MLP): Our MLP model was implemented in PyTorch with 2–5 fully connected layers (64–512 units each), ReLU activation, and dropout (0.1–0.5). Training used the Adam optimizer, and normalization layers (batch or layer norm) were selected via tuning.
●DeepDSC and MOFGCN: For DeepDSC [30], we followed the published architecture using stacked autoencoders for feature encoding followed by a fully connected prediction head. Hyperparameters followed recommended defaults with minor tuning. For MOFGCN [31], we adopted a published implementation incorporating multi-omics fusion via correlation-aware GCNs, using fixed hyperparameters from the original work for reproducibility.
●Message-Passing Neural Network (MPNN): MPNN [32] was trained with 1–3 layers, hidden dimensions of 16 or 32, and dropout (0.1–0.3). We experimented with batch sizes (2–4) and tuned regularization weights for sparsity and loss balance.
●Graph Convolutional Network (GCN) and GINE: GCN [33] and GINE [34] baselines were trained using a similar configuration as MPNN. GINE incorporated edge features for enhanced expressivity. Both models used global mean pooling for graph-level prediction.
●Graph Attention Network (GAT) [35] and Graph Transformer (GT) [36]: These models incorporated attention mechanisms for improved message passing. Hyperparameter search included attention heads (1, 2, or 4) in addition to the settings used for GCNs. We used 2–4 layers and applied ReLU and dropout for regularization.

All models were trained using the Adam optimizer with early stopping on the validation set. Performance was averaged over five independent runs using fixed train/validation/test splits. We report mean and standard deviation for ROC-AUC, PR-AUC, accuracy, precision, and specificity. Input features were consistently normalized across models and included expression, mutation, methylation, CNV, and initial DTI-based signals. This unified evaluation pipeline ensures fair and robust comparison.

In our framework, the base GNN used within the Contextual Attention Refinement (CAR) layer can be flexibly instantiated with different architectures to examine their effect on performance. OT-GNN (GAT) uses the Graph Attention Network (GAT), OT-GNN (GT) employs a Graph Transformer architecture (GT), OT-GNN (GINE) integrates the Graph Isomorphism Network with Edge features (GINE). In all cases, the CAR layer applies these GNNs to the contextualized feature matrix and edge index to generate updated node embeddings and propagate importance scores. This modularity allows us to compare architectural performance under a unified interpretable optimal transport framework.

3.3 Evaluation Metrics

To comprehensively assess the predictive performance of our model and baselines, we report a set of widely used evaluation metrics for binary classification tasks:

●ROC-AUC: (Receiver Operating Characteristic Area Under the Curve): This metric quantifies the model's ability to distinguish between positive and negative classes across all classification thresholds. A higher ROC-AUC value indicates better overall classification performance.
●PR-AUC: (Precision-Recall Area Under the Curve): PR-AUC focuses on the model’s performance on the positive class and is especially informative for imbalanced datasets. It captures the trade-off between precision and recall across different thresholds.
●Accuracy: The proportion of correctly predicted instances (both positive and negative) out of the total number of predictions.
●Precision: The proportion of true positive predictions out of all predicted positives, indicating how reliable the positive predictions are.
●Specificity: Also known as the true negative rate, specificity measures the proportion of correctly predicted negative instances among all actual negatives.

In Table 1, we use upward arrows (↑) to indicate that higher values for these metrics are desirable. For each metric, the best-performing method is highlighted in bold.

3.4 Prediction Performance

We assessed the performance of our proposed model, OT-GNN, against a variety of benchmarks, including five classical machine learning models, two state-of-the-art baselines, and three graph neural network (GNN) variants without our proposed optimal transport-guided importance propagation. Table 1 summarizes the results, reporting the mean and standard deviation over five independent runs.

OT-GNN demonstrates consistent and superior performance across nearly all evaluation metrics, especially on ROC-AUC and PR-AUC, which are critical in imbalanced classification scenarios. Among all variants, OT-GNN (GT) achieved the best ROC-AUC (0.801) and PR-AUC (0.901), indicating its strong discriminative power and high precision under class imbalance.

Compared to its non-IP counterpart, OT-GNN (GT) shows a 3.8% improvement in ROC-AUC and a 4.3% improvement in PR-AUC, highlighting the substantial contribution of optimal transport-guided importance propagation. Likewise, the GINE-based variant saw gains of 5.4% in PR-AUC and 5.4% in ROC-AUC, further validating the module’s adaptability across architectures.

While DeepDSC slightly outperforms in raw accuracy and precision, our model offers a more balanced performance profile, with considerably better PR-AUC and ROC-AUC. This reflects OT-GNN’s ability to generalize effectively rather than overfitting to the dominant class.

Interestingly, MOFGCN, while exhibiting the highest specificity (0.893), suffers from poor overall classification ability (ROC-AUC: 0.487), confirming its bias toward predicting resistance only. In contrast, OT-GNN provides a more even predictive capability.

Finally, all variants of OT-GNN exhibit low standard deviations across runs, suggesting that the model is robust and stable, even under variations in data split and initialization—an important characteristic for real-world biomedical applications.

Table 1. Predictive Performance Comparison for Binary Classification

Results are averages over 5 runs with standard deviations. Best values per metric are in bold.

3.5 Interpretability Analysis

Our model enhances interpretability by assigning each gene a context-dependent importance score, which reflects its influence on the predicted drug response. These scores are derived from the Contextual Attention Refinement (CAR) layers, which propagate drug-target information across the gene-gene network while integrating cell line–specific activity via optimal transport (OT) alignment. This design enables the model to highlight both direct and indirect contributors to drug efficacy in a biologically informed manner. To evaluate the interpretability of the proposed OT-GNN framework, we investigate how the model assigns gene-level importance scores during drug response prediction. We present two representative case studies illustrating the biological relevance of the model's outputs.

3.5.1 Case Study I: Bortezomib

Bortezomib is a proteasome inhibitor used in multiple myeloma and certain lymphomas. Table 1 lists the top 10 genes identified for Bortezomib, a proteasome inhibitor used in multiple myeloma. The top-ranked gene, PSMB5, a catalytic subunit of the 20S proteasome, is the known molecular target of Bortezomib, validating the model's alignment with established pharmacological knowledge. Genes such as RELA and NFKBIA are involved in the NF-κB pathway, which is regulated by proteasomal degradation and is central to Bortezomib’s mechanism of inducing apoptosis. Other genes like HSPA1B and RPL10, though not direct targets, are related to protein folding and stress response pathways, possibly influencing cellular susceptibility to proteasome inhibition. The identification of these genes highlights OT-GNN’s ability to capture secondary mechanisms that influence treatment response, offering interpretability beyond known direct targets.

Table 2. Top 10 important genes for Bortezomib and associated evidence

3.5.2 Case Study II: Erlotinib

We further evaluated the model on Erlotinib, an EGFR tyrosine kinase inhibitor that targets EGFR, used to treat non-small cell lung cancer. As shown in Table 2, the top-ranked gene, EGFR, the known direct molecular target of Erlotinib, demonstrates strong attribution fidelity and the model’s ability to recover established drug-target interactions. Additional genes such as PIK3CA, GRB2, and STAT3 are part of the EGFR downstream signaling network and are known to affect Erlotinib sensitivity or resistance. Notably, TP53 and CDKN2A are frequently mutated in cancer and modulate cell cycle response and apoptotic pathways that intersect with EGFR signaling. The ability of the model to recover both direct and contextually relevant genes highlights its interpretability and biological validity.

Table 3. Top 10 important genes for Bortezomib and associated evidence

The rankings in Tables 2 and 3 are based on the normalized importance scores produced by the CAR (Contextual Attention Refinement) layer. Specifically, after the graph message passing and dual attention steps, each gene node is assigned a scalar importance score 𝐈 𝐢 final (see Section 2.5), which quantifies the gene's contribution to the final drug response prediction. These scores are not derived from the classification probabilities, but rather from an internal mechanism within the model that reflects both the biological relevance and structural influence of each gene node. The genes are ranked in descending order of their importance scores.

These case studies highlight OT-GNN’s capacity for interpretable prediction by assigning biologically meaningful importance scores to genes, both known targets and secondary effectors. The consistency of top-ranked genes with known drug mechanisms and resistance pathways demonstrates that the model integrates prior knowledge with contextual cellular signals effectively. This interpretability may aid in identifying predictive biomarkers, suggesting alternative pathways for therapeutic targeting, and generating mechanistic hypotheses for experimental validation.

4. Conclusion

In this study, we presented OT-GNN, a novel graph neural network framework that incorporates optimal transport to effectively integrate prior drug-target interaction data with heterogeneous multi-omics profiles for interpretable drug response prediction. By aligning drug-target importance signals with cell line-specific gene activity distributions, our method dynamically adjusts gene importance within the GNN, enhancing both predictive performance and biological interpretability. Extensive experiments on the NCI-60 dataset demonstrate that OT-GNN consistently outperforms traditional machine learning models, existing deep learning approaches, and baseline GNN architectures lacking optimal transport-guided alignment. Furthermore, the stability of our model across multiple runs underscores its robustness in handling complex and imbalanced pharmacogenomic data. Future work may extend this framework to incorporate additional omics modalities and explore its applicability to other precision medicine challenges, further advancing the integration of prior biological knowledge with data-driven modeling for improved clinical decision support.

5. Code Availability

To support transparency and reproducibility, we plan to release the source code for OT-GNN upon publication. The repository will include implementation details, training scripts, and instructions for reproducing the main experiments. The code will be made publicly available at: https://github.com/maburidi/OT-GNN

References

[1] Barretina, J.; Caponigro, G.; Stransky, N.; Venkatesan, K.; Margolin, A.A.; et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012, 483(7391), 603–607. doi: 10.1038/nature11003.
[2] Iorio, F.; Knijnenburg, T.A.; Vis, D.J.; Bignell, G.R.; Menden, M.P.; et al. A landscape of pharmacogenomic interactions in cancer. Cell 2016, 166(3),740–754. doi: 10.1016/j.cell.2016.06.017.
[3] Shoemaker, R.H. The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer. 2006, 6(10), 813–823.doi: 10.1038/nrc1951.
[4] Costello, J.C.; Heiser, L.M.; Georgii, E.; Gönen, M.; Menden, M.P.; et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. 2014,32(12),1202–1212. doi: 10.1038/nbt.2877.
[5] Ali, M.; Aittokallio, T. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys Rev. 2019,11(1),31–39. doi: 10.1007/s12551-018-0446-z.
[6] Menden, M.P.; Iorio, F.; Garnett, M.J.; McDermott, U.; Benes, C.H.; et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS One. 2013,8(4),e61318. doi: 10.1371/journal.pone.0061318.
[7] Costello, J.C.; Heiser, L.M.; Georgii, E.; Gönen M, Menden, M.P.; et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. 2014, 32(12),1202–1212. doi: 10.1038/nbt.2877.
[8] Zitnik, M.; Agrawal, M.; Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 2018, 34(13), i457–i466. doi: 10.1093/bioinformatics/bty294.
[9] Zitnik, M.; Agrawal, M.; Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 2018, 34(13), i457–i466. doi: 10.1093/bioinformatics/bty294.
[10] Peyré, G.; Cuturi, M. Computational Optimal Transport: With Applications to Data Science. Found Trends Mach Learn. 2019, 11(5-6):355–607. doi:10.1561/2200000073.
[11] Schiebinger ,G.; Shu, J.; Tabaka, M.; Cleary, B.; Subramanian, V.; et al. Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming. Cell 2019,176(4), 928–943.e22. doi: 10.1016/j.cell.2019.01.006.
[12] Shoemaker, R. H. The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer. 2006,6(10), 813–823 doi: 10.1038/nrc1951.
[13] Saito, T. ; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 2015,10(3), e0118432. doi: 10.1371/journal.pone.0118432.
[14] Santambrogio, F. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. Progress in Nonlinear Differential Equations and Their Applications, Cham: Birkhäuser, 2015; vol. 87. doi:10.1007/978-3-319-20828-2.
[15] S. Malone, M. Aburidi and R. F. Marcia, Wasserstein-Based Similarity Constrained Matrix Factorization for Drug-Drug Interaction Prediction, 2024 IEEE Workshop on Signal Processing Systems (SiPS), Cambridge, MA, USA, 2024, pp. 49-53, doi: 10.1109/SiPS62058.2024.00017.
[16] Aburidi , M.; Marcia,R. Optimal Transport Based Graph Kernels for Drug Property Prediction. IEEE OJEMB. 2025, 6, 152–157. doi: 10.1109/OJEMB.2024.3480708.
[17] Aburidi,M.; Marcia, R. Optimal Transport-Based Network Alignment: Graph Classification of Small Molecule Structure-Activity Relationships in Biology. 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 2024, pp. 1-5, doi: 10.1109/EMBC53108.2024.10782458.
[18] Aburidi,M.; Marcia, R. Wasserstein Distance-Based Graph Kernel for Enhancing Drug Safety and Efficacy Prediction *, 2024 IEEE First International Conference on Artificial Intelligence for Medicine, Health and Care (AIMHC), Laguna Hills, CA, USA, 2024, pp. 113-119, doi: 10.1109/AIMHC59811.2024.00029.
[19] Wood, M. K.; Dantzig, G. B. Programming of interdependent activities: I general discussion. Econometrica 1949, 17(3/4), 193–199. doi:10.2307/1905522.
[20] Davis,A.P.; Grondin, C. J.; Johnson, R. J.; Sciaky, D.; Wiegers, J.; et al. The comparative toxicogenomics database (ctd): update 2023. Nucleic Acids Res. 2023, 51(D1), D1193–D1199. doi:10.1093/nar/gkac833.
[21] Wishart, D. B.; Feunang, Y. D.; Guo, A. C.; Lo, E. J.; Marcu, A.; et al. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res. 2018, 46(D1), D1074–D1082. doi:10.1093/nar/gkx1037.
[22] Freshour, S.L.; Kiwala, S.; Cotto, K.C.; Coffman,A.C.; McMichael, J. F.; et al. Integration of the Drug–Gene Interaction Database (DGIdb 4.0) with open crowdsource efforts Nucleic Acids Res. 2021, 49(D1), D1144–D1151. doi:10.1093/nar/gkaa1084.
[23] Szklarczyk, D.; Santos, A.; Mering, C. V.; Jensen, L. J.; Bork, R.; et al. Stitch 5: augmenting protein–chemical interaction networks with tissue and affinity data. Nucleic Acids Res. 2016, 44(D1), D380–D384. doi: 10.1093/nar/gkv1277.
[24] Tang,J.; Szwajda,A.; Shakyawar,S.; Xu, T.; Hintsanen,P.; et al. Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J Chem Inf Model. 2014, 54(3),735–743. doi: 10.1021/ci400709d.
[25] Inoue, Y.; Fu, T.; Luna, A. GraphPINE: Graph Importance Propagation for Interpretable Drug Response Prediction. arXiv preprint arXiv:2504.05454 doi:10.48550/arXiv.2504.05454.
[26] Eric Sayers. The e-utilities in-depth: parameters, syntax and more. Entrez Programming Utilities Help [Internet], 2009.
[27] Luna, A.; Raj,apakse, V.N.; Sousa, F.G.; Gao, J.; Schultz, N.; et al. rcellminer: exploring molecular profiles and drug response of the NCI-60 cell lines in R. Bioinformatics 2015, 32(8), 1272–1274. doi: 10.1093/bioinformatics/btv701.
[28] Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2623–2631.doi.: 10.1145/3292500.3330701.
[29] Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; et al. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30.
[30] Li, M.; Wang, Y.; Zheng, R.; Shi, X.; Li, Y.; et al. DeepDSC: A deep learning method to predict drug sensitivity of cancer cell lines. IEEE/ACM Trans Comput Biol Bioinform. 2021,18(2), 575–582. doi: 10.1109/TCBB.2019.2919581.
[31] Lao, C.; Zheng, P.; Chen, H.; Liu, Q.; An, F.; et al. DeepAEG: A model for predicting cancer drug response based on data enhancement and edge-collaborative update strategies. BMC Bioinformatics 2024, 25(1), 105. doi: 10.1186/s12859-024-05723-8.
[32] Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; Dahl, G. E. In Neural message passing for quantum chemistry. Proceedings of the 34th International Conference on Machine Learning, Australia, Aug 6 -11, 2017; Precup, D, The, Y.W., Eds.; JMLR.org: 2017; PMLR 70, pp. 1263–1272.
[33] Kipf, T. N.; Welling, M. Semi-supervised classification with graph convolutional networks. Preprint of papers, 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017. doi:10.48550/arXiv.1609.02907.
[34] Hu, W.; Liu, B.; Gomes, J.; Zitnik, M.; Liang, P.; et al. Strategies for pre-training graph neural networks. Preprint of papers, 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. doi:10.48550/arXiv.1905.12265.
[35] Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; et al. Graph attention networks. Preprint of papers, 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada, April 30-May 3, 2018. doi:10.48550/arXiv.1710.10903.
[36] Yun, S.; Jeong, M.; Kim, R.; Kang, J.; Kim, H. J. Graph transformer networks. Preprint of papers, 33rd Conference on Neural Information Processing Systems 32. 2019, Vancouver, Canada, Dec 8th-14th, 2019. doi:10.48550/arXiv.1911.06455.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）