Annals of Clinical Epidemiology
Online ISSN : 2434-4338
SEMINAR
Secondary Analysis of Randomized Controlled Trials: Methodological Considerations and Best Practices
Qingyao ShangShuna YaoMeishuo OuyangXin Wang Sheng Luo
著者情報
ジャーナル オープンアクセス HTML

2025 年 7 巻 4 号 p. 137-147

詳細
ABSTRACT

Secondary analyses of randomized controlled trials (RCTs) offer an efficient way to maximize the value of existing trial data and explore clinical questions beyond the original trial scope. This review outlines methodological considerations and best practices, emphasizing their role in evidence-based medicine. We describe major RCT data sources and summarize steps for accessing individual participant data. Key methods include formulating research questions, understanding trial design, data processing, statistical analysis, and clinical interpretation. Types of secondary analyses are discussed including subgroup analysis, biomarker studies, health economics, methodological research, adverse event analysis, and predictive modeling. Common challenges, such as data applicability, missingness, generalizability, and multiple testing, are summarized. With advances in data sharing, secondary analyses are expected to drive scientific discovery and provide more evidence for diagnosis and treatment patterns as long as rigorous standards and transparency are maintained.

 BACKGROUND

Randomized controlled trials (RCTs) are the cornerstone of evidence-based medicine1). They provide strong evidence for the efficacy and safety of all important medical interventions. RCT trials are often labor-intensive and time-consuming to conduct, making their data a valuable research resource. Secondary analysis is the additional analysis of trial data from RCTs to answer new clinical questions or aspects not addressed by the primary analysis2). This approach maximizes the use of existing information, which may be useful for clinical practice or future research. It includes subgroup analyses, long-term follow-up, and secondary endpoint analyses designed to improve the scientific and clinical value of trial results.

The importance of secondary analyses lies mainly in the utilization of data already available, thus avoiding the need to conduct further trials and saving significant resources3). Secondary analysis also promotes transparency and reproducibility of research, in line with the modern spirit of open science4). However, conducting secondary analyses requires an in-depth understanding of the design, methodology and limitations of the original study to ensure validation of new findings. This article provides an overview of RCT data sources, methodological considerations, challenges, and ethical implications for the understanding of all involved, especially researchers and clinicians.

Fig. 1  Flowchart of the secondary analysis methodology workflow for RCTs

 RCT DATA SOURCES FOR SECONDARY ANALYSIS

In recent years, it has become easier to access data from RCTs for secondary analysis as clinical research data sharing and transparency have increased. Multiple databases exist today that contain individual participant data (IPD) from completed RCTs, each with unique functionality and access processes. This article describes their primary sources, including related repositories such as VIVLI, YODA Project, and BioLINCC (Table 1).

Table 1 Key Databases for Accessing RCT Data for Secondary Analysis

Database Data Provided Access Process Website
VIVLI Individual participant-level data from various RCTs. Register on platform - Search for studies - Submit data request proposal - Review and approval by data contributor or panel - Access data in secure research environment https://vivli.org/
YODA Project Clinical trial data from pharmaceutical companies and other partners. Visit website - Identify data of interest - Submit data request form - Review by YODA staff and data partner - Data provided under data use agreement https://yoda.yale.edu/
BioLINCC Data and biospecimens from NHLBI-funded studies. Register on BioLINCC website - Search for studies - Submit data request form - Review by NHLBI staff - Data provided upon approval https://biolincc.nhlbi.nih.gov/home/
ClinicalTrials.gov Comprehensive registry of clinical trials worldwide, including study protocols, eligibility criteria, trial phases, interventions, outcomes, and recruitment status. Some trials may link to summary results or associated data repositories. Does not directly provide raw data. Search for trials using various filters (disease, intervention, sponsor, location, etc.). Contact study sponsors or investigators for potential data access if available. Some trials may provide summary results directly on the website. https://clinicaltrials.gov/
EMA Clinical Data Summary-level clinical data, including clinical study reports, protocols, and related documents for new active substances and COVID-19 treatments; individual patient data (IPD) available through a limited pilot project, subject to anonymization and approval. Access via website;
Register for an EMA account;
View or download data online under defined terms of use;
Subject to EMA oversight for regulatory and scientific purposes.
https://clinicaldata.ema.europa.eu/web/cdp
ICTRP Registry of clinical trials from WHO-approved national and regional registries; includes study design, objectives, interventions, and trial status; does not provide raw data directly. Search for clinical trials via the ICTRP website using disease, intervention, or location filters. Contact the original investigators or trial sponsors for potential data access. Data availability depends on the respective national or regional registry policies. https://www.who.int/clinical-trials-registry-platform

 Vivli

VIVLI is an independent, nonprofit data-sharing platform launched in 2018 to facilitate global access to individual IPD from completed clinical trials5),6). VIVLI serves as a neutral intermediary between data contributors and data users to form an ecosystem for data sharing, and the platform includes an independent data repository, an in-depth search engine, and a secure research environment. Researchers can retrieve RCT studies, request datasets, aggregate and process data, or share their own data on the platform. Researchers must register, search for relevant studies, submit a request proposal outlining the research question and specific methodology of the secondary analysis, and undergo a review by a data contributor or the platform’s independent team before they can access the data in the secure environment. This process safeguards the privacy of participants and ensures that the data held are used responsibly.

Several high-quality secondary analyses have been conducted using data from VIVLI. For example, the IMpower150 trial, which evaluated atezolizumab plus chemotherapy in patients with metastatic non-small cell lung cancer, has supported multiple secondary studies7). Hopkins et al8). stratified patients based on the Lung Immune Prognostic Index and demonstrated its prognostic utility for patients receiving first-line atezolizumab combination therapy, using Cox proportional hazards regression and survival analysis8). Li et al9). conducted a secondary analysis using data from the Palbociclib: Ongoing Trials in the Management of Breast Cancer-2 (PALOMA-2) and PALOMA-3 trials911). These two RCTs evaluated the efficacy of cyclin-dependent kinases 4 and 6 (CDK4/6) inhibitor palbociclib plus endocrine therapy in patients with hormone receptor-positive metastatic breast cancer. The secondary analysis stratified patients by human epidermal growth factor receptor 2 (HER2) expression (HER2-low-positive vs. HER2-0) and found that those with HER2-low expression derived greater benefit from palbociclib-based therapy.

 Yale Open Data Access (yoda) Project

This project, initiated by Yale University in 2013 to further the concept of open science and provide a reliable intermediary for data sharing, has begun to make data from clinical trials available12). It has partnered with pharmaceutical companies to transfer full jurisdiction over data access, ensuring independence and accountability. The purpose of this program is to improve the health of patients while providing a transparent process for seeking data. Investigators who are interested can visit the website, identify data of interest, submit an application form describing the project and team, and undergo a review by YODA staff and data partners. After review, where appropriate, data are made available under a use agreement designed to promote transparency and scientific rigor. To date, this has generated over 100 supporting publications over the past decade, demonstrating to some extent its value in secondary analysis. For example, Selective Prostate Androgen Receptor Targeting with ARN-509 (SPARTAN) was a phase III RCT designed to evaluate the efficacy and safety of apalutamide in patients with nonmetastatic castration-resistant prostate cancer (nmCRPC)13). Roy et al14). utilized data from the SPARTAN trial in the YODA project to conduct a secondary analysis study focusing on the impact of whether nmCRPC patients had received local therapy prior to first-line Apalutamide treatment on treatment outcomes14). The results of the secondary analysis showed that for patients with nmCRPC, the effect of first-line apalutamide treatment was not affected by a history of prior local therapy, which supports the broad applicability of Apalutamide as a standard treatment option for these patients.

 Nhlbi Biologic Specimen And Data Repository Information Coordinating Center (biolincc)

Administered by the National Heart, Lung, and Blood Institute (NHLBI), the BioLINCC provides access to data and biospecimens from NHLBI-funded studies, including RCTs related to heart, lung, blood, and sleep disorders15),16). Established in 2008, the BioLINCC links the NHLBI Biobanks with the Data Repository to further enhance their visibility and utility. Interested researchers can register on the website, search for studies using keywords, and fill out a data request form describing their needs and purpose for seeking data. Requested data will be reviewed by NHLBI staff to confirm that no data use agreements have been violated. In its fourth year online, BioLINCC completed 381 data requests, underscoring its potential to facilitate collaborations and improve scientific outcomes. Merrill et al17). used data from the Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist (TOPCAT) trial, originally designed to assess the efficacy of spironolactone in patients with heart failure with preserved ejection fraction, to examine sex-based differences in treatment response17). Their secondary analysis revealed that women appeared to benefit more than men from spironolactone treatment in terms of cardiovascular outcomes, suggesting a need to consider sex as a factor in treatment decision-making. Itoga et al18). conducted a secondary analysis based on the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT) trial, a large RCT originally designed to assess the impact of different antihypertensive treatment strategies on cardiovascular events18),19). The present study reanalyzed these data, focusing on the relationship between blood pressure levels and lower extremity peripheral arterial disease (PAD) events such as intermittent claudication, revascularization, or amputation. The study adjusted for variables such as age, gender, race, diabetes, smoking, and other cardiovascular risk factors through Cox regression modeling, and the main findings suggest that both low or high systolic blood pressure and pulse pressure are associated with an elevated risk of lower extremity PAD events. This secondary analysis study provides further evidence-based support for clinical practice and demonstrates the value of the ALLHAT data in exploring outcomes beyond cardiovascular disease.

 Other Sources Include

ClinicalTrials.gov: ClinicalTrials.gov is one of the world’s largest clinical trial registration platforms operated by the U.S. National Library of Medicine for registering publicly and privately funded clinical studies20). It provides data on study design, interventions, outcome metrics, and summary results, while some studies also disclose plans for sharing IPD. While the platform itself does not directly host the raw data, researchers can find out if they can request access to the data, how to contact the study leader, or which third-party platforms they can access the data through by reviewing the Data Sharing Statement. ClinicalTrials.gov is an important information portal for academics looking to use clinical trial data for secondary research.

European Medicines Agency (EMA) Clinical Data Platform: Launched in October 2016, the EMA Clinical Data Platform is a pioneering project in the field of regulatory assessment of medicines in the EU21). The EMA is the first platform in the world to publish clinical data provided by pharmaceutical companies, thereby supporting open access as a basis for drug approval. Within this framework, it encourages pharmaceutical companies to publish clinical study reports, protocols and related documents to promote independence and accountability. In addition to its commitment to improving patient health, the platform aims to provide a clear and understandable process for sharing data. Researchers interested in data access can visit the site to see which datasets are available, register for an EMA account, and view or download materials online under specific terms of use. Any broader data access requests or related inquiries will be submitted to the EMA, which will make release decisions based on scientific integrity and public interest. The platform provided partial support for secondary analyses22).

World Health Organization’s International Clinical Trials Registry Platform (ICTRP): ICTRP was initiated by the World Health Organization in 2005 to increase transparency of clinical trial information and facilitate international collaboration23),24). It works with multiple national and international registries and creates a unified database to ensure the fairness and reliability of tracking clinical studies. The core goal of the program is to improve public health by providing a clear and easy way to access trial details. Researchers wishing to utilize this resource can access the platform, identify relevant trials and contact the principal investigator or sponsor for more information, and upon passing the review can provide the data as appropriate in accordance with protocols that uphold scientific accuracy and ethical standards.

 CONDUCTING SECONDARY ANALYSIS: METHODOLOGY AND PROCESS

 Define The Research Question

The key to any secondary analysis is to formulate a clear and novel research question or hypothesis that was not examined in the original experiment. This research question sets the direction for the entire analysis and influences subsequent methodological choices and interpretation of results. To be effective, the research question should follow the SMART criteria: it must be specific (addressing the precise aspects of the RCT data), measurable (allowing for quantitative assessment), realizable (the required data were collected and made available in the original RCT), relevant (the research question is pertinent to the selected RCT and has implications for clinical or scientific advancement), and time-bound (applying to the RCT data for the timeframe). Developing a research question requires careful consideration of the original RCT’s enrollment population, primary and secondary study endpoints, and by reading published results, clinical trial protocols, and statistical analysis plans (SAP) to understand where problems and opportunities have been addressed. Researchers begin by identifying unexplored areas, which may be an understudied subgroup or secondary endpoint, or potential interactions that have not been previously prioritized. For example, if the initial trial focused on overall efficacy, secondary analyses might explore heterogeneity in treatment effects across demographic or clinical characteristics (e.g., age, sex, or disease stage).

Once a potential problem arises, the researcher must verify its feasibility by ensuring that the necessary data are available. This step includes reviewing the trial’s data dictionary, documentation detailing the variables collected, variable definitions, and formats. For the age-related example above, analysts will confirm that age has been documented (as a continuous or categorical variable), that the progression free survival (PFS) has been defined as a time-to-event indicator, and that other relevant covariates are present to adjust for confounding. Missing or incomplete covariates may result in the study not being conducted or the reliability of the conclusions being reduced, so this validation is critical. If the data lack granularity, for example, age is reported only as a mean without a separate value, the question may need to be rephrased or dropped.

In addition, the question should be aligned with clinical or scientific priorities to ensure its relevance. Researchers may draw inspiration from clinical guidelines, prior literature, or emerging hypotheses in the field to place their research in a broader context. This relevance enhances the potential impact of the study and justifies secondary analysis.

 Understand The Original Study

Familiarity with the design of the original study is necessary, including randomization procedures, inclusion/exclusion criteria, intervention details, and outcomes25). Reviewing the data reported in the protocol, SAP, academic conference, or published article helps to understand data collection and potential biases, such as selection or attrition bias that may affect secondary analyses.

This begins with a comprehensive review of the trial protocol, which outlines the blueprint for the study. This document details how participants are randomized, ensuring that treatment groups are balanced on key characteristics. It also specifies inclusion and exclusion criteria, such as age range, disease severity, or prior treatment, which define the study population and influence generalizability. Understanding these criteria can help assess the appropriateness of the data for a new question; if the secondary analysis targets older patients but the original trial excluded patients over the age of 75, the dataset may be insufficient.

SAP provides insights into the analytic framework of a trial26). SAP outlines predefined analytic methods, such as intention-to-treat or per-protocol methods, as well as statistical methods. Reviewing the SAP helps to understand how the original results were assessed and whether any secondary endpoints overlap with new research questions. It can also reveal potential biases that may exist in the design-such as selection bias due to restrictive inclusion criteria, or loss-to-follow-up bias due to subject dropout-that may affect the results of secondary analyses.

The published literature of a trial, including primary outcomes, follow-up studies, and subgroup analyses, can provide additional contextual information. These papers reveal the focus of the original investigator’s concerns and what may shed light on secondary research questions. If conference or interim reports are available, they may provide further information, such as early trends or methodological adjustments not detailed in the final publication. A RCT typically evolves over time, with multiple papers reflecting different stages - initial efficacy, long-term safety, or exploratory analysis. It is critical to determine the stage at which data are disclosed (e.g., interim cutoff vs. final dataset) because interim data may lack complete follow-up, while final data may contain additional variables or events.

 Data Preparation

Data preparation is a critical phase in secondary analysis that transforms raw datasets into a usable form suitable for answering new research questions. This step begins with acquiring the dataset and its associated documentation from a selected database, ensuring secure access in accordance with the data use protocol. Datasets typically contain IPD, such as baseline variates, treatment assignment, outcomes, and data dictionaries or metadata that describe variable definitions, formats, and coding schemes27).

The first task is to ensure that the data are correctly imported into the analysis environment, such as R or SAS. This involves verifying that the data were not corrupted when loaded, and checking that the number of rows, variable names, and value ranges are consistent with the documentation. The researcher must double-check that each variable corresponds to its intended meaning, referring to a dictionary of variables for cross-validation. This is followed by data cleaning and preprocessing to resolve inconsistencies, errors, or formatting issues that may affect quality28). This may include re-coding variables to ensure consistency.

Dealing with missing values is a central challenge in this step, as RCTs often have incomplete data due to subjects dropping out, not responding, or not recording variables. Researchers must assess the extent and pattern of missing values to determine whether they are random or systematic, which can be analyzed through summary statistics or visualization methods. Commonly used strategies include interpolation (e.g., using mean interpolation for continuous variables, or multiple interpolation for more robust treatment of missing values), culling cases where key variables are missing, or using statistical methods such as inverse probability weighting to account for missing values29). The choice of method depends on the missingness mechanism and the goals of the analysis; multiple interpolation may be appropriate for survival analysis, whereas culling may be applicable if missing values are small and randomly distributed. Each method should be well justified and tested for robustness through sensitivity analysis.

Validation of consistency with the original report is essential to maintain data integrity. This requires reproducing key baseline statistics or primary outcomes from the trial publication30). This step verifies the fidelity of the data to the RCT findings and builds confidence for subsequent analysis.

 Statistical Analysis

This step requires careful selection of statistical methods to match the nature of the research question, the structure of the data, and the original experimental design to ensure that the results are both valid and interpretable. The choice of analytic method depends on the type of variable and the objectives of the study31). For continuous variables, such as changes in blood pressure or quality of life scores, a t-test or analysis of variance (ANOVA) can be used to compare means between groups. For binary outcomes, such as response rate or incidence of adverse events, logistic regression is often used to estimate the ratio of ratios and thus adjust for covariates. Time-to-event outcomes, such as survival or disease progression, often require the use of Cox proportional risk models to handle censored data and provide risk ratios to assess treatment effects over time. For more complex data structures, such as longitudinal measurements or repeated events, mixed effects models or generalized estimating equations may be required to account for within-subject correlations.

In addition to the type of outcome, the design of the original study also has an important impact on the analysis. RCTs are usually randomized at the individual level, but some trials use cluster randomization (e.g., for hospitals or communities), which introduces correlations within clusters. Ignoring such cluster effects can lead to higher Type I error rates, so analysts need to adjust for them using methods such as mixed effects models or robust standard errors. Stratified randomization is more common in trials balancing key characteristics, and it may be necessary to include stratification factors in the analysis as covariates or interaction terms to preserve the original design intent32). Similarly, if trials use a blocking approach to ensure balanced allocation, this should also be considered, although its effect is usually smaller when sample sizes are larger. These designs can be obtained from trial protocols or statistical analysis plans and need to be understood to ensure that secondary analyses take into account the randomization of the RCT.

Statistical power is another key consideration, especially when secondary analyses typically explore subgroups or new hypotheses that were not adequately empowered by the original trial. For example, questions about treatment effects in small subgroups may have difficulty detecting differences due to insufficient numbers of events or participants, leading to underpowered efficacy results. Analysts should estimate efficacy after the fact, using available sample sizes and expected effect sizes to assess the reliability of the results33).

If efficacy is low, findings should be considered exploratory and need to be interpreted with caution and a call for repeated validation on larger data sets. Conversely, if analytic efficacy is sufficient, the sample size of the RCT can be utilized to generate robust insights, but analysts still need to ensure that the data meets the needs of the question, that the variables are adequately measured, and that the hypotheses are valid.

 Interpret The Results

Findings need to be interpreted in the context of the original study, assessing clinical significance, considering implications for practice, and suggesting needs for future research34). Detailed discussion of limitations such as data applicability or generalizability of results and increased transparency in reporting can enhance the credibility of the results35).

 TYPES OF SECONDARY ANALYSES

Subgroup Analyses: Subgroup analyses are designed to examine how treatment effects vary across patient subgroups (e.g., gender, age, ethnicity, or disease type), with the goal of identifying differentiated benefits or risks that may not be apparent in the overall population, thereby supporting personalized medicine. This requires stratifying the data and testing for interactions between treatment and subgroup variables, often using regression models that include interaction terms. Potential challenges to secondary analyses include reduced sample sizes within subgroups, which may lead to insufficient efficacy, and the need to guard against spurious findings due to multiple testing36). Pre-specification of subgroups on a biological or clinical basis can enhance the confidence of the results and avoid data dredging (Table 2).

Table 2 Overview of secondary analysis types in clinical RCTs

Analysis Type Description Common Methods
Subgroup Analyses Examines treatment effect variations across patient subgroups to support personalized medicine. Identifies benefits/risks not evident in the overall population. Stratified analysis, regression with interaction terms, multiple testing corrections
Biomarker or Correlative Studies Investigates relationships between biomarkers and outcomes to identify predictive/prognostic markers for precision medicine. Logistic regression, survival models, biomarker-treatment interaction analysis.
Health Economics and Quality of Life Analyses Assesses cost-effectiveness and long-term quality-of-life impacts. Cost-effectiveness modeling, t-tests, mixed-effects models for QoL scores.
Methodological Research Uses RCT data to test/improve statistical methods, designs, or analytical approaches. Advances research methodology. Simulation studies, direct method comparisons, novel model evaluation.
Adverse Event Analysis Focuses on safety by analyzing frequency, severity, or predictors of treatment-related adverse events to refine risk-benefit profiles. Incidence rate calculation, relative risk/ratio ratios, regression for risk factor identification.
Predictive Modeling Develops/validates models to predict treatment response or disease progression, creating tools for clinical decision-making. Regression, machine learning, cross-validation, AUC assessment.

Biomarker or Correlative Studies: By examining the relationship between biomarkers (e.g., biological indicators such as gene expression, protein levels, or imaging findings) and prognostic outcomes, it is possible to identify predictive markers (indicative of which patients are likely to benefit) or prognostic markers (indicative of disease progression) that can advance precision medicine37). For example, Sinicrope et al38). conducted a secondary analysis using data from two phase III randomized clinical trials of stage III colon cancer patients to evaluate whether deoxyribonucleic acid (DNA) mismatch repair (MMR) status and BRAF/KRAS mutations were associated with post-recurrence survival38). Using Cox regression models adjusted for clinical covariates, the study found that deficient MMR and BRAF mutations were each associated with significantly worse survival after disease recurrence, while KRAS mutations showed no significant association. The biomarker data were prospectively collected in the original trials, but applied retrospectively in this analysis, highlighting the potential of secondary analysis to uncover prognostic molecular subgroups. Such studies may provide guidance for patient stratification or drug development, but the reliability of their conclusions depends on the quality and relevance of the biomarker data, and external validation in independent cohorts is recommended.

Health Economics and Quality of Life Analyses: Used to assess the cost-effectiveness of an intervention or its wider impact on patients in the longer term after treatment, to inform healthcare policy and resource allocation. Cost-effectiveness analyses may estimate the cost per quality-adjusted life year gained, combining data from trials on resource use (e.g., hospitalizations) with external cost estimates. Quality-of-life analyses, on the other hand, assess the impact of treatment on patient-reported outcomes, such as fatigue or emotional well-being, often using scale data collected as secondary endpoints. Analysis methods include cost-effectiveness modeling (e.g., Markov models) or statistical comparisons of quality-of-life scores through t-tests, mixed-effects models. These analyses link clinical efficacy to real-world values, but require comprehensive cost or quality-of-life data, which may be scarce in some randomized controlled trials, as well as assumptions about long-term outcomes or costs outside the trial setting.

Methodological research: the use of RCT data to test or improve statistical methods, experimental designs, or analytical approaches, thereby advancing the research methodology itself. For example, using trial data as a test bed to compare different missing data interpolation techniques, or evaluating novel survival models against traditional models. This may involve overlaying simulation studies on top of real data, or making direct comparisons of different methods to assess metrics such as bias or efficacy. Such studies can expand the methods of future statistical analyses and validate their validity and robustness. However, this requires a deeper understanding of the differences between the scenarios in which statistical methods are applied and RCTs, and the findings may be context-specific and require broader validation.

Adverse event analysis: Adverse event (AE) analysis focuses on safety outcomes, exploring the frequency, severity or predictors of treatment-related AEs. This may include calculating the incidence of a specific adverse event (e.g., neutropenia), comparing risks between treatment groups using relative risk or ratio ratios, or identifying risk factors through regression analysis. The aim is to reveal unreported or subgroup-specific safety signals in the primary analysis, thereby refining the risk-benefit profile. Adverse event analyses may reveal that the toxicity of a drug is concentrated in a specific population, thereby guiding clinical monitoring.The lack of statistical efficacy due to low AE incidence may present a challenge for secondary analyses.

Predictive Modeling: Prediction model construction: using RCT data as a training or test set to develop or validate clinical predictive models for predicting disease progression or treatment response. This may involve traditional regression analysis or machine learning methods (e.g., random forests, support vector machines, XGBoost)39),40). Risk scores can be constructed based on baseline variables, biomarkers, clinical measurements, and gene/protein expression with the goal of creating tools for clinical decision making. This requires dividing the data into training and validation sets (or using cross-validation), assessing model performance through metrics such as area under the curve (AUC), and ensuring the model’s ability to generalize41). RCTs provide high-quality, controlled data that are suitable for this purpose, but the models may be overfitted to the trial population and therefore require external validation42). This approach is becoming increasingly important as computational power increases and emphasis is placed on personalized medicine.

 CHALLENGES AND LIMITATIONS

While secondary analyses have important value, they also face many challenges:

Data applicability: Primary data may not be applicable to new research questions and there is a risk of bias, especially when key variables are missing or measured differently43).

Missing data: Non-randomized missing data may lead to biased results and require selection of appropriate complementary methods and careful handling44).

Generalizability: the original study population may not be representative of the target group, thus limiting the applicability of the findings35).

Multiple testing and data dredging: Conducting multiple statistical tests or exploratory analyses on the same dataset increases the likelihood of false positives, a problem that is often exacerbated by “data dredging” (unplanned or excessive searching of the data without explicit assumptions)45). Without a predefined research question and analysis plan, researchers may inadvertently or intentionally select results with trends, leading to overfitting or false conclusions.

Collaboration issues: Collaboration with the original investigator may delay the process or cause conflict, requiring clear communication46).

Mitigation strategies include clarifying the analysis plan in advance, using robust statistical methods, and transparently reporting limitations to ensure the reliability of the study47).

 THE FUTURE OF DATA SHARING AND SECONDARY ANALYSIS

As data sharing becomes more widespread, driven by regulatory requirements and technological advances, and more and more RCT data is shared out by drug companies or researchers, the role of secondary analysis will continue to expand. Database-based cloud platforms and analytics tools will simplify the process of data access and analysis. Best practices will evolve to ensure responsible data sharing, making secondary analytics a key driver of scientific discovery and patient care improvement.

 CONCLUSION

Secondary analysis of RCT data is a powerful tool for advancing medical knowledge, utilizing clinical research databases such as VIVLI, YODA, and BioLINCC. By following a rigorous methodology, addressing challenges, and adhering to ethical norms, researchers can uncover new insights that can improve medical decision-making. This approach, supported by evolving data sharing practices, bodes well for future significant contributions to science and patient well-being.

 List Of Abbreviations

AE: Adverse Event

ALLHAT: Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial

AUC: Area Under the Curve

BioLINCC: Biologic Specimen and Data Repository Information Coordinating Center

CDK4/6: Cyclin-Dependent Kinases 4 and 6

DNA: Deoxyribonucleic Acid

EMA: European Medicines Agency

HER2: Human Epidermal Growth Factor Receptor 2

ICTRP: International Clinical Trials Registry Platform

IPD: Individual Participant Data

MMR: Mismatch Repair

NHLBI: National Heart, Lung, and Blood Institute

nmCRPC: Non-Metastatic Castration-Resistant Prostate Cancer

PALOMA: Palbociclib: Ongoing Trials in the Management of Breast Cancer

PAD: Peripheral Arterial Disease

PFS: Progression-Free Survival

RCT: Randomized Controlled Trial

SAP: Statistical Analysis Plan

SPARTAN: Selective Prostate Androgen Receptor Targeting with ARN-509

TOPCAT: Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist

YODA: Yale Open Data Access

 DECLARATIONS

 ETHICS APPROVAL AND CONSENT TO PARTICIPATE

Not applicable.

 CONSENT FOR PUBLICATION

Not applicable.

 AVAILABILITY OF DATA AND MATERIALS

No new datasets were generated or analyzed during the current study. All data referenced are from publicly accessible sources, which have been cited appropriately in the manuscript.

 COMPETING INTERESTS

The authors declare that they have no competing interests

 FUNDING

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

 AUTHORS’ CONTRIBUTIONS

QS conceptualized the study, designed the methodology, conducted the investigation and visualization, drafted the original manuscript, and contributed to review and editing. SY curated the data, participated in investigation, and contributed to both drafting and editing of the manuscript. MO performed the formal analysis, interpreted the data, validated findings, and participated in manuscript editing. XW was responsible for investigation, resource provision, data interpretation, and review of the manuscript. SL supervised the project, contributed to conceptualization and methodology, performed formal analysis, drafted and reviewed the manuscript, and administered the overall project. All authors read and approved the final manuscript.

 ACKNOWLEDGMENTS

The authors thank Ms. Li Kong, President of Academy of Clinical Research and Study, for coordinating this research work and communication.

 DISCLAIMER

Sheng Luo is one of the Editorial Board members of Annals of Clinical Epidemiology (ACE). This author was not involved in the peer-review or decision-making process for this paper.

References
 
© 2025 Society for Clinical Epidemiology

This article is licensed under a Creative Commons [Attribution-NonCommercial-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nc-nd/4.0/
feedback
Top