Adjustment of prognostic effects in prevalent case-control studies on genotype.

Since genotypes are unchangeable, adjustment of prognostic effects in prevalent case-control studies may produce an unbiased estimate of odds ratio (OR) for disease occurrence. In this paper, the prognostic effects on OR is demonstrated, then three approaches to examine and/or adjust the OR are presented. The demonstration shows that the prognostic effects are larger in diseases with poor prognosis than in those with better prognosis. Genotypes increasing disease risk and fatality rate are underestimated, while those increasing the risk and improving prognosis are overestimated. The simplest approach to examine the OR derived from prevalent case-control studies is to conduct stratified analysis according to the interval between diagnosis and study enrollment. When the stratified analysis finds no substantial difference in the estimate, the OR reflects mainly the relative risk for disease occurrence. The proportion of genotype among putative cases at diagnosis can be estimated from prevalent cases by a logistic model, producing the OR adjusted for the interval from diagnosis. An incomplete-data case-control design is also applicable to adjust the prognostic effects. An actual prevalent case-control study on breast cancer is used to demonstrate the three approaches. They are useful to compensate the disadvantage of prevalent case-control studies.


INTRODUCTION
Case-control studies with prevalent cases for estimating a relative risk have been regarded as a substandard design due to two reasons. One is that an estimated odds ratio (OR) of factors under study indicates an association not only with disease occurrence but also with disease prognosis 1-3). In prevalent case-control studies, longer-term survivors are more likely to be sampled as cases. Accordingly, the factors related to longer survival are more frequent among prevalent cases than among incident cases, which produces an elevated OR even if the frequency among incident cases is the same among controls. Such prognostic effects introduce a bias for the OR estimation concerning relative disease occurrence. The other relates to the recall bias. The memory concerning exposure experience before disease onset becomes more obscure as the time interval between diagnosis and enrollment is larger, so that the difference in the accuracy of exposure status between cases and controls is larger in prevalent case-control studies than in incident case-control studies. This uncorrectable recall bias has been discouraging to develop a method to adjust the prognostic effects in prevalent case-control studies. Along with a rapid progress of genotyping techniques using PCR (polymerase chain reaction) 4-7), many epidemiologic studies have been conducted to estimate the relative risk of genetic polymorphisms 8-11), by case-control studies with incident or prevalent cases. Since the genotypes do not change and are independent on the time of genotype tests, no information bias is included for the estimation of genotype OR. Accordingly, if a model adjusting the prognostic effects is introduced, the disadvantages of prevalent case-control studies will be overcome, resulting in a comparable design to incident case-control studies.
This paper proposes approaches to examine and/or adjust the prognostic effects in prevalent case-controls studies on genotypes. First, prognostic effects on the estimated ORs are simulated by a mathematical model, followed by three approaches; 1) stratified analysis according to the time interval between diagnosis and study enrollment, 2) estimation of the percentage of individuals with a given genotype among putative cases at diagnosis (putative incident cases), and 3) a Poisson regression model to adjust the OR for the interval.
A prevalent case-control study on breast cancer risk and beta 2 adrenergic receptor gene (BAR2) Gln27Glu polymorphism is used as an example to apply the approaches. It is known that adrenergic receptors play roles in the regulation of thermogenesis and lipid mobilization. The Glu allele of BAR2 Gln27Glu polymorphism was reported to be associated with obesity 13), though inconsistent results were also reported 14). Since obesity is a risk factor for postmenopausal women, we conducted the case-control study.

Study subjects
The breast cancer case-control study used as an example was conducted in a series of projects 15,16) approved by the Ethical Committee at Aichi Cancer Center in 1999 (Ethical Committee Approval Numbers 12-20 and 12-23). Cases were 239 female breast cancer patients aged 26 to 70 years (mean, 50.4 years) at diagnosis, who had been diagnosed in the past 20 years at Aichi Cancer Center Hospital. Controls were 186 female outpatients aged 24 to 69 years (mean, 53.0 years) without cancer who visited the same hospital, mainly at clinics of gastroenterology, breast surgery, and gynecology. All subjects were enrolled between 1999 and 2000.
Their BAR2 Gln27Glu polymorphism was genotyped by the method described by Large et al 13). The subjects with a Glu allele (GlnGlu or GluGlu genotype) were 30 (12.6%) out of 239 cases, and 32 (17.2%) out of 186 controls.

Statistical models 1. Prognostic effects on OR
In order to examine the prognostic effects on OR, a mathematical model is constructed. The OR for a given genotype is calculated by OR= {Pcase (1-Pcontrol) } / { (1-Pcase) pcontrol }, where Pcontrol is the proportion of the genotype among controls and Pcase is among cases. When OR is given, Pcase is obtained by P case = { OR Pcontrol } /{1+ Pcontrol (OR-1)). Denote the survival curve for those without the genotype as S(t), a as a constant, and t as the time from diagnosis, and assume S(t) to be expressed by When hazard ratio (HR)  for  those  with  the  genotype  relative  to   those  without  it is given,  the  survival  curve  for  those  with  the genotype, S(t)g, is Accordingly, the proportion of those with the genotype among the survivors at time t, which is denoted by pg, is expressed by

Calculations
The above estimates were calculated by the computer program STATA Version 7 (STATA Corporation, College Station, TX). The ORs and 95% confidence intervals (95%Cl) were estimated by STATA commands "logistic" and "logit" for stratified analysis according to the interval from diagnosis, "cci" for genotype proportion adjustment approach , and "poisson" for incomplete-data case-control design approach.

Prognostic effects on ORs
We assume here that all prevalent cases are the survivors at time t with S(t). It is an extreme case, because prevalent casecontrol studies usually include a proportion of incident cases as well as prevalent cases with a different t. Table 3 shows the calculated OR' when Pconrol = 0.1, 0.3, or 0.5, OR=2 or 5, S(t)=0.25, 0.5, or 0.75, and HR=0.5, 1, or 2. In case of HR=1, i.e., the genotype is not influential to prognosis, the same estimate is obtained as the true OR. When the genotype relates to poor prognosis (HR>1), the OR's become small. Meanwhile, when it relates to better prognosis (HR<l), the OR's become large. When the survival rate is lower, the effect on the ORs is larger. When the survivors are half and their HR is 2, OR=2 is reduced to OR'=1 and OR=5 to OR'=2.5. In case of S(t)= 0.75, the effect is relatively small; OR=2 is reduced to OR'=1.5 for HR=2 and increased to OR'=2.3 for HR=0.5. The OR' for OR=5 with S(t)=0.75 is 3.8 for HR=2 and 5.8 for HR=0.5. The extent is independent on Pcontroi, the proportion of the controls having the genotype. Table 3. Calculated OR' for prevalent case-control study according to S(t) and HR. Interval between diagnosis and study enrollment in years Figure 1. Estimation of genotype proportion of putative incident cases at diagnosis for a prevalent case-controls study on breast cancer and beta-2 adrenergic receptor Glu allele by a logistic model. Table 5. Incomplete-data case-control design approach to the OR estimation adjusted for the interval after diagnosis by a Poisson regression model.
are close to those obtained by stratified analysis and genotype proportion adjustment approach.

DISCUSSION
Although prevalent case-control studies have two main problems as stated in the introduction, they are very attractive because of a shorter enrollment period of study subjects. A short enrollment period suits well, especially in rapidly changing research fields. In addition, a larger number of subjects sampled from prevalent cases produces a more stable estimate than limited number of incident cases collected in the same length of enrollment period. Epidemiologists have been considering prevalent case-control studies to be a substandard, unrecommendable design, because their main tool in the past was questionnaires. Questionnaire studies are sometimes very cheap and powerful, but in many cases information bias is inevitable. For information-bias-free risk factors, the adjustment of the prognostic effects enables us to produce evidence at a similar level to that from incident case-control studies.
Modem biostatistical models and computer programs provide us rough, but simple adjustment approaches to examine and adjust the ORs from prevalent case-control studies. Even if they are not accurate in a statistical sense, these approaches are apparently useful. Until a more sophisticated method is developed, they could be applied for prevalent case-control studies, producing new findings. Whether environmental exposure or genetic traits, factors affecting both disease occurrence and prognosis may work in the same direction; either to promote or disturb a disease process. Smoking elevates disease risk, and for many diseases it deteriorates the prognosis 18,19). In such a case, the observed OR (i.e., OR') is the underestimated, indicating that the effect on disease occurrence is larger than the observed one. For the prevalent cases with a 50% of survival rate on average, the OR' is unity for OR=2 and HR=2. It means that OR larger than HR stays the same side (>1 or <1) for the cases with more than 50% survivorship. Another important feature is that a large OR for the cases with a large survival rate does not affect the conclusion of the study. The OR' for OR=5 with S(t)=0.75 in Table 3 is 3.8 for HR=2 and 5.8 for HR=0.5. It may be a smaller deviation than a random deviation from a small size study, and than the difference among the different studies.
The simplest method to examine the prognostic effects is the stratified analysis. If the ORs obtained by the stratified analysis are more than unity or less than unity, the factor may be associated with the disease occurrence. When no substantial difference is observed among the ORs, the prognostic effects on the observed OR is limited.
If the prognostic effects of genotype under study cannot be neglected, the adjustment is recommended. Although this paper did not compare the genotype proportion adjustment approach with the incomplete-data case-control design approach in statistical viewpoints, the choice may be dependent on the distribution of the interval from diagnosis and function of the genotype ratio according to the interval. The conditions causing a large difference in the adjusted ORs between the two approaches remain to be elucidated.
The example data in this study is from a prevalent case-control study of breast cancer, which has a good prognosis. Probably, the survival rate is 0.8 or more on average for the present cases whose intervals from diagnosis are distributed mainly between 0 and 5 years. Accordingly, the difference among the estimates is not observed. The application to the cases with poor prognosis could show a clearer adjustment effect, as expected from Table 1. To date, we do not have such an actual dataset.
Genetic epidemiology opened a new era in the field of epidemiology. It seems a dramatic change similar to one at a half century ago from infectious disease epidemiology to moderm epidemiology. Host factors have been discussed conceptually among the epidemiologists for decades, but now we can investigate host factors in a concrete form by genotyping techniques. Together with the shift from solely environmental factor evaluation to interactions with genetic traits, new methodology has been emerged 17,20-23). An incomplete-data case-control design under the assumption of the independent distributions of exposure and genotype seems a powerful epidemiologic tool. The method is demonstrated in this paper to be applicable for the adjustment of the prognostic effects. All three approaches are straightforward and understandable, which could compensate the disadvantages of prevalent case-control studies.