Family History Evaluation in Epidemiologic Studies; an Evaluation of Bias in the Conventional Methods and a Proposal of a New One

We discuss problems of conventional methods to evaluate family histories as risk factors in epidemiologic studies, and propose a new one. In the proposal, incidence rates of the disease interested among family members are treated as exposure levels. Besides, we show the quantitative difference between the results obtained by the conventional methods and the new one using models. Because of not so much information required for the new method, we recommend it even though the difference was not so large. Although family histories as risk factors in epidemiologic studies are important to evaluate whether a disease is related to genetic factors, how to observe them are still problematic. Some epidemiologic papers have discussed the issue ',23), but not yet shown the essential solution. In this paper we discuss the problems of usual methods to evaluate family histories and propose an alternative method. In addition, we show how much the current methods distort the results using models with Japanese cancer data. Because the purpose of the article is to discuss the theory to evaluate the magnitude of the family history in epidemiologic studies, we ignore issues about both mis-classification (information bias) and feasibility of studies. In this article, we point out the problems of a conventional evaluation method of family history first. Then, we propose a new method. Finally, we assess how much the usual methods distort the results using Japanese cancer data. Many epidemio-logic studies so far did not apply the ideal method and data that have already been gathered are limited. In this situation, one has to calculate the risk with a family history using the usual method, in particular cohort studies. Thus, we show the effect of the bias at the last part. PROBLEMS In many epidemiologic studies, exposures are evaluated as dichotomous data, ordinal data, or numerical data. We can point out whether to work at a special occupational condition as a dichotomous datum, a food intake frequency questionnaire survey as an ordinal one, and a period of smoking as a numerical one. Each of them is measured during a specific period of time, including one point as prevalence, of participants for an epidemiologic study. This kind of information is, therefore, able to be evaluated and equally informative for each participant. On the other hand, evaluation of family histories on epidemi-ologic studies is problematic. Many epidemiologic studies to explore risk factors of a disease treat a family …

Although family histories as risk factors in epidemiologic studies are important to evaluate whether a disease is related to genetic factors, how to observe them are still problematic.Some epidemiologic papers have discussed the issue ',23), but not yet shown the essential solution.In this paper we discuss the problems of usual methods to evaluate family histories and propose an alternative method.In addition, we show how much the current methods distort the results using models with Japanese cancer data.Because the purpose of the article is to discuss the theory to evaluate the magnitude of the family history in epidemiologic studies, we ignore issues about both misclassification (information bias) and feasibility of studies.
In this article, we point out the problems of a conventional evaluation method of family history first.Then, we propose a new method.Finally, we assess how much the usual methods distort the results using Japanese cancer data.Many epidemiologic studies so far did not apply the ideal method and data that have already been gathered are limited.In this situation, one has to calculate the risk with a family history using the usual method, in particular cohort studies.Thus, we show the effect of the bias at the last part.

PROBLEMS
In many epidemiologic studies, exposures are evaluated as dichotomous data, ordinal data, or numerical data.We can point out whether to work at a special occupational condition as a dichotomous datum, a food intake frequency questionnaire survey as an ordinal one, and a period of smoking as a numeri-cal one.Each of them is measured during a specific period of time, including one point as prevalence, of participants for an epidemiologic study.This kind of information is, therefore, able to be evaluated and equally informative for each participant.
On the other hand, evaluation of family histories on epidemiologic studies is problematic.Many epidemiologic studies to explore risk factors of a disease treat a family history as an exposure; say, whether to have family members with gastric cancer is treated as a potential risk factor in a cohort or casecontrol study of gastric cancer.In this case, one point is different from the other factors.Information about participants themselves, such as smoking, behavior factors, and food intake, depends on only the life time of the participants.Family history information, in contrast, depends on family members' life time, which is other persons' one.Thus, whether a participant has a family history relates the duration of time period of life of other persons.We assume an epidemiologic study of gastric cancer and fathers' family history as its potential risk factor.If a father of a participant of the study died just after the participant was born, the probability that this participant has the family history is quite small because the mortality rate from gastric cancer in young males is small and the father is never affected by the cancer after the death.Can we consider this "unexposed" participant equivalent to another participant not having the family history whose father lives till 100 years of age without gastric cancer?In other words, those being treated as not having the history include some persons who would be treated as exposed group if their father would have lived longer.This results always introduce the observed relative risk that is toward the null if the family history relates to the risk of the disease.This means that observed relative risks are smaller than the true ones if family members with a disease increase the risk of the disease among other members, which is usual situation.
General explanation is shown in Figure.Nevertheless, many epidemiologic studies measuring family history of parents as a potential risk factor treated the factor as a dichotomous data; whether for participants' parents to have been affected by a specific disease.Quite a few studies get information of ages of the parents4).
Family history study among siblings and offsprings are more serious.Whether a participant has a family history depends on the number of siblings and offsprings.For instance, a participant without siblings has always no family history among sib-lings.In other words, even if a disease affects siblings or offs springs with a same probability, a participant with many siblings or offspring is more likely to have the family than those having the limited number of siblings and offsprings.The life period of time also influences whether to have a family history among them.Nonetheless, many epidemiologic studies treat the family history among siblings as a dichotomous data; whether to have (a) sibling(s) with the disease.Only a few studies have shown the total number of siblings and the number of ones with the disease 4).
In conclusion, the conventional method to evaluate a family history in epidemiologic study, which confirms only whether a participant has a family history or not, induces results with a bias above mentioned.This is a logical problem.
1. cohort studies a and b: the numbers of cases with a disease interested.A and B: If the incidence rate is observed, A and B are population time (i.e., person-year).If the cumulative incidence is observed, A and B are the population at the start point.
The B consists of three groups; one is those with a family member living long enough but not having the disease, one is those with a family member not living long but who would not have had the disease even if he/she had lived longer, and the other is those with a family member not living long and who would have had the disease if he/she had lived longer.If the family history increases the risk of the disease, the last group elevates b.Consequently, the observed relative risk is underestimated.

case-control studies
The b consists of three groups; one is those with a family member living long enough but not having the disease, one is those with a family member not living long but who would not have had the disease even if he/she had lived longer, and the other is those with a family member not living long and who would have had the disease if he/she had lived longer.If the family history increases the risk of the disease, the last group lowers a/b in comparison with c/d.Consequently, the observed relative risk, the odds ratio, is underestimated.

PROPOSAL
Instead of the conventional method to evaluate a family history, we propose an incidence rate method.This is to observe the magnitude of a family history using an incidence rate of the disease among family members of participants.
Some epidemiologists propose that analytic design in a casecontrol study be transformed into a cohort study design5, 6 ).
Our method is to use the incidence as exposure measurement directly.On the other words, incidence rate of a specific disease among family members itself is treated as exposure dose measured by numeric value.Using examples of gastric cancer and its family histories, we explain this method.
In case of a case-control study of gastric cancer, we have to measure the magnitude of family history, which means how much gastric cancer occurred among specific family members.
In case of a cohort study of gastric cancer, we measure the magnitude, and then classify a participant as an exposed or non-exposed person.To simplify the explanation, we observe only family history among parents and siblings.In both the study designs, specific period of time to be observed should be determined.For example, from 20 through 69 years of age, each family member is observed (Of course, life time observation is also possible).During the determined period of time, observed person-year of gastric cancer free period should be summed.If the person died before the end of observation period of time (in abovementioned case, 69 years), he or she was a censored case at that death point.If the person is younger than the end and without gastric cancer when the family history is evaluated, then he or she is censored as well.Then we calculate the incidence rate of gastric cancer among family members with the number of the occurrence of gastric cancer divided by the total person-years.In the case-control study, the incidence rate of gastric cancer among parents and siblings is divided into several categories as dichotomous data or ordinal data, and odds ratios can be estimated.If one uses a logistic model, one is able to treat the incidence rate as numeric data to obtain odds ratio estimated as well.In the cohort study every participant is classified into one of the several exposure levels divided by the incidence rate, and then rate ratios or cumulative incidence ratios are observed as relative risks.Using a Cox's proportional hazard model or a logistic model, one can also treat the incidence rate as numeric data.
When an epidemiologist uses this method to evaluate a family history magnitude, observed period of time for all target family members are required.Whether for each family member to have an event of a target disease, and when it occurred if it exits, are also necessary.The greatest advantage of this method is to be able to remove the effects caused by the various numbers of family members and the various length of observation period of time for each family member.On the other hand, two limitations exist.One is that by using this method we cannot evaluate the risk when researchers' interest is to evaluate the family history only for one of the parents, a father or a mother, because to calculate an incidence rate for only one person is nonsense.The other is that these methods can be applied to such a common disease that at least one of the family members has.If no family member has the target disease, we cannot assess the amount of observed person-years, even though the incidence rate is zero.
Thus, we recommend to use the method to evaluate magnitude of family history in an epidemiologic study if the disease is not too rare.The only information required is each target family member's age at onset of the disease, death, or the present time.

Introduction
Many epidemiologic studies have been conducted to evaluate the family history of a disease by whether or not specific family members had the disease.We, epidemiologists, sometimes have to use information of family histories already having been obtained using incomplete methods.In this part, we show how much the conventional method, in which the family history exposure is treated as dichotomous data, affects the results.

Materials and Methods
Data used were incidence rates and mortality rates of gastric cancer for males and breast cancer for females in Japan in 19887,8).The incidence rates were estimated for whole Japanese based on 10 cancer registration systems.The mortality rates were from the vital statistics.
We established models in which we assumed that the morbidity and mortality in 1988 continued for 50 years.For the gastric cancer, it is assumed that participants in an epidemiologic study were to be born when their fathers were 25 years old, and recruited the study at 50 years of age.Therefore, whether fathers had gastric cancer were observed from 25 through 75 years of age, or till deaths if they occurred before 75 years of age.Data obtained were arranged for 5-year age class, so we established the model below with 5 year observation period as a one time unit.Using this model, we assume that No. 3 condition would not occur.In other words, the fathers would die only by gastric cancer with the current mortality rate of the cancer, and would never die by other causes.
Cumulative incidence and cumulative death were calculated using the formula, where Ch is cumulative incidence or death during the period of time t, and I is the rate.
How to calculate the effect is explained using an example.We assume 100,000 populations at the start.In other words, a cohort with 100,000 participants was observed between 25 and 74 years of age.During the 50 years 7286 incident cases of gastric cancer are observed using the Japanese incidence data.If there would be no deaths other than gastric cancer, 8395 incident gastric cancer cases should be observed.
Assuming that the relative risk of gastric cancer with the family history calculated by the conventional methods is 1.2, we can get the following situations for participants ( where CIa is the cumulative incidence in this situation, and Ru is the un-biased or true relative risk.The 8395 participants with the family history are based on the calculation in which the same number of incident gastric cancer cases among fathers should be observed if there would be no death other than the cancer, which we discussed before.Thus, the total cumulative number is 8395*Ru*CIa + 91605CIa (ii).
Because we observe the same cohort in the 2 models, the cumulative number of gastric cancer, (i) and (ii), should be identical.101457Cin = (8395K.+ 91605)CIa.(iii).
Participants with the family history in the first table are a part of those with the history in the second table.Therefore, cumulative incidence among those with family history in the 2 models, 1.2CIn and R*CIa, should be identical.Using the same ways, we calculated un-biased risk factors for gastric cancer and breast cancer in case of the observed relative risks of 1.2, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, and 5.0.

Result
The table shows the results.The magnitude of the bias was large where the relative risk was large.
In general, the un-biased relative risk is obtained using the next formula; R.=R X(1-pu)/{(l-po)-Ro(Pu p.)}, where R is an un-biased relative risk, Ro is an observed relative risk, pu is a risk of cancer for a parent in the model, and po is an observed risk of the cancer in real.This is a general solution of the formulas (iii) and (iv).The magnitude of the bias is the late part of the right side of the formula.Therefore, the magnitude depends on the risk of the cancer for a parent and the relative risk itself.

Discussion
In this part we evaluated the magnitude of the bias that the conventional methods include.One should discuss the effects of family histories with regard to the magnitude.We do not say that the data with a bias is useless, but the new methods are rec-

Figure
Figure.Problems of the conventional method to evaluate family histories as a risk factor in epidemiologic studies.
at the start of the unit at the end of the unit alive without gastric cancer-op.No. 1: have gastric cancer and still alive No. 2: have gastric cancer and died No. 3: died by other cause than gastric cancer No. 4: without any of the event above Fathers without gastric cancer at the start point of the 5-year unit would be classified as one of the 4 conditions at the end of the unit.Only the fathers with condition No. 4 are observed during the next 5-year unit.With the conventional method epidemiologists observe only No. 1 or No. 2 conditions, and ignore the condition No. 3.

Table .
Observed and un-biased relative risks of those with family history among parents, male gastric cancer and female breast cancer, in Japan.ommendedifpossible.It is because the information required is not much, as shown in the last part of PROPOSAL.CONCLUSIONIn this article we have shown the disadvantage of the methods to evaluate the family history of a disease in usual epidemiologic studies.Instead, we propose an alternative method that can exclude the bias.Besides, we evaluate the magnitude of the bias in order to discuss whether relative risks obtained