Classification and Nomenclature of Nonexperimental Epidemiological Design

Classification and nomenclature of nonexperimental epidemiological design have been a "confounded" issue. This paper discusses the problems in existing classifications of nonexperimental epidemiological design from a logical viewpoint including concept, definition and classification dimensions. The author selected, conceptualized and defined the dimensions for a study design classification scheme according to the purposes of the classification, logical consistency, and relative significance of each dimension. This classification includes ten types of basic design distinguished in terms of sampling scheme and order of hypothesized occurrence, which can be grouped into exposure-control, outcome-control and general population designs. More specific concepts may be generated from addition of an additional design features(s) to a basic design. By using the scheme, this classification distinguishes the various existing designs and clarifies some ambiguity. An example is provided to describe the classification scheme. J Epidemiol, 1994; 4 : 113-119.


CLASSIFICATION RATIONALE AND EXISTING CLASSIFICATIONS OF NONEXPERIMENTAL EPIDEMIOLOGICAL DESIGN
Based on the understanding of the nature of the materials to be classified, a good classification of things depends on reasonable dimensions used to distinguish different things, suitable concepts and relevant definitions. The existing classifications of nonexperimental epidemiological designs were examined in terms of these aspects.

Dimensions for Classification
Classification is a logical process that groups or separates a series of things by their common properties and differences3). To know the similarities and differences between things, the characteristics of things need to be analyzed and those that can be used to distinguish things (dimensions for classification) need to be determined.
Logically, a dimension for classification should be a characteristic that is used consistently at each step of the classification, and that leads to mutual exclusiveness of different categories3,20,21) Consistent use of characteristics at each step of classification means that a dimension should not be changed or supplemented on any one level of the classification proces2021). Mutual exclusiveness of categories implies that no individual should be a member of more than one category '21-13). In classification of nonexperimental epidemiological design, these rules usually have not been observed.
The well accepted classification of nonexperimental epidemiological design into cohort studies, case-control studies and cross-sectional studies24-27) does not conform to the rules of consistent use of characteristics and mutual exclusiveness of categories. While a case-control study is conceptualized only by subject selection (by disease status)2s.28.29), an additional component in time (follow up from exposure to disease) is supplemented to subject selection (by exposure status) in the definition of cohort studies24-27). On the other hand, a cross-sectional study is defined only according to time in measurement (simultaneous measurement of exposure and disease status) by some epidemiologists24,25,27). The inconsistent use of the characteristics (dimensions) in the classification of design leads to a so called overlapping of categories. For example, a case-control design can be thought of as one kind of cross-sectional study because no dimension in time is specified in the definition. On the other hand, a crosssectional design also can be classified as a case-control study under certain circumstances because it is not defined how subjects are selected.

Concept
A concept is a notion that enables the mind to distinguish one thing from another30). It is represented by a term or terms. The application of terms with different meanings (without an overall relation between them) to the same thing may lead to an ambiguity in concept. For example, "case-control" generally implies the approach of subject selection14.25,28,29) in which the time of data collection might vary ; "retrospective" means the timing of data collectio13,31) or directionality of pursuit6,26), in which the selection of subjects might be different, such as selection by disease status or exposure status. "Case-control" and "retrospective" have often been equated in some epidemiological books26.32-34) This indistinct use of the two terms might bring confusion to students and epidemiologists.
A differential use of a well accepted concept might be another problem. The word "case" is epidemiologically defined as a person having a specified disease, health disorder, or medical state or event34,35) In the definition of case-control studies, most epidemiologists have taken the word "case" as a synonym of an outcome32.34.36)whereas the others have used it either as a result or as a factor related to the result5,12). This difference has caused the confusion between the concepts of "case-control study" and "cohort study". For instance, when the current prevalence of hypercholesterolemia is compared between a group of patients who contracted diabetes mellitus 20 years ago and a group of people who have never had diabetes), the latter would treat the patients with diabetes as "cases" even though diabetes is not an outcome, and therefore, regard this type of design as a longitudinal or forward "casecontrol" stud y6,12) On the contrary, the former would take the patients with hypercholesterolemia as the cases, and therefore, the design would be thought to be a cohort study.

Definition
Definition is the process of classifying the intention of a concept, in which the definition term and the defined term must be united. Otherwise, too narrow or too wide a definition emerges37). For example, many authors have defined "cohort study" in terms of both subject selection by exposure status and follow-up from exposure to diseas24,38). By their definition, the Framingham Study, which is considered as a paradigm of cohort study by most epidemiologists, would not quality as a cohort study because the subjects in this study were not selected by exposure.
To avoid these problems in selection of dimensions, concept, and definition, a new classification of nonexperimental epidemiological design should be based on careful considerations of these aspects.

Dimension Selection in the Suggested Classification
The dimension(s) for a classification should be not only logically reasonable, but also important for the classification. The process of selecting dimensions according to their importance requires consideration of the purposes of classification, the features that serve the purposes, and the relative significance of each feature.
Dimension selection depends upon the purposes of classification. This means that different purposes take different dimensions and bring about different classifications20,39). Although reflecting susceptibility to bias is an important purpose in distinguishing epidemiological designs40), study efficiency and capability to establish an exposure-disease time sequence also are important considerations in the practice of epidemiological studies. The different types of design vary in these aspects. For example, a design that includes follow up from exposure to disease may be subject to potential attrition bias, inefficiency for a study on rare diseases, but it generally can elucidate the temporal relationship between exposure and disease. On the other hand, a design that includes a comparison of cases and controls may be prone to recall bias, have less power in estimating rare exposure, and possibly fail to ascertain the temporal relationship between exposure and disease. However, such a design is efficient for a study on a rare disease. Therefore, dimensions used for a classification of nonexperimental epidemiological design should represent the features that are consistent with the purposes of classification.
The features that may be used to characterize a design have been discussed in some literatures14,40)These features can be grouped into three categories : study schemerelated features, study population characteristics, and exposure/disease-related features. The study scheme-related features include sampling scheme (sampling of subjects according to exposure or disease status) and timing of data collection (the temporal relationship between study onset and measurments). Population characteristics mean fixedness or dynamic of a study population. Exposure/ disease-related features include order of hypothesized occurrence (the temporal relationship between measured exposure and outcome), timing of disease measurement (the relationship between occurrence and measurement of disease), timing of exposure measurement (the relathonship between occurrence and measurement of exposure), and order of measurement (the temporal relationship between exposure measurement and disease measurement). Table  1 shows these features in relation to the three classification purposes.
However, not all features are of equal importance for a classification of nonexperimental epidemiological design. The features serving the primary classification purposes and reflecting basic structure of a design should be given priority. Study efficiency is an important practical consideration for choosing a specific design. When one studies the association of maternal diethylstilbestol (DES) exposure with the risk for vaginal cancer in young female offspring, with a limited budget and time length, he/she has to decide which approach is more suitable, sampling study population by the DES exposure among mothers or by the disease among young female effspring. Providing a clue for causal inference is the goal of analytical studies. Therefore, it is an important feature of a design whether it is possible to elucidate an exposure-disease time sequence (order of hypothesized occurrence). The remaining features primarily imply susceptibility to bias. These features are important on the condition that basic scheme of a design is specified. However, order of measurement is of minor importance because (1) order of measurement and sampling scheme are usually interdependent, and (2) the possible influence of measurement order on differential misclasification bias18,40) may be suggested by timing of data collection. For example, the exposure is measured first in follow-up studies of subjects sampled by exposure status, and the outcome is usually measured first in retrospective case-control studies in which subjects are sampled by disease status. When either exposure or disease status is known before study onset, differential misclassification in subsequent measurement of the disease or exposure is possible. On the other hand, if both the exposure and the disease have been measured before a study begins, the differential misclassification is less likely without knowledge of the study hypothesis. Based on the above analyses, the features (dimensions) that will be used in our classification can be categorized into two groups according to their importance for understanding a design, one primarily representing the basic structure of a design and the other mainly reflecting the possible susceptibility to bias. The former includes sampling scheme and order of hypothesized occurrence. The latter, which is meaningful after the basic structure of a design is specified, includes the temporal relationship between study onset and measurement, the relationship between occurrence and measurement of an outcome, and the relationship between occurrence and measurement of an exposure(s), providing information on study quality and possible bias related to features of an exposure or an outcome at measurement.
Concept and Definition in the Suggested Class f ration Table 2 presents the components of each dimension for our classification, their concepts, and definitions. The term "outcome" is used instead of "case" previously used because of the aforementioned problems. "Outcome" here refers to any process or state of health as an effect interested. The use of the word "exposure" is somewhat of a flaw, because it is originally defined to be "the fact of being exposed in a helpless condition to the elements"") while people do study protective (helpful) agents under some conditions. Unfortunately, we failed to find as a substitute for "exposure" a better single word that can imply the uncertainty of a variable of interest in causal inference, and the distinction between this variable and other variables such as confounding variables. Therefore, we will keep using the widely accepted word "exposure". However, "exposure" here is designated as any variable(s) interested in a study as a presumed determinant(s) of an outcome studied.
The Suggested Classification Table 3 lists types of design based on sampling scheme and order of hypothesized occurrence, reflecting the basic structure of a design. Designs are categorized into exposure-control outcome-control and general population types according to how subjects are sampled. An exposure-control study is referred to as a study in which the people are defined according to the extent of exposure to a presumed determinant, and the frequencies of the outcome  of interest are compared between the different groups of the people. An outcome-control study is defined as a study in which the people with the outcome of interest, and the sampled people without the outcome (who represent the preson-time experience regarding exposure in the population from which the people with the outcome rise) are selected and compared according to the proportions with the exposure of interest. A general population study is a study in which a whole population or part of the population is sampled without reference to either exposure or outcome status, and the frequencies of the outcome or/and exposure are estimated in the population. Based on the temporal relationship in hypothesized occurrence between an exposure and an outcome, exposure-control designs can be categorized into cross-sectional repeated cross-sectional and prospective types. Outcome-control designs are divided into cross-sectional repeated cross-sectional, and retrospective (in which the information about the past exposure(s) is obtained). General population designs include cross-sectional repeated cross-sectional prospective, and retrospective types. Additionally, the pool of study subjects may be fixed or dynamic. Different statistical measures and methods may be used depending upon whether the population is fixed or dynamic36) A component(s) of the other dimension(s) may be added to the design defined above if one wants a more specific concept of the design. For example, (1) prospective exposure-control study can be subdivided, according to temporal relationship between study onset and measurement, into preplanned (previously termed "prospective cohort study"), preconducted (previously termed "retrospective cohort study") and partially preconducted one (previously termed "ambispective study"). A preconducted study is inferior to a preplanned study because existing data rather than study-oriented data would be used ; (2) in measurement of an outcome, cases in an outcome-control design may be prevalent or incident. Using prevalent cases is subject to prevalence-incidence bias42); and (3) an exposure factor may be concurrent or nonconcurrent in terms of the relationship between its measurement and its occurrence. Recall bias or measurement bias is possible with measurement of an exposure that occurred in the past .
To illustrate the various types of design, an example is taken of study on the relationship between occupational exposures to cadmium and prostate cancer among workers aged 45-65 years in battery manufacturing factories in a big city. When workers exposed and nonexposed to cadmium in these factories are compared in frequency of prostate cancer, the study is an exposure-control study. When workers with prostate cancer and a sample of workers without the disease are compared according to the proportion of occupational exposures to cadmium, the design is an outcome-control study. If the whole population in these factories or its sample is defined and proportions of persons with prostate cancer or/and of exposures to cadmium are investigated, the study is a generalpopulation one. The most common exposure-control design is prospective, in which workers with and without exposures to cadmium are followed up for a period and the incidence rate of prostate cancer in each group is calculated. If this study starts among workers newly-employed in the battery-manufacturing factories, the design is a preplanned prospective exposure-control one. The information on occupational exposure and history of the disease is usually also available from occupational and health records. If the prospective exposure-control study is carried out based on the existing records, the study is preconducted. For an outcome-control design, the information on long-term exposures to cadmium can come from either the past occupational records, recall of subjects or measurement of urinary cadmium. These three types of outcomecontrol design are conceptualized as concurrent retrospective, nonconcurrent retrospective and cross-sectional, respectively. In addition, in an outcome-control design patients with prostate cancer can be either newly-diagnostic cases only or all existing cases. As a result, incident and prevalent outcome-control design can be distinguished. The most common general population design is crosssectional, in which urinary cadmium and the existence of prostate cancer are simultaneously investigated and the temporal relationship between them cannot be ascertained. However, prostate cancer can also be measured in terms of its occurrence during a period following meansurement of exposures to cadmium (a prospective general population study), and the information on the past exposures to cadmium can come from occupational records or recall of subjects (retrospective general population study). In addition, a cross-sectional general population study can be repeated. The repeated cross-sectional general population studies can be used to assess the relationship between changes in proportion of exposures to cadmium and changes in frequency of prostate cancer in the factories. DISCUSSION The key elements for establishing a classification scheme are determination and definition of features used for a classification, and appropriate conceptualization of the components of the features. Based on the dimensions, concepts, and definitions that have been carefully determined, ten basic types of design were generated from two primary dimensions, sampling scheme and order of hypothesized occurrence. When an additional dimension is added to describe a design, the design that the concept refers to is more specific. This allows epidemiologists to choose the level at which a concept to design is presented. The "lumper" or those teaching introductory epidemiology may prefer using only concepts of exposure-control outcome-control and general population study with an addition of order of hypothesized occurrence. These basic concepts concisely convey the most important attributes of a design. The "splitter" or those preferring more information in their concept may add an additional component to the basic concept. As a system, this classification includes most existing designs, distinguishes the various designs, and thus clarifies some ambiguity.
This classification scheme includes not only basic designs such as "case-control", "cohort", and "crosssectional" but also some hybrid designs. For example, a repeated cross-sectional population design is a so-called "panel study"25) whe n the population is fixed, and is previously termed a "repeated survey" when the population is dynamic36). A prospective exposure-control study with a prevalent outcome is a "follow-up prevalent study"36). Since these hybrid designs are conceptualized in this classification by the addition of an additional dimension(s) to a basic design, their relation to the basic design is reflected.
By granting a unique term to each type of design, one can distinguish one design from the other. In this classification, outcome-control studies are divided into cross-sectional and retrospective types, which have been seldom distinguished in previous classifications. This distinction is meaningful because of our interests in the exposure-outcome time sequence. Furthermore, with the addition of the dimension of relationship between occurrence and measurement of an exposure, retrospective outcome-control studies can be further divided into those with the past exposure concurrently measured and those with the past event recalled. The data from studies with the past exposure recalled may be less accurate than those with retrospective collection of existing data on an exposure. This classification also distinguishes outcomecontrol studies according to temporal relationship between study execution and data collection. Although previously classifications have divided "cohort studies" (exposure-control studies) into "prospective" (proplanned) and "retrospective" (preconducted) ones according to the temporal relationship between sudy execution and data collection, those classifications have ignored the distinction between case-control studies (outcome-control studies) with data collected before study execution and those with data to be collected in the future. In this classification, not only exposure-control but also outcome-control designs are divided into preplanned and preconducted types. The distinctions between different designs provide a better picture of exposure-outcome time sequence, study efficiency and suceptability to bias.
The classification could clear up some prior vagueness as a result of the distinctions between different designs. For example, the incidence study in the Framingham study, the follow-up study with selection of subjects by exposure, and the panel study are classified into "cohort study" in some previous literatures32,36,43-45) However, there are differences among these three branches of "cohort studies". The design like that in the original Framingham study is used in the population study of the incidence, in which multiple exposure factors and outcomes could be involved. The follow-up design with selection of subjects by an exposure factor is applied to confirm the cause-outcome association, in which a particular exposure factor and single or multiple outcome(s) are usually concerned. The panel study is actually a repeated cross-sectional survey within a fixed population, which is used to investigate serial changes in the values of several variables"). These three types of design are differentiated in our proposed classification and nomenclature with the terms "prospective general population study", "prospective exposure-control study" and "repeated cross-sectional population study" in fixed population.
Although this classification and nomenclature scheme includes most nonexperimental epidemiological designs including some hybrid types, the classification may be incomplete because of complexity of things, arbitrary logic in classification, and our imperfect knowledge in a continuously developing discipline. For example , exposurecontrol and outcome-control designs are overlapped in familial aggregation studies in genetic epidemiology46> . Further effort on principles of study design , suitable dimensions and relevant terminology is needed to set up a more complete and concise classification and nomenclature scheme.