Abstract
For complex diseases, including psychiatric disorders, genetic and environmental risk factors were investigated using genomic and epidemiological methods. However, genomics is faced by the so‐called “missing heritability” problem, and epidemiology is limited by the reproducibility crisis for small effect risk factors. A research design called prospective genome cohort is expected to overcome these limitations. Prospective genomic cohorts collect genomic and environmental exposure information for a defined population and store biological samples such as blood and urine under a controlled quality. Besides, genome cohorts prospectively obtain disease onset information and various endophenotypes such as laboratory tests, brain imaging, and questionnaire survey. With the prospective genome cohort, new risk factors, including gene‐environment interactions, will be identified. However, genome cohort studies have unique difficulties. One is the p >> n problem, and the other is the problem in extracting meaningful features from diverse and multi‐layered endophenotypic information. Here, we will present examples of statistical machine learning and deep learning techniques that are expected to address these problems.