Japanese Geotechnical Society Special Publication
Online ISSN : 2188-8027
ISSN-L : 2188-8027
Seismic hazard assessment
Sampling Bias versus Class Imbalance in Binary Geotechnical Earthquake Engineering Datasets
Jonathan SchmidtRobb Moss
著者情報
ジャーナル フリー

2024 年 10 巻 26 号 p. 949-953

詳細
抄録

There are a number of geotechnical earthquake engineering problems that require predicting the probability of a binary (Yes or No) outcome, typically using logistic regression or similar models. Two relevant examples are liquefaction triggering and surface fault rupture. The datasets used to develop these models often have imbalance in the Yes/No class ratio. The number of yes data points can outweigh the no datapoints by a large fraction. This is because finding true No data points is often very hard and requires careful investigations, whereas Yes data points are obvious and attractive to document and measure. Modelers are often concerned that this class imbalance might lead to biased or skewed results. However, they usually do not explicitly distinguish between class imbalance, the observed Yes/No ratio, and sampling bias due to something that causes potentially observed data to be excluded from observations. This paper examines the problem of sampling bias versus class imbalance and makes recommendations for when it needs to be addressed and when it does not influence the predictive capacity the models.

著者関連情報
次の記事
feedback
Top