We tried to show how and why sampling survey outcomes over (under) estimate the real world results. We examined voting turnout rates in the Japanese Lower House election in 1980. The sample of 3000 people was selected by two stage national random sampling and the amount of the overestimation of the turnout rate was 12.4%.
First, we verified mathematically that overestimation in sample survey is the sum of three components: sampling bias, ‘misreport effect’, nonresponse bias.
Second, among the three we can estimate the size of the sampling bias through sampling theory. The confidence interval for sampling bias in our case was just 2.2% and the direction could be either over- or under-estimation. Thus, we estimated that 10.2% to 14.6% of the overestimation was due to the other two causes.
Third, we assumed that there was no serious sampling bias and tried, under this assumption, to find the way to decompose the 12.4% overestimation into the other two causes, nonresponse bias and ‘misreport effect’.
Since we got four unknown parameters with two simultaneous equations we suggested two methods. Our first method is to use statistical analysis, such as nonlinear multiple regression, in the case where we have enough data. Our second method is to introduce assumptions either a priori or from other data for two of four unknowns and solve mathematically. We tried the latter method andadopted three types of assumptions: no-misreport, like U.S., like former Japan.
As a result we concluded that nonresponse bias was most influential and caused 7 to 13% overestimation while ‘misreport effect’ caused 0 to 6% bias.
We also showed that we can decompose the bias into causes for each characteristic such as sex. We found that the overestimation is larger for male (15.3%) than for the female (9.9%), and the nonresponse bias is more serious for male.
Finally, we found that the relationship between voting and response is highly positive, and that the nonresponse bias for nonvoter is much more serious.
The method which we developed here is quite helpful to understand the character of data and can be used for other types of items such as party identification, education, and income.
View full abstract