2022 Volume 41 Pages 63-73
In statistical practice, the tabulation of data is the final step toward publication. Guidelines for the selection of tabulation are not necessarily well developed.
“Median” is selected as the third tabulation item after “mean” and “standard deviation” in the tabulation of food intake in the National Health and Nutrition Examination Survey. Since there are more cases where the tabulation of “median” is zero as we move closer to the detailed classification, it is suggested that “zero data ratio” should be tabulated instead of “median”. In this paper, I explain the characteristics of this data measurement method and distribution.
Next, the “hurdle model” by Cragg (1971) is proposed to be a statistical model for processing data with these characteristics. As the first conclusion of this paper, the discriminant criterion is proposed, that the data is appropriate for the application of the “hurdle model” as a guideline for tabulating the zero data ratio and non-zero data distribution.
Finally, the distribution of the micro data generated from the National Survey of Family Income and Expenditure is shown to be completely different from that of the original data because the “hurdle model” has not been considered in the generation of the micro data for general use. The improvements using the “hurdle model” are proposed.