Intelligence, Informatics and Infrastructure
Online ISSN : 2758-5816
Characterization of missingness and data-driven imputation for incomplete pavement condition data
Angela ODERAMichael HENRYAzam AMIR
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML

2024 Volume 5 Issue 1 Pages 57-71

Details
Abstract

As with datasets in many fields, pavement management systems suffer from missing data, but machine learning techniques such as random forest analysis make imputation a viable solution. This study applied missForest, one implementation of random forest, to impute missing international roughness index (IRI), structural number (SN), and pavement condition index (PCI) data in the Kenya paved road inventory and condition survey database. The database also contains complete region, road class, carriageway surface type, road usage and visual surface condition rating data. With imputation methods influenced mainly by data distributions, missing mechanisms and correlation between variables and less by other data features such as missing rates, the study examined the distributions of the IRI, SN, and PCI data variables and investigated the missing data mechanism in the subject dataset towards confirming the applicability of missForest for imputation. It was found that the three variables follow highly skewed complex distributions and that the missing data is missing not at random (MNAR). Applying missForest to 19 combinations of impute and predictor variables, it was found that the combination of IRI, SN, and PCI impute variables with visual surface condition rating as the predictor variable gave the most accurate imputation in terms of normalized root mean squared error (NRMSE). A reliability check of variablewise missForest imputation in terms of mean squared error (MSE) revealed that the imputation was accurate for SN and PCI but not for IRI due to an extreme missing data rate of almost 90%. The study highlights that low-cost visual pavement condition survey on an entire road network with measurement of superior condition parameters on a sample of it followed by data-driven imputation sufficiently supports management decisions.

Content from these authors
© 2024 Japan Society of Civil Engineers
Previous article Next article
feedback
Top