• Media type: E-Article
  • Title: Comprehensive Evaluation of a Sparse Dataset, Assessment and Selection of Competing Models
  • Contributor: Rezapour, Mahdi; Ksaibati, Khaled
  • imprint: MDPI AG, 2020
  • Published in: Signals
  • Language: English
  • DOI: 10.3390/signals1020009
  • ISSN: 2624-6120
  • Keywords: General Medicine
  • Origination:
  • Footnote:
  • Description: <jats:p>With tremendous associated economic and social costs of crashes, researchers have been trying not only to identify the factors affecting crashes, but also to estimate those coefficients in the most accurate ways. Estimating model coefficients without accounting for a correct distribution would result in biased and erroneous results. This risk especially holds true when modeling skewed equivalent property damage only (EPDO) crashes with a preponderance of zeroes. The distribution of EPDO is known for not being modeled with known distributions such as Poisson or negative binomial distributions. This issue is highlighted in particular for a mountainous state like Wyoming that has very low traffic levels and a severely high crash rate. In addition, we included barriers in the model that did not experience any crashes but did suffer from being under-designed by geometric architects, thereby adding to the number of zero count observations. Various models with different distributional characteristics were considered and compared in this study. Comparisons were not just made across models in terms of their goodness of fit, but the estimated coefficients were also compared to see the impact of considering the wrong distributional assumptions on model parameter estimates. As the objectives of this study are to implement the identified results for optimization purposes and locate hazardous locations that could host future crashes, the results highlight accurate model estimations and the consequences of a failure to account for the right distributions. After conducting different goodness-of-fit measures, a hurdle model was proposed in this study to accommodate observations with zero crashes, and to account for a sparse distribution of EPDO crashes in the state of Wyoming. For the hurdle model, binary logistic regression was used to account for observations with zero crashes, while the negative binomial method was considered for non-zero observations. The findings of this study have direct implications on the allocation of limited funds for policymakers in Wyoming, as optimization could be conducted on the geometric characteristics of traffic barriers in future studies.</jats:p>
  • Access State: Open Access