Abstract:
Generalised Linear Models such as Poisson and
Negative Binomial models have been routinely used to model
count data. But, these models assumptions are violated when the
data exhibits over-dispersion and zero-inflation. Over-dispersion
is as a result of excess zeros in the data. For modelling data with
such characteristics several extensions of Negative Binomial and
Poisson models have been proposed, such as zero-inflated and
Hurdles models. Our study focus is on identifying the most
statistically fit model(s) which can be adopted in presence of
over-dispersion and excess zeros in the count data. We simulate
data-sets at varying proportions of zeros and varying
proportions of dispersion then fit the data to a Poisson, Negative
Binomial, Zero-inflated Poisson, Zero-inflated Negative
Binomial, Hurdles Poisson and Negative Binomial Hurdles.
Model selection is based on AIC, log-likelihood, Vuong statistics
and Box-plots. The results obtained, suggest that Negative
Binomial Hurdles performed well in most scenarios compared to
other models hence, the most statistically fit model for over-
dispersed count data with excess zeros.