Mathematics Ph.D. Dissertations


Regression Analysis for Zero Inflated Population Under Complex Sampling Designs

Date of Award


Document Type


Degree Name

Doctor of Philosophy (Ph.D.)



First Advisor

Hanfeng Chen (Advisor)

Second Advisor

Christopher Rump (Committee Member)

Third Advisor

Maria Rizzo (Committee Member)

Fourth Advisor

Junfeng Shang (Committee Member)


An underlying population may contain a large proportion of zero values which cause the population distribution spiked at zero and such population is referred to as zero-inflated population. This type of population can be seen in many applications such as insurance, meteorology, auditing, ecology, and manufacturing. Zero-inflated population is often analyzed via a two-component mixture model: probabilistic mixture of zero and a regular component with specific probability distribution. The confidence interval problems for the zero-inflated population mean under regular models have existed in the literature. Regression models have also been developed for zero-inflated populations. However, many of these models aim at count data, though the regression models with continuous-type responses are more often to be seen in application. Furthermore, these regression models for zero-inflated populations do not address the situations when the data available for analysis are obtained through complex probability sampling designs.

This dissertation investigates the estimation problem (both point and confidence interval) in generalized linear regression models (continuous-type or discrete-type) associated with complex probability sampling designs. This dissertation develops the zero-inflated mixture (ZIM) regression model under complex sampling design via two-component mixture model where the probability distribution of non-zero component is supposed to be parametric. It proposes a maximum pseudo-likelihood procedure to estimate the expected responses at "future" covariate values/vectors. The limiting distribution of the pseudo-likelihood ratio statistic is derived so that a large-sample theory for constructing the confidence intervals for the expected responses is established. Simulation studies are conducted to assess the performance of the proposed procedure. The simulation results show that under some complex probability sampling designs, the new confidence intervals based on the pseudo-likelihood function perform significantly better than the standard procedure. The proposed pseudo-likelihood new procedure is applied to a real-life data set that was analyzed by many other authors. Again, the confidence interval with the new procedure appears to be more useful than other classic procedures.