Mathematics Ph.D. Dissertations

Methodologies for Missing Data with Range Regressions

Date of Award

2019

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Statistics

First Advisor

John Chen (Advisor)

Second Advisor

Wei Ning (Committee Member)

Third Advisor

Craig Zirbel (Committee Member)

Fourth Advisor

Helen Michaels (Other)

Abstract

A primary focus of this dissertation is to draw inferences about a response variable that is subject to being missing using large samples. When some response variables are missing and the missing behavior is dependent on the response variable, simply using the sample mean of the non-missing responses to estimate the population mean is biased in general.

There are, however, historical mean estimators that can circumvent the bias. Examples include the inverse propensity weighted, regression, double-robust, stratification, and empirical likelihood estimators. In order to obtain an appropriate estimate on the targeted population mean, these methods place greater weight on non-missing observations likely to be missing. We review the historical estimators, and we propose new estimators and methodologies for mean estimation and beyond. The consistency of each estimator primarily depends on the existence of non-missing covariates, the missing at random assumption, and a correctly specified model relating the covariates to the missing behavior or response, each of which is discussed.

Among our proposals lie new double-robust estimators which obtain lower variance than historical methods when the regression or propensity function is known and yield competitive performances when regression and propensity functions are estimated. Additionally, we detail bootstrap approaches which enable researchers to efficiently draw inferences beyond mean estimation.

Furthermore, we rework range regression for missing response variables, but also develop nonparametric range regression which models the average rank versus each bin. We argue the average rank to be superior to the median and mean for measuring trends among the bins particularly when researchers seek distributional superiority or when the sample mean is not guaranteed to converge, e.g. under Cauchy response. In doing so, we define ascendancy, a measure of pairwise distributional superiority. Then, we interpret the relationship between the average rank and the ascendancy of a conditional response over a benchmark random variable which is distributed by a mixture of the cumulative distribution functions of the underlying populations; the relationship is made apparent through an alternative calculation of the average rank that we find. Bootstrap approaches enabling researchers to fit nonparametric range and range regression under missing response variables are provided.

Share

COinS