Simultaneous Inference for High Dimensional and Correlated Data
Date of Award
Doctor of Philosophy (Ph.D.)
John Chen (Advisor)
Marc Simon (Other)
Wei Ning (Committee Member)
Junfeng Shang (Committee Member)
In high dimensional data, the number of covariates is larger than the sample size, which makes the estimation process challenging. We consider a high-dimensional and longitudinal data where at each time point, the number of covariates is much higher than the number of subjects. We consider two different settings of longitudinal data. First, we consider that the samples at different time points are generated from different populations. Second, we consider that the samples at different time points are generated from a multivariate distribution. In both cases, the number of covariates is much larger than the sample size and the standard least square methods are not applicable.
In longitudinal study, our main focus is in the changes of the mean responses over the time and how these changes are related to the explanatory variables. Thus we are interested in testing the effect of the covariates over the time points simultaneously. In the first scenario, we use lasso at each time point to regress the response on the explanatory variables. Along with estimating the regression coefficients lasso also does dimension reduction. We use de-biased lasso for inference. To adjust the multiplicity effect in simultaneous testing we apply Bonferroni, Holm’s, Hochberg’s and the coherent stepwise procedures.
In the second scenario, the samples at different time points are generated from a multivariate distribution and the dimension of the multivariate distribution is equal to the number of time points. We use lasso and de-biased lasso for inferences. To adjust the multiplicity effect in simultaneous testing, we use Bonferroni, Holm’s, Hochberg’s and stepwise procedures. We provide theoretical details that Bonferroni, Holm’s step-down and the coherent step-wise procedures controls the family-wise error rate in strong sense for de-biased lasso estimators. While Hochberg’s procedure provides a strong control of family-wise error rate only for independent or positively correlated test statistics.
Polin, Afroza, "Simultaneous Inference for High Dimensional and Correlated Data" (2019). Mathematics Ph.D. Dissertations. 44.