Mathematics Ph.D. Dissertations

Title

Ultra High Dimension Variable Selection with Threshold Partial Correlations

Date of Award

2022

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Statistics

First Advisor

John Chen (Committee Chair)

Second Advisor

Arrigo Michael (Other)

Third Advisor

Hanfeng Chen (Committee Member)

Fourth Advisor

Wei Ning (Committee Member)

Abstract

With respect to variable selection in linear regression, partial correlation for normal models (Buhlmann, Kalisch and Maathuis, 2010), was a powerful alternative method to penalized least squares approaches (LASSO, SCAD, etc.). The method was improved by Li, Liu, Lou (2015) with the concept of threshold partial correlation (TPC) and extension to elliptical contoured dis- tributions. The TPC procedure is endowed with its dominant advantages over the simple partial correlation in high or ultrahigh dimensional cases (where the dimension of predictors increases in an exponential rate of the sample size). However, the convergence rate for TPC is not very satis- fying since it usually takes substantial amount of time for the procedure to reach the final solution, especially in high or even ultrahigh dimensional scenarios. Besides, the model assumptions on the TPC are too strong, which suggest the approach might not be conveniently used in practice. To address these two important issues, this dissertation puts forward an innovative model selection al- gorithm. It starts with an alternative definition of elliptical contoured distributions, which restricts the impact of the marginal kurtosis. This posts a relatively weaker condition for the validity of the model selection algorithm. Based on the simulation results, the new approach demonstrates not only competitive outcomes with established methods such as LASSO and SCAD, but also advan- tages in terms of computing efficiency. The idea of the algorithm is extended to survival data and nonparametric inference by exploring various measurements on correlations between the response variable and predictors.

COinS