Mathematics Ph.D. Dissertations

Title

Advancing Bechhofer's Ranking Procedures to High-dimensional Variable Selection

Date of Award

2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Mathematics/Mathematical Statistics

First Advisor

John Chen (Advisor)

Second Advisor

Lubomir Popov (Other)

Third Advisor

Junfeng Shang (Committee Member)

Fourth Advisor

Wei Ning (Committee Member)

Abstract

Model selection is a core topic in regression analysis, referring to a set of exploratory approaches for improving statistical models. It aims at selecting a subset of optimum predictor variables that still takes care of the variation of the response variable. When the chosen model is too small, we will be confronted with the issues of under-fitting, poor predictions/classifications against test data, as well as high bias and low variance. Conversely, an over-complex model comes with the problems of poor model interpretation, low bias, and high variance. As a result, an optimum model should properly address the trade-off between bias and variance, complexity and simplicity (see, for instance, Bin Yu (2017)).

In this dissertation, we try to explore a new approach of variable selection for regression analysis, in light of the ideas of Bechhofer's (1954) population ranking procedure. Firstly, we study this approach in detail and consider its appropriateness to be applied to the hypothesis testing. Then we propose a new population approach based on the likelihood ratio test. Secondly, when the heteroscedasticity is presented, we refine Robert E. Bechhofer's (1954) two-sample ranking procedure to solve the question about applicable sample sizes by adjusting the correlation structure and population assumption inside the algorithm. Furthermore, we extend Bechhofer’s procedure to rank the magnitude of population means. And we apply the results to variable selection in regression models. Comparing to the LASSO and the Relaxed LASSO, our simulations suggest that the rank-ing procedure-based variable selection significantly outperforms the LASSO and works at least as good as the Relaxed LASSO.

COinS