Mathematics Ph.D. Dissertations

Adaptive LASSO For Mixed Model Selection via Profile Log-Likelihood

Date of Award

2016

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Statistics

First Advisor

Junfeng Shang (Advisor)

Second Advisor

Lewis Fulcher (Other)

Third Advisor

Hanfeng Chen (Committee Member)

Fourth Advisor

John Chen (Committee Member)

Abstract

Linear mixed models describe the relationship between a response variable and some predictors for data that are grouped according to one or more clustering factors. A linear mixed model consists of both fixed effects and random effects. Fixed effects are the conventional linear regression coefficients, and random effects are associated with units which are drawn randomly from a population. By accommodating such two types of parameters, linear mixed models provide an effective and flexible way of representing the means as well as the covariance structure of the data, therefore have been primarily used to model correlated data, and have received much attention in a variety of disciplines including agriculture, biology, medicine, and sociology.

Due to the complex nature of the linear mixed models, the selection of only important covariates to create an interpretable model becomes challenging as the dimension of fixed or random effects increases. Thus, determining an appropriate structural form for a model to be used in making inferences and predictions is a fundamental problem in the analysis of longitudinal or clustered data using linear mixed models.

This dissertation focuses on selection and estimation for linear mixed models by integrating the recent advances in model selection. More specifically, we propose a two-stage penalized procedure for selecting and estimating important fixed and random effects. Compared with the traditional subset selection approaches, penalized methods can enhance the predictive power of a model, and can significantly reduce computational cost when the number of variables is large (Fan and Li, 2001). Our proposed procedure is different from the existing ones in the literature mainly in two aspects. First, the proposed method is composed of two stages to separately choose the parameters of interests, therefore can respect and accommodate the distinct properties between the random and fixed effects. Second, the usage of the profile log-likelihoods in the selection process can make the computation more efficient and stable due to a smaller number of dimensions involved.

In the first stage, we choose the random effects by maximizing the penalized restricted profile log-likelihood, and the maximization is completed by the Newton-Raphson algorithm. Observe that if a random effect is a noise variable, then the corresponding variance components should be all zero. Thus, we first estimate the covariance matrix of random effects using the adaptive LASSO penalized method and then identify the vital ones based on the estimated covariance matrix. In the view of such a selection procedure, the selected random effects are invariant to the selection of the fixed effects. When a proper model for the covariance is adopted, the correct covariance structure will be obtained and valid inferences for the fixed effects can then be achieved in the next stage. We further study the theoretical properties of the proposed procedure for random effects selection. We prove that, with probability tending to one, the proposed procedure surely identifies all true random effects.

After the completion of the random effects selection, in the second stage, we select the fixed effects through the maximization of the penalized profile log-likelihood, which only involves the regression coefficients. The optimization of the penalized profile log-likelihood can be solved by the Newton-Raphson algorithm. We then investigate the sampling properties of the resulting estimate of fixed effects. We show that the resulting estimate enjoys model selection oracle properties, indicating that asymptotically the proposed approach can discover the subset of significant predictors. After finishing the two-stage penalized procedure, the best linear mixed model can subsequently be determined and be applied to handle correlated data in a number of fields.

To illustrate the performance of the proposed method, numerous simulation studies have been conducted. The simulation results demonstrate that the proposed technique is quite efficient in selecting the best covariates and random covariance structure in linear mixed models and outperforms the existing selection methodologies in general. We finally apply the method to two real data applications for further examining its effectiveness in mixed model selection.

Share

COinS