Mathematics Ph.D. Dissertations

Title

Linear Mixed Model Selection Via Minimum Approximated Information Criterion

Date of Award

2020

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Statistics

First Advisor

Junfeng Shang (Advisor)

Second Advisor

Melissa Kary Miller (Other)

Third Advisor

Hanfeng Chen (Committee Member)

Fourth Advisor

Wei Ning (Committee Member)

Abstract

The analyses of correlated, repeated measures, or multilevel data with a Gaussian response are often based on models known as the linear mixed models (LMMs). LMMs are modeled using both fixed effects and random effects. The random intercepts (RI) and random intercepts and slopes (RIS) models are two exceptional cases from the linear mixed models that are taken into consideration. Our primary focus in this dissertation is to propose an approach for simultaneous selection and estimation of fixed effects only in LMMs.

This dissertation, inspired by recent research of methods and criteria for model selection, aims to extend a variable selection procedure referred to as minimum approximated information criterion (MIC) of Su et al. (2018). Our contribution presents further use of the MIC for variable selection and sparse estimation in LMMs. Thus, we design a penalized log-likelihood procedure referred to as the minimum approximated information criterion for LMMs (lmmMAIC), which is used to find a parsimonious model that better generalizes data with a group structure. Our proposed lmmMAIC method enforces variable selection and sparse estimation simultaneously by adding a penalty term to the negative log-likelihood of the linear mixed model. The method differs from existing regularized methods mainly due to the penalty parameter and the penalty function.

With regards to the penalty function, the lmmMAIC mimics the traditional Bayesian information criterion (BIC)-based best subset selection (BSS) method but requires a continuous or smooth approximation to the L0 norm penalty of BSS. In this context, lmmMAIC performs sparse estimation by optimizing an approximated information criterion, which substantially requires approximating that L0 norm penalty of BSS with a continuous unit dent function. A unit dent function, motivated by bump functions called mollifiers (Friedrichs, 1944), is an even continuous function with a [0, 1] range. Among several unit dent functions, incorporating a hyperbolic tangent function is most preferred. The approximation changes the discrete nature of the L0 norm penalty of BSS to a continuous or smooth one making our method less computationally expensive. Besides, the hyperbolic tangent function has a simple form and it is much easier to compute its derivatives. This shrinkage-based method fits a linear mixed model containing all p predictors instead of comparing and selecting a correct sub-model across 2p candidate models. On this account, the lmmMAIC is feasible for high-dimensional data. The replacement, however, does not enforce sparsity since the hyperbolic tangent function is not singular at its origin. To better handle this issue, a reparameterization trick of the regression coefficients is needed to achieve sparsity.

For a finite number of parameters, numerical investigations demonstrated by Shi and Tsai (2002) prove that traditional information criterion (IC)-based procedure like BIC can consistently identify a model. Following these suggestions of consistent variable selection and computational efficiency, we maintain the BIC fixed penalty parameter. Thus, our newly proposed procedure is free of using the frequently applied practices such as generalized cross validation (GCV) in selecting an optimal penalty parameter for our penalized likelihood framework. The lmmMAIC enjoys less computational time compared to other regularization methods.

We formulate the lmmMAIC procedure as a smooth optimization problem and seek to solve for the fixed effects parameters by minimizing the penalized log-likelihood function. The implementation of the lmmMAIC involves an initial step of using the simulated annealing algorithm to obtain estimates. We proceed using these estimates as starting values by applying the modified Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm until convergence. After this step, we plug estimates obtained from the modified BFGS into the reparameterized hyperbolic tangent function to obtain our fixed effects estimates. Alternatively, the optimization of the penalized log-likelihood can be solved using generalized simulation annealing.

Our research explores the performance and asymptotic properties of the lmmMAIC method by conducting extensive simulation studies using different model settings. The numerical results of our simulations for our proposed variable selection and estimation method are compared to other standard LMMs shrinkage-based methods such as Lasso, ridge, and elastic net. The results provide evidence that lmmMAIC is more consistent and efficient than the existing shrinkage-based methods under study. Furthermore, two applications with real-life examples are illustrated to evaluate the effectiveness of the lmmMAIC method.

COinS