Mathematics Ph.D. Dissertations


Mixed Model Selection Based on the Conceptual Predictive Statistic

Date of Award


Document Type


Degree Name

Doctor of Philosophy (Ph.D.)


Mathematics/Mathematical Statistics

First Advisor

Junfeng Shang (Advisor)

Second Advisor

Hanfeng Chen (Committee Member)

Third Advisor

John Chen (Committee Member)

Fourth Advisor

Alexander Goberman (Committee Member)


Model selection plays an important role in statistical literature. The objective of model selection is to choose the most appropriate model from a potential large class of candidate models that balance the increase in fit against the increment in model complexity. To facilitate the selection process, a variety of model selection criteria are employed and have been developed for optimal selection of the most appropriate model.

The most popular model selection criteria are the Akaike Information Criterion (AIC, 1973. 1974) and the Bayesian Information Criterion (BIC, 1976). Over the past several decades, a number of additional model selection criteria have been proposed and investigated. One important one among these is Cp from Mallow (1973), which is based on the Gauss discrepancy.

In the dissertation, we focus on the development of variants of Cp in linear mixed models. Linear mixed model theory has expanded greatly in recent years, resulting in its widespread application in many areas of research. Therefore, the improvement of Cp in linear mixed model setting will significantly increase the efficiency and effectiveness of model selection.

We propose the model selection criteria following Mallow's Cp (1973) statistic in linear mixed models. The first proposed criterion is marginal Cp, called MCp. We first derive MCp based on the expected Gauss discrepancy. For the set of candidate models including the true model, we adopt a consistent estimator of correlation matrix of response data. Then we form and prove an idempotent matrix in linear mixed models, which leads to an asymptotically unbiased estimator of the expected Gauss discrepancy between a candidate model and the true model, called MCp. An improvement of MCp, called IMCp, is then proposed and proved, which is also an asymptotically unbiased estimator of the expected Gauss discrepancy. In the simulation study, a set of increasing correlation coefficients in the correlation matrix of the response variable is employed for demonstrating the performance of the proposed MCp and IMCp. The simulated data are generated in different sample sizes to investigate the effect of the sample size on the performance of the proposed criteria. The simulation results illustrate that under suitable conditions, the proposed criteria outperform AIC and BIC in selecting the correct model. The IMCp behaves best when the maximum likelihood estimator (MLE) is used. Additionally, the proposed criteria perform significantly better for highly correlated response data than for weakly correlated data.

The second proposed criterion is conditional Cp, called CCp. We derive the CCp under the conditional mean of response variable. Corresponding to the case where the covariance matrix is known or unknown, we derive two versions of the conditional Cp, called TCCp and CCp, respectively, and they are proved based on the expected conditional Gauss discrepancy. When the covariance matrix is known, the TCCp is an unbiased estimator of the expected conditional Gauss discrepancy; when the covariance matrix is unknown, the CCp is an asymptotically unbiased estimator of the expected conditional Gauss discrepancy. In estimation, the best linear unbiased predictor (BLUP) is employed. The simulation results demonstrate that when the true model includes significant fixed effects variables, both TCCp and CCp perform effectively in selecting the correct model. When the variance components are unknown, the penalty term in CCp computed by the estimated effective degrees of freedom yields a very good approximation to the bias correction between the target discrepancy and the goodness-of-fit part in the proposed criteria.