Mathematics Ph.D. Dissertations
Efron’s Method on Large Scale Correlated Data and Its Refinements
Date of Award
2023
Document Type
Dissertation
Degree Name
Doctor of Philosophy (Ph.D.)
Department
Statistics
First Advisor
John Chen (Committee Chair)
Second Advisor
Alexis Ostrowski (Other)
Third Advisor
Riddhi Ghosh (Committee Member)
Fourth Advisor
Umar Islambekov (Committee Member)
Abstract
This dissertation focuses on methodological innovation for multiple testing on hypotheses related to large-scale and correlated data, where error rate control is intrinsically critical. Research toward this goal necessitates rigorous discussions on a thorny concept, the strong control of familywise error rate (FWER). In the literature, published papers in this regard subsequently avoid this intricate issue by adapting feeble criteria such as the weak control of FWER or the false discovery rate. Different from conventional approaches, we directly tackle the problem with the strong control of FWER. Starting with Efron’s data on an inference problem related to 7128 genes of 72 patients, consisting of 47 acute lymphoblastic leukemia patients and 25 acute myeloid leukemia patients, the dissertation lays out fundamental terminologies facilitating the research on multiple inferences after discussing a method controlling the false discovery rate following the empirical approach of estimating the correlation parameter. Following a review of the current literature, one distinct feature of the dissertation attributes to multiple testing procedures on odds ratios when several populations are of interest. When the joint distribution of a cluster of subsequent populations is approximately available, such as the utilization of the Cochran-Mantel-Haenszel statistic, a sequential testing method of strong control of FWER is proposed. The new method outperforms the traditional Holm’s procedure (which also strongly controls FWER) in terms of substantiating any significant discovery that is detected by the latter. Another feature of the dissertation explores the sequential testing procedure for the comparison of the odds ratio. It effectuates a general stepwise exact inference procedure that strongly controls the FWER. The new procedure is robust and versatile for both parametric and nonparametric settings. When the new procedure was employed with the Jonckheere-Terpstra test, it distinctly improved power performance, as shown in a simulation. The new procedure was applied to analyze a real-life dataset from CDC regarding the age effect on binge alcoholism. It reveals the fact that the rate of binge alcoholism steadily increases in the age group of 18-34. Finally, the dissertation shifts attention to the analysis of large-scale correlated data posted in Efron’s paper. It attributes more intrinsic inference outcomes to the new procedure proposed in this dissertation research. Specifically, the new method was combined with a normality bootstrapping method. The outcome greatly enhances preceding analytic results on the gene expression data. An implementation adapting a nonparametric bootstrapping method on the data casts a new highlight on the robustness of the new procedure.
Recommended Citation
Ghoshal, Asmita, "Efron’s Method on Large Scale Correlated Data and Its Refinements" (2023). Mathematics Ph.D. Dissertations. 94.
https://scholarworks.bgsu.edu/math_diss/94