SyNPar: A Data-Preservation Framework for High-Power False Discovery Rate Control in High-Dimensional Variable Selection
Presented by Jingyi Jessica Li, Ph.D.
Professor of Statistics and Data Science (Primary), Biostatistics, Computational Medicine, and Human Genetics (Secondary)
University of California, Los Angeles
Balancing false discovery rate (FDR) control and statistical power is a fundamental challenge in high-dimensional variable selection. Existing FDR control methods often perturb the original data, either by concatenating knockoffs variables or splitting the data, which can compromise power. In this paper, we introduce SyNPar, a novel framework that controls the FDR in high-dimensional variable selection while preserving the integrity of the original data. SyNPar generates synthetic null data using an inference model under the null hypothesis and identifies false positives through a numerical analog of the likelihood ratio test. We provide theoretical guarantees for FDR control at any desired level and show that SyNPar achieves asymptotically optimal power. The framework is versatile, straightforward to implement, and applicable to a wide range of statistical models, including high-dimensional linear regression, generalized linear models (GLMs), Cox models, and Gaussian graphical models. Through extensive simulations and real-world data applications, we demonstrate that SyNPar consistently outperforms state-of-the-art methods, such as knockoffs and data-splitting techniques, in terms of FDR control, statistical power, and computational efficiency.
A seminar tea will be held at 2:45 p.m. in University Office Plaza, Room 116. All are Welcome.