Title: Methods For Supervised Machine Learning and Polygenic Risk Scores
Presented by Michael Anderson
Ph.D. Candidate in Biostatistics
Ph.D. Advisers: Weihua Guan and Saonli Basu
Abstract: Improvements in technology enable the collection and analysis of increasingly large datasets, which require some form of dimension reduction. In the case of large genetic data, Polygenic Risk Scores can be used to aggregate GWAS summary statistics into a single value pertaining to genetic risk. We use this to explore associations between neuroblastoma and various traits. We also explore other dimension reduction techniques such as gradient based variable selection for non-linear kernel supervised principal component analysis. Another commonly encountered problem in high-dimensional studies is confounding variables, such as batch effects, which can negatively impact dimension reduction. We consider modifications to supervised PCA and PLS that include sparse penalties as well as correction for confounding variables. Finally, we adapt supervised PCA and PLS to include group-wise variable selection penalties as well as sparse-group penalties. Together, these methods address several different problems encountered in high-dimensional data analysis.