About
Our Mission
The UMN Genomic Data Commons (GDC) aims to improve the way researchers access, analyze, and collaborate on large-scale genetic and genomic datasets. By establishing a centralized infrastructure for data storage, curation, integration, and analysis, GDC seeks to overcome current challenges in reproducibility, standardization, and heterogeneity.
Our Goals
The Genomic Data Center (GDC) has three major goals:
- Developing a centralized datastore for storing local and publicly available genomic data, pre-processed, harmonized, and integrated according to the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles for scientific data management.
- Developing a web interface/portal for end users to access some basic summary information about these datasets and submit requests for data analysis.
- Developing analytic pipelines to perform different genomic analyses utilizing these integrated datasets. The pipelines will be developed in Python or R and streamlined to provide platform-independent packages.
The GDC will focus on whole-genome, SNP array, and RNA-Seq data in the initial phase. The project aims to have this available for early users by the end of 2023.
Our Vision
Our vision is to empower both established researchers and early-stage investigators with a comprehensive resource, providing streamlined access to public data, robust data processing and analysis, and opportunities for interdisciplinary collaboration. Ultimately, the GDC facility will accelerate scientific discovery, foster innovation, and inspire the next generation of researchers in the rapidly evolving field of genomics.
Our Values
- Collaboration: We want to bring together researchers from different fields that are working with genomics data
- Standardization: We want to reduce barriers to research by providing standardized pipelines, consistent formats, and reproducible results.
- Empowerment: we want researchers at different stages in their work to contribute to the advancement of genomics research.
Our Services
- Public site to search for available data
- Access to datasets on MSI HPC systems
- Command line interface to data for batch analysis jobs
- API access to data for RStudio and Jupyter notebook interactive sessions
- Facilitate access to restricted data
- A data steward at GDC will help with access and compliance issues
- Grant writing help utilizing GDC datasets
- Assistance with genomic data analysis using GDC datasets