Research Project: Understanding the Genetic Roots of Obesity Across Populations
Abstract
Description
We request access to the dbGaP datasets related to the Obesity-Diabetes Familial Risk, Viva La Familia Study for the purpose of developing and benchmarking a rare variant association pipeline designed for whole-genome sequencing (WGS) and whole-exome sequencing (WES) data. Our goal is to create a robust and scalable computational workflow that can accurately identify rare variants associated with complex diseases in both familial and non-familial contexts. The pipeline will integrate variant quality control, population stratification adjustment, and family-based statistical models to handle related individuals, as well as methods suitable for unrelated cohorts. The Viva La Familia dataset provides an ideal test case for method development because of its family-based structure and its focus on obesity and diabetes, two complex diseases of major public health importance. The data will be used strictly for method and software development purposes. Only processed summary-level results (e.g., p-values, effect sizes, genomic coordinates of variants, gene-level burden statistics) will be generated, and these will adhere to dbGaP guidelines to prevent participant re-identification. No attempt will be made to identify individual participants. All raw data will remain stored securely on institutional servers behind firewalls, with access restricted to the PI and authorized personnel under controlled conditions. Data transfer and storage will comply with institutional and NIH security requirements. Upon expiration of the data access period, all raw data and backups will be securely destroyed. We will acknowledge the contributing investigators and dbGaP in all software releases, presentations, and publications that result from the use of these data. Accession numbers and dataset version information will be included in any published work, in accordance with dbGaP policy.We are developing a computational tool to study rare genetic changes that may play a role in diseases. To test and validate our tool, we need to work with data that include both family-based and non-family-based information. The Viva La Familia Study dataset is particularly useful because it focuses on Hispanic families with obesity and diabetes risk, which allows us to evaluate our methods in a real-world scenario where related individuals are included. Our work does not focus on the health of specific individuals. Instead, we aim to improve the methods researchers use to analyze genetic data. By testing and refining our pipeline, we hope to provide a more accurate way to detect rare genetic risk factors for complex diseases like obesity and diabetes. Ultimately, this may help future studies identify genetic elements that contribute to these conditions more effectively.
Keywords
Bilgisayar Bilimleri, Biyoenformatik, Yaşam Bilimleri, Moleküler Biyoloji ve Genetik, Genetik Bozuklukların Moleküler Biyolojisi, Populasyon Biyolojisi, Populasyon Genetiği, Temel Bilimler, Mühendislik ve Teknoloji, Computer Sciences, bioinformatics, Life Sciences, Molecular Biology and Genetics, Genetic Disorders, Population Biology, Population Genetics, Natural Sciences, Engineering and Technology, Mühendislik Bilişim ve Teknoloji (Eng), Bilgisayar Bilimi Teori ve Yöntem, Temel Bilimler (Sci), Yaşam Bilimleri (Life), Bilgisayar Bilimi, Doğa Bilimleri Genel, Çok Disiplinli Bilimler, Gelişimsel Biyoloji, Engineering Computing & Technology (Eng), Computer Science Theory & Methods, Natural Sciences (Sci), Life Sciences (Life), Computer Science, Molecular Biology & Genetics, Natural Sciences General, Multidisciplinary Sciences, Developmental Biology, Embriyoloji, Teorik Bilgisayar Bilimi, Bilgisayar Bilimi Uygulamaları, Bilgisayar Bilimi (çeşitli), Genel Bilgisayar Bilimi, Moleküler Biyoloji, Multidisipliner, Sağlık Bilimleri, Fizik Bilimleri, Embryology, Theoretical Computer Science, Computer Science Applications, Computer Science (miscellaneous), General Computer Science, Molecular Biology, Multidisciplinary, Health Sciences, Physical Sciences