Developing innovative methods to study the human genome

July 27, 2022

The human genome is the complete set of DNA that makes up a person. It consists of a long stretch of over 3 billion nucleotides that combine in a certain order to form thousands of genes. These genes, in turn, encode proteins, which are the building blocks of each cell. Genomes contain each person’s unique genetic code and are responsible for various complex traits and common diseases. Genetic variations, or mutations, are differences in these genetic codes that arise when the genome sequences are modified through small or large changes. 

Structural variations (SVs) are defined as medium and large genomic rearrangements, involving more than 50 base pairs (sets of nucleotide) changes. These rearrangements to the genomic sequence may be in the form of insertions, deletions, or inversions of nucleotides.  A growing body of evidence has shown that SVs in a person’s genome are a major contributing factor to diseases (such as cancer) as well as neurodevelopmental conditions (like autism). There are still many unknowns about SVs including their diversity, complexity, distribution in the population, and their exact impact in biology. These are some of the reasons that my lab is conducting a major project, funded by the National Science Foundation CAREER Award, to develop new computational methods to study SVs.  

Recent advances in sequencing technology have allowed us to investigate the complexity of SVs and their direct contribution to different diseases or traits. To fully investigate these variants, we need accurate and efficient computational approaches to discover and characterize different types of complex SVs. The goal of our project is to develop novel methods to provide researchers with necessary tools to better capture the diversity of SVs and study their contribution to biology.  

As part of this project, novel methods will be developed for efficient and accurate genotyping of any type of SV. These tools will be directly applicable to studying the impact of both common and rare SVs on neurodevelopmental conditions such as autism. To establish the utility of these methods, we will analyze publicly available whole-genome sequence data. This project will also achieve a broader impact by providing training opportunities for both undergraduate and graduate students interested in computational genomics. The results of the projects will be available at 

infographic alt text: image describes how DNA is made up of 4 letters, called nucleotides, that combine to make genes.  Small or large changes to the genome (like inserting, deleting, or rearranging nucleotides) can have an impact, but the particular impacts are still unknown