Genomic Analysis of Human Chromosome Structure and FunctionMy research interests are the investigation of two specific areas of human chromosome structure and function. The first is the epigenetics of human centromere formation and function, focusing on the DNA sequence requirements, chromatin domain structure and histone modifications found at human centromeres. The study of human centromeres is important to the basic biology of chromosome function and cell division, and has broad ranging applications in human health and disease, including aneuploidy and birth defects, cell cycle regulation and cancer, and gene therapy. The long term goal of this research is to understand the requirements for de novo centromere formation in human cells in order to create improved mitotically stable Mammalian Artifical Chromosme (MAC) constructs for use as autonomous gene delivery vectors.
The second area of research is an experimental and bioinformatic analysis of the repetitive DNA structure of the human genome, concentrating on inverted repeats, tandem repeat arrays, and transposable elements. These abundant DNA elements make up 50% of the human genome, and have had an enormous impact on our genome structure and evolution. The long term goal of this research is to identify and classify these DNA elements in the human genome and elucidate their role in chromosome structure and function.
Genomic Analysis of Human Centromere Structure and Function
My lab takes several approaches to investigate human centromeres, including 1) examination of clinically ascertained cases of unusual chromosomal aberrations involving variant centromeres, 2) basic research into the organization and interactions between centromeric DNA and kinetochore proteins and 3) improved methods for the construction and delivery of mammalian artificial chromosomes (MACs). We pride ourselves in developing and applying modern genomic, cytogenetic and molecular approaches to the study of human centromeres, including chromatin immunoprecipitation and genomic microarrays ("ChIP on a CHIP"), and combined immunofluorescence (IF) and fluorescent in situ hybridization (FISH) on metaphase chromosomes and interphase nuclei.
We are currently concentrating on the study of human neocentromeres, a rare class of new centromeres that form in chromosome arm fragments that would normally be acentric and rapidly lost, usually resulting in aneuploidy. Neocentromeres form on single copy DNA and therefore permit investigation of centromeric chromatin structure in relation to the underlying DNA sequence, which is not possible at endogenous centromeres due to the large amounts of highly homologous tandemly-repeated alpha satellite DNA found there. Working with clinical cytogenetics labs worldwide, we have characterized a large collection of patient-derived cell lines that contain neocentromeres. A disproportionate number of neocentromeres localized to chromosome band 13q32. Therefore, in order to directly identify neocentromere DNA, we constructed a genomic microarray (CHIP) that contained 126 overlapping BACs spanning 14Mbp across chromosome band 13q32, the largest contiguous human genomic microarray yet described. We screened this CHIP with neocentromere DNA obtained by chromatin immunoprecipitation (ChIP) using antibodies to CENP-A, the centromere-specific histone 3 homologue. This revealed at least three distinct genomic regions several hundred kb in size within 13q32 where neocentromeres have formed. These results suggested that neocentromere formation may be largely sequence independent and form by epigenetic mechanisms (Alonso et al, 2003).
We will continue to use our ChIP on a CHIP to further dissect the chromatin domain structure of neocentromeres, using antibodies to both kinetochore and heterochromatin proteins. We have constructed a next generation CHIP containing 138 unique sequence PCR fragments from within the neocentromere region, in order to map the organization of the CENP-A kinetochore chromatin to an unprecedented resolution.
E. coli based mammalian artificial chromosome (MAC) vectors
My lab is developing improved methods for the construction and delivery of MAC vectors to human cells. As a graduate student, I pioneered studies demonstrating that transfected alpha satellite DNA is capable of forming de novo centromeres, which has been confirmed extensively by other groups. The biggest hurdle to efficient MAC construction is the ability to manipulate and deliver large DNA constructs into cells in a controlled fashion. Therefore, we have developed a novel E. coli based vector system to manipulate and deliver large MAC vectors into human cells. An inducible homologous recombination system in BAC host E. coli DH10B permits convenient and rapid modification of human genomic BACs. Expression of the Yersinia pseudotuberculosis invasin gene in these E.coli DH10B BAC strains result in their ability to invade mammalian cells and deliver the modified BAC DNA (Narayanan and Warburton, 2003). Thus, a human BAC, containing genes of interest along with their surrounding regulatory DNA sequences, can be converted into a MAC by the addition of alpha satellite DNA and selectable markers, and delivered directly into mammalian cells. This system will be used to investigate the requirements for human centromere formation, and to facilitate development of MACs as gene expression vectors. This project is supported by the NIH NIDDKD, initially as an R21 "pilot study" grant, but a full NIDDKD R01 grant submitted February 1, 2004 has been favorably reviewed (10.5 percentile), and will be funded starting January 2005.
Bioinformatics Analysis of Human Repetitive DNA
Another area of research being developed in my lab is a genomic analysis of the organization and evolution of human repetitive DNA, which accounts for ~45% of our DNA and has had a huge impact on the structure and function of our genome. My past experience in studying repetitive DNA at human centromeres makes this an excellent niche in genome analysis that is well suited for my research program. Thus, in collaboration with Dr. Gary Benson, Boston University, we are developing novel bioinformatics software and genome analysis tools to examine the structure, organization and functions of three abundant and important classes of human repetitive DNA, inverted repeats, tandem repeat arrays, and transposable elements. We have developed and applied a unique computer algorithm, called Inverted Repeat Finder (IRF), that is capable of finding all inverted repeats (IR's) in the human genome sequence and indexing the results. After Repeat Masking of known interspersed repetitive elements, IRF identified ~22,000 low-copy IR's throughout the genome. Analysis of the 96 largest and most homologous IR's in the genome (=8000bp, =95% homology) revealed a remarkable prevalence (24 IR's, 25%) on the human X chromosome, which only contains ~5% of the genome. Of these, 11 contain genes predominantly expressed in the testes. These results on the X chromosome are remarkably similar to the inverted repeat structure found on the human Y chromosome (by David Page's lab). These results suggest a possible role for these IR's in regulation or maintenance of testes genes on sex chromosomes during germ cell development or meiosis (Warburton et al, 2004). The genomic analysis of the complete set of human IR's, i.e. their structure, location, organization, position relative to genes/ replication origins, or their ability to extrude into cruciforms, represents a large and important bioinformatics and molecular biology project that will provide considerable insight into genome structure and function.
We thank the National Institute of General Medical Sciences (NIGMS), the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDKD), and the National Human Genome Research Institute (NHGRI), at the National Institutes of Health (NIH), for supporting this research.