Iowa State University Brigham Young University University of Georgia

Fiber Evolution

Introgression Populations
Homoeolog-specific Profiling
Genetic Networks & Phenotype
Effects of Selection
Sequence Capture

Genetic and Physical mapping resources
Comparative BAC Sequencing
Genome Sequence Resources
EST D-genome map
EST Resources

Web Database
Education and Outreach
Significance for cotton industry
Cotton Literature
Cotton Links
Wendel Lab
PGML (Paterson Lab)
Udall Lab

Lists & protocols
How to
CEGC Site Search

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
turn explanations on/off

Comparative BAC Alignments

Comparison of the homoeologous CesA region sequences from G. hirsutum

Published in Genome Research (2004), the 105kb gene-rich region surrounding cellulose synthase (CesA1; pictured below) from the two genomes that comprise G. hirsutum (A­­T and DT) was our first look into genome size evolution in Gossypium. This region was fairly gene rich, with 14 shared genes predicted along this length, an average gene density that is slightly less than Arabidopsis, but similar to that of rice. Also predicted were four retrotransposons and two DNA transposons, with only one retrotransposon and one DNA transposon shared between the two genomes.

Our first, and perhaps most striking, observation from this region was the extraordinary conservation of intergenic sequence, both in terms of sequence and length. The conservation demonstrated by this region contrasted the dogma laid down by prior microcolinearity studies, all of which displayed little to no conservation of intergenic space. While it was tempting to attribute this lack of divergence to the relative youth of the genus, reports from the grasses indicated that 11 million years is sufficient to remove homology outside of genes, and in some cases only ½ to 1 million years is required.

Given the considerable conservation of intergenic space, we suspected that the mechanisms that generated the two-fold genome size difference between the A and D genomes were not operating differentially in this region, which was confirmed by subsequent analyses. Overall, this region did not lead us any closer to uncovering the mechanisms operating to affect genome size in Gossypium; however, it did highlight a property of genome size evolution in Gossypium. This region demonstrated that genome size evolution in Gossypium must be the result of heterogeneously operating mechanisms that serve to expand or contract certain regions of the genomes, while others remain relatively unscathed.

The blue boxes on the diagram indicate predicted genes, while the green indicates shared intergenic space. The grey boxes indicate intergenic space that is unique to that genome. The retrotransposons are noted individually (rTE), and triangles denote predicted LTRs. DNA transposable elements are listed individually (POGO and Mutator), as is a cpDNA insertion of ycf2 origin. The middle panel indicates a continuous window of sequence identity between the two BACs, scaled from 50% to 100%.

Comparison of the AdhA region from G. hirsutum (AT and DT), G. raimondii (D), and G. arboreum (A)

As the CesA1 region raised just as many questions as it answered, we sequenced a second region surrounding the gene encoding alcohol dehydrogenase A (AdhA; pictured below and published in The Plant Journal, 2007) from the two genomes of the tetraploid, as before, but also from the model diploid progenitors, whose resources had become available.

In this comparison, ~ 100kb of shared sequence was obtained from the A and AT genomes and ~ 50kb was obtained from the D and DT genomes, sizes that reflect the overall differences in genome size. The gene density of the region was about ½ to 1/3 that observed in the previous region. The major difference in the AdhA region, as opposed to the previous one, was the accumulation of transposable elements in the A and AT genomes, particularly gypsy elements (red).

This region was congruent with what we would expect based upon genome size, considering transposable element accumulation is generally a primary contributor to genome size evolution. Here we see nearly five times the number and length of TEs in the A genomes as in the D genomes (~25 - 32kb in A and AT versus ~5 - 7kb in D and DT). There was evidence for one event of intra-strand homologous recombination in the gypsy rich region of the AT genome, which corresponds with the slightly less than additive size of the tetraploid. Further contributing to the "genomic down-sizing" experienced by the polyploid, may be due to increased illegitimate recombination, which was observed in this region for the polyploid genomes relative to the diploid genomes. All four genomes were evaluated for evidence of a bias in small indels for those that could be polarized (i.e. those occurring after diploid-polyploid divergence). This region did display a biased accumulation, such that the smaller genomes averaged more frequent and longer deletions than the larger genomes. Further exaggerating this bias was the tendency for the A genome diploid to acquire longer insertions more frequently as well.

From this analysis, our supposition that genome size evolution was heterogeneous was further confirmed, even occurring among what could be considered gene islands. The primary contributor to genome size change in the region was, inarguably, expansion in the A and AT genomes via transposable element proliferation, namely gypsy elements. Realistically, however, this mechanism only accounted for about half of the observed difference in the region-about 25 out of the 50kb difference. The rest of this genome size difference was likely due to a variety of contributors, some of which were yet unknown. The analysis did indicate that a bias in small indels and increased illegitimate recombination in the polyploid may contribute to genome size differences in this genus and warrant further investigation.

Multiple alignment of orthologous AdhA BACs from four different genomes (A, D, AT and DT; the latter two are co-resident in the nucleus of polyploid cottons). Numbered blue boxes are predicted genes; copia elements are in orange, gypsy elements in red and LINE elements in pink. Identifiable long terminal repeats (LTRs) are depicted by triangles. Continuous windows of sequence identity are shown between each pair of BACs, with that in the middle illustrating sequence identity between the two BAC pairs (A and AT versus D and DT); all are scaled from 50 to 100%. Grey diamonds on the identity plots denote the location of large (>400 bp), unpolarized indels between the diploid progenitor and respective polyploid genome. The scale bar at the bottom indicates increments of 10 Kb.

We welcome your comments and suggestions.