Comparative BAC Alignments
Comparison of the homoeologous CesA region sequences from G. hirsutum
Genome Research (2004), the 105kb gene-rich region surrounding cellulose synthase (CesA1; pictured below)
from the two genomes that comprise G. hirsutum (AT and DT) was our first look into genome size evolution
in Gossypium. This region was fairly gene rich, with 14 shared genes predicted along this length, an average
gene density that is slightly less than Arabidopsis, but similar to that of rice. Also predicted were four
retrotransposons and two DNA transposons, with only one retrotransposon and one DNA transposon shared between the
Our first, and perhaps most striking, observation from this region was the extraordinary conservation of intergenic
sequence, both in terms of sequence and length. The conservation demonstrated by this region contrasted the dogma
laid down by prior microcolinearity studies, all of which displayed little to no conservation of intergenic space.
While it was tempting to attribute this lack of divergence to the relative youth of the genus, reports from the
grasses indicated that 11 million years is sufficient to remove homology outside of genes, and in some cases only
½ to 1 million years is required.
Given the considerable conservation of intergenic space, we suspected that the mechanisms that generated the
two-fold genome size difference between the A and D genomes were not operating differentially in this region,
which was confirmed by subsequent analyses. Overall, this region did not lead us any closer to uncovering the
mechanisms operating to affect genome size in Gossypium; however, it did highlight a property of genome
size evolution in Gossypium. This region demonstrated that genome size evolution in Gossypium must be the
result of heterogeneously operating mechanisms that serve to expand or contract certain regions of the genomes,
while others remain relatively unscathed.
The blue boxes on the diagram indicate predicted genes, while the green indicates shared intergenic space.
The grey boxes indicate intergenic space that is unique to that genome. The retrotransposons are noted individually
(rTE), and triangles denote predicted LTRs. DNA transposable elements are listed individually (POGO and Mutator),
as is a cpDNA insertion of ycf2 origin. The middle panel indicates a continuous window of sequence identity between
the two BACs, scaled from 50% to 100%.
Comparison of the AdhA region from G. hirsutum (AT and DT), G. raimondii (D),
and G. arboreum (A)
As the CesA1 region raised just as many questions as it answered, we sequenced a second region surrounding the gene
encoding alcohol dehydrogenase A (AdhA; pictured below and published in
The Plant Journal, 2007) from the two genomes of the tetraploid, as before, but also from the model
diploid progenitors, whose resources had become available.
In this comparison, ~ 100kb of shared sequence was obtained from the A and AT genomes and ~ 50kb was
obtained from the D and DT genomes, sizes that reflect the overall differences in genome size. The gene
density of the region was about ½ to 1/3 that observed in the previous region. The major difference in the AdhA
region, as opposed to the previous one, was the accumulation of transposable elements in the A and AT
genomes, particularly gypsy elements (red).
This region was congruent with what we would expect based upon genome size, considering transposable element
accumulation is generally a primary contributor to genome size evolution. Here we see nearly five times the number
and length of TEs in the A genomes as in the D genomes (~25 - 32kb in A and AT versus ~5 - 7kb in D and
DT). There was evidence for one event of intra-strand homologous recombination in the gypsy rich region
of the AT genome, which corresponds with the slightly less than additive size of the tetraploid. Further
contributing to the "genomic down-sizing" experienced by the polyploid, may be due to increased illegitimate
recombination, which was observed in this region for the polyploid genomes relative to the diploid genomes. All
four genomes were evaluated for evidence of a bias in small indels for those that could be polarized (i.e. those
occurring after diploid-polyploid divergence). This region did display a biased accumulation, such that the smaller
genomes averaged more frequent and longer deletions than the larger genomes. Further exaggerating this bias was the
tendency for the A genome diploid to acquire longer insertions more frequently as well.
From this analysis, our supposition that genome size evolution was heterogeneous was further confirmed, even
occurring among what could be considered gene islands. The primary contributor to genome size change in the region
was, inarguably, expansion in the A and AT genomes via transposable element proliferation, namely gypsy
elements. Realistically, however, this mechanism only accounted for about half of the observed difference in the
region-about 25 out of the 50kb difference. The rest of this genome size difference was likely due to a variety of
contributors, some of which were yet unknown. The analysis did indicate that a bias in small indels and increased
illegitimate recombination in the polyploid may contribute to genome size differences in this genus and warrant
Multiple alignment of orthologous AdhA BACs from four different genomes (A, D, AT and DT;
the latter two are co-resident in the nucleus of polyploid cottons). Numbered blue boxes are predicted genes; copia
elements are in orange, gypsy elements in red and LINE elements in pink. Identifiable long terminal repeats (LTRs)
are depicted by triangles. Continuous windows of sequence identity are shown between each pair of BACs, with that
in the middle illustrating sequence identity between the two BAC pairs (A and AT versus D and
DT); all are scaled from 50 to 100%. Grey diamonds on the identity plots denote the location of large
(>400 bp), unpolarized indels between the diploid progenitor and respective polyploid genome. The scale bar at the
bottom indicates increments of 10 Kb.