Raw sequences will be trimmed, aligned, and subjected to phenetic and
phylogeographic analysis (using, for example, PAUP and Phylip), as well as
diversity and divergence estimation using DNAsp (http://www.ub.es/dnasp/). This
will shed light on levels of nucleotide variation throughout the genome and as
well as the portion and proportion that has been captured in modern cultivars.
Diversity for this quasi-randomly selected set of genes will be normalized
against genomic context and GC content (among other confounding factors) by
calculating the fractional diversity in cultivars relative to total diversity.
Based on our previous analysis of variation for 48 genes among species, we
anticipate an approximately normal distribution of diversity estimates for
these mostly neutral or near-neutral genes. This analysis will serve as the
context for evaluating the same estimates obtained for stage 2 sequencing,
i.e., for the set of genes putatively subjected to human selection, either
directly or through their linkage to selected genes. A comparison of these two
curves (Figure) will provide important information about the nature of the
cotton genome under selection. We emphasize that our aim is not to prove
selection, and in fact population genetic screens may fail for several reasons,
but the data will provide an important framework in this regard. We argue that
multiple sources of evidence will bear on the question of domestication, i.e.,
introgressed segments, expression analyses, homoeolog-specificity, and
population genetic bottlenecks. This confluence or melding of approaches will
likely be especially powerful. Moreover, the data will provide estimates of
linkage disequilibrium (LD) in Gossypium, employing TASSEL at various
scales (e.g., within the species, within modern cultivars), with an eye toward
future appropriate design of association mapping experiments.
Nucleotide diversity among 50 genes is expected to be quasi-normal, ranging from
low (left) to high (right). Randomly sampled genes (black) will contain more
diversity than those subject to selection (red). The most likely candidates for
genes experiencing selection/hitchhiking are in the blue region.
Finally, the data will be partitioned by homoeolog to test the previous
mysterious observation that the D genome of allopolyploid cotton accumulates
diversity at a higher rate than does the A-genome. This observation is
especially intriguing in that these two co-resident genomes, which vary
two-fold in genome size but which essentially are collinear, are housed in a
common nucleus and hence subjected to the same ecological, mating system and
population-level processes, features that otherwise are often invoked to
account for variation in diversity among genes and organisms.