An Evaluation of Sequence Capture Technologies with Polyploid Cotton
Sequence capture is a revolutionary method to resequence targeted portions of the genome
but it has not been tested with a polyploid plant. We constructed two capture platforms
1) a novel Nimblegen sequence capture microarray and 2) Mycroarray beads containing RNA
probes. Both platforms targeted the same 534 genes. On each platform, we hybridized two
different accessions to this microarray to assess capture efficiency and potential of
combined sequence using DNA multiplex identifiers (MIDs).
Sequence capture of 532 selected genes from G. hirsutum
Cotton (G. hirsutum) is a polyploid species native to Central America. It was domesticated
by the ancient inhabitants of Central America and it was quickly adopted by Europeans with
colonial agricultural technology. Initially, it was grown around the Caribbean Sea. Feral
cultivars from these historical growing regions and native populations have been investigated
for genetic diversity with the aim to improve modern cultivated cotton. Here, we use sequence
capture to investigate nucleotide diversity within and flanking 532 selected genes from the
Evolutionary Genomics of Cotton - cDNA
This study includes the generation of a pan-transcriptome from domesticated and wild cotton
accessions from both Gossypium hirsutum and G. barbadense.
Description: RNA was extracted from whole seedlings, leaves, roots, floral organs, and fiber
of each accession (Acala Maxxa and Tx2094, G. hirsutum; K101 and S6, G. barbadense). RNA pooled
in equimolar amounts for each sample. cDNA was generated using a poly-T primer and amplified
using the Clontech SMRT technology. Samples were normalized using the Evrogen Trimmer Kit (DSN).
Some samples were sequenced after cutting off the poly-A tails using MmeI nuclease. For other
samples, the 5' and interior portions of the transcripts were preferentially amplified using
ligation mediated PCR-suppression. Samples were sequenced on both FLX and Titanium 454 techologies.
Sequence of expressed genes were also generated by Illumina sequencing from 10 and 20 dpa fiber
(days post-anthesis). From each stage, sequence was generated from A2 (G. arboreum), D5
(G. raimondii), Acala Maxxa (G. hirsutum), TX2094 (G. hirsutum), Pima-S6 (G. barbadens), and
K101 (G. barbadense). Sequence was generated using Illumina's recommended protocols, including
cDNA synthesis, cluster generation, and sequencing. These reads were used to validate and identify
SNPs between the A and D genomes of diploid cotton and SNPs between the A- and D-genomes of
Additional information and data
Whole genome DNA samples were extracted using a modified CTAB protocol. Twelve samples were tagged
with multiplex identifiers (MIDs), each with a different MID. They were idependently hybridized to
a custom Nimblegen Sequence Capture Array (12-plex). After washing, the samples were all eluted
into a single tube and subjected to ligation-mediated PCR (i.e. PCR using the 454 sequence adapters)
to amplify the captured fragments to a high enough concentration for sequencing. Subsequently, the
libraries were prepared for 454 sequencing (size selection on a Capliper XL, emPCR, and bead
preparation etc.). The twelve captured samples tagged with multiplex identifiers (MIDs) were run
on a single plate (2 large regions) of 454. The image files from the sequence were processed with
'less stringent' settings when compared to the default settings for the 454 software pipeline.
Namely, BadFlowThreshold from 4 (default) to 8; LastFlowToTest from 320 (default) to 240;
TrimBackScaleFactor 0.7 (default) to 1; errorQscoreWindowTrim from 0.01 (default) to 0.02;
QScoreTrimBackScaleFactor from 0.9 (default) to 1.0. The samples were prepared using 'traditional'
WGS libraries from Roche. Below is a list of the samples and MIDs included in these runs (Capture 1 - 4).
Download links for capture (e.g., fasta and ace) and sample specific (e.g., fasta, fastq and sff) data files are also included in the table.
We welcome your comments and suggestions.