Developing an enhanced EST resource for cotton
Here we present a vastly expanded cotton EST assembly, which contains approximately 4.4 million Sanger
and next-generation (454) transcripts. Like previous assemblies , this one incorporates ESTs from
both the A- and D-genome diploid progenitors, along with allopolyploid ESTs from two species of
allopolyploid cotton, G. barbadense and G. hirsutum. The 56,373 contigs extracted from
this assembly represent a vastly expanded representation of the genic content of cotton. We describe
this collection and document its utility for genome-specific transcriptome analysis in allopolyploid
cotton. We also present a characterization of the functional properties of the cotton transcriptome
and analyses of molecular evolution following the most recent whole genome duplication that accompanied
allopolyploid formation 1-2 million years ago[2, 3].
To add additional depth to the assembly, we also generated ~152 million 82 bp Illumina reads, representing
the fiber transcriptome of diploid A- and D-genome cotton as well as the allopolyploids G. barbadense
and G. hirsutum. Together these resources allow us to detect 259,192 genome-specific SNPs, which in
turn can be used to distinguish the A- and D-genome homoeologs found in the allopolyploid cotton genome.
At the time of writing, allopolyploid cotton is now among the most important crops lacking a whole genome
sequence, but as progress is made in this regard, the EST assembly and genome-specific SNP resources
presented here will be of use in assembling and annotating the cotton genome.
Plant material and EST library construction and sequencing 454-FLX and Titanium ESTs were derived from
various Gossypium species and tissue types. RNA was independently extracted from each tissue source
using a modified hot-borate method (Wilkins and Smart, 1996) and checked for integrity on
Bioanalyzer (Agilent Technologies, Santa Clara, CA). Equimolar amounts of RNA from each extraction were
combined into a single sample for cDNA library construction. cDNA libraries were constructed using SMART
method (Clontech, Mountain View, CA) and the resulting amplified, double- stranded libraries were
normalized using a double-strand nuclease (Trimmer, Evrogen, Moscow, Russia). To prevent poly-A (or poly-T)
homopolyers in the 454 reads, we employed two strategies. The first strategy was applied to the FLX reads
where a TypeIIS endonuclease was used to cleave 18-20 bp of transcript from a modified 3' SMART
adapter (K. Delehaunty, personal communication). The second strategy, used for the 454 Titanium reads,
employed PCR-suppression oligos to target particular regions in the transcript (5', internal, or 3').
5', internal, and 3' transcript segments were pooled for cDNA sequencing of the G. raimondii sample.
Only 5' and internal segments were pooled for Titanium sequencing of G. hirsutum (Tx2094) and
G. barbadense (K101 and S6). DNA sequencing was performed using 454 sequencing (454 Life Sciences,
Branford, CT) at the Brigham Young University DNA sequencing center (FLX and Titanium) and Washington
University (FLX). The reads have been made publically available through NCBI's Sequence Read Archive
(Study #SRP001603). All publicly available Sanger reads were downloaded from GenBank (Feb. 2009) and
filtered for duplicate, vector, and low-complexity sequences. Short ESTs (< 30 bp) and low-quality sequence
(quality limit = 0.5) were removed from the 454 reads.
- 1) Udall JA, Swanson JM, Haller K, Rapp RA, Sparks ME, Hatfield J, Yu Y, Wu Y, Dowd C, Arpat AB,
Sickler BA, Wilkins TA, Guo JY, Chen XY, Scheffler J, Taliercio E, Turley R, McFadden H, Payton P,
Klueva N, Allen R, Zhang D, Haigler C, Wilkerson C, Suo J, Schulze SR, Pierce ML, Essenberg M,
Kim H, Llewellyn DJ et al.: A global assembly of cotton ESTs. Genome Res 2006, 16:441-450.
- 2) Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics 2009, 25:1754-1760.
- 3) Wendel JF, Brubaker CL, Alvarez I, Cronn RC, Stewart JM: Evolution and natural history of the
cotton genus. In: Genomics of cotton, plant genetics and genomics; crops and models 3.
Edited by Paterson AH. New York: Springer; 2009: 3-22.
We welcome your comments and suggestions.