Analysis aids for our updated version of the Cotton Oligonucleotide Microarray (v2)
are available as file downloads. The aids include files containing array information, basic probe information
and probe annotation data.
The Cotton Oligonucleotide Microarray (v2) GAL file is available for
Basic probe information:
Probe sequence information is available for
in Excel spreadsheet format.
Unigene assignment of individual probes:
As part of our annotation effort, oligo probes were tested (Vmatch)
for high quality alignments to the latest estinformatics
assembly for Gossypium (assembly date: 12/31/2006). A fasta file was generated with the probe names mapped
to their respective gene hit in the assembly. The name of each fasta entry includes the estinformatics
contig/singleton ID and, in the description section of the fasta header, the probes that match that contig
and their corresponding Vmatch scores.
This FASTA file, CEGCprobes_ESTinformatics.zip, is available for
A spreadsheet containing probe names, assembly names and alignment data, CEGCprobes_ESTinformatics_table.zip, is available for
Also, a fasta file, Cotton12.long_names.zip, containing the Cotton12 contig sequences is available for
Unigene blastx annotation:
The probe related ESTinformatics sequences were tested for BLASTX
(--evalue 1e-5, --num_hits 20 (-b and -v options), and softmasking -F "m S")
generated alignments to both the Arab TAIR7 pep and nr databases. The BLAST output files in XML format
are available for both the Arab TAIR7 pep
and nr databases.
Post Cotton12 unigene sets:
To take account of the growth in available ESTs, additional assemblies have been performed. They are listed below.
The Cotton16 assembly was employed in the design of a NimbleGen microarray platform capable of measuring global gene expression
in Gossypium species. The FASTA file, Cotton16.zip, is available for
A post Cotton16 assembly, incorporating additional EST resources, was generated and its FASTA file,
Gossypium.assembly.2006.Dec.31.filtered.new.assembly.zip is available for
Cotton32 our latest unigene set:
Cotton32 is an assembly of 454 reads and all the Sanger ESTs in Genbank (as of Nov 2008).
It contains 1.8 million Sanger and 454 reads (G. hirsutum, G. raimondii,
and G. arboreum) that are assembled into 71,568 contigs (27,951 contigs > 500 bp).
Of this large assembly, ~350,000 are long Sanger reads and the remainder is composed of
shorter (~200 bp) 454 FLX reads (722,578 GA, 577,375 GH, 144,236 GR). Each of the 454 FLX
libraries are composed of equi-molar amounts of various cotton tissues including gynoecium,
calyx, fiber, roots, whole seedlings (stem & leaf), petals, and leaves. One of the libraries
(G. raimondii) had a large proportion of the cDNA library adapter embedded in >80K 454
sequences. Cotton32 used the adapter sequence as a 'vector' sequence to remove/ignore during
assembly. A subsequent 454 run of G. raimondii mix RNAs is forthcoming (454 Titanium)
to compensate for the low numbers of D-genome reads.
The FASTA file, Cotton32_fasta.zip, is available for
The JGI is sequencing the D-genome of cotton according to the plan previously described by cotton researchers
Some of the data is publicly accessible, particularly the 454 reads and
Illumina reads contributed by Monsanto. As a
public service prior to the draft release, we have assembled the 454 reads with newbler using default parameters. The data is available
here, but use at your own risk. This data has not been curated nor has it been checked for assembly error. The Contig and Scaffold ID
will change with the pending release of the genome sequence during Fall 2011.
The 454 assembly files are available for download.