Cotton Oligonucloetide Microarray


Analysis Aids

         Analysis aids for our updated version of the Cotton Oligonucleotide Microarray (v2) are available as file downloads. The aids include files containing array information, basic probe information and probe annotation data.

Array information:

The Cotton Oligonucleotide Microarray (v2) GAL file is available for download.

Basic probe information:

Probe sequence information is available for download in Excel spreadsheet format.

Unigene assignment of individual probes:

As part of our annotation effort, oligo probes were tested (Vmatch) for high quality alignments to the latest estinformatics assembly for Gossypium (assembly date: 12/31/2006). A fasta file was generated with the probe names mapped to their respective gene hit in the assembly. The name of each fasta entry includes the estinformatics contig/singleton ID and, in the description section of the fasta header, the probes that match that contig and their corresponding Vmatch scores.

This FASTA file,, is available for download.

A spreadsheet containing probe names, assembly names and alignment data,, is available for download.

Also, a fasta file,, containing the Cotton12 contig sequences is available for download..

Unigene blastx annotation:

The probe related ESTinformatics sequences were tested for BLASTX (--evalue 1e-5, --num_hits 20 (-b and -v options), and softmasking -F "m S") generated alignments to both the Arab TAIR7 pep and nr databases. The BLAST output files in XML format are available for both the Arab TAIR7 pep and nr databases.

Post Cotton12 unigene sets:

To take account of the growth in available ESTs, additional assemblies have been performed. They are listed below.

The Cotton16 assembly was employed in the design of a NimbleGen microarray platform capable of measuring global gene expression in Gossypium species. The FASTA file,, is available for download.

A post Cotton16 assembly, incorporating additional EST resources, was generated and its FASTA file, is available for download.

Cotton32 our latest unigene set:

Cotton32 is an assembly of 454 reads and all the Sanger ESTs in Genbank (as of Nov 2008). It contains 1.8 million Sanger and 454 reads (G. hirsutum, G. raimondii, and G. arboreum) that are assembled into 71,568 contigs (27,951 contigs > 500 bp). Of this large assembly, ~350,000 are long Sanger reads and the remainder is composed of shorter (~200 bp) 454 FLX reads (722,578 GA, 577,375 GH, 144,236 GR). Each of the 454 FLX libraries are composed of equi-molar amounts of various cotton tissues including gynoecium, calyx, fiber, roots, whole seedlings (stem & leaf), petals, and leaves. One of the libraries (G. raimondii) had a large proportion of the cDNA library adapter embedded in >80K 454 sequences. Cotton32 used the adapter sequence as a 'vector' sequence to remove/ignore during assembly. A subsequent 454 run of G. raimondii mix RNAs is forthcoming (454 Titanium) to compensate for the low numbers of D-genome reads.
The FASTA file,, is available for download.

D-genome assembly:

The JGI is sequencing the D-genome of cotton according to the plan previously described by cotton researchers ( Some of the data is publicly accessible, particularly the 454 reads and Illumina reads contributed by Monsanto. As a public service prior to the draft release, we have assembled the 454 reads with newbler using default parameters. The data is available here, but use at your own risk. This data has not been curated nor has it been checked for assembly error. The Contig and Scaffold ID will change with the pending release of the genome sequence during Fall 2011.
The 454 assembly files are available for download.

We welcome your comments and suggestions.