The defining feature of
oligonucleotide expression arrays
is the use of several probes to assay each targeted transcript.
This is a bonanza for the statistical geneticist, who can create
probeset summaries with specific characteristics.
There are now several methods available for summarizing probe level
data from the popular Affymetrix GeneChips, and it can be difficult
to identify the method best suited to a given inquiry.
We have developed a graphical tool to evaluate summaries of
Affymetrix probe level data. Plots and summary statistics offer a
picture of how an expression measure performs in several important
areas. This picture facilitates the comparison of competing
expression measures and the selection of methods suitable for a
specific investigation. The key is a benchmark consisting of one
or two spike-in studies and, optionally, a dilution study (details
below). Because the truth is
known for these data, it is possible to identify statistical features
of the data for which the expected outcome is known in advance.
Those features highlighted in our suite of graphs are justified by
questions of biological interest, and motivated by the presence of
appropriate data.
In conjunction with the release of a graphics toolbox as part of
the Bioconductor Project,
we have created this web-based tool.
We invite all interested parties to put their probe summary methods
to the test in a friendly competition.
See the submission form below.
Download the benchmark data and develop one or more probe summaries.
Return the expression-level data, and we'll tell you how you did on this
set of tasks.
The
new assessments
(and the
original assessments
)
show how everyone is doing.
Summaries need not be serious attempts at a complete expression measure.
The submission form contains a check-box for exclusion from the competition.
If you are interested in normalization, run competing normalization
procedures, take a simple average over probes in a set and see how the
different methods do. The goal is threefold. In addition to vetting
the toolbox and competing for bragging rights, this will be an
opportunity to systematically examine the strengths and weaknesses of the
various approaches to probeset summary.
For more details, read
the manuscript [pdf].
Affymetrix's Spike-in hgu95a Experiment CEL files
[gzip-compressed tar-archive]
Affymetrix's Spike-in hgu133a Experiment CEL files
[gzip-compressed tar-archive]
Request
Gene Logic's Dilution Experiment CEL files [available only on CD/DVD]
In the event of problems, contact Gene Logic
directly by
telephone
or e-mail.
Submission form
Data and instructions
x is matrix with probe set IDs
as rownames and the filenames as colnames, the call
should do the trick.
write.table(data.frame(x, check.names=FALSE), file="filename.csv", sep=",", col.names=NA, quote=FALSE)
This website designed and maintained by
Rafael A. Irizarry
and
Harris A. Jaffee