Predicting GO-BP categories of mouse genes using microarray data and SVMs. (a) The number of the 992 initial GO-BP categories exceeding the indicated precision value, with recall fixed for each line; for example at 40% recall (green line), around 100 categories achieve precision of 30%. To estimate the significance of the colored lines, we repeated their calculation after permuting the gene labels in the annotation database. The dotted gray line indicates the maximum number of GO categories that achieve the indicated precision, with recall of 10% or greater. The dotted magenta line indicates the result obtained using 'binary' expression data (expressed/not expressed) in each tissue. (b) The number of genes with predicted GO-BP categories (blue line) or superGO categories (red line) at varying precision values. The individual predictions are given in the Additional data files. (c) Comparison of the overall predictive capacity of three data sets, restricted to the 13 tissues and 1,800 genes shared by all three data sets. Each of the lines corresponds to the 30% recall line in (a). All of the lines are to the lower right of those in (a), since fewer genes and tissues were used. (d) A histogram comparing the precision of predictions derived from lists of tissue-specific genes with the precision of predictions from SVMs. For each category, the tissue-specific list yielding the highest precision value was identified, along with its associated recall value, and the SVM precision for the same category at the same recall value was identified. The difference between the two precision values is plotted for each category, such that instances where the SVM is superior are to the right of center.
Zhang et al. Journal of Biology 2004 3:21 doi:10.1186/jbiol16