Validation of interactions within interaction datasets. (a) The fraction of interactions in each dataset supported by multiple validations (that is, different publications or types of experimental evidence). (b) The fraction of interactions in each indicated dataset supported by more than one publication or type of experimental evidence. (c) Better studied proteins or genes, as defined by the number of supporting publications relative to node connectivity (designated bias, see Materials and methods), tend to be more highly connected within the physical or genetic networks. (d) The study bias towards essential genes in each dataset. (e) The distribution of conserved proteins in interaction datasets. Frequency refers to fraction of the dataset in each bin. Orthologous eukaryotic clusters for seven standard species (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Encephalitozoon cuniculi) were obtained from the COG database . Sc refers to all budding yeast proteins as a reference dataset; non-LC refers to all HTP interactions except those that overlap with the LC datasets; X refers to yeast genes that were not assigned to any of the COG clusters and contains yeast-specific genes in addition to genes that have orthologs in only one of the other six species.
Reguly et al. Journal of Biology 2006 5:11 doi:10.1186/jbiol36