<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD 2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
	<front>
		<journal-meta>
			<journal-id journal-id-type="nlm-ta">J Proteomics Bioinform</journal-id>
			<journal-id journal-id-type="publisher-id">opg</journal-id>						
			<journal-title>Journal of Proteomics &amp; Bioinformatics</journal-title>			 
			<issn pub-type="epub">0974-276X</issn>
			<publisher>
				<publisher-name>OMICS Publishing Group</publisher-name>
				<publisher-loc>India, USA</publisher-loc>
			</publisher>
		</journal-meta>
		<article-meta>			
			<article-id pub-id-type="publisher-id">000063</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Review Article</subject>
				</subj-group>
				<subj-group subj-group-type="Discipline">
					<subject>Biochemistry</subject>
				</subj-group>
				<subj-group subj-group-type="System Taxonomy">
					<subject>Proteomics</subject>
					<subject>Bioinformatics</subject>
					<subject>Genomics</subject>
					<subject>Transcriptomics</subject>
					<subject>Biomarkers</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>Integration and Prediction of PPI Using Multiple Resources from Public Databases</article-title>
			</title-group>
			<contrib-group>
				<contrib contrib-type="author">
					<name>
						<surname>Aragues</surname>
						<given-names>Ramón</given-names>
					</name>
					<xref ref-type="corresp" rid="cor1">&ast;</xref>					
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>García-García</surname>
						<given-names>Javier</given-names>
					</name>
					<xref ref-type="corresp" rid="cor1">&ast;</xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Oliva</surname>
						<given-names>Baldo</given-names>
					</name>
					<xref ref-type="corresp" rid="cor1">&sect;</xref>
				</contrib>
			</contrib-group>
			<aff id="af1">Structural Bioinformatics Lab. (GRIB). Universitat Pompeu Fabra-IMIM. Barcelona Research Park of Biomedicine (PRBB). 08003-Barcelona, Catalonia, Spain.</aff>
			 <author-notes>
				<corresp id="cor1">&sect; To whom correspondence should be addressed: Baldo Oliva, Structural Bioinformatics Lab. (GRIB). Universitat Pompeu Fabra-   IMIM. Barcelona Research Park of Biomedicine (PRBB), 08003-Barcelona, Catalonia, Spain, E-mail: <email>boliva@imim.es</email></corresp>
			</author-notes>
			<pub-date pub-type="collection">
			     <month>07</month>
				 <year>2008</year>
			</pub-date>
			<pub-date pub-type="epub">
				<day>17</day>
				<month>07</month>
				<year>2008</year>
			</pub-date>			
			<volume>1</volume>
			<issue>4</issue>
			<fpage>166</fpage>
			<lpage>187</lpage>
			<history>
			<date date-type="received">
			     <day>24</day>
				 <month>06</month>
				 <year>2008</year>
			</date>
			<date date-type="accepted">
			      <day>16</day>
				  <month>07</month>
				  <year>2008</year>
			</date>
			</history>
			<permissions>						 
			<copyright-statement><bold>Copyright:</bold> &copy; 2008 Ramón A, etal.</copyright-statement>
			<copyright-year>2008</copyright-year> 
			<license license-type="open access">
			<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</p>
			</license>
			</permissions>				
			<abstract>				
				<p><bold>Background:</bold> The analysis and usage of biological data is hindered by the spread of information across multiple repositories and the difficulties posed by different nomenclature systems and storage formats. In particular, the study and use of protein-protein interactions is one area where there is an important need for data integration. Without good integration strategies, it is difficult to assess how much interaction data is available and its properties.</p>
				<p><bold>Results:</bold> We present a data integration approach for protein-protein interactions. This integrative approach has been implemented into PIANA, a protein-protein interaction software framework under the GNU Public License <ext-link ext-link-type="uri" xlink:href="www.sbi.imim.es/piana">http://sbi.imim.es/piana</ext-link>. We find that the integrated network of interactions shows properties very similar to those observed in previously reported protein interaction networks. We also find that interaction prediction methods find interactions for many proteins for which experimental methods have not produced any information.</p>
				<p><bold>Conclusions:</bold> PIANA´s approach to protein interaction data integration solves many of the nomenclature issues common to systems dealing with biological data. The concept presented here can be extended to other types of biological data. The integration of all available protein interaction data is fundamental to obtaining a comprehensive picture of the interactions taking place in the cell. </p>
			</abstract>
			<kwd-group>
				<kwd>Protein-protein interaction</kwd>
				<kwd>database integration</kwd>
				<kwd>protein identifiers</kwd>
			</kwd-group>
			<custom-meta-wrap>
				<custom-meta>
					<meta-name>citation</meta-name>
					<meta-value>Aragues R, García-García J, Oliva B. (2008) Integration and Prediction of PPI Using Multiple Resources from Public Databases.</meta-value>
				</custom-meta>
			</custom-meta-wrap>
		</article-meta>
	</front>
	<body>
		<sec id="s1">
			<title>Introduction</title>
				<p>The completion of genome sequencing projects stimulated the development of high-throughput experimental methods aimed at functional characterization of the discovered genes. In particular, the identification of protein-protein interactions has been accelerated by the development of new technologies such as two-hybrid assays (<xref ref-type="bibr" rid="r42">Parrish et al., 2006</xref>; <xref ref-type="bibr" rid="r47">Rual et al., 2005</xref>; <xref ref-type="bibr" rid="r54">Stelzl et al., 2005</xref>) and affinity purifications followed by mass spectrometry (<xref ref-type="bibr" rid="r22">Gavin et al., 2006</xref>; <xref ref-type="bibr" rid="r35">Krogan et al., 2006</xref>; <xref ref-type="bibr" rid="r46">Puig et al., 2001</xref>). Thus, a vast amount of protein-protein interaction data has been collected, including proteome-scale interactome maps for yeast (<xref ref-type="bibr" rid="r30">Ito et al., 2001</xref>; <xref ref-type="bibr" rid="r56">Uetz et al., 2000</xref>), fly (<xref ref-type="bibr" rid="r24">Giot et al., 2003</xref>) and worm (<xref ref-type="bibr" rid="r37">Li et al., 2004</xref>), and a partial map for human (<xref ref-type="bibr" rid="r47">Rual et al., 2005</xref>;<xref ref-type="bibr" rid="r54">Stelzl et al., 2005</xref>). In addition to providing insights about biological systems (<xref ref-type="bibr" rid="r4">Barabasi et al., 2004;</xref> <xref ref-type="bibr" rid="r12">Cusick et al., 2005</xref>), protein interaction maps can be used to infer the function of proteins (<xref ref-type="bibr" rid="r52">Sharan et al., 2007</xref>), detect remote homologs (<xref ref-type="bibr" rid="r18">Espadaler et al., 2005a</xref>) and to identify the binding sites of a protein (<xref ref-type="bibr" rid="r34">Kim et al., 2006</xref>).</p>
                <p>However, interaction data is spread across multiple repositories and codified using various nomenclature systems (<xref ref-type="bibr" rid="r39">Mathivanan et al., 2006</xref>). In consequence, experimental biologists face difficulties when trying to find all known interactions for their proteins of interest, and the computational analysis and usage of protein interaction data is usually constrained to using a partial subset of all available knowledge. For example, any comprehensive search of interactions for a particular protein must include at least seven databases of protein-protein interactions: the Database of Interacting Proteins (DIP) (<xref ref-type="bibr" rid="r48">Salwinski et al., 2004</xref>), the MIPS database of interactions (<xref ref-type="bibr" rid="r41">Pagel et al., 2005</xref>), the Molecular INTerations database (MINT) (<xref ref-type="bibr" rid="r13">Chatr-aryamontri et al., 2007</xref>), IntAct (<xref ref-type="bibr" rid="r32">Kerrien et al., 2007</xref>), the Biomolecular Interactions Database (BIND) (<xref ref-type="bibr" rid="r2">Alfarano et al., 2005</xref>), the BioGrid (<xref ref-type="bibr" rid="r53">Stark et al., 2006</xref>) and Human Protein Reference Database (HPRD) (<xref ref-type="bibr" rid="r44">Peri et al., 2003</xref>).</p>
			<p>Besides, each database uses different strategies for identifying proteins, and translations between synonym identifiers (i.e. identifiers linked to the same protein sequence) are required before any manual search or automatic processing. Moreover, there are methods for predicting protein interactions that can be used when no experimental interactions have been detected for a protein, but results from these methods are usually spread across multiple websites, each one in its own format.</p>
			<p>There are efforts to standardize and harmonize protein interaction data. HUPO-PSI (<xref ref-type="bibr" rid="r27">Hermjakob, 2006</xref>) has developed a schema that enables the description of interactions between a wide range of molecular types, thus facilitating the access and data exchange between different research groups. The IMEx consortium (<xref ref-type="bibr" rid="r40">Orchard et al., 2007</xref>) is a group of major public interaction data providers sharing curation effort and exchanging completed records on molecular data following the HUPO standard exchange format. In consequence, the rate of data curation and data sharing between different repositories has been improved, but integration is still not completed. For example, HUPO PSI-MI 2.5 format allows the identification of interactors by unique identifiers from different databases, but the guidelines implemented do not include a strategy for naming proteins, which leaves unresolved many of the integration issues. </p>
			<p>The issue of protein nomenclature has been addressed by internationally recognized scientific organizations like HGNC(<xref ref-type="bibr" rid="r58">Wain et al., 2002</xref>) and SGD (<xref ref-type="bibr" rid="r16">Christie et al., 2004</xref>), but they do not cover all species and do not map all database identifiers. IPI (<xref ref-type="bibr" rid="r33">Kersey et al., 2004</xref>) offers a non-complete redundant data set with cross-references with external identifiers.</p>
			<p>The importance of protein interactions analysis has prompted the development of tools focused on protein interaction networks and their visualization, analysis and data integration (<xref ref-type="bibr" rid="r1">Aittokallio et al., 2006</xref>; <xref ref-type="bibr" rid="r10">Cline et al., 2007</xref>). For example, Cytoscape is focused in centralizing network analysis tools on a single platform with built-in visualization (<xref ref-type="bibr" rid="r51">Shannon et al., 2003</xref>). Other visualization and analysis tools include Osprey (<xref ref-type="bibr" rid="r7">Breitkreutz et al., 2003</xref>), VisANT (<xref ref-type="bibr" rid="r28">Hu et al., 2004</xref>), and ProViz (<xref ref-type="bibr" rid="r29">Iragne et al., 2005</xref>). On the other hand, current packages aimed at data integration include tYNA (<xref ref-type="bibr" rid="r60">Yip et al., 2006</xref>), a web system for managing, comparing and mining multiple networks, and cPath (<xref ref-type="bibr" rid="r9">Cerami et al., 2006</xref>), a platform for collecting and storing biological pathways that can be used from third party softwares for visualization and analysis. Some other works provide merged views of most public interaction data, such as MiMI (<xref ref-type="bibr" rid="r31">Jayapandian et al., 2007</xref>), APID (<xref ref-type="bibr" rid="r45">Prieto et al., 2006</xref>), and UniHI (<xref ref-type="bibr" rid="r14">Chaurasia et al., 2007</xref>).</p>
			<p>While these tools have been shown to be useful for creating and analyzing protein-protein interaction networks, there is still the need for an integration engine that truly unifies all available data into a single network and allows automatic analyses on a global scale. Most current integration tools are designed to work with interactions coming from one single type of data format, and others have problems when dealing with interactions codified using different types of protein identifiers.</p>
			<p>Recently, a number of studies have examined the protein interaction data available in the public domain (<xref ref-type="bibr" rid="r21">Futschik et al., 2007;</xref> <xref ref-type="bibr" rid="r26">Hart et al., 2006;</xref> <xref ref-type="bibr" rid="r39">Mathivanan et al., 2006</xref>). Pandey and coworkers (<xref ref-type="bibr" rid="r39">Mathivanan et al., 2006</xref>) analyzed experimentally detected human interactions from multiple databases, concluding that repositories show little overlap among them. Herzel et al. (<xref ref-type="bibr" rid="r21">Futschik et al., 2007</xref>) also compared human interaction maps, but added interaction predictions to the list of analyzed repositories.</p>
			<p>They concluded that the overlap between repositories is small but significant, and showed that the different interaction maps suffer from sampling and detection biases. The integration strategy of both works consisted in mapping all binary interactions to pairs of Entrez Gene identifiers. Marcotte and coworkers (<xref ref-type="bibr" rid="r26">Hart et al., 2006</xref>) analyzed yeast and human interaction data sets, and estimated that their protein interaction networks should contain 37,800-75,500 and 154,000-369,000 interactions respectively.</p>
			<p>In a recent work, we presented PIANA (Protein Interactions And Network Analysis), a framework for creating, managing and analyzing protein-protein interactions (<xref ref-type="bibr" rid="r3">Aragues et al., 2006</xref>). Here, we describe the PIANA approach to protein nomenclature and its strategy to proteinprotein interaction data integration. Furthermore, we describe the properties of the experimental interaction network obtained for all species by integrating interactions from DIP (
<xref ref-type="bibr" rid="r48">Salwinski et al., 2004</xref>), MIPS (<xref ref-type="bibr" rid="r41">Pagel et al., 2005</xref>), MINT (<xref ref-type="bibr" rid="r13">Chatr-aryamontri et al., 2007</xref>), IntAct (<xref ref-type="bibr" rid="r32">Kerrien et al., 2007</xref>), BIND (<xref ref-type="bibr" rid="r2"> Alfarano et al., 2005</xref>), BioGrid (<xref ref-type="bibr" rid="r53">Stark et al., 2006</xref>) and HPRD (<xref ref-type="bibr" rid="r44">Peri et al., 2003</xref>). We also describe the properties of the interaction networks obtained from different methods of protein-protein interaction prediction. We conclude by discussing potential enhancements to the integration approach here described.</p>
</sec>
		<sec sec-type="methods">
			<title>Materials and Methods</title>
				<sec>
					<title>Interaction Networks Based on ProteinIDs and other External Identifiers</title>
						<p>Interaction networks are built using proteinIDs as nodes (see sections ‘PIANA and protein identifiers’ and ‘Proteinprotein interactions integration’). When translating the nodes of the network to external protein identifiers (process referred as ‘unifying the network’), there are two possibilities: 1) one proteinID corresponds to a single external identifier and 2) different proteinIDs correspond to the same identifier, and thus, nodes and interactions are merged. Therefore, the same PIANA proteinID network will correspond to different unified networks, depending on the external identifier. Statistics in this article have been obtained after unifying the networks by NCBI geneID. Although geneIDs only cover 42% of proteinIDs, the cardinality proteinID:externalIdentifier is the highest (<xref ref-type="table" rid="t1">Table 1</xref>), and therefore geneID is the best suited identifier type for obtaining an unbiased view of the integrated protein interaction network. Protein sequences of unknown geneID were unified using UniProt accessions.</p>				
				</sec>
				<sec>
					<title>Methods for the Prediction of Protein Interactions</title>
						<p>We used predictions of protein-protein interactions obtained by four different methods: (i) Gene fusion, in which two proteins are predicted to interact if their corresponding genes appear fused in another genome (<xref ref-type="bibr" rid="r17">Enright et al., 1999</xref>); (ii) Phylogenetic profiles, in which similarity of phylogenetic profiles is interpreted as being indicative of two proteins need to be simultaneously present to perform a given function together (<xref ref-type="bibr" rid="r43">Pellegrini et al., 1999</xref>); (iii) Distant conservation of sequence patterns and structure relationships, in which structural similarities among domains of known interacting proteins and conservation of pairs of sequence patches involved in protein–protein interfaces are used to predict putative protein interaction pairs (<xref ref-type="bibr" rid="r20">Espadaler et al., 2005b</xref>); and (iv) Structural interologs, in which interactions are transferred between proteins with the same structural domains (<xref ref-type="bibr" rid="r3">Aragues et al., 2006</xref>). Interactions for the two first methods were retrieved from STRING (<xref ref-type="bibr" rid="r57">von Mering et al., 2007</xref>) by querying the database for interactions with a score higher than 0.7 for that particular methodology. Interactions for (iii) were obtained from the work of Espadaler et al. (<xref ref-type="bibr" rid="r20">Espadaler et al., 2005b</xref>). Interactions for (iv) were predicted by transferring experimental interactions in PIANA between proteins with a domain within the same SCOP family.</p></sec>				
		</sec>
		<sec id="s3">
			<title>Results</title>
				<sec>
					<title>Overview</title>
						<p>PIANA (Protein Interactions And Network Analysis) (Aragues et al., 2006) is a software framework capable of (i) integrating multiple sources of information into a single relational database (see database design on additional file 1); (ii) creating and analyzing protein interaction networks; and (iii) mapping multiple types of biological data onto protein interaction networks. PIANA code and documentation are freely available under an open source license for local installation and modification <ext-link ext-link-type="uri" xlink:href="www.sbi.imim.es/piana">http://sbi.imim.es/piana</ext-link>. The data warehousing approach and software architecture of PIANA are shown in <xref ref-type="fig" rid="g1">Figure 1</xref> (see additional file X for details). The PIANA database is accessed by the Graph library through a database interface, which is also used by the PIANA library to create, manage and analyze proteinprotein interaction networks. The whole process can be controlled from a user interface module.</p>
				<fig id="g1">
					<label>Figure 1</label>
					<caption>
						<title>PIANA architecture</title>
						<p>A set a parsers inserts information from external repositories into the PIANA database.</p>												
					</caption>
					<graphic xlink:href="JPB-01-166-g001.tif"/>
				</fig>
				
				</sec>
				<sec>
					<title>Mapping Protein Identifiers</title>
						<p>PIANA handles an extensive set of protein identifiers types: UniProt entries and accessions; gene symbols; NCBI gi, geneID, Unigene and accession numbers; ENSEMBL; RefSeq; PDB; and FastA formatted sequences. PIANA internally identifies proteins with proteinIDs (integers). Each proteinID is linked to a pair [aminoacid sequence, taxonomy id], so there is a unique identifier for each protein sequence for a given organism. This allows PIANA to use the &lt;sequence, species&gt; of the protein as an inter-lingua between the external identifiers provided by the main repositories of genes and proteins. Therefore, one external protein identifier (e.g. UniProt entry THRB_HUMAN) can be associated to one or more proteinIDs (e.g. 11483), which are in turn linked to other external identifiers that are also used to represent that protein (e.g., gene symbol ‘f2’ and Unigene‘Hs.410092’). Consequently, along the different processes involved in inputting/outputting PIANA, external identifiers are ‘translated’ to proteinIDs, the desired operations are performed, and finally, if needed, proteinIDs are returned into the external identifier expected by the user (<xref ref-type="fig" rid="g2">Figure 2</xref>). This strategy reduces the ambiguity and processing problems to the minimum: there is no need for continuously translating between distinct types of protein identifiers, since all information has been previously stored by assigning it to specific proteinIDs. Furthermore, codifying interactions in terms of proteinIDs allows PIANA to capture a larger number of interactions than platforms based on third party protein identifiers.</p>
				<fig id="g2">
					<label>Figure 2</label>
					<caption>
						<title>PIANA use of proteinIDs as an interlingua between external identifiers.</title>
						<p>PIANA keeps all information in terms of proteinIDs (an integer that uniquely identifies a protein sequence of a given taxonomy). User inputs are immediately translated to proteinIDs. Once this translation has been performed, all operations are performed at the sequence level, reducing ambiguities and synonyms conversions to a minimum.</p>
					</caption>
					<graphic xlink:href="JPB-01-166-g002.tif"/>
				</fig>
				<p>Moreover, PIANA uses a number of techniques to assure the quality and completeness of the identifiers used as input/output: 1) inferring correspondences between identifiers and sequences even in the case that no external database explicitly contained the cross-reference: if one database identifies sequence A with identifier id1 and another database uses identifier id2 to sequence A, PIANA infers that id1 is equivalent to id2; 2) uniqueness of output protein identifiers: if two proteinIDs are linked to the same external identifier, those proteins are considered to be the same, and hence, merged into a single network node; 3) avoiding gene name ambiguities: thanks to integrating the species of the protein into the internal identifier, gene names are not confounded even if the same symbol is used for several species; and 4) using representative protein identifiers: (i) PIANA will use the identifier labeled as ‘preferred’ by the source database (eg. official gene symbol) unless the user says the contrary; and (ii) any input identifiers given by the user are prioritized over other identifiers in the PIANA database.</p>
				<p>Since PIANA works internally with identifiers linked to the sequence of proteins (i.e. proteinID), the output identifier that is used for proteins depends not only on the type of identifier chosen by the user (e.g. UniProt) but also on the specific results that are being outputted. The reason is that one proteinID can be associated to several external identifiers (i.e. one sequence is associated to three gene names) and consequently, one of those external identifiers has to be chosen above the others. The algorithm used to chose among external identifiers depends on the input identifiers given by the user (they are prioritized over other identifiers) and the number of external databases that linked that sequence to the identifiers. Therefore, one proteinID will not always be represented in the output by the same external identifier.</p>
				<p>Our internal protein identifiers do not distinguish between identical paralogs. We believe this distinction is not needed, since most repositories of interactions do not reach that level of specificity. Finally, proteinIDs are not intended to be new external protein identifiers, their only purpose is to be used for integration. Therefore, the way the integration is performed remains transparent to the user, whose only concern is to decide on the type of identifiers for input and output.</p>
				</sec>
				<sec>
					<title>Protein Sequences Integration</title>
						<p>Sequence and taxonomy data was obtained from (The Uniprot Consortium, 2007), NCBI GenBank (<xref ref-type="bibr" rid="r5">Benson et al., 2007</xref>) and NCBI Blast nr (Maglott et al., 2007) databases (see additional file 2 for the complete list of protein sequence repositories used). Unexpectedly, UniProt Swiss-Prot (i.e. curated sequences) and UniProt TrEMBL (i.e. predicted sequences) have a significant overlap (additional file 3). Moreover, the overlap between TrEMBL and GenBank is lower than anticipated. Cross-references between external identifiers and proteinIDs were obtained from multiple thirdparty repositories (see additional file 2). <xref ref-type="table" rid="t1">Table 1</xref> shows the coverage provided by the main protein external identifiers for all proteinIDs (i.e. pair [protein sequence, taxonomy]) in the PIANA database.</p>
				</sec>
				<sec>
					<title>MProtein-Protein Interactions Integration</title>
						<p>Each interaction described in a third-party database is‘translated’ to one or more interactions between proteinIDs. For example, if the external database contains an interaction between proteins A and B, with A corresponding to two proteinIDs (e.g 1 and 2) and B to one proteinID (e.g. 3), two interactions (1-3 and 2-3) will be inserted into the PIANA database. Both interactions will be described in the PIANA database as coming from that specific external database and labeled with the method used to detect the interaction between A and B. For example, HPRD describes an interaction between Entrez Gene 217 (mitochondrial ALDH) and Entrez Gene 3336 (heat shock protein). According to the correspondences in the PIANA database, Entrez Gene 217 corresponds to 13 different proteinIDs, and Entrez Gene 3336 corresponds to 12 proteinIDs. Therefore, PIANA will internally store the interaction between those two proteins as 156 different interactions. This methodology allows PIANA to give full control to the user: 1) interactions can be retrieved from any type of identifier; 2) a network can be created for a given external database (e.g. use only interactions from IntAct) and/or a specific method (e.g. do not use interactions detected in two hybrids assays) and/or a species (e.g. only interested in human interactions); 3) PIANA outputs can be set to use any type of protein identifier and therefore, interactions between proteinIDs are transformed to non-redundant interactions between protein identifiers (Methods). Consequently, describing interactions in terms of protein sequences instead of external identifiers provides a true integration of all known interactions into a single network, while keeping record of the source databases and detection methods associated with the interaction. Currently, PIANA can integrate interactions from DIP (<xref ref-type="bibr" rid="r48">Salwinski et al., 2004</xref>), MIPS (<xref ref-type="bibr" rid="r41">Pagel et al., 2005</xref>), MINT (<xref ref-type="bibr" rid="r13">Chatr-aryamontri et al., 2007</xref>), IntAct (<xref ref-type="bibr" rid="r32">Kerrien et al., 2007</xref>), BIND (<xref ref-type="bibr" rid="r2">Alfarano et al., 2005</xref>), BioGrid (Stark et al., 2006), HPRD (<xref ref-type="bibr" rid="r44">Peri et al., 2003</xref>), STRING(<xref ref-type="bibr" rid="r57">von Mering et al., 2007</xref>), interactions predicted by distant conservation of sequence patterns and structure relationships (<xref ref-type="bibr" rid="r20">Espadaler et al., 2005b</xref>), interactions transferred between proteins based on orthology (<xref ref-type="bibr" rid="r61">Yu et al., 2004</xref>) and, in general, any interaction data that is in tabulated or PSI-MI (<xref ref-type="bibr" rid="r27">Hermjakob, 2006</xref>) formats. See additional file 2 for the detailed description of interaction repositories that have been used in this work. Furthermore, data does not to have to be integrated indiscriminately without differentiating high-throughput versus small-scale experiments and literature annotation. Therefore, PIANA allows users to define subsets of interactions based on the source repository and detection methods employed. For example, a subset of reliable interactions can be extracted by requiring them to be in at least two different repositories.</p></sec>
				</sec>
				<sec>
					<title>Experimental Interactions</title>
						<p>The integrated set of experimental interactions consisted of 4,055,698 interactions between 113,785 different proteinIDs. When grouping proteinIDs by their associated NCBI geneID (Methods), there were 405,808 interactions for 53,143 proteins, an average of 7.63 interactions per protein.</p>
					<sec>
						<title>Interactions Distribution</title>
							<p>The experimental interactions in the PIANA integrated database have been obtained from 7 different repositories, belong to 736 different species, and were detected using 106 different experimental methods. As shown on <xref ref-type="table" rid="t2">Table 2</xref>, the species with the largest number of experimental interactions are yeast (111,535 interactions) and human (110,457 interactions). Most interactions were found in just one database and were detected by just one method (<xref ref-type="fig" rid="g3">Figure 3</xref>). The high correlation between the number of methods and databases is explained by the fact that most interactions appear in just one external repository, and these repositories usually label interactions with a single detection method. We calculated the overlap between 7 repositories with experimental information in terms of interactions (<xref ref-type="table" rid="t3">Table 3A</xref>) and proteins (<xref ref-type="table" rid="t3">Table 3B</xref>). BioGrid (<xref ref-type="bibr" rid="r53">Stark et al., 2006</xref>) is the repository with the highest number of interactions (216,370) and with the highest number of unique interactions (163,700). The two repositories that show the greatest overlap are MINT and IntAct (61% of interactions and 82% of proteins in MINT are also in IntAct) while the lowest overlap was between HPRD and DIP (only 4% of interactions and 9% of proteins in HPRD are also in DIP). Most low overlaps in terms of interactions are explained by the low overlap in terms of proteins. Therefore, data integration is required in order to obtain an interaction network that covers most proteins and interactions.</p>
							<p>We were interested in analyzing the distribution of interactions in terms of the detection method employed. We examined the overlaps between different detection methods in terms of interactions (<xref ref-type="table" rid="t4">Table 4A</xref>) and proteins (<xref ref-type="table" rid="t4">Table 4B</xref>). We observed that high-throughput methods account for most of the known interactions (126,136 for affinity methods and 103,334 for yeast two hybrid assays). The overlap between the interactions detected by the different methods is low, even in cases where the overlap at the protein level is high. For example, while 51% of proteins with interactions from affinity methods also had interactions detected by yeast twohybrid methods, only 9% of interactions from yeast twohybrid were also detected by affinity methods. Therefore, in order to maximize the number of known interactions for a protein, multiple experimental detection methods should be employed.</p>				
				<fig id="g3">
					<label>Figure 3</label>
					<caption>
						<title>Distribution of interactions in PIANA across different source databases and detection methods.</title>
						<p>Most interactions were found in just one database and were detected just by one method. Unspecific detection method names were not taken into account (e.g., experimental, in-vitro, in-vivo).</p>
					</caption>
					<graphic xlink:href="JPB-01-166-g003.tif"/>
				</fig>									
					</sec>
					<sec>
						<title>Properties of the Experimental Integrated Protein Interaction Network</title>
							<p>Well-documented observations about protein interaction networks are confirmed when analyzing the integrated experimental interaction networks of different species. Moreover, the integrated network shows the modular functional organization of the proteome reported by previous works (<xref ref-type="bibr" rid="r22">Gavin et al., 2006</xref>). In particular, proteins tend to interact with proteins of the same Gene Ontology (GO) (<xref ref-type="bibr" rid="r25">Harris et al., 2004</xref>) biological process (<xref ref-type="table" rid="t5">Table 5</xref>). Furthermore, 95% of the interacting proteins in the integrated network have the same cellular component according to GO. In addition, the following properties were observed for the yeast protein interaction network (<xref ref-type="table" rid="t6">Table 6</xref>): (i) yeast hubs (proteins with 5 or more interactions) are more likely to be essential (<xref ref-type="bibr" rid="r23">Giaever et al., 2002</xref>) than non-hubs (22% of hubs are essential versus only 5% of non-hubs), although this might be a reflection of hubs usually having multiple interfaces (<xref ref-type="bibr" rid="r34">Kim et al., 2006</xref>); (ii) approximately 59% of the interactions have the same cell localization according to (<xref ref-type="bibr" rid="r36">Lee et al., 2002</xref>); (iii) approximately 60% of the interactions reported are found coexpressed during the yeast cell cycle according to (<xref ref-type="bibr" rid="r15">Cho et al., 1998</xref>).</p>							
					</sec>
					<sec>
						<title>Protein Function Prediction from the Experimental Integrated Network</title>
							<p>Recently, it has been shown that the number of common interaction partners between two proteins can be used to annotate proteins (<xref ref-type="bibr" rid="r8">Brun et al., 2003;</xref> <xref ref-type="bibr" rid="r49">Samanta et al., 2003</xref>). We have studied the use of this heuristic to predict molecular functions and biological processes as defined by GO (<xref ref-type="bibr" rid="r25">Harris et al., 2004</xref>), by calculating the percentage of shared GO terms between proteins with common interaction partners (<xref ref-type="fig" rid="g4">Figure 4</xref>). As expected, we observe that the interactions of a protein in the integrated network can be used to predict its function and the biological processes in which it intervenes. For example, proteins with 10-20 interaction partners in common share 90% of their GO biological process terms. Moreover, the accuracy of the predictions based on the integrated network is similar to that obtained when solely using the subset of interactions from DIP (<xref ref-type="bibr" rid="r48">Salwinski et al., 2004</xref>), while the number of annotated proteins is much higher (additional file 4).</p> 
					</sec>
					<sec>
						<title>Predicted Interaction Networks</title>
							<p>We were interested in assessing protein interaction predictions and evaluating the similarities between the predicted interaction network and the experimental interaction network. In particular, we studied 4 different types of predictions (Methods): (i) Gene fusion events (<xref ref-type="bibr" rid="r17">Enright et al., 1999</xref>) as predicted by STRING(<xref ref-type="bibr" rid="r57">von Mering et al., 2007</xref>); (ii) Phylogenetic profiles (<xref ref-type="bibr" rid="r43">Pellegrini et al., 1999</xref>) as predicted by STRING (<xref ref-type="bibr" rid="r57">von Mering et al., 2007</xref>); (iii) Distant conservation of sequence patterns and structure relationships as described by Espadeler et al. (<xref ref-type="bibr" rid="r20">Espadaler et al., 2005b</xref>); and (iv) Structural interologs predicted by PIANA (<xref ref-type="bibr" rid="r3">Aragues et al., 2006</xref>). We calculated the overlap between the different experimental and prediction methods in terms of interactions (<xref ref-type="table" rid="t7">Table 7A</xref>) and proteins (<xref ref-type="table" rid="t7">Table 7B</xref>), observing a high overlap between prediction methods based on genomes analyses (i.e. gene fusion events and phylogenetic profiles) and a very low overlap between all other prediction methods. This minimal overlap between interaction predictions is explained by the different types of input data used by each method and the type of proteins for which the methods are capable of predicting interactions. For example, the method based on structural interologs predicts interactions for proteins with known 3D structure, while STRING predictions from gene fusion events were mainly applied to prokaryotes. Most proteins with known 3D structure are eukaryotes (<xref ref-type="bibr" rid="r6">Berman et al., 2000</xref>), and therefore, the two methods rarely predict similar interactions. Moreover, there is low overlap between predicted interactions and those obtained by experimental high throughput methods, both in terms of interactions and proteins. These results indicate that different methods identify interactions for different proteins. For example, there are many species for which no yeast two-hybrid experiments have been carried out, while many predictions can be ‘transferred’ to those species on the basis of genomes analysis, resulting in a low overlap at the interaction and protein level between the two methods.</p>
				<fig id="g4">
					<label>Figure 4</label>
					<caption>
						<title>Function prediction based on common interaction partners in the integrated experimental network.</title>
						<p>The percentage of shared GO terms is plotted as a function of the number of common interaction partners.</p>
					</caption>
					<graphic xlink:href="JPB-01-166-g004.tif"/>
				</fig>					
				<p>We evaluated whether interacting proteins according to different prediction methods tended to share biological process, molecular function and cellular component according to GO (<xref ref-type="table" rid="t8">Table 8</xref>). We observed that the method that better captures functional relationships between proteins is the one based on gene fusion events (Methods): 85% of the predicted interacting pairs belong to the same biological process. Moreover, all prediction methods detected a sensible number of colocalized proteins. For example, 87% of interacting proteins according to the prediction method based on structural interologs had the same cellular location.</p>				
					</sec>
					</sec>		
				<sec id="s4">
					<title>Discussion</title>
						<p>We presented the data integration approach of PIANA, a software framework designed for creating, managing and analyzing protein-protein interaction networks. PIANA was created to address nomenclature and integration issues common in protein interaction repositories and network visualization tools. Moreover, the modular approach of PIANA makes it a useful resource for bioinformaticians wishing to avoid the low-level details related to working with protein interaction networks.
</p>
		<p>Many areas of biological research are hampered by the difficulties found in accessing all biological information available. In particular, protein-protein interactions analysis is usually biased by the input sources of data. PIANA is one of the very few protein interaction platforms where all interactions from all external databases can be found for a protein of interest, regardless of the type of identifier used as input or the name given to the protein by the researcher that submitted the interactions. We presented a detailed analysis of the protein-protein interactions in the integrated network, in terms of their distribution across different databases and detection methods. We showed that most interactions appear in just one database and the overlap in terms of interactions is below 50% between most repositories, reinforcing the need for tools that unify all known interactions into a single network. Moreover, this integrated network has been shown to agree with properties previously reported about protein-protein interaction networks retrieved from just one database/detection method, such as its capability of predicting the function of proteins. Besides, the overlap between different experimental and prediction methods for protein-protein interaction identification was low, both in terms of interactions and proteins for which at least one interaction has been described. Despite this low overlap, interaction prediction approaches such as those based on gene fusion events and structural interologs were successful at identifying pairs of proteins within the same GO biological process. However, more in-depth studies are undertaken to evaluate the ability of annotating proteins based on interaction predictions (<xref ref-type="bibr" rid="r19">Espadaler et al., 2008</xref>).</p>
				<p>Our analysis of protein interaction data in the public domain is similar to the studies of Herzel et al. (<xref ref-type="bibr" rid="r21">Futschik et al., 2007</xref>) and Pandey and coworkers (<xref ref-type="bibr" rid="r39">Mathivanan et al., 2006</xref>). However, our study includes protein interactions for all species, as well as predicted interactions from diverse methods. Moreover, we have analyzed the overlap between diverse experimental and prediction methods. The main conclusions from the studies in (<xref ref-type="bibr" rid="r21">Futschik et al., 2007</xref>) and (<xref ref-type="bibr" rid="r39">Mathivanan et al., 2006</xref>) are confirmed for interactions for organisms other than human. However, we found a higher overlap between the different interaction repositories, probably due to recent efforts in data exchange. Moreover, the total number of interactions in the experimental human integrated network is 110,457, compared to the 154,000-369,000 interactions estimated by Marcotte and coworkers (<xref ref-type="bibr" rid="r26">Hart et al., 2006</xref>).</p>
		<p>PIANA's approach to data integration is a good equilibrium between reliability and flexibility, while giving a good coverage of the information available. Two potential improvements to the current integration approach are: (i) the implementation of more sophisticated gene name disambiguation (<xref ref-type="bibr" rid="r50">Schijvenaars et al., 2005</xref>; <xref ref-type="bibr" rid="r59">Xu et al., 2007</xref>); and (ii) the capability of detecting highly similar protein sequences (e.g. via sequence alignments) and thus, transferring interactions and identifiers between similar proteins. The data integration techniques described here could also be of help for areas other than protein-protein interactions, such as gene expression studies or regulatory networks.</p>
				</sec>
				<sec id="s5">
					<title>Conclusions</title>
						<p>Our approach to data integration is based on using the sequence of proteins as an interlingua between the different identifiers. This strategy allows PIANA, our proteinprotein interaction software platform to integrate data from multiple sources into a single interaction network, while allowing the user to control which interactions are used in the analyses. The low overlap found between the different repositories of interaction data reinforces the need for integration tools. Moreover, we found that the integrated network of interactions shows properties similar to those previously reported for partial interaction networks. Finally, we observed that interaction predictions are not as accurate as experimentally detected interactions in tasks such as protein annotation. However, prediction methods can help experimental methods to cover a larger portion of the interactome space.</p>
				</sec>
				<sec>
					<title>Authors’ Contributions</title>
						<p>RA designed PIANA and wrote the manuscript. JGG and RA implemented the code and performed analyses. BO conceived of the PIANA project and provided scientific guidance. JGG and BO helped draft the manuscript. All authors read and approved the final manuscript.</p>
				</sec>							
	</body>
	<back>
		<ack>
			<p>We thank members of the UPF-IMIM SBI lab and P. Boixeda for their helpful comments. R.A is supported by a grant from the Spanish Ministerio de Ciencia y Tecnología (MCyT, BIO2002-03609). J.GG. is supported by a FI grant from the Catalonian Agència de Gestió d’Ajuts Universitaris i de Recerca del Departament d’Innovació, Empresa I Universitats de la Generalitat de Catalunya. The work has been supported by grants from the Spanish Ministerio de Educación y Ciencia (MEC, BIO02005-00533, PROFIT PSE-010000-2007-1 and FIT-350300-2006-40/41/42).</p>
		</ack>
		   <ref-list>
			<title>References</title>
			<ref id="r1">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Aittokallio</surname>
							<given-names>T</given-names>
						</name>
						<name>
							<surname>Schwikowski</surname>
							<given-names>B</given-names>
						</name>										
					</person-group>
					<year>2006</year>
					<article-title>Graph-based methods for analysing networks in cell biology</article-title>
					<source>Brief Bioinform</source>
					<volume>7</volume>
					<fpage>243</fpage>
					<lpage>255</lpage>
				</citation>
			</ref>
			<ref id="r2">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Alfarano</surname>
							<given-names>C</given-names>
						</name>
						<name>
							<surname>Andrade</surname>
							<given-names>CE</given-names>
						</name>
						<name>
							<surname>Anthony</surname>
							<given-names>K</given-names>
						</name>	
						<name>
							<surname>Bahroos</surname>
							<given-names>N</given-names>
						</name>	
						<name>
							<surname>Bajec</surname>
							<given-names>M</given-names>
						</name><etal/>											
					</person-group>
					<year>2005</year>
					<article-title>The Biomolecular Interaction Network Database and related tools 2005 update</article-title>
					<source>Nucleic Acids Res</source>
					<volume>33</volume>
					<fpage>D418</fpage>
					<lpage>424</lpage>
				</citation>
			</ref>
			<ref id="r3">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Aragues</surname>
							<given-names>R</given-names>
						</name>
						<name>
							<surname>Jaeggi</surname>
							<given-names>D</given-names>
						</name>
						<name>
							<surname>Oliva</surname>
							<given-names>B</given-names>
						</name>																
					</person-group>
					<year>2006</year>
					<article-title>PIANA:protein interactions and network analysis</article-title>
					<source>Bioinformatics</source>
					<volume>22</volume>
					<fpage>1015</fpage>
					<lpage>1017</lpage>
				</citation>
			</ref>
			<ref id="r4">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Barabasi</surname>
							<given-names>AL</given-names>
						</name>
						<name>
							<surname>Oltvai</surname>
							<given-names>ZN</given-names>
						</name>																		
					</person-group>
					<year>2004</year>
					<article-title>Network biology:understanding the cell’s functional organization</article-title>
					<source>Nat Rev Genet</source>
					<volume>5</volume>
					<fpage>101</fpage>
					<lpage>113</lpage>
				</citation>
			</ref>
			<ref id="r5">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Benson</surname>
							<given-names>DA</given-names>
						</name>
						<name>
							<surname>Karsch</surname>
							<given-names>MI</given-names>
						</name>
						<name>
							<surname>Lipman</surname>
							<given-names>DJ</given-names>
						</name>	
						<name>
							<surname>Ostell</surname>
							<given-names>J</given-names>
						</name>	
						<name>
							<surname>Wheeler</surname>
							<given-names>DL</given-names>
						</name>																			
					</person-group>
					<year>2007</year>					
					<source>GenBank</source>
					<source>Nucleic Acids Res</source>
					<volume>35</volume>
					<fpage>D21</fpage>
					<lpage>25</lpage>
				</citation>
			</ref>
			<ref id="r6">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Berman</surname>
							<given-names>HM</given-names>
						</name>
						<name>
							<surname>Westbrook</surname>
							<given-names>J</given-names>
						</name>
						<name>
							<surname>Feng</surname>
							<given-names>Z</given-names>
						</name>	
						<name>
							<surname>Gilliland</surname>
							<given-names>G</given-names>
						</name>	
						<name>
							<surname>Bhat</surname>
							<given-names>TN</given-names>
						</name><etal/>																			
					</person-group>
					<year>2000</year>
					<article-title>The Protein Data Bank</article-title>
					<source>Nucleic Acids Res</source>
					<volume>28</volume>
					<fpage>235</fpage>
					<lpage>242</lpage>
				</citation>
			</ref>
			<ref id="r7">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Breitkreutz</surname>
							<given-names>BJ</given-names>
						</name>
						<name>
							<surname>Stark</surname>
							<given-names>C</given-names>
						</name>
						<name>
							<surname>Tyers</surname>
							<given-names>M</given-names>
						</name>																								
					</person-group>
					<year>2003</year>
					<article-title>Osprey:a network visualization system</article-title>
					<source>Genome Biol</source>
					<volume>4</volume>					
					<fpage>R22</fpage>
				</citation>
			</ref>
			<ref id="r8">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Brun</surname>
							<given-names>C</given-names>
						</name>
						<name>
							<surname>Chevenet</surname>
							<given-names>F</given-names>
						</name>
						<name>
							<surname>Martin</surname>
							<given-names>D</given-names>
						</name>	
						<name>
							<surname>Wojcik</surname>
							<given-names>J</given-names>
						</name>	
						<name>
							<surname>Guenoche</surname>
							<given-names>A</given-names>
						</name>																			
					</person-group>
					<year>2003</year>					
					<article-title>Functional classification of proteins for the prediction of cellular function from a in-protein interaction network</article-title>
					<source>Genome Biol</source>
					<volume>5</volume>					
					<fpage>R6</fpage>
				</citation>
			</ref>
			<ref id="r9">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Cerami</surname>
							<given-names>EG</given-names>
						</name>
						<name>
							<surname>Bader</surname>
							<given-names>GD</given-names>
						</name>
						<name>
							<surname>Gross</surname>
							<given-names>BE</given-names>
						</name>
						<name>
							<surname>Sander</surname>
							<given-names>C</given-names>
						</name>																												
					</person-group>
					<year>2006</year>
					<article-title>cPath: open source software for collecting, storing,and querying biological pathways</article-title>
					<source>BMC Bioinformatics</source>
					<volume>7</volume>					
					<fpage>497</fpage>
				</citation>
			</ref>
			<ref id="r10">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Cline</surname>
							<given-names>MS</given-names>
						</name>
						<name>
							<surname>Smoot</surname>
							<given-names>M</given-names>
						</name>
						<name>
							<surname>Cerami</surname>
							<given-names>E</given-names>
						</name>
						<name>
							<surname>Kuchinsky</surname>
							<given-names>A</given-names>
						</name>	
						<name>
							<surname>Landys</surname>
							<given-names>N</given-names>
						</name><etal/>																											
					</person-group>
					<year>2007</year>
					<article-title>Integration of biological networks and gene expression data using Cytoscape</article-title>
					<source>Nature protocols</source>
					<volume>2</volume>
					<fpage>2366</fpage>					
					<lpage>2382</lpage>
				</citation>
			</ref>
			<ref id="r11">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Crosby</surname>
							<given-names>MA</given-names>
						</name>
						<name>
							<surname>Goodman</surname>
							<given-names>JL</given-names>
						</name>
						<name>
							<surname>Strelets</surname>
							<given-names>VB</given-names>
						</name>
						<name>
							<surname>Zhang</surname>
							<given-names>P</given-names>
						</name>	
						<name>
							<surname>Gelbart</surname>
							<given-names>WM</given-names>
						</name>																											
					</person-group>
					<year>2007</year>
					<article-title>FlyBase: genomes by the dozen</article-title>
					<source>Nucleic acids research</source>
					<volume>35</volume>
					<fpage>D486</fpage>					
					<lpage>491</lpage>
				</citation>
			</ref>
			<ref id="r12">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Cusick</surname>
							<given-names>ME</given-names>
						</name>
						<name>
							<surname>Klitgord</surname>
							<given-names>N</given-names>
						</name>
						<name>
							<surname>Vidal</surname>
							<given-names>M</given-names>
						</name>
						<name>
							<surname>Hill</surname>
							<given-names>DE</given-names>
						</name>																																
					</person-group>
					<year>2005</year>
					<article-title>Interactome: gateway into systems biology</article-title>
					<source>Hum Mol Genet 14 Spec No</source>
					<volume>2</volume>
					<fpage>R171</fpage>					
					<lpage>181</lpage>
				</citation>
			</ref>
			<ref id="r13">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Chatraryamontri</surname>
							<given-names>A</given-names>
						</name>
						<name>
							<surname>Ceol</surname>
							<given-names>A</given-names>
						</name>
						<name>
							<surname>Palazzi</surname>
							<given-names>LM</given-names>
						</name>
						<name>
							<surname>Nardelli</surname>
							<given-names>G</given-names>
						</name>
						<name>
							<surname>Schneider</surname>
							<given-names>MV</given-names>
						</name><etal/>																																	
					</person-group>
					<year>2007</year>
					<article-title>MINT: the Molecular INTeraction database</article-title>
					<source>Nucleic Acids Res</source>
					<volume>35</volume>
					<fpage>D572</fpage>					
					<lpage>574</lpage>
				</citation>
			</ref>
			<ref id="r14">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Chaurasia</surname>
							<given-names>G</given-names>
						</name>
						<name>
							<surname>Iqbal</surname>
							<given-names>Y</given-names>
						</name>
						<name>
							<surname>Hanig</surname>
							<given-names>C</given-names>
						</name>
						<name>
							<surname>Herzel</surname>
							<given-names>H</given-names>
						</name>
						<name>
							<surname>Wanker</surname>
							<given-names>EE</given-names>
						</name><etal/>																																	
					</person-group>
					<year>2007</year>
					<article-title>UniHI: an entry gate to the human protein interactome</article-title>
					<source>Nucleic acids research</source>
					<volume>35</volume>
					<fpage>D590</fpage>					
					<lpage>594</lpage>
				</citation>
			</ref>
			<ref id="r15">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Cho</surname>
							<given-names>RJ</given-names>
						</name>
						<name>
							<surname>Campbell</surname>
							<given-names>MJ</given-names>
						</name>
						<name>
							<surname>Winzeler</surname>
							<given-names>EA</given-names>
						</name>
						<name>
							<surname>Steinmetz</surname>
							<given-names>L</given-names>
						</name>
						<name>
							<surname>Conway</surname>
							<given-names>A</given-names>
						</name><etal/>																																	
					</person-group>
					<year>1998</year>
					<article-title>A genome-wide transcriptional analysis of the mitotic cell cycle</article-title>
					<source>Molecular cell</source>
					<volume>2</volume>
					<fpage>65</fpage>					
					<lpage>73</lpage>
				</citation>
			</ref>
			<ref id="r16">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Christie</surname>
							<given-names>KR</given-names>
						</name>
						<name>
							<surname>Weng</surname>
							<given-names>S</given-names>
						</name>
						<name>
							<surname>Balakrishnan</surname>
							<given-names>R</given-names>
						</name>
						<name>
							<surname>Costanzo</surname>
							<given-names>MC</given-names>
						</name>
						<name>
							<surname>Dolinski</surname>
							<given-names>K</given-names>
						</name><etal/>																																	
					</person-group>
					<year>2004</year>
					<article-title>Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms</article-title>
					<source>Nucleic acids research</source>
					<volume>32</volume>
					<fpage>D311</fpage>					
					<lpage>314</lpage>
				</citation>
			</ref>
			<ref id="r17">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Enright</surname>
							<given-names>AJ</given-names>
						</name>
						<name>
							<surname>Iliopoulos</surname>
							<given-names>I</given-names>
						</name>
						<name>
							<surname>Kyrpides</surname>
							<given-names>NC</given-names>
						</name>
						<name>
							<surname>Ouzounis</surname>
							<given-names>CA</given-names>
						</name>																																						
					</person-group>
					<year>1999</year>
					<article-title>Protein interaction maps for complete genomes based on gene fusion events</article-title>
					<source>Nature</source>
					<volume>402</volume>
					<fpage>86</fpage>					
					<lpage>90</lpage>
				</citation>
			</ref>
			<ref id="r18">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Espadaler</surname>
							<given-names>J</given-names>
						</name>
						<name>
							<surname>Aragues</surname>
							<given-names>R</given-names>
						</name>
						<name>
							<surname>Eswar</surname>
							<given-names>N</given-names>
						</name>
						<name>
							<surname>Marti-Renom</surname>
							<given-names>MA</given-names>
						</name>
						<name>
							<surname>Querol</surname>
							<given-names>E</given-names>
						</name><etal/>																																								
					</person-group>
					<year>2005a</year>
					<article-title>Detecting remotely related proteins by their interactions and sequence similarity</article-title>
					<source>Proc Natl Acad Sci USA</source>
					<volume>102</volume>
					<fpage>7151</fpage>					
					<lpage>7156</lpage>
				</citation>
			</ref>
			<ref id="r19">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Espadaler</surname>
							<given-names>J</given-names>
						</name>
						<name>
							<surname>Eswar</surname>
							<given-names>N</given-names>
						</name>
						<name>
							<surname>Querol</surname>
							<given-names>E</given-names>
						</name>
						<name>
							<surname>Aviles</surname>
							<given-names>FX</given-names>
						</name>
						<name>
							<surname>Sali</surname>
							<given-names>A</given-names>
						</name><etal/>																																								
					</person-group>
					<year>2008</year>
					<article-title>Prediction of enzyme function by combining sequence similarity and protein interactions</article-title>
					<source>BMC bioinformatics</source>
					<volume>9</volume>
					<fpage>249</fpage>						
				</citation>
			</ref>
			<ref id="r20">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Espadaler</surname>
							<given-names>J</given-names>
						</name>
						<name>
							<surname>Romero</surname>
							<given-names>IO</given-names>
						</name>
						<name>
							<surname>Jackson</surname>
							<given-names>RM</given-names>
						</name>
						<name>
							<surname>Oliva</surname>
							<given-names>B</given-names>
						</name>																																												
					</person-group>
					<year>2005b</year>
					<article-title>Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships</article-title>
					<source>Bioinformatics</source>
					<volume>21</volume>
					<fpage>3360</fpage>
					<lpage>3368</lpage>						
				</citation>
			</ref>
			<ref id="r21">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Futschik</surname>
							<given-names>ME</given-names>
						</name>
						<name>
							<surname>Chaurasia</surname>
							<given-names>G</given-names>
						</name>
						<name>
							<surname>Herzel</surname>
							<given-names>H</given-names>
						</name>																																																
					</person-group>
					<year>2007</year>
					<article-title>Comparison of human protein-protein interaction maps</article-title>
					<source>Bioinformatics (Oxford, England)</source>
					<volume>23</volume>
					<fpage>605</fpage>
					<lpage>611</lpage>						
				</citation>
			</ref>
			<ref id="r22">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Gavin</surname>
							<given-names>CA</given-names>
						</name>
						<name>
							<surname>Aloy</surname>
							<given-names>P</given-names>
						</name>
						<name>
							<surname>Grandi</surname>
							<given-names>P</given-names>
						</name>
						<name>
							<surname>Krause</surname>
							<given-names>R</given-names>
						</name>	
						<name>
							<surname>Boesche</surname>
							<given-names>M</given-names>
						</name><etal/>																																																
					</person-group>
					<year>2006</year>
					<article-title>Proteome survey reveals modularity of the yeast cell machinery</article-title>
					<source>Nature</source>
					<volume>440</volume>
					<fpage>631</fpage>
					<lpage>636</lpage>						
				</citation>
			</ref>
			<ref id="r23">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Giaever</surname>
							<given-names>G</given-names>
						</name>
						<name>
							<surname>Chu</surname>
							<given-names>AM</given-names>
						</name>
						<name>
							<surname>Ni</surname>
							<given-names>L</given-names>
						</name>
						<name>
							<surname>Connelly</surname>
							<given-names>C</given-names>
						</name>	
						<name>
							<surname>Riles</surname>
							<given-names>L</given-names>
						</name><etal/>																																																
					</person-group>
					<year>2002</year>
					<article-title>Functional profiling of the Saccharomyces cerevisiae genome</article-title>
					<source>Nature</source>
					<volume>418</volume>
					<fpage>387</fpage>
					<lpage>391</lpage>						
				</citation>
			</ref>
			<ref id="r24">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Giot</surname>
							<given-names>L</given-names>
						</name>
						<name>
							<surname>Bader</surname>
							<given-names>JS</given-names>
						</name>
						<name>
							<surname>Brouwer</surname>
							<given-names>C</given-names>
						</name>
						<name>
							<surname>Chaudhuri</surname>
							<given-names>A</given-names>
						</name>	
						<name>
							<surname>Kuang</surname>
							<given-names>B</given-names>
						</name><etal/>																																																
					</person-group>
					<year>2003</year>
					<article-title>A protein interaction map of Drosophila melanogaster</article-title>
					<source>Science</source>
					<volume>302</volume>
					<fpage>1727</fpage>
					<lpage>1736</lpage>						
				</citation>
			</ref>
			<ref id="r25">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Harris</surname>
							<given-names>MA</given-names>
						</name>
						<name>
							<surname>Clark</surname>
							<given-names>J</given-names>
						</name>
						<name>
							<surname>Ireland</surname>
							<given-names>A</given-names>
						</name>
						<name>
							<surname>Lomax</surname>
							<given-names>J</given-names>
						</name>	
						<name>
							<surname>Ashburner</surname>
							<given-names>M</given-names>
						</name><etal/>																																																
					</person-group>
					<year>2004</year>
					<article-title>The Gene Ontology (GO) database and informatics resource</article-title>
					<source>Nucleic Acids Res</source>
					<volume>32</volume>
					<fpage>D258</fpage>
					<lpage>261</lpage>						
				</citation>
			</ref>
			<ref id="r26">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Hart</surname>
							<given-names>GT</given-names>
						</name>
						<name>
							<surname>Ramani</surname>
							<given-names>AK</given-names>
						</name>
						<name>
							<surname>Marcotte</surname>
							<given-names>EM</given-names>
						</name>																																																				
					</person-group>
					<year>2006</year>
					<article-title>How complete are current yeast and human protein-interaction networks?</article-title>
					<source>Genome Biol</source>
					<volume>7</volume>
					<fpage>120</fpage>							
				</citation>
			</ref>
			<ref id="r27">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Hermjakob</surname>
							<given-names>H</given-names>
						</name>																																																						
					</person-group>
					<year>2006</year>
					<article-title>The HUPO Proteomics Standards Initiative - Overcoming the Fragmentation of Proteomics Data</article-title>
					<source>Proteomics 6 Suppl</source>
					<volume>2</volume>
					<fpage>34</fpage>
					<lpage>38</lpage>								
				</citation>
			</ref>
			<ref id="r28">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Hu</surname>
							<given-names>Z</given-names>
						</name>
						<name>
							<surname>Mellor</surname>
							<given-names>J</given-names>
						</name>	
						<name>
							<surname>Wu</surname>
							<given-names>J</given-names>
						</name>	
						<name>
							<surname>DeLisi</surname>
							<given-names>C</given-names>
						</name>																																																							
					</person-group>
					<year>2004</year>
					<article-title>VisANT:an online visualization and analysis tool for biological interaction data</article-title>
					<source>BMC Bioinformatics</source>
					<volume>5</volume>
					<fpage>17</fpage>											
				</citation>
			</ref>
			<ref id="r29">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Iragne</surname>
							<given-names>F</given-names>
						</name>
						<name>
							<surname>Nikolski</surname>
							<given-names>M</given-names>
						</name>	
						<name>
							<surname>Mathieu</surname>
							<given-names>B</given-names>
						</name>	
						<name>
							<surname>Auber</surname>
							<given-names>D</given-names>
						</name>
						<name>
							<surname>Sherman</surname>
							<given-names>D</given-names>							
						</name>																																																							
					</person-group>
					<year>2005</year>
					<article-title>ProViz: protein interaction visualization and exploration</article-title>
					<source>Bioinformatics</source>
					<volume>21</volume>
					<fpage>272</fpage>	
					<lpage>274</lpage>											
				</citation>
			</ref>
			<ref id="r30">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Ito</surname>
							<given-names>T</given-names>
						</name>
						<name>
							<surname>Chiba</surname>
							<given-names>T</given-names>
						</name>	
						<name>
							<surname>Ozawa</surname>
							<given-names>R</given-names>
						</name>	
						<name>
							<surname>Yoshida</surname>
							<given-names>M</given-names>
						</name>
						<name>
							<surname>Hattori</surname>
							<given-names>M</given-names>
						</name><etal/>																																																							
					</person-group>
					<year>2001</year>
					<article-title>A comprehensive two-hybrid analysis to explore the yeast protein interactome</article-title>
					<source>Proc Natl Acad Sci USA</source>
					<volume>98</volume>
					<fpage>4569</fpage>	
					<lpage>4574</lpage>											
				</citation>
			</ref>
			<ref id="r31">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Jayapandian</surname>
							<given-names>M</given-names>
						</name>
						<name>
							<surname>Chapman</surname>
							<given-names>A</given-names>
						</name>	
						<name>
							<surname>Tarcea</surname>
							<given-names>VG</given-names>
						</name>	
						<name>
							<surname>Yu</surname>
							<given-names>C</given-names>
						</name>
						<name>
							<surname>Elkiss</surname>
							<given-names>A</given-names>
						</name><etal/>																																																							
					</person-group>
					<year>2007</year>
					<article-title>Michigan Molecular Interactions (MiMI): putting the jigsaw puzzle together</article-title>
					<source>Nucleic acids research</source>
					<volume>35</volume>
					<fpage>D566</fpage>	
					<lpage>571</lpage>											
				</citation>
			</ref>
			<ref id="r32">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Kerrien</surname>
							<given-names>S</given-names>
						</name>
						<name>
							<surname>Alam</surname>
							<given-names>FY</given-names>
						</name>	
						<name>
							<surname>Aranda</surname>
							<given-names>B</given-names>
						</name>	
						<name>
							<surname>Bancarz</surname>
							<given-names>I</given-names>
						</name>
						<name>
							<surname>Bridge</surname>
							<given-names>A</given-names>
						</name><etal/>																																																							
					</person-group>
					<year>2007</year>
					<article-title>IntAct—open source resource for molecular interaction data</article-title>
					<source>Nucleic Acids Res</source>
					<volume>35</volume>
					<fpage>D561</fpage>	
					<lpage>565</lpage>											
				</citation>
			</ref>
			<ref id="r33">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Kersey</surname>
							<given-names>PJ</given-names>
						</name>
						<name>
							<surname>Duarte</surname>
							<given-names>J</given-names>
						</name>	
						<name>
							<surname>Williams</surname>
							<given-names>A</given-names>
						</name>	
						<name>
							<surname>Karavidopoulou</surname>
							<given-names>Y</given-names>
						</name>
						<name>
							<surname>Birney</surname>
							<given-names>E</given-names>
						</name><etal/>																																																							
					</person-group>
					<year>2004</year>
					<article-title>The International Protein Index: an integrated database for proteomics experiments</article-title>
					<source>Proteomics</source>
					<volume>4</volume>
					<fpage>1985</fpage>	
					<lpage>1988</lpage>											
				</citation>
			</ref>
			<ref id="r34">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Kim</surname>
							<given-names>PM</given-names>
						</name>
						<name>
							<surname>Lu</surname>
							<given-names>LJ</given-names>
						</name>	
						<name>
							<surname>Xia</surname>
							<given-names>Y</given-names>
						</name>	
						<name>
							<surname>Gerstein</surname>
							<given-names>MB</given-names>
						</name>																																																												
					</person-group>
					<year>2006</year>
					<article-title>Relating three-dimensional structures to protein networks provides evolutionary insights</article-title>
					<source>Science</source>
					<volume>314</volume>
					<fpage>1938</fpage>	
					<lpage>1941</lpage>											
				</citation>
			</ref>
			<ref id="r35">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Krogan</surname>
							<given-names>NJ</given-names>
						</name>
						<name>
							<surname>Cagney</surname>
							<given-names>G</given-names>
						</name>	
						<name>
							<surname>Yu</surname>
							<given-names>H</given-names>
						</name>	
						<name>
							<surname>Zhong</surname>
							<given-names>G</given-names>
						</name>	
						<name>
							<surname>Guo</surname>
							<given-names>X</given-names>
						</name><etal/>																																																													
					</person-group>
					<year>2006</year>
					<article-title>Global landscape of protein complexes in the yeast Saccharomyces cerevisiae</article-title>
					<source>Nature</source>
					<volume>440</volume>
					<fpage>637</fpage>	
					<lpage>643</lpage>											
				</citation>
			</ref>
			<ref id="r36">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Lee</surname>
							<given-names>TI</given-names>
						</name>
						<name>
							<surname>Rinaldi</surname>
							<given-names>NJ</given-names>
						</name>	
						<name>
							<surname>Robert</surname>
							<given-names>F</given-names>
						</name>	
						<name>
							<surname>Odom</surname>
							<given-names>DT</given-names>
						</name>	
						<name>
							<surname>Bar-Joseph</surname>
							<given-names>Z</given-names>
						</name><etal/>																																																													
					</person-group>
					<year>2002</year>
					<article-title>Transcriptional regulatory networks in Saccharomyces cerevisiae</article-title>
					<source>Science (New York)</source>
					<volume>298</volume>
					<fpage>799</fpage>	
					<lpage>804</lpage>											
				</citation>
			</ref>
			<ref id="r37">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Li</surname>
							<given-names>S</given-names>
						</name>
						<name>
							<surname>Armstrong</surname>
							<given-names>CM</given-names>
						</name>	
						<name>
							<surname>Bertin</surname>
							<given-names>N</given-names>
						</name>	
						<name>
							<surname>Ge</surname>
							<given-names>H</given-names>
						</name>	
						<name>
							<surname>Milstein</surname>
							<given-names>S</given-names>
						</name><etal/>																																																													
					</person-group>
					<year>2004</year>
					<article-title>A map of the interactome network of the metazoan C. elegans</article-title>
					<source>Science</source>
					<volume>303</volume>
					<fpage>540</fpage>	
					<lpage>543</lpage>											
				</citation>
			</ref>
			<ref id="r38">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Maglott</surname>
							<given-names>D</given-names>
						</name>
						<name>
							<surname>Ostell</surname>
							<given-names>J</given-names>
						</name>	
						<name>
							<surname>Pruitt</surname>
							<given-names>KD</given-names>
						</name>	
						<name>
							<surname>Tatusova</surname>
							<given-names>T</given-names>
						</name>																																																																
					</person-group>
					<year>2007</year>
					<article-title>Entrez Gene: gene-centered information at NCBI</article-title>
					<source>Nucleic Acids Res</source>
					<volume>35</volume>
					<fpage>D26</fpage>	
					<lpage>31</lpage>											
				</citation>
			</ref>
			<ref id="r39">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Mathivanan</surname>
							<given-names>S</given-names>
						</name>
						<name>
							<surname>Periaswamy</surname>
							<given-names>B</given-names>
						</name>	
						<name>
							<surname>Gandhi</surname>
							<given-names>T</given-names>
						</name>	
						<name>
							<surname>Kandasamy</surname>
							<given-names>K</given-names>
						</name>
						<name>
							<surname>Suresh</surname>
							<given-names>S</given-names>
						</name><etal/>																																																																	
					</person-group>
					<year>2006</year>
					<article-title>An evaluation of human protein-protein interaction data in the public domain</article-title>
					<source>BMC Bioinformatics 7 Suppl</source>
					<volume>5</volume>
					<fpage>S19</fpage>													
				</citation>
			</ref>
			<ref id="r40">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Orchard</surname>
							<given-names>S</given-names>
						</name>
						<name>
							<surname>Kerrien</surname>
							<given-names>S</given-names>
						</name>	
						<name>
							<surname>Jones</surname>
							<given-names>P</given-names>
						</name>	
						<name>
							<surname>Ceol</surname>
							<given-names>A</given-names>
						</name>
						<name>
							<surname>Chatr-Aryamontri</surname>
							<given-names>A</given-names>
						</name><etal/>																																																																	
					</person-group>
					<year>2007</year>
					<article-title>Submit your interaction data the IMEx way: a step by step guide to trouble-free deposition </article-title>
					<source>Proteomics 7 Suppl</source>
					<volume>1</volume>
					<fpage>28</fpage>
					<lpage>34</lpage>																
				</citation>
			</ref>
			<ref id="r41">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Pagel</surname>
							<given-names>P</given-names>
						</name>
						<name>
							<surname>Kovac</surname>
							<given-names>S</given-names>
						</name>	
						<name>
							<surname>Oesterheld</surname>
							<given-names>M</given-names>
						</name>	
						<name>
							<surname>Brauner</surname>
							<given-names>B</given-names>
						</name>
						<name>
							<surname>Dunger-Kaltenbach</surname>
							<given-names>I</given-names>
						</name><etal/>																																																																	
					</person-group>
					<year>2005</year>
					<article-title>The MIPS mammalian protein-protein interaction database</article-title>
					<source>Bioinformatics</source>
					<volume>21</volume>
					<fpage>832</fpage>
					<lpage>834</lpage>																
				</citation>
			</ref>
			<ref id="r42">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Parrish</surname>
							<given-names>JR</given-names>
						</name>
						<name>
							<surname>Gulyas</surname>
							<given-names>KD</given-names>
						</name>	
						<name>
							<surname>Finley</surname>
							<given-names>RL</given-names>
							<suffix>Jr</suffix>
						</name>																																																																			
					</person-group>
					<year>2006</year>
					<article-title>Yeast two-hybrid contributions to interactome mapping</article-title>
					<source>Curr Opin Biotechnol</source>
					<volume>17</volume>
					<fpage>387</fpage>
					<lpage>393</lpage>																
				</citation>
			</ref>
			<ref id="r43">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Pellegrini</surname>
							<given-names>M</given-names>
						</name>
						<name>
							<surname>Marcotte</surname>
							<given-names>EM</given-names>
						</name>	
						<name>
							<surname>Thompson</surname>
							<given-names>MJ</given-names>
						</name>
						<name>
							<surname>Eisenberg</surname>
							<given-names>D</given-names>
						</name>	
						<name>
							<surname>Yeates</surname>
							<given-names>TO</given-names>
						</name>																																																																				
					</person-group>
					<year>1999</year>
					<article-title>Assigning protein functions by comparative genome analysis: protein phylogenetic profiles </article-title>
					<source>Proceedings of the National Academy of Sciences of the United States of America</source>
					<volume>96</volume>
					<fpage>4285</fpage>
					<lpage>4288</lpage>																
				</citation>
			</ref>
			<ref id="r44">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Peri</surname>
							<given-names>S</given-names>
						</name>
						<name>
							<surname>Navarro</surname>
							<given-names>JD</given-names>
						</name>	
						<name>
							<surname>Amanchy</surname>
							<given-names>R</given-names>
						</name>
						<name>
							<surname>Kristiansen</surname>
							<given-names>TZ</given-names>
						</name>	
						<name>
							<surname>Jonnalagadda</surname>
							<given-names>CK</given-names>
						</name><etal/>																																																																				
					</person-group>
					<year>2003</year>
					<article-title>Development of human protein reference database as an initial platform for approaching systems
biology in humans</article-title>
					<source>Genome Res</source>
					<volume>13</volume>
					<fpage>2363</fpage>
					<lpage>2371</lpage>																
				</citation>
			</ref>
			<ref id="r45">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Prieto</surname>
							<given-names>C</given-names>
						</name>
						<name>
							<surname>De Las</surname>
							<given-names>RJ</given-names>
						</name>																																																																								
					</person-group>
					<year>2006</year>
					<article-title>APID: Agile Protein Interaction DataAnalyzer</article-title>
					<source>Nucleic acids research</source>
					<volume>34</volume>
					<fpage>W298</fpage>
					<lpage>302</lpage>																
				</citation>
			</ref>
			<ref id="r46">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Puig</surname>
							<given-names>O</given-names>
						</name>
						<name>
							<surname>Caspary</surname>
							<given-names>F</given-names>
						</name>
						<name>
							<surname>Rigaut</surname>
							<given-names>G</given-names>
						</name>	
						<name>
							<surname>Rutz</surname>
							<given-names>B</given-names>
						</name>	
						<name>
							<surname>Bouveret</surname>
							<given-names>E</given-names>
						</name><etal/>																																																																									
					</person-group>
					<year>2001</year>
					<article-title>The tandem affinity purification (TAP) method: a general procedure of protein complex purification</article-title>
					<source>Methods</source>
					<volume>24</volume>
					<fpage>218</fpage>
					<lpage>229</lpage>																
				</citation>
			</ref>
			<ref id="r47">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Rual</surname>
							<given-names>JF</given-names>
						</name>
						<name>
							<surname>Venkatesan</surname>
							<given-names>K</given-names>
						</name>
						<name>
							<surname>Hao</surname>
							<given-names>T</given-names>
						</name>	
						<name>
							<surname>Hirozane</surname>
							<given-names>KT</given-names>
						</name>	
						<name>
							<surname>Dricot</surname>
							<given-names>A</given-names>
						</name><etal/>																																																																									
					</person-group>
					<year>2005</year>
					<article-title>Towards a proteome-scale map of the human protein-protein interaction network</article-title>
					<source>Nature</source>
					<volume>437</volume>
					<fpage>1173</fpage>
					<lpage>1178</lpage>																
				</citation>
			</ref>
			<ref id="r48">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Salwinski</surname>
							<given-names>L</given-names>
						</name>
						<name>
							<surname>Miller</surname>
							<given-names>CS</given-names>
						</name>
						<name>
							<surname>Smith</surname>
							<given-names>AJ</given-names>
						</name>	
						<name>
							<surname>Pettit</surname>
							<given-names>FK</given-names>
						</name>	
						<name>
							<surname>Bowie</surname>
							<given-names>JU</given-names>
						</name><etal/>																																																																									
					</person-group>
					<year>2004</year>
					<article-title>The Database of Interacting Proteins: 2004 update</article-title>
					<source>Nucleic Acids Res</source>
					<volume>32</volume>
					<fpage>D449</fpage>
					<lpage>451</lpage>																
				</citation>
			</ref>
			<ref id="r49">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Samanta</surname>
							<given-names>MP</given-names>
						</name>
						<name>
							<surname>Liang</surname>
							<given-names>S</given-names>
						</name>																																																																										
					</person-group>
					<year>2003</year>
					<article-title>Predicting protein functions from redundancies in large-scale protein interaction networks</article-title>
					<source>Proc Natl Acad Sci USA</source>
					<volume>100</volume>
					<fpage>12579</fpage>
					<lpage>12583</lpage>																
				</citation>
			</ref>
			<ref id="r50">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Schijvenaars</surname>
							<given-names>BJ</given-names>
						</name>
						<name>
							<surname>Mons</surname>
							<given-names>B</given-names>
						</name>
						<name>
							<surname>Weeber</surname>
							<given-names>M</given-names>
						</name>
						<name>
							<surname>Schuemie</surname>
							<given-names>MJ</given-names>
						</name>
						<name>
							<surname>van Mulligen</surname>
							<given-names>EM</given-names>
						</name><etal/>																																																																										
					</person-group>
					<year>2005</year>
					<article-title>Thesaurus-based disambiguation of gene symbols</article-title>
					<source>BMC bioinformatics</source>
					<volume>6</volume>
					<fpage>149</fpage>																				
				</citation>
			</ref>
			<ref id="r51">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Shannon</surname>
							<given-names>P</given-names>
						</name>
						<name>
							<surname>Markiel</surname>
							<given-names>A</given-names>
						</name>
						<name>
							<surname>Ozier</surname>
							<given-names>O</given-names>
						</name>
						<name>
							<surname>Baliga</surname>
							<given-names>NS</given-names>
						</name>
						<name>
							<surname>Wang</surname>
							<given-names>JT</given-names>
						</name><etal/>																																																																										
					</person-group>
					<year>2003</year>
					<article-title>Cytoscape: a software environment for integrated models of biomolecular interaction networks</article-title>
					<source>Genome Res</source>
					<volume>13</volume>
					<fpage>2498</fpage>
					<lpage>2504</lpage>																				
				</citation>
			</ref>
			<ref id="r52">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Sharan</surname>
							<given-names>R</given-names>
						</name>
						<name>
							<surname>Ulitsky</surname>
							<given-names>I</given-names>
						</name>
						<name>
							<surname>Shamir</surname>
							<given-names>R</given-names>
						</name>																																																																												
					</person-group>
					<year>2007</year>
					<article-title>Networkbased prediction of protein function</article-title>
					<source>Mol Syst Biol</source>
					<volume>3</volume>
					<fpage>88</fpage>																							
				</citation>
			</ref>
			<ref id="r53">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Stark</surname>
							<given-names>C</given-names>
						</name>
						<name>
							<surname>Breitkreutz</surname>
							<given-names>BJ</given-names>
						</name>
						<name>
							<surname>Reguly</surname>
							<given-names>T</given-names>
						</name>
						<name>
							<surname>Boucher</surname>
							<given-names>L</given-names>
						</name>
						<name>
							<surname>Breitkreutz</surname>
							<given-names>A</given-names>
						</name><etal/>																																																																																
					</person-group>
					<year>2006</year>
					<article-title>BioGRID: a general repository for interaction datasets</article-title>
					<source>Nucleic Acids Res</source>
					<volume>34</volume>
					<fpage>D535</fpage>
					<lpage>539</lpage>																							
				</citation>
			</ref>
			<ref id="r54">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Stelzl</surname>
							<given-names>U</given-names>
						</name>
						<name>
							<surname>Worm</surname>
							<given-names>U</given-names>
						</name>
						<name>
							<surname>Lalowski</surname>
							<given-names>M</given-names>
						</name>
						<name>
							<surname>Haenig</surname>
							<given-names>C</given-names>
						</name>
						<name>
							<surname>Brembeck</surname>
							<given-names>FH</given-names>
						</name><etal/>																																																																																
					</person-group>
					<year>2005</year>
					<article-title>A human proteinprotein interaction network: a resource for annotating the proteome</article-title>
					<source>Cell</source>
					<volume>122</volume>
					<fpage>957</fpage>
					<lpage>968</lpage>																							
				</citation>
			</ref>			
			<ref id="r55">
				<citation citation-type="patent">
				<collab collab-type="assignee">The Uniprot Consortium</collab>
				<article-title>The Universal Protein Resource (UniProt)</article-title>
				<source>Nucleic acids research</source>
				<year>2007</year>
				<volume>35</volume>
				<fpage>D193</fpage>
				<lpage>197</lpage>
				</citation>
			</ref>
			<ref id="r56">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Uetz</surname>
							<given-names>P</given-names>
						</name>
						<name>
							<surname>Giot</surname>
							<given-names>L</given-names>
						</name>
						<name>
							<surname>Cagney</surname>
							<given-names>G</given-names>
						</name>
						<name>
							<surname>Mansfield</surname>
							<given-names>TA</given-names>
						</name>
						<name>
							<surname>Judson</surname>
							<given-names>RS</given-names>
						</name><etal/>																																																																																
					</person-group>
					<year>2000</year>
					<article-title>A comprehensive analysis of protein-protein interactions in Sac-charomyces cerevisiae</article-title>
					<source>Nature</source>
					<volume>403</volume>
					<fpage>623</fpage>
					<lpage>627</lpage>																							
				</citation>
			</ref>
			<ref id="r57">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>von</surname>
							<given-names>MC</given-names>
						</name>
						<name>
							<surname>Jensen</surname>
							<given-names>LJ</given-names>
						</name>
						<name>
							<surname>Kuhn</surname>
							<given-names>M</given-names>
						</name>
						<name>
							<surname>Chaffron</surname>
							<given-names>S</given-names>
						</name>
						<name>
							<surname>Doerks</surname>
							<given-names>T</given-names>
						</name><etal/>																																																																																
					</person-group>
					<year>2007</year>
					<article-title>STRING 7—recent developments in the integration and prediction of protein interactions</article-title>
					<source>Nucleic Acids Res</source>
					<volume>35</volume>
					<fpage>D358</fpage>
					<lpage>362</lpage>																							
				</citation>
			</ref>
			<ref id="r58">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Wain</surname>
							<given-names>HM</given-names>
						</name>
						<name>
							<surname>Bruford</surname>
							<given-names>EA</given-names>
						</name>
						<name>
							<surname>Lovering</surname>
							<given-names>RC</given-names>
						</name>
						<name>
							<surname>Lush</surname>
							<given-names>MJ</given-names>
						</name>
						<name>
							<surname>Wright</surname>
							<given-names>MW</given-names>
						</name><etal/>																																																																																
					</person-group>
					<year>2002</year>
					<article-title>Guidelines for human gene nomenclature</article-title>
					<source>Genomics</source>
					<volume>79</volume>
					<fpage>464</fpage>
					<lpage>470</lpage>																							
				</citation>
			</ref>
			<ref id="r59">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Xu</surname>
							<given-names>H</given-names>
						</name>
						<name>
							<surname>Fan</surname>
							<given-names>JW</given-names>
						</name>
						<name>
							<surname>Hripcsak</surname>
							<given-names>G</given-names>
						</name>
						<name>
							<surname>Mendonca</surname>
							<given-names>EA</given-names>
						</name>
						<name>
							<surname>Markatou</surname>
							<given-names>M</given-names>
						</name><etal/>																																																																																
					</person-group>
					<year>2007</year>
					<article-title>Gene symbol disambiguation using knowledge-based profiles</article-title>
					<source>Bioinformatics (Oxford, England)</source>
					<volume>23</volume>
					<fpage>1015</fpage>
					<lpage>1022</lpage>																							
				</citation>
			</ref>
			<ref id="r60">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Yip</surname>
							<given-names>KY</given-names>
						</name>
						<name>
							<surname>Yu</surname>
							<given-names>H</given-names>
						</name>
						<name>
							<surname>Kim</surname>
							<given-names>PM</given-names>
						</name>
						<name>
							<surname>Schultz</surname>
							<given-names>M</given-names>
						</name>
						<name>
							<surname>Gerstein</surname>
							<given-names>M</given-names>
						</name>																																																																															
					</person-group>
					<year>2006</year>
					<article-title>The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks</article-title>
					<source>Bioinformatics</source>
					<volume>22</volume>
					<fpage>2968</fpage>
					<lpage>2970</lpage>																							
				</citation>
			</ref>
			<ref id="r61">
					<citation citation-type="journal">
					<person-group>
						<name>
							<surname>Yu</surname>
							<given-names>H</given-names>
						</name>
						<name>
							<surname>Luscombe</surname>
							<given-names>NM</given-names>
						</name>
						<name>
							<surname>Lu</surname>
							<given-names>HX</given-names>
						</name>
						<name>
							<surname>Zhu</surname>
							<given-names>X</given-names>
						</name>
						<name>
							<surname>Xia</surname>
							<given-names>Y</given-names>
						</name><etal/>																																																																															
					</person-group>
					<year>2004</year>
					<article-title>Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs</article-title>
					<source>Genome Res</source>
					<volume>14</volume>
					<fpage>1107</fpage>
					<lpage>1118</lpage>																							
				</citation>
			</ref>
</ref-list>
 		<app-group>
			<app>
				<title>Additional Files</title>
				<fig id="g5">
					<label>Additional file 1</label>
					<caption>
						<title>The relational database design of the PIANA database.</title>
						<p>Information is kept in four types of tables: 1) biological entity tables (protein and interaction); 2) biological entity identifiers (UniProt entry, gene names, …); 3) biological entity information (protein features tables and interaction properties). All information is linked to the internal identifier of PIANA: proteinID (an index to all pairs [sequence, tax_id]).</p>
					</caption>
					<graphic xlink:href="JPB-01-166-g005.tif"/>
				</fig>	
				<fig id="g6">
					<label>Additional file 2</label>
					<caption>
						<title>The relational database design of the PIANA database.</title>
						<p>For each external repository used to populate the PIANA database, the version and the file used are shown.</p>
					</caption>
					<graphic xlink:href="JPB-01-166-g006.tif"/>
				</fig>	
				<fig id="g7">
					<label>Additional file 3</label>
					<caption>
						<title>Overlaps between repositories of protein sequences.</title>
						<p>The overlap between UniProt Swiss-Prot, UniProt TrEMBL, and NCBI genpept is shown. The percentage over the total number of sequences in PIANA is shown in parenthesis. Two sequences must be identical in order to be considered a positive overlap. A table with overlaps between the NCBI non-redundant database (nr) and the three other databases is also provided.</p>
					</caption>
					<graphic xlink:href="JPB-01-166-g007.tif"/>
				</fig>
				<fig id="g8">
					<label>Additional file 4</label>
					<caption>
						<title>Function prediction based on common interaction partners in the interaction network from the Database of Interacting Proteins.</title>
						<p>The percentage of shared GO terms is shown for each number of common interaction partners range (Figure 4A). We observed that the accuracy when using a partial subset of interactions is similar to that obtained by using the integrated network of interactions. However, the coverage provided by the partial set of interactions is much lower (Figure 4B).</p>
					</caption>
					<graphic xlink:href="JPB-01-166-g008.tif"/>
				</fig>
			</app>
		</app-group>		
	</back>
<floats-wrap >
	<table-wrap position="float" id="t1">
	<label>Table 1.</label>
  			<caption>
  				<title>Protein identifiers statistics.</title>
				<p>Summary of the most relevant protein identifier types, calculated from a total of 6,476,028 distinct sequenceIDs in the database. Columns are: identifier type, number of distinct identifiers, the proportion of proteinIDs with respect to external identifier correspondences, the proportion of external identifiers with respect to proteinIDs, and the percentage of proteinIDs covered by the external identifier. Primary gene symbols are those gene symbols that have been established as the official gene name by nomenclature authorities such as HUGO (Wain et al., 2002) or FlyBase (Crosby et al., 2007).</p>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left">Identifier Type</th>
            <th align="left">Number of distinct identifiers</th>
            <th align="left">External Identifier: Piana proteinID</th>
			<th align="left">Piana proteinID : External Identifier</th>
			<th align="left">% proteinIDs covered</th>			
         </tr>
      </thead>
      <tbody>
         <tr>
            <td rowspan="2">Uniprot Accession</td>
            <td rowspan="2">4639397</td>
            <td>1:1</td>
			<td>99.74</td>
			<td>1:1</td>
			<td>96.98</td>
            <td rowspan="2">68%</td>           
         </tr>
         <tr>
            <td>1:&gt;2</td>
            <td>0.07</td>
            <td>1:&gt;2</td>
			<td>0.98</td>            		
         </tr>
         <tr>
            <td rowspan="2">NCBI Accession</td>
            <td rowspan="2">10760685</td>
            <td>1:1</td>
			<td>98.44</td>
            <td>1:1</td>
            <td>24.85</td>	
			<td rowspan="2">96%</td>		
         </tr>
         <tr>
            <td>1:&gt;2</td>
            <td>0.83</td>
            <td>1:&gt;2</td>
			<td>26.78</td>           			
         </tr>
		 <tr>
            <td rowspan="2">NCBI geneID</td>
            <td rowspan="2">2416561</td>
            <td>1:1</td>
			<td>90.77</td>
            <td>1:1</td>
            <td>98.41</td>
			<td rowspan="2">42%</td>			
         </tr>
		  <tr>
            <td>1:&gt;2</td>
            <td>1.87</td>
            <td>1:&gt;2</td>
			<td>0.17</td>           			
         </tr>
		 <tr>
            <td rowspan="2">Gene Symbol</td>
            <td rowspan="2">4143090</td>
            <td>1:1</td>
			<td>81.10</td>
            <td>1:1</td>
            <td>79.58</td>
			<td rowspan="2">80%</td>			
         </tr>
		  <tr>
            <td>1:&gt;2</td>
            <td>3.95</td>
            <td>1:&gt;2</td>
			<td>6.57</td>           			
         </tr>
		 <tr>
            <td rowspan="2">Primary Gene Symbol</td>
            <td rowspan="2">1207358</td>
            <td>1:1</td>
			<td>90.34</td>
            <td>1:1</td>
            <td>97.70</td>
			<td rowspan="2">42%</td>			
         </tr>
		  <tr>
            <td>1:&gt;2</td>
            <td>5.27</td>
            <td>1:&gt;2</td>
			<td>0.13</td>           			
         </tr>
     </tbody>
 	  </table>
 	</table-wrap>
	<table-wrap position="float" id="t2">
	<label>Table 2.</label>
  			<caption>
  				<title>Number of protein interactions per species.</title>
				<p>The number of interactions and proteins with at least one known interaction are shown for species with more than 2000 interactions.</p>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left">Species</th>
            <th align="left">Number of interactions</th>
            <th align="left">Number of proteins</th>						
         </tr>
      </thead>
      <tbody>
         <tr>
            <td>Saccharomyces cerevisiae</td>
            <td>111535</td>
            <td>6493</td>				
         </tr>
         <tr>
            <td>Homo sapiens</td>
            <td>110457</td>
            <td>36900</td>					
         </tr>
         <tr>
            <td>Drosophila melanogaster</td>
            <td>90562</td>
            <td>11605</td>					
         </tr>
         <tr>
            <td>Escherichia coli</td>
            <td>16097</td>
            <td>3467</td>				
         </tr>
		 <tr>
            <td>Caenorhabditis elegans</td>
            <td>9184</td>
            <td>3959</td>					
         </tr>
		 <tr>
            <td>Mus musculus</td>
            <td>4776</td>
            <td>3495</td>					
         </tr>
		 <tr>
            <td>Plasmodium falciparum 3D7</td>
            <td>2723</td>
            <td>1279</td>					
         </tr>
		 <tr>
            <td>Other species</td>
            <td>17021</td>
            <td>9724</td>					
         </tr>
     </tbody>
 	  </table>
 	</table-wrap>
		<table-wrap position="float" id="t3">
	<label>Table 3.</label>
  			<caption>
  				<title>Pairwise overlaps of protein interactions and proteins for seven interaction repositories.</title>
				<p>For each repository, cells show the overlap with other repositories in terms of (A) interactions and (B) proteins. In parenthesis, the percentage that the overlap represents over the repository from the pair with less interactions or proteins is shown. Unique interactions and proteins are those only appearing in that repository. This table reflects the overlaps in the interaction network unified by NCBI geneID identifiers.</p>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left" colspan="9">3(A)</th>           		
         </tr>
      </thead>
      <tbody>
         <tr>
            <td>Total</td>
            <td>Unique</td>
            <td></td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>		
         </tr>
         <tr>
            <td>104339</td>
            <td>42284</td>
            <td>IntAct</td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
         <tr>
            <td>97377</td>
            <td>46115</td>
            <td>DIP</td>
			<td>36867 (38%)</td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
         <tr>
            <td>77419</td>
            <td>16392</td>
            <td>MINT</td>
			<td>47119 (61%)</td>
            <td>37210 (48%)</td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>			
         </tr>
		 <tr>
           <td>38372</td>
            <td>10978</td>
            <td>HPRD</td>
			<td>8729 (23%)</td>
            <td>825 (2%)</td>
            <td>8925 (23%)</td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		 <tr>
           <td>833</td>
            <td>389</td>
            <td>MIPS</td>
			<td>87 (10%)</td>
            <td>34 (4%)</td>
            <td>107 (13%)</td>	
			<td>312 (38%)</td>
            <td></td>
            <td></td>				
         </tr>
		 <tr>
           <td>216370</td>
            <td>163700</td>
            <td>BioGrid</td>
			<td>25355 (22%)</td>
            <td>19870 (20%)</td>
            <td>24709 (32%)</td>	
			<td>20550 (54%)</td>
            <td>210 (25%)</td>
            <td></td>				
         </tr>
		  <tr>
           <td>62444</td>
            <td>24925</td>
            <td>BIND</td>
			<td>27269 (44%)</td>
            <td>28143 (45%)</td>
            <td>26406 (42%)</td>	
			<td>2839 (7%)</td>
            <td>121 (15%)</td>
            <td>16187 (26%)</td>				
         </tr>
		  <tr>
           	<td></td>
            <td></td>
            <td>Total</td>
			<td>IntAct 104339</td>
            <td>DIP 97377</td>
            <td>MINT 77419</td>	
			<td>HPRD 38372</td>
            <td>MIPS 833</td>
            <td>BioGrid 216370</td>				
         </tr>
		 <tr>
            <th align="left" colspan="9"></th>           		
         </tr>
		 <tr>
            <th align="left" colspan="9"></th>           		
         </tr>
		  <tr>
            <th align="left" colspan="9">3(B)</th>           		
         </tr>
		 <tr>
           	<td>Total</td>
            <td>Unique</td>
            <td></td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		 <tr>
           	<td>30998</td>
            <td>6273</td>
            <td>IntAct</td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>20794</td>
            <td>2761</td>
            <td>DIP</td>
			<td>15297 (74%)</td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>25993</td>
            <td>2008</td>
            <td>MINT</td>
			<td>21415 (82%)</td>
            <td>15611 (75%)</td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>9461</td>
            <td>783</td>
            <td>HPRD</td>
			<td>5445 (58%)</td>
            <td>834 (9%)</td>
            <td>783 (8%)</td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>848</td>
            <td>115</td>
            <td>MIPS</td>
			<td>532 (63%)</td>
            <td>219 (26%)</td>
            <td>550 (65%)</td>	
			<td>440 (52%)</td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>21036</td>
            <td>5337</td>
            <td>BioGrid</td>
			<td>11893 (57%)</td>
            <td>7862 (38%)</td>
            <td>11240 (53%)</td>	
			<td>7263 (77%)</td>
            <td>439 (52%)</td>
            <td></td>				
         </tr>
		 <tr>
           	<td>21011</td>
            <td>4781</td>
            <td>BIND</td>
			<td>13849 (66%)</td>
            <td>12567 (60%)</td>
            <td>13580 (65%)</td>	
			<td>2507 (27%)</td>
            <td>411 (49%)</td>
            <td>8360 (40%)</td>				
         </tr>
		 <tr>
           	<td></td>
            <td></td>
            <td>Total</td>
			<td>IntAct 30998</td>
            <td>DIP 20794</td>
            <td>MINT 25993</td>	
			<td>HPRD 9461</td>
            <td>MIPS 848</td>
            <td>BioGrid 21036</td>				
         </tr>
     </tbody>
 	  </table>
 	</table-wrap>
	<table-wrap position="float" id="t4">
	<label>Table 4.</label>
  			<caption>
  				<title>Pairwise overlaps of protein interactions and proteins for seven detection methods.</title>
				<p>For each detection method, cells show the overlap with other methods in terms of (A) interactions and (B) proteins. In parenthesis, the percentage that the overlap represents over the method from the pair with less interactions or proteins is shown. This table reflects the overlaps in the interaction network unified by NCBI geneID identifiers.</p>
  			</caption>
   <table frame="hsides" rules="groups">
<thead>
         <tr>
            <th align="left" colspan="8">4(A)</th>           		
         </tr>
      </thead>
      <tbody>
         <tr>
            <td>Total</td>
            <td></td>
            <td></td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>            	
         </tr>
         <tr>
            <td>126136</td>
            <td>Affinity methods</td>
            <td></td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>            			
         </tr>
         <tr>
            <td>103334</td>
            <td>Yeast two-hybrid</td>
            <td>9146 (9%)</td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>           	
         </tr>
         <tr>
            <td>72159</td>
            <td>Phenotypic</td>
            <td>1574 (2%)</td>
			<td>1155 (2%)</td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>            
         </tr>
		 <tr>
           <td>6525</td>
            <td>3D structure</td>
            <td>746 (11%)</td>
			<td>446 (7%)</td>
            <td>36 (1%) </td>
            <td></td>	
			<td></td>
            <td></td>            	
         </tr>
		 <tr>
           <td>4627</td>
            <td>Array technologies</td>
            <td>131 (3%)</td>
			<td>130 (3%)</td>
            <td>232 (5%)</td>
            <td>3 (0%)</td>	
			<td></td>
            <td></td>           	
         </tr>
		 <tr>
           <td>3914</td>
            <td>Dosage</td>
            <td>784 (20%)</td>
			<td>544 (13%)</td>
            <td>1036 (26%)</td>
            <td>24 (1%)</td>	
			<td>24 (1%)</td>
            <td></td>           
         </tr>
		  <tr>
           	<td>3104</td>
            <td>Cross Linking</td>
            <td>853 (27%)</td>
			<td>244 (8%)</td>
            <td>16 (1%)</td>
            <td>218 (7%)</td>				
            <td>10 (0%)</td>
            <td>21 (1%)</td>				
         </tr>
		  <tr>
            <td></td>
            <td>Total</td>
			<td>Affinity methods 126136</td>
            <td>Yeast two-hybrid 103334</td>
            <td>Phenotypic 72159</td>	
			<td>3D 6525</td>
            <td>Array techs 4627</td>
            <td>Dosage 3914</td>				
         </tr>
		  <tr>
            <th align="left" colspan="8"></th>           		
         </tr>
		 <tr>
            <th align="left" colspan="8"></th>           		
         </tr>
		  <tr>
            <th align="left" colspan="8">4(B)</th>           		
         </tr>
		 <tr>
           	<td>Total</td>            
            <td></td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		 <tr>
           	<td>20043</td>
            <td>Affinity methods</td>
            <td></td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>            
         </tr>
		  <tr>
           	<td>33246</td>
            <td>yeast two-hybrid</td>
            <td>10240 (51%)</td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>           		
         </tr>
		  <tr>
           	<td>5619</td>
            <td>Phenotypic</td>
            <td>3148 (56%)</td>
			<td>4704 (84%)</td>            
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>5737</td>
            <td>3D structure</td>
            <td>1352 (24%)</td>
			<td>989 (17%)</td>
            <td>202 (36%)</td>           
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>907</td>
            <td>Array technologies</td>
            <td>661 (73%)</td>
			<td>807 (89%)</td>
            <td>440 (49%)</td>
            <td>57 (6%)</td>				
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>2004</td>
            <td>Dosage</td>
            <td>1732 (86%)</td>
			<td>1861 (93%)</td>
            <td>1749 (87%)</td>
            <td>259 (13%)</td>	
			<td>259 (29%)</td>            
            <td></td>				
         </tr>
		 <tr>
           	<td>1083</td>
            <td>Cross Linking</td>
            <td>827 (76%)</td>
			<td>392 (36%)</td>
            <td>114 (11%)</td>
            <td>210 (19%)</td>	
			<td>36 (4%)</td>
            <td>83 (8%)</td>            	
         </tr>
		 <tr>
           	<td></td>
            <td>Total</td>
			<td>Affinity methods 20043</td>
            <td>Yeast  two-hybrid 33246</td>
            <td>Phenotypic 5619</td>	
			<td>3D 5737</td>
            <td>Array techs 907</td>
            <td>Dosage 2004</td>				
         </tr>
     </tbody>
 	  </table>
 	</table-wrap>
	<table-wrap position="float" id="t5">
	<label>Table 5.</label>
  			<caption>
  				<title>Commonalities in localization, molecular function and biological process of experimentally detected interacting proteins.</title>
				<p>This table shows the fraction of experimentally detected interacting proteins with the following properties: a) co-localized according to GO cellular component terms; b) same biological process according to GO biological process terms; and
c) same molecular function according to GO molecular function terms. An interaction was considered to respect the GO restriction if both interacting proteins shared a GO term when retrieving GO parents up to level 3 (Harris et al., 2004). In parenthesis, the percentage of interactions where both interacting proteins share a GO term is shown. Interactions were used for the study only if both proteins had at least one GO term assigned. Interactions where a protein interacts with itself were discarded for this study.</p>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left"></th>
            <th align="left">Number of interactions with GO information</th>
            <th align="left">Number of interactions with common GO term</th>						
         </tr>
      </thead>
      <tbody>
         <tr>
            <td>a) Cellular Component</td>
            <td>221213</td>
            <td>209254 (95%)</td>					
         </tr>
         <tr>
            <td>b) Biological Process</td>
            <td>266245</td>
            <td>227254 (85%)</td>						
         </tr>
         <tr>
            <td>c) Molecular Function</td>
            <td>270008</td>
            <td>141323 (52%)</td>					
         </tr>         
     </tbody>
 	  </table>
 	</table-wrap>
	<table-wrap position="float" id="t6">
	<label>Table 6.</label>
  			<caption>
  				<title>Properties of the yeast protein interaction network obtained by integrating multiple sources with PIANA.</title>
				<p>Yeast co-localization data was obtained from the work of Lee and coworkers (Lee et al., 2002). Yeast co-expression data was obtained from the work of Cho et al (Cho et al., 1998). Yeast essentiality data was obtained from the work by Giaever et al (Giaever et al., 2002). A yeast protein was considered a hub if it had 5 or more interaction partners. The interactions and proteins were included in the study for those cases in which information was available. Interactions where a protein interacts with itself were discarded for this study.</p>
  			</caption>
   <table frame="hsides" rules="groups">      
      <tbody>
         <tr>
            <td>Interaction partners are co-localized</td>
            <td>37684 out of 72661 (52%)</td>            		
         </tr>
         <tr>
            <td>Interaction partners are  co-expressed</td>
            <td>1524 out of 2576 (59%)</td>            
         </tr>
         <tr>
            <td>Essential yeast hubs</td>
            <td>930 out of 4229  (22%)</td>            		
         </tr>
         <tr>
            <td>Essential non yeast hubs</td>
            <td>44 out of 886 (5%)</td>           	
         </tr>		 
     </tbody>
 	  </table>
	  </table-wrap>
	  <table-wrap position="float" id="t7">
	<label>Table 7.</label>
  			<caption>
  				<title>Pairwise overlaps of protein interactions and proteins for four interaction prediction methods, two types of high-throughput methods (yeast two hybrid assays and affinity purification methods), and curated data (invitro and invivo).</title>
				<p>For each method, cells show the overlap with other methods in terms of (A) interactions and (B) proteins. In parenthesis, the percentage that the overlap represents over the method from the pair with less interactions or proteins is shown. This table reflects the overlaps in the interaction network unified by NCBI geneID identifiers.</p>
  			</caption>
  <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left" colspan="9">7(A)</th>           		
         </tr>
      </thead>
      <tbody>
         <tr>
            <td>Total</td>
            <td></td>
            <td></td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>		
         </tr>
         <tr>
            <td>33859</td>
            <td>Fusion</td>
            <td></td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
         <tr>
            <td>610177</td>
            <td>Phylogenetic</td>
            <td>24805 (73%)</td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
         <tr>
            <td>128216</td>
            <td>Distant patterns</td>
            <td>6 (0%)</td>
			<td>66 (0%)</td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>			
         </tr>
		 <tr>
           <td>3633328</td>
            <td>Structural interologs</td>
            <td>14 (0%)</td>
			<td>126 (0%)</td>
            <td>2375 (2%)</td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		 <tr>
           <td>103334</td>
            <td>Yeast two hybrid</td>
            <td>12 (0%)</td>
			<td>34 (0%)</td>
            <td>65 (0%)</td>
            <td>1469 (1%)</td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		 <tr>
           <td>126136</td>
            <td>Affinity methods</td>
            <td>13 (0%)</td>
			<td>143 (0%)</td>
            <td>121 (0%)</td>
            <td>5982 (5%)</td>	
			<td>9146 (9%)</td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           <td>27214</td>
            <td>In vitro literature</td>
            <td>0 (0%)</td>
			<td>14 (0%)</td>
            <td>244 (1%)</td>
            <td>3617 (13%)</td>	
			<td>5886 (22%)</td>
            <td>3360 (12%)</td>
            <td></td>				
         </tr>
		 <tr>
           <td>34447</td>
            <td>In vivo literature</td>
            <td>0 (0%)</td>
			<td>0 (0%)</td>
            <td>91 (0%)</td>
            <td>2583 (7%)</td>	
			<td>4948 (14%)</td>
            <td>3107 (9%)</td>
            <td>15028 (55%)</td>				
         </tr>
		  <tr>           	
            <td></td>
            <td>Total</td>
			<td>Fusion 33859</td>
            <td>Phylo-genetic 610177</td>
            <td>Distant patterns 128216</td>	
			<td>Structural interologs 3633328</td>
            <td>Yeast two-hybrid 103334</td>
            <td>Affinity methods 126136</td>	
			<td>In vitro literature 27214</td>			
         </tr>
		 <tr>
            <th align="left" colspan="9"></th>           		
         </tr>
		 <tr>
            <th align="left" colspan="9"></th>           		
         </tr>
		  <tr>
            <th align="left" colspan="9">7(B)</th>           		
         </tr>
		 <tr>
           	<td>Total</td>
            <td></td>
            <td></td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		 <tr>
           	<td>15075	</td>
            <td>Fusion</td>
            <td></td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>84686</td>
            <td>Phylogenetic</td>
            <td>7369 (49%)</td>
			<td></td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>12539</td>
            <td>Distant patterns</td>
            <td>116 (1%)</td>
			<td>584 (5%)</td>
            <td></td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>22277</td>
            <td>Structural interologs</td>
            <td>122 (1%)</td>
			<td>494 (2%)</td>
            <td>1164 (9%)</td>
            <td></td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>33246</td>
            <td>Yeast two hybrid</td>
            <td>117 (1%)</td>
			<td>251 (1%)</td>
            <td>2864 (23%)</td>
            <td>1663 (7%)</td>	
			<td></td>
            <td></td>
            <td></td>				
         </tr>
		  <tr>
           	<td>20043</td>
            <td>Affinity methods</td>
            <td>196 (1%)</td>
			<td>1546 (8%)</td>
            <td>2923 (23%)</td>
            <td>2466 (12%)</td>	
			<td>10240 (51%)</td>
            <td></td>
            <td></td>				
         </tr>
		 <tr>
           	<td>9026</td>
            <td>In vitro literature</td>
            <td>8 (0%)</td>
			<td>72 (1%)</td>
            <td>1672 (19%)</td>
            <td>1434 (16%)</td>	
			<td>5608 (62%)</td>
            <td>4126 (46%)</td>
            <td></td>				
         </tr>
		  <tr> 
           	<td>7805</td>
            <td>In vivo literature</td>
            <td>3 (0%)</td>
			<td>0 (0%)</td>
            <td>1502 (19%)</td>
            <td>1153 (15%)</td>	
			<td>4929 (63%)</td>
            <td>3560 (46%)</td>
            <td>6606 (85%)</td>				
         </tr>
		 <tr>
           	<td></td>           
            <td>Total</td>
			<td>Fusion 15075</td>
            <td>Phylo-genetic 84686</td>
            <td>Distant patterns 12539</td>	
			<td>Structural interologs 22277</td>
            <td>Yeast two-hybrid 33246</td>
            <td>Affinity methods 20043</td>	
			<td>In vitro literature 9026</td>			
         </tr>
     </tbody>
 	  </table>
 	</table-wrap>
	<table-wrap position="float" id="t8">
	<label>Table 8.</label>
  			<caption>
  				<title>Commonalities in localization, molecular function and biological process of predicted interacting proteins.</title>
				<p>This table shows the fraction of predicted interacting proteins with the following properties: colocalized according to GO cellular component terms; same biological process according to GO biological process terms; and same molecular function according to GO molecular function terms. An interaction was considered to respect the GO restriction if both proteins shared a GO term when retrieving GO parents up to level 3. Interactions were used for the study only if both proteins had at least one GO term assigned. Interactions where a protein interacts with itself were discarded from this study.</p>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left"></th>
            <th align="left"></th>
            <th align="left">Number of interactions with GO information</th>
			<th align="left">Number of interactions with common GO term</th>						
         </tr>
      </thead>
      <tbody>
         <tr>
            <td rowspan="3">Gene fusion</td>
            <td>Cellular Component</td>
            <td>5883</td>
			<td>5807 (99%)</td>           	
         </tr>
         <tr>
            <td>Biological Process</td>
            <td>24754</td>
            <td>20998 (85%)</td>					
         </tr>
         <tr>
            <td>Molecular Function</td>
            <td>27725</td>
            <td>10669 (39%)</td>				
         </tr>
         <tr>
            <td rowspan="3">Phylogenetic profiles</td>
            <td>Cellular Component</td>
            <td>92092</td>
			<td>90458 (98%)</td>		
         </tr>
		 <tr>
            <td>Biological Process</td>
            <td>258368</td>
            <td>182418 (71%)</td>				
         </tr>
		 <tr>
            <td>Molecular Function</td>
            <td>302512</td>
            <td>93023 (31%)</td>				
         </tr>
		 <tr>
            <td rowspan="3">Distant conservation</td>
            <td>Cellular Component</td>
            <td>89797</td>
			<td>61607 (69%)</td>			
         </tr>
		 <tr>
            <td>Biological Process</td>
            <td>105904</td>
            <td>70322 (66%)</td>				
         </tr>
		  <tr>
            <td>Molecular Function</td>
            <td>113024</td>
            <td>55371 (49%)</td>				
         </tr>
		 <tr>
            <td rowspan="3">Structural interologs</td>
            <td>Cellular Component</td>
            <td>236094</td>
			<td>205034 (87%)</td>			
         </tr>
		 <tr>
            <td>Biological Process</td>
            <td>420996</td>
			<td>340751 (81%)</td>			
         </tr>
		 <tr>
            <td>Molecular Function</td>
            <td>434449</td>
			<td>248239 (57%)</td>			
         </tr>
     </tbody>
 	  </table>
 	</table-wrap> 	
  </floats-wrap>
</article>
