<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD 2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
	<front>
		<journal-meta>
			<journal-id journal-id-type="nlm-ta">J Proteomics Bioinform</journal-id>
			<journal-id journal-id-type="publisher-id">opg</journal-id>						
			<journal-title>Journal of Proteomics &amp; Bioinformatics</journal-title>			 
			<issn pub-type="epub">0974-276X</issn>
			<publisher>
				<publisher-name>OMICS Publishing Group</publisher-name>
				<publisher-loc>India, USA</publisher-loc>
			</publisher>
		</journal-meta>
		<article-meta>			
			<article-id pub-id-type="publisher-id">000063</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Research Article</subject>
				</subj-group>
				<subj-group subj-group-type="Discipline">
					<subject>Biochemistry</subject>
				</subj-group>
				<subj-group subj-group-type="System Taxonomy">
					<subject>Proteomics</subject>
					<subject>Bioinformatics</subject>
					<subject>Genomics</subject>
					<subject>Transcriptomics</subject>
					<subject>Biomarkers</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>Protein Interaction Network. Double Exponential Model</article-title>
			</title-group>
			<contrib-group>
				<contrib contrib-type="author">
					<name>
						<surname>Pawlowski</surname>
						<given-names>Piotr H</given-names>
					</name>
					<xref ref-type="aff" rid="a1">1</xref>										
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Kaczanowski</surname>
						<given-names>Szymon</given-names>
					</name>
					<xref ref-type="aff" rid="a1">1</xref>
					<xref ref-type="aff" rid="a2">2</xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Zielenkiewicz</surname>
						<given-names>Piotr </given-names>
					</name>
					<xref ref-type="aff" rid="a1">1</xref>					
					<xref ref-type="aff" rid="a3">3</xref>
				</contrib>
			</contrib-group>
			<aff id="a1"><label>1</label>Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warszawa, Poland</aff>
			<aff id="a2"><label>2</label>Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, USA</aff>
			<aff id="a3"><label>3</label>Plant Molecular Biology Laboratory, Warsaw University, Warszawa, Poland</aff>
			<author-notes>
				<corresp id="cor1">To whom correspondence should be addressed: Piotr H. Pawlowski, Ph.D.,Fax: (48) 39121623;  E-mail: <email>piotrp@ibb.waw.pl</email></corresp>
			</author-notes>
			<pub-date pub-type="collection">
			     <month>05</month>
				 <year>2008</year>
			</pub-date>
			<pub-date pub-type="epub">
				<day>20</day>
				<month>05</month>
				<year>2008</year>
			</pub-date>			
			<volume>1</volume>
			<issue>2</issue>
			<fpage>061</fpage>
			<lpage>067</lpage>
			<history>
			<date date-type="received">
			     <day>13</day>
				 <month>04</month>
				 <year>2008</year>
			</date>
			<date date-type="accepted">
			      <day>14</day>
				  <month>05</month>
				  <year>2008</year>
			</date>
			</history>
			<permissions>			 
			<copyright-statement><bold>Copyright:</bold> &copy; 2008 Piotr HP, etal.</copyright-statement>
			<copyright-year>2008</copyright-year>
			<license license-type="open access">
			<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</p>
			</license>
			</permissions>			
			<abstract>
				<p>The proper theoretical description of the distribution of the node degree for yeast protein-protein interaction network was investigated to deal with the observed discrepancy between usually proposed models and the existing data. The power law or the generalized power law with exponential cut-off were shown to be inaccurate within a wide range of degree values. Proposed linear-combination-of-exponentialdecays- method exactly characterizing the distribution by the spectrum of decay constants revealed two separate parameter domains. A consequent hypothesis that the node degree distribution could follow the universal double exponential law was successfully verified by selected model comparison using the AIC criterion. BIND and DIP data for H. pylori, E. coli, S. cerevisiae, D. melanogaster, C. elegans and A. thaliana were used for this purpose. A linear change in the magnitude of the distribution components with proteome size was observed, manifesting the evolutional stability of the process of developing the protein interaction network. Proposed kinetic model of protein evolution, considering the two hypothetical protein classes, first, with a relatively rapid emerging rate and a short characteristic residence time, and the second one, with the opposite properties, analytically described the nature of bi-exponential pattern. The model presents a situation in which evolutionary conserved proteins increase their interactions due to specific kinetic conditions. Thus, we oppose the opinion that the majority of such interactions are biologically significant, and, therefore the older parts of interactome are more complex. We believe that our interactome results support the hypothesis of Stuart Kaufman, presented in his book "The Origin of Order", that random mutations and natural selection constitute the origin of order and complexity.</p>
			</abstract>
			<kwd-group>
				<kwd>Protein interaction network</kwd>
				<kwd>exponential model</kwd>
				<kwd>scale free network</kwd>
				<kwd>power law</kwd>
			</kwd-group>
			 <custom-meta-wrap>
				<custom-meta>
					<meta-name>citation</meta-name>
					<meta-value>Pawlowski PH, Kaczanowski S, Zielenkiewicz P. (2008)Protein Interaction Network. Double Exponential Model.</meta-value>
				</custom-meta>
			</custom-meta-wrap>
		</article-meta>
	</front>
	<body>
		<sec id="s1">
			<title>Introduction</title>
				<p>The degree of a node (or connectivity) is the number of edges that are adjacent to it. From the theoretical point of view, it is one of the basic measures characterizing the importance of the node in the network. Although the power law (PL) and the generalized power law supplemented with an exponential cut-off (GPL-EC) were widely popularized (<xref ref-type="bibr" rid="r18">Wagner 2001</xref>; <xref ref-type="bibr" rid="r7">Jeong et al. 2001</xref>) as the rules describing the distribution of the node degrees in proteinprotein interaction network, attempts at a more exact mathematical description are still being undertaken (<xref ref-type="bibr" rid="r15">Thomas et al. 2003</xref>; <xref ref-type="bibr" rid="r2">Berg et al. 2004</xref>). The reasons are both of practical and methodological nature. The first reason pertains to the still evolving databases, and the second one concerns the facts that the usually simple shape of arrangement of experimental points may be fitted in various manners giving at different theoretical assumptions quite similar results. According to the DIP data (see Materials and Methods) we could observe that the degree distribution of nodes of S. cerevisiae protein interaction network follows approximately a PL or a GPL-EC, but only for the degree values smaller than 10. For higher values of we saw a serious discrepancy between the theory and the experiment, already reported by others as an exponential decay (<xref ref-type="bibr" rid="r19">Wilhelm et al. 2003</xref>).</p>
				<p>There are additional indications (<xref ref-type="bibr" rid="r1">Barabasi and Oltvai 2004</xref>; <xref ref-type="bibr" rid="r14">PereiraLeal et al. 2005</xref>) that the biological network characteristics may contain an exponential component. The main aim of the present paper is to resolve whether by using a more complex exponentialtype model one can better describe the distribution of node degree in the protein interaction network. Developing the above idea we proposed to consider a node distribution as a linear combination of exponential decays A<sub>i</sub> exp (&minus;&lambda;<sub>i</sub> k ) , with amplitudes A<sub>i</sub> and decay constants &lambda;<sub>i</sub>  being positive values. Our method applied to S. cerevisiae DIP data revealed two separate domains of &lambda;<sub>i</sub> , with two characteristic values of the parameters related to the relatively &quot;fast&quot;, then &quot;slow&quot;, tendency of a distribution to decay along k-axis. This led to the natural concept that a double exponential curve a<sub>1</sub> exp(-d<sub>1</sub>k) + a<sub>2</sub> exp(-d<sub>2</sub> k) could be a better model of the node degree distribution than the standard or modified power law. This supposition was confirmed by using BIND or DIP data for 6 different organisms and the AIC criterion (see Materials and Methods). The obtained results led to analysis of the dependence of both exponential contributions to the total protein pool on proteome size, clearly indicating a linear trend. In consequence, this analysis helps us to better characterise the evolutionary mechanism leading to the observed double exponential distribution and points out its universal elements.</p>
				<p>To explain the bi-exponential character of node degree distribution, the kinetic model of protein network evolution was proposed. It relates the searched distribution formula to the parameters describing the rate of some creation and disruption processes, postulated as being important in formation of the net. According to our model, two basic types of proteins, marked "1" and "2", with a different dynamics of evolutional behaviour were assumed. They were shown to be good candidates, from a statistical point of view, for the low-connected nodes and hubs, respectively. </p>
				<p>The discussed results suggest that the process of evolution leads to a "biological" order in the interactome. Therefore, they support the hypothesis of Stuart <xref ref-type="bibr" rid="r8">Kaufman (1993)</xref> that the process of random mutation and selection always leads to complexity.</p>
		</sec>
		<sec sec-type="methods">
			<title>Materials and Methods</title>
				<p>Protein interaction network data for H. pylori ( = 724 nodes, N<sub>e</sub> = 1403 edges) and S. cerevisiae (analogous values 4135 and 7839) were taken from Coevolution and Self-organisation in Dynamical Networks data sets (COSIN, <ext-link ext-link-type="uri" xlink:href="www.cosin.org">http://www.cosin.org</ext-link>) derived from the Database of Interacting Proteins (DIP, <ext-link ext-link-type="uri" xlink:href="www.dip.doe-mbi.u cla.edu/">http://dip.doe-mbi.u cla.edu/</ext-link>). Data for E. coli (399 and 312), D. melanogaster (7910 and 23128), C. elegans (3227 and 5026) and A. thaliana (487 and 959) were taken from Biomolecular Interaction Network Database (BIND, <ext-link ext-link-type="uri" xlink:href="www.bind.ca/Action">http://www.bind.ca/Action</ext-link>). Only single protein-protein interaction records (without self-interaction) were analyzed. No noninteracting proteins were reported.</p>
				<p>According to our method of linear combination of exponential decays (LCED), a S. cerevisiae node degree distribution (histogram) was tentatively described by the sum:</p>
				<p><xref ref-type="fig" rid="e1">Equation 1</xref></p>
				<fig id="e1">
					<label>Equation 1</label>					 
					<graphic xlink:href="JPB-01-061-e001.tif"/>
				</fig>
				<p>where nk was a number of k -degree nodes and i _ max was the maximal value of a sum index i. <xref ref-type="fig" rid="e1">Equation 1</xref> was fitted to the  experimental data, at i_ max = 50 and gridded spectrum of decay constants &lambda;i  = {0, 0.025, 0.050, 0.075&hellip; 1.250}. The fit had been repeated 20 times to find the sets of amplitudes Ai , and then the respective averages &lt;Ai &gt;  were analysed. As a fitting algorithm the NonlinearRegress procedure (NRP) from Mathematica 4.1 <ext-link ext-link-type="uri" xlink:href="www.wolfram.com">http://www.wolfram.com</ext-link> was applied, with substitution Ai = (A'i)2 to guarantee only the positive value of amplitude.  Random starting conditions, A'i0 , were being selected within the range 0.5 &le; A'0i &le; 1.5.</p>
				<p>In the final modelling with a double exponential law (DEL),</p>
				<p>n<sub>K</sub> = a1 exp(-d<sub>1</sub>K) + a2 exp(-d<sub>2</sub>k)                         (2)</p>
				 <p>In the alternative modelling with a PL,</p>
				<p>n<sub>K</sub> = Ak<sup>&minus;&gamma;</sup>                                                        (3)</p>
				<p>and with a GPL-EC,</p>
				<p>n<sub>K</sub> =  A (K+ K<sub>0</sub>)<sup>&minus;&gamma;</sup> exp(-K/K<sub>c</sub>)    (4)</p>
				<p>The fits were performed in the range 1 &le; k &le; 15, using NRP once (without squared substitution of amplitude), and at default starting conditions (1.0).</p>
				<p>To rate the quality of the proposed models, corrected Akaike's Information Criterion (AICc) was adopted, defined as:</p>
				<p> AIC<sub>c</sub> =  Z ln( &sigma;<sup>2</sup> ) + 2m  + 2m(m + 1) / z- m -1             (5)</p>
				<p>where &sigma; <sup>2</sup> is the average squared residual for a given model, - the number of model m parameters, and .- the number of observations (<xref ref-type="bibr" rid="r3">Burnham and Anderson 2004</xref>). In the case of PL, m = 2. For GPL-EC and DEL,  m = 4. The number of analysed points was z = 15 in each competing model. Models with a smaller AICc value were being favoured.</p>
				<p>In the theoretical considerations, the total proteome size ( <sup>*</sup> NP )of the analysed species was assumed to be equal to the number of open reading frames, i.e., 1788 for H. pylori, 4285 for E. coli, 6307 for S. cerevisiae, 14218 for D. melanogaster and 18944 for C. elegans (<xref ref-type="bibr" rid="r10">Liu and Rost, 2001</xref>) or 28952 for family members of A. thaliana (<xref ref-type="bibr" rid="r6">Horan et al., 2005</xref>). Due to division by the scaling factor , where:</p>
				<p>SC= n<sub>0</sub> + n<sub>K &gt;0</sub> / N<sup>*</sup>P                                (6)</p>
				<p>describes the ratio of the extrapolated size of the analysed probe to the size of the total proteome, the DEL model amplitudes for accessed data, a1 and a2, were transformed into hypothetical values a<sub>1</sub><sup>*</sup>= a<sub>1</sub> / sc  and a<sub>2</sub><sup>*</sup>= a<sub>2</sub> / sc , for the total species proteome (see Appendix 1). In eq. 6 the unknown value n<sub>0</sub> was replaced by a<sub>1</sub> + a<sub>2</sub> . Then, the expected amount of proteins in considered contributions 1 and 2 to the total proteome was estimated by the sum of infinite geometrical series</p>
				<p><xref ref-type="fig" rid="e7">Equation 7</xref></p>
				<fig id="e7">
					<label>Equation 7</label>					 
					<graphic xlink:href="JPB-01-061-e007.tif"/>
				</fig>
				<p>leading to:</p>
				<p>N<sub>1</sub><sup>*</sup> = a<sub>1</sub><sup>*</sup> / 1 - exp( -d<sub>1</sub>)         (8)</p>
				<p> N<sub>2</sub><sup>*</sup> = a<sub>2</sub><sup>*</sup> / 1 - exp( -d<sub>2</sub>)        (9)</p>
				<p>In the estimation of the parameters of the model of protein network evolution (Appendix 2) eqs. A.2.8-11 were applied.</p> 	
		</sec>
		<sec id="s3">
			<title>Results</title>
				<p> It was observed that the distribution histogram of node degree of S. cerevisiae protein-protein interaction network exhibits a well-ordered pattern in the range1&le; k &le; 25 (<xref ref-type="fig" rid="g1">Fig. 1</xref>).</p>
				<fig id="g1">
					<label>Figure 1</label>
					<caption>
						<title>The distribution histogram (nk) of node degree (k) of S. cerevisiae protein-protein interaction network. Presented fits are: the upper line - a power law (PL): n<sub>k</sub> = 1.65.10<sup>3</sup> k <sup>-1.27</sup>; the bottom line - a generalized power law supplemented with an exponential cut-off (GPL-EC): n<sub>k</sub> 2.4 10<sup>3</sup> (k + 0.3) <sup>-0.5</sup> exp( k /3.0). Zero values are not shown.</title>
					</caption>
					<graphic xlink:href="JPB-01-061-g001.tif"/>
				</fig>
				<p>Above that range statistical fluctuations prevailed and quantization perturbed the continuity of analysed characteristics of the network. Attempts to describe the investigated distribution by a PL: A=(1.65 &plusmn; 0.02).10<sup>3</sup>, &gamma; = (1.27 &plusmn; 0.02)(upper line), or by a GPL-EC: A = (2.4 &plusmn; 0.7).10<sup>3</sup>,k<sub>0</sub> = (0.3 &plusmn; 0.7),&gamma; =(0.5 &plusmn; 0.3),k<sub>c</sub> =(3.0 &plusmn; 0.4)(bottom line) gave good results only in the range 1 &le; k &le; 10.The PL parameters obtained,&gamma; = 1.27 and A/N<sub>p</sub> = 0.40, are consistent with &gamma; = 1.32 and A/N<sub>p</sub> = 0.42 for the whole yeast interaction network (<xref ref-type="bibr" rid="r17">Yu et al. 2004</xref>). A  different picture is seen in case of the GPL-EC model. One can notice a  big discrepancy between our result and those for a Mean standard error  (s.e.) is also presented. small sample of 1870 nodes (<xref ref-type="bibr" rid="r13">Pastor-Satorras  et al., 2003</xref>), which may indicate the narrow area of applicability of  the cut-off formula.</p>
			<fig id="g2">
					<label>Figure 2</label>
					<caption>
						<title>Linear combination of exponential decays method (LCED) applied to the data for S. cerevisiae (<xref ref-type="fig" rid="g1">Figure 1</xref>). Two regions of decay constants ( ) spectrum with dominant amplitudes Ai at 7 =0.175 and 25 = 0.625 are clearly seen. Shown values are averages of adequate amplitudes of 20 multi-exponential fits.</title>
					</caption>
					<graphic xlink:href="JPB-01-061-g002.tif"/>
				</fig>	
				<p>The proposed LCED method (<xref ref-type="fig" rid="g2">Figure 2</xref>) revealed two narrow ranges of decay constants spectrum with dominant amplitudes at &lambda;<sub>7</sub> = 0.175 and &lambda;<sub>25</sub>  = 0.625 (characteristic values of node degree: 1/ &lambda;<sub>7</sub> = 5.7,  1/ &lambda; <sub>25</sub> = 1.6). Half-width of the observed peaks equals 0.025 and 0.050, respectively. An example of one in 20 fits performed to obtain the above spectrum is also presented (<xref ref-type="fig" rid="g3">Figure 3</xref>).</p>
				
				<fig id="g3">
					<label>Figure 3</label>
					<caption>
						<title>An example of one of 20 fits to the experimental data (<xref ref-type="fig" rid="g1">Figure 1</xref>) performed to obtain the decay constants spectrum (<xref ref-type="fig" rid="g2">Figure 2</xref>). The nk is the number of k-degree nodes. For clarity, the open circles denote group averages. Zero values are not shown.</title>
					</caption>
					<graphic xlink:href="JPB-01-061-g003.tif"/>
				</fig>	
				<p>As it is seen here, and in the case of other fits (data not shown), their qualities, especially in the range of values k > 10, are better than the estimation with standard or modified power law. As a result of the above, it was hypothesized that our combination, even reduced to a double exponential formula, could provide a better description of the node degree distribution than the considered power law type models. The examples of yeast and five other species were analysed for k &lt;15. Corresponding fits of proposed DEL models are presented in (<xref ref-type="fig" rid="g4">Figure 4</xref>)and <xref ref-type="table" rid="t1">Table 1</xref>.Their qualities are confirmed by AICc values, which favour bi-exponential approximation in 5/6 of the investigated cases (<xref ref-type="table" rid="t2">Table 2</xref>).</p>
			<fig id="g4">
					<label>Figure 4</label>
					<caption>
						<title>The distribution histogram (nk) of node degree (k) for different species. Continuous line is the fit of a double exponential law (DEL).</title>
	<p>A. Helicobacter pylori.</p>
	<p>B. Escherichia coli.</p>
	<p>C. Saccharomyces cerevisiae.</p>
	<p>D. Drosophila melanogaster.</p>
	<p>E. Caenorhabditis elegans.</p>
	<p>F. Arabidopsis thaliana</p>
	<p>Parameters of the DEL models are presented in <xref ref-type="table" rid="t1">Table 1</xref>. Their qualities are confirmed by AICc values, which favour bi-exponential approximation in 5/6 of the investigated cases (<xref ref-type="table" rid="t2">Table 2</xref>).</p>
					</caption>
					<graphic xlink:href="JPB-01-061-g004.tif"/>
				</fig>	
					<p> Plots of alternative fits are not shown. Some parameters of DEL models vary with proteome size. The size N1<sup>*</sup>  and N2<sup>*</sup>  of distinguished protein groups increases with the total number of proteins N<sup>*</sup><sub>P</sub>  (<xref ref-type="fig" rid="g5">Figure 5</xref>).</p>	
				<fig id="g5">
					<label>Figure 5</label>
					<caption>
						<title>The variation in the estimated number of proteins NF<sup>*</sup> and NS<sup>*</sup>  of a given protein class with proteome size NP<sup>*</sup>. The following data points represent: (from left) H. pylori, E. coli, S. cerevisiae, D. melanogaster, C. elegans and A. thaliana. A. Protein class F. B. Protein class S. Continuous line - linear trend.</title>
					</caption>
					<graphic xlink:href="JPB-01-061-g005.tif"/>
				</fig>						
				<p>There was no detected essential dependence of decay constant d1 and d2 on the proteome size.</p>
		</sec>
		<sec id="s4">
			<title>Discussion</title>
				<p>The results presented above confirm recent reports (<xref ref-type="bibr" rid="r5">Goldberg et al., 2005</xref>) suggesting the "break" of a power law in the global description of the protein interaction network. Actually, we can suggest that this "break" may be caused by the second exponential term in node degree distribution, which does not affect strongly the formula in the range of the node degree smaller than 10, but may be essential elsewhere.</p>
				<p>Initial inspection of the data shown in (<xref ref-type="fig" rid="g1">Figure 1</xref>) reveals that GPL-EC, the 4-parameter improvement of PL (bottom line), fits better than PL alone (upper line), but is still a very long way from perfect. Hence we decided to introduce a more general description.</p>
				<p>In accordance with our idea, protein interaction network consists of subpopulations of vertexes described by a similar statistical formula, but with different parameters. As a universal formula we choose exponential decay, which is consistent with the suggested model of network evolution (see Appendix 2).</p>
				<p>The proposed LCED method revealed the spectrum of decay constants and the magnitude of subpopulational contributions into the degree global distribution (<xref ref-type="fig" rid="g2">Figure 2</xref>). Two classes of nodes with the values of decay constant lying closely together were clearly distinguished. Good quality of fits (<xref ref-type="fig" rid="g3">Figure 3</xref>) testifies to the utility of the method and the acceptance of the formula.</p>
				<p>Reducing the huge number of parameters of a general model and taking into account the above observation, we propose to limit the number of decay components to only two items, indexed by 1 and 2. It did not weaken the fitting abilities for different species in the range 1 &le; k &le; 15 (<xref ref-type="fig" rid="g4">Figure 4</xref>), <xref ref-type="table" rid="t1">Table 1</xref>), which was confirmed by the AICc criterion. As seen in <xref ref-type="table" rid="t2">Table 2</xref>, the DEL models are the best in 5/6 of investigated cases and just a little worse (2nd place) than the winner in one case. Generally, they are more effective for networks with big proteomes (the PL model for a small probe of A. thaliana may be an exception) than for sets with a small protein number; PL or GPL-EC models may give similar results.</p>
				<p>Documented changes in the dimensions of the indexed protein classes with proteome size (<xref ref-type="fig" rid="g5">Figure 5</xref>) indicate a similar tendency for linear increase for the first (a) and the second (b) component of proteome. This way the ratio N<sub>1</sub><sup>*</sup>/ N<sub>2</sub><sup>*</sup> &asymp; 2.5 seems to be a universal constant for a wide group of organisms. The contribution of each class of proteins to the summary distribution was shown in the example of a yeast probe (<xref ref-type="fig" rid="g6">Figure 6</xref>).</p>
				<fig id="g6">
					<label>Figure 6</label>
					<caption>
						<title>The contribution of &quot;F&quot; and &quot;S&quot; protein class to the overall distribution. The insets n<sub>k1</sub> = a<sub>1</sub> exp ( -d<sub>1</sub>k ) and n<sub>k2</sub>= a<sub>2</sub> exp (- d<sub>2</sub>k ) were plotted (continuous lines 1 and 2) for the parameters of S. cerevisiae (<xref ref-type="table" rid="t1">Table 1</xref>). The broken line represents fitted summary distribution n<sub>k</sub> = n<sub>k1</sub> + n<sub>k2</sub></title>
					</caption>
					<graphic xlink:href="JPB-01-061-g006.tif"/>
				</fig>
				<p>As seen for small values of k , the two classes contribute to the global distribution. For k &gt; 10, the first class vanishes and the second class clearly dominates. The latter class may be related to so called hubs. It is worth stressing that the second class of proteins may bear only a few links, too.</p>	
				<p>It seems that the proposed double-exponential model is a simplification of a hypothetical multi-componential model describing the full spectrum of contributions from different classes of proteins. The analysed data indicate that there probably exists the third, small amplitude class of yeast proteins (not visible in <xref ref-type="fig" rid="g2">Figure 2</xref>), which may be related to the "super" hubs connecting hundreds of nodes; however, a "false positive" error cannot be excluded. Although the two protein classes clearly dominate, the analysed subpopulations do not form spikes along the decay constant axes, but have some definite width. We believe that more sophisticated analysis of discussed contributions, considering their continuous representation, should fully describe protein network statistics and reveal new properties of the proteome system.</p>
				<p>As mentioned beforehand, to specify our hypothesis, we proposed a simple mathematical model of protein network evolution (Appendix 2). The applied assumptions permit duplication events to occur even more often than the appearance of "new" types of protein encoding genes. Such behaviour is suggested by the observation that gene-copy number within a family is often changed during the process of speciation (<xref ref-type="bibr" rid="r4">Cheng et al., 2005</xref>; <xref ref-type="bibr" rid="r11">Ma and Gustafson, 2005</xref>; <xref ref-type="bibr" rid="r16">Ting et al., 2004</xref>). However, to avoid an enormous expansion of the system, we assumed that the speciation processes are no more frequent than deletion episodes effectively leading to the elimination of proteins. On the other hand, one can detect evolutionary conservation of genes present even in different kingdoms. Therefore, the probability of multiplication of old "proteins" is similar to the probability of multiplication of"young" proteins in a given genome. The facts mentioned above were "silently" included in the model. It relates amplitudes and decay constants to the emergence rates, q<sub>1</sub> and q<sub>1</sub> , effective elimination rates, &gamma;<sub>1</sub> and &gamma;<sub>2</sub> , and interaction gaining rates, ν<sub>1</sub> and ν<sub>2</sub> of the two classes of proteins, with different dynamics of evolutional performance. This difference in dynamics of the evolution of proteins manifests in the observed difference between"fast" and "slow" tendency in the variation of the node degree distribution along k-axis. In general the above parameters may differ for different evolutional pathways.</p>
				<p>According our model, the linear trend in <xref ref-type="fig" rid="g5">Figure 5</xref> may be related to the stable dynamics of evolution of investigated classes of proteins during the inter space progress. Indeed, with equations A.2.12-14 it is easy to show that the observed dependence calls for stability of the ratio .</p>
				<p>q<sub>1</sub> &gamma;<sub>2</sub> / q<sub>2</sub> &gamma;<sub>1</sub> This linear trend also suggests that for the total proteomes the corresponding amplitudes of calculated probability (frequency) of the occurrence of a node with a given degree may remain approximately constant. In a sense, we showed not a scale-free distribution but a scale-free evolution.</p>
				<p>As the analysed decay constants d1 and d2 do not exhibit a clear tendency to change, we may simply imagine that during evolution &gamma;<sub>1</sub> ,&gamma;<sub>2</sub> , ν<sub>1</sub> and ν<sub>2</sub> remain approximately constant (see eqs. A.2.10-11). According to this picture, q<sub>1</sub> and q<sub>2</sub> slowly evolve in a stable manner ( q<sub>1</sub> / q<sub>2</sub> = const ), governed, for example, by the varying amount of DNA, which accounts for the change in the global protein pool (see eq. A.2.12).</p>
				<p>To make our considerations more quantitative we estimated values q<sub>1</sub> , q<sub>2</sub> , &gamma;<sub>1</sub> and &gamma;<sub>2</sub> , assuming that ν<sub>1</sub> =ν<sub>2</sub> =0.1 [1/mln years] (<xref ref-type="bibr" rid="r2">Berg et al., 2004</xref>). It is seen that first class of proteins may be characterized by a relatively rapid emerging rate q1 and also relatively rapid elimination &gamma;<sub>1</sub> rate (or short characteristic residence time) when to compare with the second class of proteins.</p>
				<p>The proposed mathematical model of evolution suggests unexpected explanation of the observation of Barabasi and co-workers that more densely interconnected parts, "motives" of the interaction network, are more strictly evolutionary conserved (<xref ref-type="bibr" rid="r20">Wuchty et al., 2003</xref>). Intuitively, one can suppose that proteins belonging to such motives are evolutionary conserved because they are required for maintaining the connections in such motives. But the results of our simulations suggest an exactly opposite explanation: the old proteins (evolutionary conserved proteins) are more interconnected because they are simply old enough. This explanation although surprising for us, does in fact have sense. Since the majority of the proteins are not interacting (for example, protein-protein interaction network of yeast contains only approximately 30000 protein-protein interactions (according to the estimation of <xref ref-type="bibr" rid="r9">Kumar and Snyder, 2002</xref>) and more than 36000000 protein-protein pairs), and the protein interaction network is evolutionary conserved (see, for example, <xref ref-type="bibr" rid="r12">Matthews et al., 2001</xref>), it is likely that the majority of interactions have biological significance and that interactions appear gradually during the process of evolution. It is also likely that new "proteins" have no interactions or have a small number of interactions. During the process of evolution these proteins slowly gain new "useful" interactions. If they belong to the class 2, they may even gain many such interactions. This process leads to a well-ordered protein-protein interaction network in which proteins are not randomly connected and in which one can distinguish "modules" of interacting proteins.</p>
				<p>As we have already referred to in the Introduction, our results support the hypothesis of Stuart Kaufman that natural selection, random mutations and the process of evolution are the source of order in biological systems. This paper shows a random process of evolution leading to complex and non-random systems. Although it remains an open question whether the random process is rapid enough to lead to creation of structures as complex as multi-enzymatic complexes or flagelles, we believe that a right step in the proper direction has been taken.</p>
		</sec>		
	</body>
	<back>
		<ack>
			<p>We would like to thank Doktor Christophe Pakleza for his help in the application acknowledgement of Mathematica to numerical calculations. Fianancial support from the ministry of Science and Higher Education geant PBZ-MNil_2/1/2005 and postdoctoral fellowship from the foundation for polish Science to S.K are gratefully acknowledged.</p>
		</ack>
		<ref-list>
			<title>References</title>
					<ref id="r1">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Barabasi</surname>
								<given-names>AL</given-names>
							</name>
							<name>
								<surname>Oltvai</surname>
								<given-names>ZN</given-names>
							</name>
							</person-group>
							<year>2004</year>
							<article-title>Network biology: Understanding the cell's functional organization</article-title>
							<source>Nat Rev Genet</source>
							<volume>5</volume>
							<fpage>101</fpage>
							<lpage>113</lpage>
					</citation>
					</ref>
					<ref id="r2">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Berg</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Lassig</surname>
								<given-names>M</given-names>
							</name>
							<name>
								<surname>Wagner</surname>
								<given-names>A</given-names>
							</name>
							</person-group>
							<year>2004</year>
							<article-title>Structure and evolution of protein interaction networks: a statistical model for link   dynamics and gene duplications</article-title>
							<source>BMC Evolutionary Biology</source>
							<volume>4</volume>
							<fpage>51</fpage>
							<lpage>62</lpage>
					</citation>
					</ref>
					<ref id="r3">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Burnham</surname>
								<given-names>KP</given-names>
							</name>
							<name>
								<surname>Anderson</surname>
								<given-names>DR</given-names>
							</name>
							</person-group>
							<year>2004</year>
							<article-title>Multimodel Inference: understanding AIC and BIC in Model Selection</article-title>
							<source>Sociological Methods &amp; Research</source>
							<volume>33</volume>
							<fpage>261</fpage>
							<lpage>304</lpage>
					</citation>
					</ref>
					<ref id="r4">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Cheng</surname>
								<given-names>Z</given-names>
							</name>
							<name>
								<surname>Ventura</surname>
								<given-names>M</given-names>
							</name>
							<name>
								<surname>She</surname>
								<given-names>X</given-names>
							</name>
							<name>
								<surname>Khaitovich</surname>
								<given-names>P</given-names>
							</name>
							<name>
								<surname>Graves</surname>
								<given-names>T</given-names>
							</name><etal/>
							</person-group>
							<year>2005</year>
							<article-title>A genome-wide comparison of recent chimpanzee and human segmental duplications</article-title>
							<source>Nature</source>
							<volume>437</volume>
							<fpage>88</fpage>
							<lpage>93</lpage>
					</citation>
					</ref>
					<ref id="r5">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Goldberg</surname>
								<given-names>DS</given-names>
							</name>
							<name>
								<surname>Franklin</surname>
								<given-names>G</given-names>
							</name>
							<name>
								<surname>Roth</surname>
								<given-names>FP</given-names>
							</name>
							</person-group>
							<year>2005</year>
							<article-title>Breaking the Power Law:Improved Model Selection Reveals Increased Network Complexity.  In: Poster Session of the Ninth Annual International Conference on Research in Computational Molecular Biology</article-title>
							<source>RECOMB (2005) Cambridge MA</source>
					</citation>
					</ref>
					<ref id="r6">
					<citation citation-type="confproc">							
							<year>2005</year>							
							<source>Research in Computational Molecular Biology</source>
							<conf-name>RECOMB</conf-name>
							<conf-loc>Cambridge, MA</conf-loc>							
					</citation>
					</ref>
					<ref id="r7">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Horan</surname>
								<given-names>K</given-names>
							</name>
							<name>
								<surname>Lauricha</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Bailey-Serres</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Raikhel</surname>
								<given-names>N</given-names>
							</name>
							<name>
								<surname>Girke</surname>
								<given-names>T</given-names>
							</name>
							</person-group>
							<year>2005</year>
							<article-title>Genome cluster database. A sequence family analysis. Platform for Arabidopsis and rice						</article-title>
							<source>Plant Physiology</source>
							<volume>138</volume>
							<fpage>47</fpage>
							<lpage>54</lpage>
					</citation>
					</ref>
					<ref id="r8">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Jeong</surname>
								<given-names>H</given-names>
							</name>
							<name>
								<surname>Mason</surname>
								<given-names>S</given-names>
							</name>
							<name>
								<surname>Barabasi</surname>
								<given-names>AL</given-names>
							</name>
							<name>
								<surname>Oltvai</surname>
								<given-names>ZN</given-names>
							</name>
							</person-group>
							<year>2001</year>
							<article-title>Lethality and centrality in protein networks</article-title>
							<source>Nature</source>
							<volume>411</volume>
							<fpage>41</fpage>
							<lpage>42</lpage>
					</citation>
					</ref>
					<ref id="r9">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Kaufman</surname>
								<given-names>SA</given-names>
							</name>
							</person-group>
							<year>1993</year>
							<article-title>The origin of order self-organization and selection in evolution</article-title>
							<source>Oxford Univeristy Press</source>
					</citation>
					</ref>
					<ref id="r10">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Kumar</surname>
								<given-names>A</given-names>
							</name>
							<name>
								<surname>Snyder</surname>
								<given-names>M</given-names>
							</name>
							</person-group>
							<year>2002</year>
							<article-title>Protein complexes take the bait</article-title>
							<source>Nature</source>
							<volume>415</volume>
							<fpage>123</fpage>
							<lpage>124</lpage>
					</citation>
					</ref>
					<ref id="r11">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Liu</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Rost</surname>
								<given-names>B</given-names>
							</name>
							</person-group>
							<year>2001</year>
							<article-title>Comparing function and structure between entire proteomes</article-title>
							<source>Protein Science</source>
							<volume>10</volume>
							<fpage>1970</fpage>
							<lpage>1979</lpage>
					</citation>
					</ref>
					<ref id="r12">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Ma</surname>
								<given-names>XF</given-names>
							</name>
							<name>
								<surname>Gustafson</surname>
								<given-names>JP</given-names>
							</name>
							</person-group>
							<year>2005</year>
							<article-title>Genome evolution of allopolyploids:a process of cytological and genetic diploidization </article-title>
							<source>Cytogenet Genome Res</source>
							<volume>109</volume>
							<fpage>236</fpage>
							<lpage>249</lpage>
					</citation>
					</ref>
					<ref id="r13">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Matthews</surname>
								<given-names>LR</given-names>
							</name>
							<name>
								<surname>Vaglio</surname>
								<given-names>P</given-names>
							</name>
							<name>
								<surname>Reboul</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Ge</surname>
								<given-names>H</given-names>
							</name>
							<name>
								<surname>Davis</surname>
								<given-names>BP</given-names>
							</name><etal/>
							</person-group>
							<year>2001</year>
							<article-title>Identification of potential interaction networks using sequence-based searches for     conserved protein-protein interactions or "interologs"</article-title>
							<source>Genome Res</source>
							<volume>11</volume>
							<fpage>2120</fpage>
							<lpage>2126</lpage>
					</citation>
					</ref>
					<ref id="r14">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Pastor-Satorras</surname>
								<given-names>R</given-names>
							</name>
							<name>
								<surname>Smith</surname>
								<given-names>E</given-names>
							</name>
							<name>
								<surname>Sole</surname>
								<given-names>RV</given-names>
							</name>
							</person-group>
							<year>2003</year>
							<article-title>Evolving protein interaction networks through gene duplication</article-title>
							<source>J Theor Biol</source>
							<volume>222</volume>
							<fpage>199</fpage>
							<lpage>210</lpage>
					</citation>
					</ref>
					<ref id="r15">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Pereira-Leal</surname>
								<given-names>JB</given-names>
							</name>
							<name>
								<surname>Audit</surname>
								<given-names>B</given-names>
							</name>
							<name>
								<surname>Peregrin-Alvarez</surname>
								<given-names>JM</given-names>
							</name>
							<name>
								<surname>Ouzounis</surname>
								<given-names>CA</given-names>
							</name>
							</person-group>
							<year>2005</year>
							<article-title>An exponential core in the heart of the yeast protein interaction network</article-title>
							<source>Mol Biol Evol</source>
							<volume>22</volume>
							<fpage>421</fpage>
							<lpage>425</lpage>
					</citation>
					</ref>
					<ref id="r16">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Thomas</surname>
								<given-names>A</given-names>
							</name>
							<name>
								<surname>Cannings</surname>
								<given-names>R</given-names>
							</name>
							<name>
								<surname>Monk</surname>
								<given-names>NA</given-names>
							</name>
							<name>
								<surname>Cannings</surname>
								<given-names>C</given-names>
							</name>
							</person-group>
							<year>2003</year>
							<article-title>On the structure of protein-protein interaction networks</article-title>
							<source>Biochem Soc Trans</source>
							<volume>31</volume>
							<fpage>1491</fpage>
							<lpage>1496</lpage>
					</citation>
					</ref>
					<ref id="r17">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Ting</surname>
								<given-names>CT</given-names>
							</name>
							<name>
								<surname>Tsaur</surname>
								<given-names>SC</given-names>
							</name>
							<name>
								<surname>Sun</surname>
								<given-names>S</given-names>
							</name>
							<name>
								<surname>Browne</surname>
								<given-names>WE</given-names>
							</name>
							<name>
								<surname>Chen</surname>
								<given-names>YC</given-names>
							</name><etal/>
							</person-group>
							<year>2004</year>
							<article-title>Gene duplication and speciation in Drosophila: evidence from the Odysseus locus</article-title>
							<source>Proc Natl Acad Sci USA</source>
							<volume>101</volume>
							<fpage>12232</fpage>
							<lpage>12235</lpage>
					</citation>
					</ref>
					<ref id="r18">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Yu</surname>
								<given-names>H</given-names>
							</name>
							<name>
								<surname>Zhu</surname>
								<given-names>X</given-names>
							</name>
							<name>
								<surname>Greenbaum</surname>
								<given-names>D</given-names>
							</name>
							<name>
								<surname>Karro</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Gerstein</surname>
								<given-names>M</given-names>
							</name>
							</person-group>
							<year>2004</year>
							<article-title>TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics</article-title>
							<source>Nucleic Acids Res</source>
							<volume>32</volume>
							<fpage>328</fpage>
							<lpage>337</lpage>
					</citation>
					</ref>
					<ref id="r19">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Wagner</surname>
								<given-names>A</given-names>
							</name>
							</person-group>
							<year>2001</year>
							<article-title>The Yeast Protein Interaction Network Evolves Rapidly and Contains Few Redundant       Duplicate Genes</article-title>
							<source>Mol Biol Evol</source>
							<volume>18</volume>
							<fpage>1283</fpage>
							<lpage>1292</lpage>
					</citation>
					</ref>
					<ref id="r20">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Wilhelm</surname>
								<given-names>T</given-names>
							</name>
							<name>
								<surname>Nasheuer</surname>
								<given-names>HP</given-names>
							</name>
							<name>
								<surname>Huang</surname>
								<given-names>D</given-names>
							</name>
							</person-group>
							<year>2003</year>
							<article-title>Physical and functional modularity of the protein network in yeast</article-title>
							<source>Mol Cell Prot</source>
							<volume>2</volume>
							<fpage>292</fpage>
							<lpage>298</lpage>
					</citation>
					</ref>
				<ref id="r21">
					<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Wuchty</surname>
								<given-names>S</given-names>
							</name>
							<name>
								<surname>Oltvai</surname>
								<given-names>ZN</given-names>
							</name>
							<name>
								<surname>Barabasi</surname>
								<given-names>AL</given-names>
							</name>
							</person-group>
							<year>2003</year>
							<article-title>Evolutionary conservation of motif constituents in the yeast protein interaction   network</article-title>
							<source>Nat Genet</source>
							<volume>35</volume>
							<fpage>176</fpage>
							<lpage>179</lpage>
					</citation>
					</ref>
</ref-list>
	<app-group>
		<app>
			<title>Appendix 1</title>
			<p><xref ref-type="fig" rid="app1">Appendix 1</xref></p>
			<fig id="app1">
					<label>Appendix 1</label>
					 	<graphic xlink:href="JPB-01-061-A001.tif"/>
				</fig>	
		</app>
		<app>
			<title>Appendix 2</title>
			<p><xref ref-type="fig" rid="app2">Appendix 2</xref></p>
			<fig id="app2">
					<label>Appendix 2</label>
					 	<graphic xlink:href="JPB-01-061-A002.tif"/>
			</fig>
		</app>	 
	</app-group>			
	</back>
	 <floats-wrap >
	<table-wrap position="float" id="t1">
	<label>Table 1.</label>
  			<caption>
  				<title>Parameters of the fitted DEL models.</title>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
		 	<th align="left"></th>	
            <th align="left">a<sub>1</sub></th>
            <th align="left">a<sub>2</sub></th>
            <th align="left">d<sub>1</sub></th>
			<th align="left">d<sub>2</sub></th>				
         </tr>
      </thead>
      <tbody>
         <tr>
            <td><italic>H. pylori</italic></td>
            <td>507.409</td>
            <td>44.529</td>
			<td>0.743</td>
            <td>0.157</td>           
         </tr>
         <tr>
            <td><italic>E. coli</italic></td>
            <td>1166.020</td>
            <td>219.041</td>
			<td>1.898</td>
            <td>0.762</td>           		
         </tr>
         <tr>
            <td><italic>S. cerevisiae</italic></td>
            <td>2592.380</td>
            <td>197.464</td>
			<td>0.616</td>
            <td>0.170</td>         			
         </tr>
         <tr>
            <td><italic>D. melanogaster</italic></td>
            <td>5783.780</td>
            <td>837.777</td>
			<td>1.005</td>
            <td>0.187</td>            	
         </tr>
		 <tr>
            <td><italic>C. elegans</italic></td>
            <td>7307.120</td>
            <td>389.915</td>
			<td>1.564</td>
            <td>0.278</td>            		
         </tr>
		 <tr>
            <td><italic>A. thaliana</italic></td>
            <td>486.548</td>
            <td>68.659</td>
			<td>1.234</td>
            <td>0.220</td>            		
         </tr>
     </tbody>
 	  </table>
 	</table-wrap>
	<table-wrap position="float" id="t2">
	<label>Table 2.</label>
  			<caption>
  				<title>AICc ranking of the models<sup>1</sup>.</title>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left"></th>
            <th align="left">DEL</th>
            <th align="left">PL</th>
			<th align="left">GPL-EC</th>						
         </tr>
      </thead>
      <tbody>
         <tr>
            <td><italic>H. pylori</italic></td>
            <td>56.4 [1]</td>
            <td>73.2 [3]</td>
			<td>58.3 [2]</td>            
         </tr>
         <tr>
            <td><italic>E. coli</italic></td>
            <td>6.0 [1]</td>
            <td>43.0 [3]</td>
			<td>6.4 [2]</td>            		
         </tr>
         <tr>
            <td><italic>S. cerevisiae</italic></td>
            <td>94.6 [1]</td>
            <td>135.8 [3]</td>
			<td>112.4 [2]</td>            		
         </tr>
         <tr>
            <td><italic>D. melanogaster</italic></td>
            <td>97.4 [1]</td>
            <td>122.1 [3]</td>
			<td>109.4 [2]</td>            	
         </tr>
		 <tr>
            <td><italic>C. elegans</italic></td>
            <td>52.5 [1]</td>
            <td>66.5 [2]</td>
			<td>90.9 [3]</td>           
         </tr>
		 <tr>
            <td><italic>A. thaliana</italic></td>
            <td>38.9 [2]</td>
            <td>37.2 [1]</td>
			<td>130.5 [3]</td>            		
         </tr>		 
     </tbody>
 	  </table>
 	</table-wrap>
  </floats-wrap>
</article>
