Abstract
Whole genome sequences of the four strains of Streptococcus pneumoniae, encapsulated TIGR4, D39, G54 and
nonencapsulated R6 are considered for the comparative study on genome features, whole genome pairwise
alignment, gene role category, and virulence factors using relevant comparative genomics tools. The study of
capsular polysaccharide synthesizing genes reveals that many cps genes are unique to TIGR4, which shows the
high virulence nature of TIGR4. Further, the study on the other virulence factors such as pneumococcal surface
protein A, autolysin, hyaluronate lyase, pneumolysin, neuraminidase B, and pneumococcal surface antigen A of
TIGR4 are much related to those of the other three strains, and hence the virulence nature due to these factors
among four strains seems to be similar. But it differs from neuraminidase A, choline binding protein A and
immunoglobulin A1 protease. Also in the present study, 4 and 22 hypothetical protein sequences of TIGR4 and
R6 respectively are predicted as virulence factors. Among those sequences, it is found that 8 hypothetical
protein sequences with 7 different functional regions of R6 are related to other previously known virulence
factors of TIGR4 and R6 of S. pneumoniae.
Keywords
Comparative genomics; Streptococcus pneumoniae; TIGR4; D39; G54; R6; virulence factors; hypothetical
protein sequences
Abbreviations
CMR, comprehensive microbial resource; cps, capsular polysaccharide; PspA, pneumococcal surface
protein A; LytA, autolysin; Hyl, hyaluronate lyase; Ply, pneumolysin; NanA and NanB, neuraminidases A and B; CbpA,
choline binding protein A; PsaA, pneumococcal surface antigen A; IgA1, immunoglobulin A1 protease.
Introduction
The whole genome sequences of bacteria of closely related
species or strains are providing new avenues of investigation
for the further understanding of microbial diversity,
pathogenesis, host-parasite interaction, evolution, etc. through
a comparative analysis of their genomes.
Streptococcus
pneumoniae,
commonly pneumococcus (
Dowson, 2004;Gregory and DeSalle, 2005), a human pathogen, causes life
threatening diseases like pneumoniae, bacteremia, meningitis, sepsis, and otitis media. Genome sequencing of four
S.
pneumoniae strains, namely, TIGR4, D39, G54 and R6 have
been completed and genome sequencing of other 14 strains
are ongoing. G54 genome sequence is not yet added in
GenBank but it is inbuilt in Comprehensive Microbial Resource
(CMR) and D39 genome sequence is available in GenBank but not in CMR. TIGR4, a clinical isolate, is encapsulated
and highly virulent and many of its virulence factors have been studied (
Tettelin et al., 2001). D39, the encapsulated
and virulent strain (
Lanie et al., 2007), was used
by Avery, Macleod, and McCarty (
Avery et al., 1979) in
their landmark study on the role of DNA as the genetic
material. G54 is an encapsulated clinical strain type 19F
(
Dopazo et al., 2001). R6, a derivative of the serotype 2
clinical isolate D39, is nonencapsulated and avirulent. The
genes encoding many virulence factors are present in R6
genome in addition to the genes of capsular biosynthesis
(
Hoskins et al., 2001).
Many types of comparative studies (Tettelin et al., 2001; Lanie et al., 2007; Hoskins et al., 2001; AlonsoDeVelasco
et al., 1995; Brückner et al., 2004; Ferretti et al., 2004; Silva
et al., 2006) have already been carried out in Streptococcus strains on various aspects. The preliminary comparative
analysis (Jothi et al., 2007) of the whole genomes of
both the encapsulated TIGR4 and nonencapsulated R6
strains of S. pneumoniae provided some insights into the
high virulence nature of TIGR4. This present study summarizes
specifically how the whole genomes of the four
strains, namely, TIGR4, D39, G54 and R6 of S. pneumoniae differ from each other by their genome features, genome
diversity, gene role category and virulence factors. Comparison
of the virulence factors among these strains can
provide further insight into any strain uniqueness with relevance
to virulence nature and can stimulate new approaches
into disease prevention and treatment.
S. pneumoniae has two surface layers outside the plasma
membrane, namely, cell wall and capsule. The cell wall has
triple-layered peptidoglycan that holds the capsular and cell
wall polysaccharides, and also few proteins. The capsule
completely covers the inner structure of S. pneumoniae.
The cell wall polysaccharide is common to all serotypes of S. pneumoniae, but the chemical structure of the capsular
polysaccharide is serotype-specific (AlonsoDeVelasco et
al., 1995). After Avery’s experiment (Avery et al., 1979),
the capsule has long been recognized as the major virulence
factor of S. pneumoniae. Experimental proof for
this was provided by the difference in 50% lethal dose between
encapsulated and nonencapsulated strains. Encapsulated
strains were found (AlonsoDeVelasco et al., 1995)
to be at least 105 times more virulent than strains lacking the
capsule. Certain proteins in S. pneumoniae like pneumococcal
surface protein A (PspA), autolysin (LytA), hyaluronate
lyase (Hyl), pneumolysin (Ply), neuraminidases A and
B (NanA and NanB), choline binding protein A (CbpA),
pneumococcal surface antigen A (PsaA) and immunoglobulin
A1 (IgA1) protease are important virulence factors
(AlonsoDeVelasco et al., 1995; Jedrzejas, 2001; Rigden et
al., 2003) and these could be used as potential vaccine candidates. The preliminary identification of the surface proteins
and virulence factors of S. pneumoniae were done
by computational analysis of its genome sequences (Tettelin
and Hollingshead, 2004; Gregory and DeSalle, 2005; Tettelin
et al., 2001; Hoskins et al., 2001) and continued in several
subsequent studies (Brückner et al., 2004; Polissi et al., 1998; Wizemann et al., 2001). Strains of S. pneumoniae are now
resistant to commonly prescribed antibiotics, such as, penicillin,
macrolides and fluoroquinolones (Tettelin et al., 2001).
Because of the multidrug resistance nature of the S.
pneumoniae strains, we need a deeper understanding of
the virulence factors, for that the comparative genomics
approach may provide more insight.
At present, only 70 % of the genes in any given genome
can be predicted with reasonable confidence (Bork, 2000).
The remaining genes are either hypothetical (do not have
any known homolog) or conserved hypothetical (homologous
to genes of unknown function), because it is unclear
whether they encode actual proteins. The large quantity of
hypothetical protein sequences in completely sequenced
genomes of organisms makes their study an enormous task.
Characterization of these genes or proteins of unknown function
is generally recognized as an essential step towards
fully understanding the biology of the pathogenic organism
and for potential targets. Few studies (Galperin and Koonin,
2004; Brown, 2005; Sivashankari and Shanmughavel, 2006)
have already been carried out on hypothetical sequences.
In the present study, hypothetical protein sequences of the
strains TIGR4 and R6 of S. pneumoniae are analyzed to
find their virulence nature using VirulentPred. Among those
sequences, it is also analyzed how far the hypothetical protein
sequences are related to other previously known virulence
factors of TIGR4 and R6 of S. pneumoniae.
Materials and Methods
Various analysis of the whole genomes of the four strains,
namely, TIGR4, D39, G54 and R6 of S. pneumoniae like
the whole genome alignment, comparison of gene role categories,
finding the location of the virulence factors in the
genome and comparison of virulence regions are carried
out using the appropriate bioinformatics software tools.
Sequence Retrieval and Whole Genome Pairwise
Alignment
The complete genome sequences and the list of annotated
gene and protein sequences of TIGR4, D39 and R6
are retrieved from the NCBI – FTP server (ftp://
ftp.ncbi.nih.gov/genomes). We used the run-mummer3 program
available in the standalone MUMmer 3.20 (http://mummer.sourceforge.net/) and its built-in mummerplot for
obtaining the whole genome pairwise alignment of S.
pneumoniae strains TIGR4, D39, and R6 in different combinations.
MUMmer at Comprehensive Microbial Resource
(CMR) is used for the whole genome pairwise alignment of
the strains TIGR4, G54 and R6 in different combinations.
Comparison of the Role Category of Genes and Sequence
Analysis
The tool in CMR database (http://cmr.tigr.org/tigr-scripts/
CMR/ CmrHomePage.cgi), the role category piechart is
used for the genome features and functional role category
comparison of the whole genomes of TIGR4, G54 and R6.
Bacterial Annotation System (BASys - http://
wishart.biology.ualberta.ca/basys) - A web server for automated
bacterial genome annotation is used to know the role
category for three strains TIGR4, D39 and R6, whose whole
genomes are already available in it. From the prediction
server of the Center for Biological Sequence Analysis (CBS
- http://www.cbs.dtu.dk/services), the Genome Atlas is used
for the analysis of repeats of S. pneumoniae. The sequences
of various virulence factors, which are taken for
our study, have been verified by using the virulence factors
database http://www.mgc.ac.cn/VFs). BioEdit (http://
www.mbio.ncsu.edu/ BioEdit/bioedit.html) is used to compute
sequence composition of the genomes and genes.
Further, LALIGN (http://www.ch.embnet.org/software/
LALIGN_form.html) is used for the pairwise global alignment
of the gene sequences of the strains of S. pneumoniae.
Functional Annotation of Hypothetical Sequences
VirulentPred (http://bioinfo.icgeb.res.in/virulent) is a SVM
(Support Vector Machine) based method to predict bacterial
virulent protein sequences, which can be used to screen
virulent proteins in proteomes. In the present study the above
tool is used to analyse the hypothetical sequences of the
strains TIGR4 and R6 of S. pneumoniae. From the
proteome of TIRG4 and R6 of S. pneumoniae, all
unannotated hypothetical protein sequences are retrieved
using PERL script and those sequences are used as data
set for virulence factor prediction.
Results and Discussion
Comparative genomics and in silico studies have begun
to reveal insights into gene and protein functions of many
organisms. Here, we compare the genomes of the strains
TIGR4, D39, G54 and R6 of S. pneumoniae using the appropriate
tools for whole genome comparison and the results
are discussed below.
Comparison of the Genome Features of Four Strains
of S. Pneumoniae
Table 1 summarizes the general information about the
genomes including statistics of genes of these four strains,
obtained and compiled from CMR and NCBI web servers.
The genome sizes of these four strains range between 2
Mb and 2.16 Mb (c.f. Sl.No.2 of Table1). Among these
four strains, D39 is the smallest and TIGR4 is the largest
based on genome size. The nucleotide base (A, T, G, C, AT
and GC) compositions of four strains show that the strains
have low GC (~40%) genomes. The number of genes encoding
for proteins of these four strains ranges between
1914 and 2234 (c.f. Sl.No.3 of Table1). Of the total base
pairs of four genomes, approximately 85 - 87% of base pairs
(bps) are involved in coding and the remaining are non-coding
or junk DNA. The number of genes involved in RNA
synthesis (structural RNA, tRNA, and rRNA) is more or
less similar in all strains. Finally, by comparing the global
and local repeats of TIGR4 and R6 using CBS web server,
it is evident that both the repeats are high in TIGR4 than in
R6 (c.f. Sl.No.4 of Table1) and this may be related to the
duplicated regions of the chromosome (Gregory and DeSalle,
2005).
Comparison of Whole Genome Pairwise Alignments
The whole genome pairwise alignments of the strains
TIGR4, D39 and R6 of S. pneumoniae (whose sequence
data are available at NCBI) are obtained using the standalone
version of MUMmer and the results are plotted using its
built-in mummerplot. The whole genome pairwise alignments
of the strains TIGR4, G54 and R6 are obtained using
CMR, where these sequences are available, and the five
possible alignments are shown in Figure 1(a) – (e). Generally,
the genomes of prokaryotes are very dynamic, with
insertions, deletions, inversions, and translocations being
commonly observed among related species or even between
different strains of the same species (Gregory and DeSalle,
2005; Hughes, 2000). The net result is that the particular
complement of genes and their order along the chromosome
are not typically conserved over evolutionary time.
In some cases, genes that are grouped into operons in one
species may be dispersed throughout the genome in others.
We find similar results, while we analyzed the genomes of
four strains of S. pneumoniae. In particular, we find that
there exists a stability of the gene order in the genome pairs
TIGR4 vs. D39 and TIGR4 vs. R6 and they are shown by
fact that most of the points lie along the diagonal in Figures
1a and 1b. The results (Figures 1a and 1b) indicate that the
stability of gene order of D39 vs. R6 must also be relatively
high and it is shown in Figure 1c.
Table1: Comparison of the genome features of the strains, encapsulated TIGR4, D39 & G54 and nonencapsulated R6 of S. pneumoniae using CMR,
Bioedit and CBS tools NA – Not Available
|
|
Figure 1: Whole genome alignment of a) TIGR4 vs. D39; b) TIGR4 vs. R6; c) D39 vs. R6 using stand-alone MUMmer;
Whole genome alignment of d) TIGR4 vs. G54 and e) R6 vs. G54 using built-in MUMmer of CMR, which show plasticity and
stability in gene order between two strains.
|
This also confirms the fact that R6 is the derivative of D39. The whole genome
pairwise alignments of TIGR4 vs. G54 and that of R6 vs.
G54 do not show such a high degree of the stability of gene
order compared to the above results (for D39 strain) and
are shown in Figures 1d and 1e, respectively.
Many of the gene and protein sequences among these
strains are approximately the same and this is not surprising
as all the strains occupy the same niche in the human respiratory
system. The small differences might have arisen
after the divergence of these strains from other evolutionary
lineages for adaptations in their host. This increases
greatly in pathogens and appears to be associated with the
ability to infect eukaryotes, perhaps reflecting a mechanism
for evading host immune defenses and the unique genes
may be located in a plasticity zone.
Since G54 genome sequence is not available at NCBI
web server and D39 genome is not available at CMR server,
we could not get the whole genome alignment for D39 vs.
G54. However, we are able to predict the whole genome
pairwise alignment of D39 vs. G54, based on the earlier
result. As the Figures 1d and 1e are similar, it indicates that
the alignment of D39 vs. G54 must also possess similar structure.
This prediction may be confirmed if the whole genome
sequence of G54 is made available in NCBI or genome
sequence of D39 is included in CMR.
Comparison of Capsular Polysaccharide Synthesizing
Genes
We have compared the capsular polysaccharide (cps)
synthesizing genes of the strains TIGR4, D39, G54 and R6
of S. pneumoniae and the results are shown in Table 2.
There are 15 different cps genes in TIGR4, 7 in D39 and 9
in G54 and only one in R6. Their gene IDs, G+C percentage,
protein length, gene length and gene coordinates are
shown in Table 2. On comparison, it is estimated that 5 cps
genes of TIGR4 (gi|15900275-cps4A, gi|15900276-cps4B,
gi|15900278-cps4D, gi|15900046-cps-ptv & gi|15901666-cpsptv)
are related to that of D39 (gi|116516963-cps2A,
gi|116516159-cpsB, gi|116517023-cps2D, gi|116517199-cps
and gi|116516120-cps-ptv). All the cps genes of D39 are
present in TIGR4 except gi|116516773-cps2E and
gi|116516341-cps-ptv.
Between TIGR4 and G54, 6 cps genes are related
(gi|15900275-cps4A, gi|15900276-cps4B, gi|15900277-
cps4C, gi|15900278-cps4D, gi|15900046-cps-ptv &
gi|15901666-cps-ptv of TIGR4 with NT05SP0190-cps4A,
NT05SP0191-cps4B, NT05SP0192-cps4C, NT05SP0193-
cps4D, NT05SP2185-cps9E & NT05SP1650-cps7G of G54). Likewise, between D39 and G54, 5 cps genes are
related (gi|116516963-cps2A, gi|116516159-cpsB,
gi|116517023-cps2D, gi|116517199-cps and gi|116516120-
cps-ptv of D39 with NT05SP0190-cps4A, NT05SP0191-
cps4B, NT05SP0192-cps4C, NT05SP2185-cps9E &
NT05SP1650-cps7G of G54), but gi|116516773-cps2E and
gi|116516341-cps-ptv of D39 are not present in G54. Similarly,
it is interesting to note that the only cps gene of R6
(gi|15902136-capD), has 99.8 % identity with the gene
gi|15900046-cps-ptv of TIGR4, 100 % identity with the gene
gi|116517199-cps of D39 and 99.5 % identity with the gene
NT05SP2185 of G54. All the above results are in support
of the Avery’s statement (Avery et al., 1979) that the capsule
is responsible for pathogenecity.
From similar analysis, we have also noted that the genes,
gi|15900279-cps4E, gi|15900280-cps4F, gi|15900281-cps4G,
gi|15900282-cps4H, gi|15900286-cps4I, gi|15900287-cps4J,
gi|15900288-cps4K, gi|15900289-cps4L and gi|15900788-
cps-ptv are unique to TIGR4. Similarly, the genes
gi|116516773-cps2E and gi|116516341-cps-ptv are unique
to D39 strain. In the same way, the genes NT05SP0198,
NT05SP0202 and NT05SP1909 are unique to the strain G54.
But in R6, the only cps gene gi|15902136-capD is common
to all other strains (Table 2). As the TIGR4 strain has more
number of cps genes than other strains it indicates the high
virulence nature of TIGR4. Further, the results also explain
that the virulence nature is lesser in D39 and G54 strains,
and very less in R6 compared to TIGR4.
Though all the cps genes of TIGR4 are not present in
D39, G54 and R6 strains, they are also pathogenic. Therefore,
to know the other virulence factors in addition to cps
genes, we consider the other genes of the strains from the
gene role category aspect.
Comparison of the Role Category of Genes
Role category of genes of the different strains are compared
by using the two different tools, namely, i. CMR –
role category pie chart for TIGR4, G54 and R6 (Table 3)
and ii. Bacterial Annotation System (BASys) for the strains
TIGR4, D39 and R6, based on the availability of genome
sequences. The genes responsible for biosynthesis of various
proteins (Sl. Nos. 1-9 of Table 3) of TIGR4 are nearly
same as in G54 and R6, which suggests the basic complement
of proteins required for certain cellular processes. But
the genes responsible for the biosynthesis of some other
proteins (Sl.Nos.10-23 of Table 3) of TIGR4 are notably
different from that of G54 and R6. This suggests that, these
proteins are important for strain uniqueness and they may
be involved in variations in pathogenesis among the strains of S. pneumoniae.
Table2: Comparison of capsular polysaccharide (cps) synthesizing genes of four strains of S. pneumoniae. Each cps is
compared with all cps sequences of other three strains using LALIGN; all the cps sequences considered fall under the Role
Category 11 (Cell Envelope) of CMR.
|
The percentage values given for a particular
role category in Table 3 is specific to the gene involved
in that category only and does not represent the overall
gene percentage. For example, autolysin (SP1937) of TIGR4 is categorized into two role categories such as cell
envelope and cellular processes (Sl.Nos.11 and 12 of Table
3) and the percentage given is specific to the respective
categories.
Table3: Distribution of genes in the whole genomes of TIGR4, G54 and R6 strains of S. pneumoniae based on their gene role category. These gene role category
data are retrieved and compiled from CMR using its Gene Role Category Pie-chart.
|
Table4: Details of the number of hypothetical sequences in whole genomes, unique genes and virulence factors in unique genes of the strains TIGR4, G54 and R6
of S. pneumoniae. (D39 data are not included due to the non-availability of the genome sequence information of D39 strain in CMR tool).
|
The number of genes which are responsible for pathogenesis
in the strains TIGR4, G54 and R6 are manually
counted from CMR gene role category (sub role categories
pathogenesis, toxin production and resistance) and found to
be 101 (4.52 %), 47 (2.30 %) and 42 (1.89 %) respectively
(Sl.No.19 of Table 3). TIGR4 has many pathogenic factors
and it is highly virulent and G54 and R6 strains have approximately
50% of the pathogenic factors of TIGR4.
Mobile and extra chromosomal elements comprise a significant
fraction of the genome as with the 134 genes (5.99
%) in TIGR4, 71 (3.46 %) in G54 and 86 genes (3.87 %) in
R6 (Sl.No.18 of Table 3). Generally transposons encode
genes for antibiotic resistance (Gregory and DeSalle, 2005);
therefore from our results, it is evident that the antibiotic
resistance may be relatively higher in TIGR4 than the strains
G54 and R6.
From the results of the comparative study on TIGR4, D39
and R6, using BASys server, we find that most of the values
are more or less similar. But, there is a higher percentage
for unknown functions in the strains TIGR4, D39 and
G54, which indicates that the reason for the differences
may also be hidden in the unknown genes or proteins (data
not shown).
From Table 3, the number of hypothetical, conserved hypothetical,
unclassified and unknown genes of whole genomes
of the strains TIGR4, G54 and R6 are noted and is
shown in Table 4. Nearly 37 - 42 % of genes are of unknown
type and it shows that these sequences have to be
annotated and assigned functions of which some of them
may be responsible for the virulence nature. Using the multigenome
homology comparison tool, which is available at
CMR, the numbers of unique genes in TIGR4, G54 and R6
are found to be 288, 104 and 78, respectively (Table 4).
The unique genes of the strains TIGR4, G54 and R6 themselves
have many hypothetical, conserved hypothetical, unknown
and unclassified sequences and their percentage
ranges from 65 to 74, thus the other possible differences
among the strains may be known by studying the above
said gene sequences. As far as the virulence factors are
concerned, in the unique genes of the strain TIGR4, 3 capsular
polysaccharide biosynthesis proteins (Sp_0351 (cps4F),
Sp_0352 (cps4G) and Sp_0359 (cps4K)), 4 cell wall surface
anchor family proteins (Sp_0462, Sp_0463, Sp_0464
and Sp_1772), a PspC protein (Sp_1417), a NanA protein
(SP_1693) and a IgA1 protease (SP_2155) are there. In
the case of R6, it has three proteins of type 2 capsule locus
(Spr0315, Spr0317 and Spr0319) in its unique genes. But
the strain G54 does not have such virulence factors in its
unique genes (Table 4). The above result shows the high virulence nature of TIGR4 and it also suggests that those
virulence factors are specific to TIGR4 and R6. The above
differences might have arisen because of the species-specific
adaptation to their host particularly in the sake of defense
mechanism.
Comparison of Virulence Factors Other than Capsular
Polysaccharide Synthesizing Genes
In S. pneumoniae, the surface and cytoplasmic proteins
such as pneumococcal surface protein A (PspA), autolysin
(LytA), hyaluronate lyase (Hyl), pneumolysin (Ply), two
neuraminidases (NanA and NanB), choline binding protein
A (CbpA), pneumococcal surface antigen A (PsaA) and
immunoglobulin A1 (IgA1) protease are already stated as
the virulence factors (Jedrzejas, 2001; Rigden et al., 2003).
The comparative results of the above mentioned sequences
obtained from CMR, are given in Table 5. It provides more
insight into the virulence factors of the strains TIGR4, D39,
G54 and R6 of S. pneumoniae.
The virulence factors of TIGR4 are taken as reference
and are compared with all other related sequences of the
strains such as D39, G54 and R6, likewise the virulence
factors of D39 are taken as reference and are compared
with all the related sequences of the strains G54 and R6.
Similarly the virulence factors of G54 are taken as reference
and are compared with all the related sequences of
the remaining strain R6 using the pairwise sequence alignment
tool LALIGN, with default parameters (Alignment:
Global; Scoring matrix: BLOSUM50, Gap opening penalty:
-14 and extension penalty: -4), and all the results are comparatively
shown in Table 5.
PspA is located in the cell wall of pneumococci and
present in all S. pneumoniae strains (Jedrzejas, 2001). PspA
of TIGR4 has ~53-63% identities with D39, G54 and R6
(Table 5). When we compare PspA in D39 vs. G54 and
G54 vs. R6, the identities between those strains are nearly
63%. The above results indicate that nearly 50-60% virulence
nature of PspA of TIGR4 exist in other strains D39,
G54 and R6. But it is interesting to note that there is 100%
identity between the PspA sequences of D39 and R6, thus
the virulence nature of PspA is exactly the same.
Regarding LytA, Hyl, Ply, NanB and PsaA, all the four
strains of S. pneumoniae have above 90% identities, thus
the effect of the above mentioned five virulence factors is
also similar and it also reflects on G+C percentage, protein
length and gene length, but the location in their genomes
varies and the similarities and differences can be noticed
from the Table 5.
Table 5: Comparison of the common virulence factors namely, pneumococcal surface protein A (PspA), autolysin (LytA), hyaluronate lyase (Hyl), pneumolysin
(Ply), neuraminidase A (NanA), neuraminidase B (NanB), choline binding protein A (CbpA), pneumococcal surface antigen A (PsaA) and immunoglobulin A1 (IgA1)
protease of four strains of S. pneumoniae. LALIGN program is used to find identity between sequences.
NA - Not Available
* Nan-ptv: Neuraminidase, putative
**N.lyase-ptv: N-acetylneuraminate lyase, putative
| *** Role category functions |
| 1. |
Cell envelope; cellular process – pathogenesis |
| 2. |
Mobile and extra chromosomal element function: transposon function |
| 3. |
Cell envelope biosynthesis and degradation of surface polysaccharides and Lipopolysaccharides; Cellular processes:
pathogenesis |
| 4. |
Cell envelope: biosynthesis and degradation of murine sacculus and peptidoglycan |
| 5. |
Cellular processes: pathogenesis |
| 6. |
Cellular processes: toxin production and resistance; Cellular processes: pathogenesis |
| 7. |
Unclassified: role category not yet assigned |
| 8. |
Viral function: general |
| 9. |
Cell envelope; cellular process – pathogenesis cellular process: cell adhesion |
| 10. |
Cellular processes toxin production and resistance; Fatty acid and phospholipid metabolism: degradation |
| 11. |
Cell envelope biosynthesis and degradation of surface polysaccharides and Lipopolysaccharides |
| 12. |
Unclassified – role category not yet assigned |
| 13. |
protein fate: Degradation of proteins, peptides and glycopeptides |
| 14. |
Transport and binding proteins: Cations and iron carrying compounds; Cellular processes: pathogenesis; cellular
processes:cell adhesion |
| 15. |
protein fate: Degradation of proteins, peptides and glycopeptides; Cellular processes: pathogenesis |
| 16. |
protein fate: Degradation of proteins, peptides and glycopeptides |
|
All strains have different neuraminidase sequences except
G54 and R6 (~90% identity). In the case of CbpA and
IgA1 of the strain TIGR4, high percent identities (~73 and
87%) exist with D39 and R6 respectively, exactly identical
(100%) between D39 and R6. But very less identities (~40
and 35%) exist with G54 combinations. It seems that the
virulence nature based on cbpA and IgaA are similar among
the strains TIGR4, D39 and R6 and differs in G54.
From Table 5, it is interesting to note that all the virulence
factors of D39 are very similar to R6 (above 99% identities
except NanA), and it confirms the fact that the avirulent
strain R6 is the derivative of the strain D39 (Lanie et al.,
2007). Based on the role category, all TIGR4 virulence
factors come under pathogenesis related functions and it
also says that TIGR4 has high virulence nature.
Functional Annotation of Hypothetical Sequences
Relevant to the Virulence Factors
Prediction of virulence factors from the hypothetical sequences
of S. pneumoniae has implications on the identification
and characterization of the virulence mechanism. The
present study predicted using VirulentPred (Garg and Gupta,
2008) that 4 hypothetical sequences of TIGR4 and 22 of
R6, respectively, are virulence factors. All these sequences
are listed in Table 6. The prediction is based on protein features, such as, amino acid composition, di-peptide composition,
similarity search, higher order di-peptide composition,
PSSM and cascaded SVM module of the tool
VirulentPred. However, similar predictions are not possible
at present with D39 and G54 as the sequence information
of the latter is not fully available.
Among the 4 predicted virulence factors of TIGR4, only
one sequence (gi|15901572) is predicted in R6 as a hypothetical
protein (gi|15903627) and the functional region is
predicted as Plasmid_Txe (PF06769). This family contains
many hypothetical proteins and there is no homolog with
other mentioned virulence factors. But in R6, it is interesting
to note that among the 22 predicted virulence factors of
hypothetical protein sequences, 8 different sequences
(gi|15902372, gi|15903388, gi|15903446, gi|15902652,
gi|15902781, gi|15903694, gi|15903627 and gi|15903771) with
7 different functional regions which are related to the already
mentioned virulence factors of the strains R6 and
TIGR4. Those virulence factors are hyaluronidase, Immunoglobulin
A1 protease, capsular polysaccharide synthesis,
pneumolysin, neuraminidase and choline binding protein. The
above mentioned related sequences of TIGR4 and R6 except
gi|15903771 are compared in Table 7.
Table6: List of predicted 4 and 22 hypothetical protein sequences as virulence
factors from Tigr4 and R6 respectively.
|
The hypothetical protein sequence, gi|15903771 of R6 has
71 amino acids and its functional region is predicted as putative cell wall binding repeat (42-60) using Interproscan
(ID - PF01473). It is also found that the same functional
region is repeatedly present in the known virulence factors
such as pneumococcal surface protein A, autolysin and choline
binding proteins of the strains TIGR4 and R6. Since
many domain regions have been identified in the above
mentioned known virulence factors of TIGR4 and R6, the
regions are not explicitly given. But one can easily obtain
those regions using the tool Interproscan.
Conclusion
We have compared the virulence nature of the strains,
encapsulated TIGR4, D39, G54 and nonencapsulated R6 of
Streptococcus pneumoniae using comparative genomics tools. From the whole genome pairwise alignment, we found
that the stability of the gene order in the genomes of TIGR4
vs. D39, TIGR4 vs. R6 and D39 vs. R6 are relatively higher
than the genomes of TIGR4 vs. G54 and R6 vs. G54. We
are able to predict the possible structure of whole genome
pairwise alignment of D39 vs. G54 from the alignments of
TIGR4 vs. G54 and R6 vs. G54.
From the comparison on the capsular polysaccharide (cps)
synthesizing genes, we found that, TIGR4 strain has more
number of cps genes than other strains, which may indicate
the high virulence nature of TIGR4. Many cps genes are
unique to TIGR4, only few are in D39 & G54 and none in
R6, which shows the high virulence nature of TIGR4.
Table7: Comparison of the predicted and known virulence factors of hypothetical protein sequences with already known virulence factors of TIGR4 and R6 of S.
pneumoniae.
|
Further, the study on other virulence factors such as, pneumococcal surface protein A, autolysin, hyaluronate lyase,
pneumolysin, neuraminidase B and pneumococcal surface
antigen A of TIGR4 are closely related to those of the other
three strains, which shows that the virulence nature due to
these factors among four strains seems to be similar. But
the virulence factors neuraminidase A, choline binding protein
A and immunoglobulin A1 protease of TIGR4 differs
from other strains of S. pneumoniae, which shows that
these factors are responsible for the differences in virulence
nature among four strains.
From the gene role category comparison, many genes of
TIGR4 that are nearly same as in G54 and R6, suggests the
basic complement of proteins required for certain cellular
processes in the strains of S. pneumoniae. But many of
the genes of TIGR4 which are notably different from the
strains G54 and R6, suggest that these proteins are important
for strain uniqueness and they may be involved in variations
in pathogenesis. Since many hypothetical, conserved
hypothetical, unknown and unclassified proteins exist among
the dissimilar role categorized genes, it seems that many of
these genes of S. pneumoniae have to be annotated and
assigned functions of which some of them may also be responsible
for the virulence nature. Further, we have also
found that most of the virulence factors are same in D39
and R6 and hence also confirms the fact that R6 is the
derivative of the strain D39.
In order to annotate the uncharacterized protein sequences
(hypothetical and conserved hypothetical), the present study
predicted 4 and 22 hypothetical sequences of the strains
TIGR4 and R6 respectively of S. pneumoniae are of virulence
factors. Among those predicted virulence factors, 1
and 8 different hypothetical sequences of TIGR4 and R6
respectively contain conserved sequences of known virulence
factors such as hyaluronidase, immunoglobulin A1
protease, capsular polysaccharide synthesis, pneumolysin,
neuraminidase and choline binding protein. These sequences
also may be considered as desirable targets for therapeutics.
The effort is to narrow down the search of virulence
factors from all hypothetical sequences and this conclusion
will be a reality only when it is experimentally proved.
References
-
AlonsoDeVelasco E, Verheul AF, Verhoef J, Snippe H
(1995) Streptococcus pneumoniae: virulence factors,
pathogenesis, and vaccines. Microbiol Rev 59: 591-603. » CrossRef » PubMed » Google Scholar
-
Avery OT, MacLeod CM, McCarty M (1979) Studies
on the chemical nature of the substance inducing transformation
of pneumococcal types. Inductions of transformation by a deoxyribonucleic acid fraction isolated
from pneumococcus type III. J Exp Med 149: 297-326. » CrossRef » PubMed » Google Scholar
-
Bork P (2000) Powers and pitfalls in sequence analysis:
the 70% hurdle. Genome Res 10: 398-400. » CrossRef » PubMed » Google Scholar
-
Brown TA Jr, Ahn SJ, Frank RN, Chen YY, et al. (2005)
A hypothetical protein of Streptococcus mutans is critical
for biofilm formation. Infect Immun 73: 3147-3151. » CrossRef » PubMed
» Google Scholar
-
Brückner R, Nuhn M, Reichmann P, Weber B,
Hakenbeck R (2004) Mosaic genes and mosaic chromosomes
- genomic variation in Streptococcus
pneumoniae. Int J Med Microbiol 294: 157-168.
» CrossRef » PubMed » Google Scholar
-
Dopazo J, Mendoza A, Herrero J, Caldara F, Humbert
Y, et al. (2001) Annotated draft genomic sequence from
a Streptococcus pneumoniae type 19F clinical isolate.
Microb Drug Resist 7: 99-125.
» CrossRef » PubMed » Google Scholar
- Dowson CG (2004) What is a Pneumococcus? In
Tuomanen et al. (eds) The Pneumococcus ASM press
Washington pp 3-14.
- Ferretti JJ, Ajdic D, McShan WM (2004) Comparative
genomics of streptococcal species. Indian J Med Res
119: 1-6. » CrossRef » PubMed » Google Scholar
- Galperin MY, Koonin EV (2004) Conserved hypothetical
proteins: prioritization of targets for experimental
study. Nucleic Acids Res 32: 5452-5463. » CrossRef » PubMed » Google Scholar
- Garg A, Gupta D (2008) VirulentPred: a SVM based
prediction method for virulent proteins in bacterial pathogens.
BMC Bioinformatics 28: 9-62. » CrossRef » PubMed » Google Scholar
- Gregory TR, DeSalle R (2005) Comparative genomics
in prokaryotes. In Gregory (ed.) The evolution of the
genome, Elsevier/Academic Press. London pp 585-660.
- Hoskins J, Alborn WE Jr, Arnold J, Blaszczak LC, et al.
(2001) Genome of the bacterium Streptococcus
pneumoniae strain R6. J Bacteriol 183: 5709-5717. » CrossRef » PubMed
» Google Scholar
- Hughes D (2000) Evaluating genome dynamics: the constraints
on rearrangements within bacterial genomes. Genome
Biol 1: reviews 0006.1-0006.8. » CrossRef » PubMed » Google Scholar
- Jedrzejas MJ (2001) Pneumococcal virulence factors:
structure and function. Microbiol Mol Biol Rev 65: 187-
207. » CrossRef » PubMed » Google Scholar
- Jothi R, Manikandakumar K, Ganesan K, Parthasarathy
S (2007) On the analysis of the virulence nature of TIGR4 and R6 strains of Streptococcus pneumoniae using genome
comparison tools. J Chem Sci 119: 559-563. » CrossRef » Google Scholar
- Lanie JA, Wai LNG, Kazmierczak KM, Andrzejewski
TM, Davidsen TM, et al. (2007) Genome sequence of
Avery’s virulent serotype 2 strain D39 of Streptococcus
pneumoniae and comparison with that of unencapsulated
laboratory strain R6. J Bacteriol 189: 38-51. » CrossRef » PubMed » Google Scholar
- Polissi A, Pontiggia A, Feger G, Altieri M, Mottl H, et al.
(1998) Large-scale identification of virulence genes from Streptococcus pneumoniae. Infect Immun 66: 5620-
5629. » CrossRef
» PubMed » Google Scholar
- Rigden DJ, Galperin MY, Jedrzejas MJ (2003) Analysis
of structure and function of putative surface-exposed
proteins encoded in the Streptococcus pneumoniae genome:
A Bioinformatics-based approach to vaccine and
drug design. Crit Rev Biochem Mol Biol 38: 143-168.
» PubMed » Google Scholar
- Silva NA, McCluskey J, Jefferies JM, Hinds J, Smith A,
et al. (2006) Genomic diversity between strains of the
same serotype and multilocus sequence type among pneumococcal clinical isolates. Infect Immun 74: 3513-
3518. » CrossRef » PubMed » Google Scholar
- Sivashankari S, Shanmughavel P (2006) Functional annotation
of hypothetical proteins – A review.
Bioinformation 1: 335 -338. » CrossRef » PubMed » Google Scholar
- Tettelin H, Hollingshead SK (2004) Comparative
genomics of Streptococcus pneumoniae: Intrastrain
diversity and genome plasticity. In Tuomanen et al. (eds)
The Pneumococcus ASM press Washington pp 15-29. » CrossRef » Google Scholar
- Tettelin H, Nelson KE, Paulsen IT, Eisen JA, Read TD,
et al. (2001) Complete genome sequence of a virulent
isolate of Streptococcus pneumoniae. Science 293: 498-
506. » CrossRef » PubMed
» Google Scholar
- Wizemann TM, Heinrichs JH, Adamou JE, Erwin AL,
Kunsch C, et al. (2001) Use of a whole genome approach
to identify vaccine molecules affording protection against Streptococcus pneumoniae infection. Infect Immun
69: 1593-1598. » CrossRef » PubMed » Google Scholar