Home Join Contact
 

Research Article

Open Access
Computational Annotation for Hypothetical Proteins of Mycobacterium Tuberculosis
S.Anandakumar and P. Shanmughavel1 *
1Computational Biology and Bioinformatics Laboratory, Department of Bioinformatics,   Bharathiar University,
  Coimbatore – 641046, TamilNadu, India
*Corresponding author: Dr. P. Shanmughavel,
Email  : shanvel_99@yahoo.com
Received August 28, 2008; Accepted November 10, 2008; Published December 26, 2008
Citation: Anandakumar S, Shanmughavel P (2008) Computational Annotation for Hypothetical Proteins of MycobacteriumTuberculosis. J Comput Sci Syst Biol 1: 050-062. doi:10.4172/jcsb.1000004
 
Copyright: © 2008 Anandakumar S, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
 
Abstract
There is rising death of humans worldwide by reason of tuberculosis. The current sequencing of the Mycobacteriumtuberculosis genome holds assure for the development of new vaccines and the design of new drugs. In this view, the functions prediction of genomic sequences for hypothetical proteins will invigorate our knowledge with reference to the identification of new drugs for tuberculosis. There are various function prediction methods available based on the on the assumption. The process accurate annotation for genes in newly sequenced genomes currently has been based on sequence similarity. In this work about 250 hypothetical proteins of Mycobacteriumtuberculosis taken functions were predicted using Bioinformatics web tools, BLAST, INTERPROSCAN, PFAM and COGs.

Keywords
Tuberculosis; Hypothetical proteins; Sequence similarity; Bioinformatics web tools

Introduction
The current research on sequencing of the Mycobacterium tuberculosis genome holds assure for the development of new vaccines and the design of new drugs (Prachee Chakhaiyar and Hasnain, 2004) The functions for genomic sequences of hypothetical proteins are unknown because this is a protein whose being has been predicted (Edward Eisenstein et al, 2000). In depth learn of function prediction on such proteins will offer opportunity for novel applications and help the researchers to Identify new drug molecules for tuberculosis. Mycobacterium tuberculosis organism has totally 3887 number of proteins. In these proteins 1985 hypothetical proteins were present Out of the 250 hypothetical proteins taken for this work. All hypothetical proteins were analyzed for function prediction using Bioinformatics web tools such as BLAST, INTERPROSCAN, PFAM and COGs. The results indicates 100% confidence for only 86 proteins, with 75% confidence for 92 proteins and some proteins function could not be predicted with much confidence (unknown function).

Methodolgy
Complete genome sequence of pathogenic bacteria Mycobacterium tuberculosis sequences were downloaded from the PIR Database (http://pir.georgetown.edu/) and NCBI Database (www.ncbi.nlm.nih.gov/). In complete genome sequence of Mycobacterium tuberculosis, 1985 hypothetical proteins were present. Only 250 hypothetical proteins of genome sequence were analyzed and then downloaded from the site (http://www.ncbi.nih.gov/genomes/ lproks.cgi). Finally genomics sequences of each protein were submitted to functions prediction web tools such as NCBI-BLAST2 (Wendy Baker et al, 2000 ), INTERPROSCAN (Zdobnov and Rolf Apweiler, 2001), PFAM (Bateman et al, 2002) and COG (Roman et al, 2000). The confidence level can be measured on the basis of above tools.

Table1: Functional genomics of Mycobacterium tuberculosis.

Table2: Percentage of similarity.

(In 250 proteins, 100% confidence levels present in eighty-four proteins, 75% in Ninety-two proteins, 50% in fifty-six proteins, 25% in twelve proteins and 0% in six proteins).

1 If the given four tools indicate the same functions then the confidence level were to be 100 percent.
2 If the given three tools indicate the same functions other is different functions then the confidence level were to be 75 percent.
3 If the given two tools indicate the same functions other two given different functions then the confidence level were to be 50 percent.
4 If the given four tools indicate different functions then the confidence level were to be 25 percent.
5 If the given tool doesn’t indicate any functions then the confidence level were to be 0 percent.

Results and Discussion
There is rising death of humans worldwide by reason of tuberculosis (Smith et al, 2004). Central goal of Bioinformatics is recognized as the major area of research to determining protein functions from their genomic sequences and to develop personalized medicine. Functional annotations of genomic sequences for hypothetical proteins are of major importance in providing insights into their molecular functions and will help in the identification of new drugs for tuberculosis. Table 1 shows the functional genomics of Mycobacterium tuberculosis by using tools such as BLAST, INTERPROSCAN, PFAM and COG. Mycobacterium tuberculosis organism has totally 3887 number of proteins. In this 3887 proteins 1985 were hypothetical proteins from which 250 hypothetical proteins were retrieved for this study. Those hypothetical proteins were submitted to above tools, which help to determine the confidence level. Among 250 proteins, 244 proteins only were obtained the function such as DEHYDROGENASES/REDUCTASE, HYDROLASES, LUCIFERASES & METHYL TRANSFERASES were in more in number.

References
  1. Bateman A, et al. (2002) The Pfam protein families database. Nucleic Acids Res 30: 276-80.
    »
    CrossRef   » PubMed  »  Google Scholar


  2. Edward E, et al. (2000) Biological function made crystal clear — annotation of hypothetical proteins via structural genomics. Current Opinion in Biotechnology 11: 25- 30. » CrossRef  
    » PubMed  »  Google Scholar


  3. Prachee C, Hasnain SE (2004) Defining the Mandate of Tuberculosis Research in a Postgenomic Era. Medicinal principles and practice 13: 177-184. » CrossRef   » PubMed  »  Google Scholar

  4. Roman L, et al. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Research 28: 33-36. » CrossRef   » PubMed  »  Google Scholar

  5. Smith, Clare V, et al. (2004) TB drug discovery: addressing issues of persistence and resistance. Tuberculosis 84: 45-55. » CrossRef   » PubMed  »  Google Scholar

  6. Wendy B, et al. (2000) The EMBL Nucleotide Sequence Database. Nucleic Acids Research. 28: 19-23.» CrossRef   »  Google Scholar

  7. Zdobnov EM, Rolf A (2001) InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17: 847-848. » CrossRef   » PubMed  »  Google Scholar

  8. Pellegrini M, et al. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 96: 4285-4288. » CrossRef   » PubMed  »  Google Scholar
This Article
DOWNLOAD
» XML (73 KB)
» PDF (871 KB)
» Citation
CONTRIBUTE

SHARE

EXPLORE
Related Article at