Home Join Contact
 

Research Article

Open Access
Matrix Frequency Analysis of Oryza Sativa (japonica cultivar-group) Complete Genomes
K. Manikandakumar1,*, S. Muthu Kumaran2, R. Srikumar3
1Department of Physics, Bharathidasan University College (W), Orathanadu – 614 625, Tanjavore District,   Tamil Nadu, India
2Department of Physics, Nehru Memorial College, Puthanampatti – 621 007, Tiruchirappalli District,
  Tamil Nadu, India
3Department of Microbiology, Bharathidasan University College (W), Orathanadu – 614 625,
  Tanjavore District, Tamil Nadu, India
*Corresponding author: Dr. K. Manikandakumar, Department of Physics,
Bharathidasan University College (W), Orathanadu – 614 625, Tanjavore District,
Tamil Nadu, India,
Phone : 9787383327
E-mail : bioinfokm@gmail.com
Received January 28, 2009; Accepted April 20, 2009; Published April 22, 2009
Citation: Manikandakumar K, Kumaran MS, Srikumar R (2009) Matrix Frequency Analysis of Oryza Sativa (japonica cultivar-group) Complete Genomes. J Comput Sci Syst Biol 2: 159-166. doi:10.4172/jcsb.1000027
 
Copyright: © 2009 Manikandakumar K, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
 
Abstract

The genome sequence information is essential to understand the function of extensive arrangements of genes. It is significant to combine all sequence information in a precise database to provide an efficient manner of sequence similarity search. The complete genome analysis, which is one of the essential steps to know their characteristics, is very important. Complete genome analysis is depends on matrix frequency of sequence residue calculation and CGR analysis. In this study, we select rice as the specimen for complete genome analysis. Rice is one of the most essential cereal crops providing food for more than half of the world’s population. Oryza sativa (japonica cultivar-group) species is an important cereal and model monocot. We have generated a matrix frequency for genetic code analysis, which helps in the study of complete genome residues. Here we report the duplets and triplets codon for genetic code analysis of O. sativa chromosomes. We illustrate a new method of Chaos Game Representation, which produces the objects possessing self-similar structure. As per our findings, the average matrix frequency of stop codons is similar
to the matrix frequency of start codon. This average is seems to be similar in the complete genome sequences of every Oryza sativa (japonica cultivar-group) chromosomes.


Keywords
Oryza sativa (japonica cultivar-group); Chaos game representation; chromosome; matrix frequency; fractal structure

Introduction
DNA is a double anti-parallel helix built by concatenating nucleotide blocks. Several physicochemical properties of DNA depends on the interactions between consecutive bases, thus, the classification of patterns from nearest neighbor bases could help in the description of nucleotide sequences (Mohanty A. K and A.V.S.S. Narayana Rao, 2000). However, (Almeida J. S et al, 2001) have followed the scale independence of CGR of genetic sequence method to investigate local and global homology. The two patterns identified from the analysis of whole genomes and the number of different dinucleotides are unequal frequencies of manifestation of some asymmetric pairs and preferences of certain nucleotides with specific nearest neighbors over equivalent dinucleotides (Nussinov R, 1980, 1981).

Small plant chromosomes, such as those in rice, often show irregular condensation at mitotic prometaphase. Thus, the condensation pattern appearing at prometaphase was only a morphological landmark to divide the rice chromosomes into sub-regions. Characteristics of each rice chromosome with uneven condensation have quantitatively been analyzed by using image analysis methods. (Fukui K, 1985, Iijima K and Fukui K, 1991) developed a method for identifying rice chromosomes based on a flow chart that consists of 11 discriminates, which classify specific chromosome groups. All rice chromosomes have identified and numbered by comparing the categories given by discriminates, one after another. The chromosomal spread is worth analyzing if, chromosomes 4, 11, and 12 are distinguishable by visual inspection and if chromosomes 1, 2, and 3
are completely recognized. If these six chromosomes are identified using discriminates 1 to 6 in order, then there is a great possibility of identifying all 12 chromosomes within the particular spread. The relevance of accessing the frequency of non-integer genomic sequences may not be apparent
at first given that (Almeida J. S et al, 2001) physically make all sequences of integer number of nucleotides.

The genome sequence information is indispensable in understanding the function of the wide array of genes that constitute the rice plant. Therefore, it is important to consolidate all sequence information in a specified database to provide an efficient method of sequence similarity search that eliminates artifactual matches be analysed by (Yoshiaki Nagamura et al, 2003). We have generated a matrix frequency for genetic code analysis with all available rice genome for Oryza sativa japonica-cultivar group. (Li-Zhi Gao et al, 2005) analysed that DNA shuffling is a direct evolution process which generates genetic diversity through the recombination of parental sequences in order to evaluate which pair of sequences could potentially produce the best result. Oryza sativa (japonica cultivar-group), a subspecies of rice, is an important cereal and model monocot (35884299 bp). The rice genome sequence provides a foundation for the improvement of cereals, our most important
crops (Stephen A et al, 2002). Experiments of direct evolution have successfully used to improve specific biological functions. (Jorge Luis Fuentes et al, 2005) analyses Genetic diversity of rice varieties (Oryza sativa L.) based on morphological, pedigree and DNA polymorphism data, Plant Genetic Resources and phenotypic, genealogical, RAPD and AFLP diversity groups.

Mathematical characterization of DNA sequences could help in the understanding of structural relationships among different whole genomes along the chromosomes. The degenerated translation of trinucleotide codons encode for 20 amino acids, and remaining three nonsense codons signal
for the end of transcription. Base concentrations, stretches and patches are the main factors explaining the variability observed among sequences (Deschavanne P. J et al 1999). The genomic signature as expressed in terms of short nucleotide usage extends and generalizes the genomic signature and it takes advantages of whole genome data reveals genome wide trends (Karlin S and Burge C, 1995). The measure of similarity using CGR can be the basis of a new set of algorithms to align sequences with considerable advantages over the conventional scoring methods (Almeida J. S
et al, 2001
). The CGR is a formalism that bridges between sequence of discrete units and numeric coordinates in a continuous space. Consequently, basic statistic measures and techniques have applied to sequences and a wide range of new tools have devised for statistical analysis. Here we
report the genetic code analysis of complete genome of all chromosomes of O. sativa. We have generated a matrix frequency of the rice genome. We describe a new method of Chaos Game Representation applied to O. Sativa (japonica cultivar-group) species sequences, which produces fractal objects possessing self-similar structure.

Material and Methods
The Oryza sativa (japonica cultivar-group) Eukaryote complete genomes have downloaded from the GOLD (http://www.genomesonline.org/) database. The species have been totally 12 chromosomes. The details of the chromosomes are giving below.

[http://www.genomesonline.org/gold.cgi?want= Published+Co plete+Genomes]

Ch. No.

Web link

1

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008394

2

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008395

3

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008396

4

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008397

5

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008398

6

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008399

7

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008400

8

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008401

9

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008402

10

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008403

11

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008404

12

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=NC_008405


A simple model, which permits the simulation of these features of nucleotide residues, is discrete time Markov Chain (Goldman N, 1993). In this model, a 4 X 4 matrix, P defines the probabilities with which subsequent bases follow the current base in a nucleotide residue. If the base labels A,T,G, and C are equated with the numbers 1,2,3 and 4; then Pij is the jth element of the ith row of P which defines the probability that base j follows base i. The row sums of P must equal 1. Using this matrix, a simulated nucleotide residue may be obtained by selecting a first base randomly according to the frequencies of the bases in the nucleotide residue under study. If the base is i, then the probabilities will be Pi1, Pi2, Pi3 and Pi4. These probabilities are used to select the next base, and so on until the simulated sequence is the same length as the original nucleotide residues.

This first-order Markov Chain model is in which successive bases in a residue depend only on the preceding base. The probabilities in the matrix P may be estimated by direct calculation from the residues dinucleotide frequencies. If the dinucleotide XY is observed nxy times in the sequence,
then probability Pxy is estimated by nxy / (nXA + nXT + nXG + nXC). This permits a protein sequence to be simulated with both individual base frequencies and digroup frequencies matching those of the original sequence. Dinucleotide frequencies (nXY) and Markov Chain probabilities (PXY) for the Oryza sativa (japonica cultivar-group) genomes are given in Table - 2.

The first-order Markov Chain model successfully recreates other genomes. The lack of banding suggests approximate equality of the frequencies of the bases A, C, G, and T, confirmed by direct calculation from the residues. The firstorder Markov Chain model will not give the observed patterns,
but a more complex second-order Markov Chain, in which each base depends on the previous two, does. Second- order Markov Chains have been used to describe both structure and with-in-structure of nucleotide residues. PXYZ, the probability that base Z follows the trigroup XYZ, is estimated directly from the nucleotide residues trigroup frequencies nXYZ using the formula PXYZ = nXYZ / (nXYA + nXYT + nXYG + nXYC). Trinucleotide frequencies (nXYZ) and Markov Chain probabilities (PXYZ) for the Oryza sativa (japonica cultivar-group) genomes are given in Table - 3.

We apply the CGR method to Oryza sativa (japonica cultivar- group) species by considering the four different nucleotide residues into four groups namely Adenine, Thymine, Guanine and Cytosine. Using this distinctive way of CGR technique, the Oryza sativa (japonica cultivar-group) species produces the intrinsic fractal structure. The percentage values of nucleotide residues of the four groups available in the species under consideration has also computed and used for analysis. We find that some of the species of Oryza sativa (japonica cultivar-group) nucleotide sequences produce
the similar kind of self-similar fractal structure. The CGR shows the characteristics of the Oryza sativa (japonica cultivar-group) genome.

To begin with, let us generate the typical fractal object namely the ‘Square’ possessing the self-similar structure using the Chaos Game Representation (CGR). Let us start with three vertices located at (0,0), (1,0), (0,1) and (1,1) labeled as A, T, C and G respectively. Now random sequences
of 1, 2, 3 and 4 are obtained using a random number generator available in typical C compiler. In generating CGR, the nth point of the attractor is simply the midpoint between the (n-1)th point and the vertex corresponding to the nth value. Similarly, the successive application of this procedure for 100,000 points produces the ‘Square’ as shown in Fig. 1, which is a typical fractal object possessing
self-similar structure.

We calculate the nucleotide contents of the above species into grouping of four types name as A, T, G and C. Used in computer algorithm the nucleotide contents are differentiating to each group. Then we calculate the A+G and T+C ratios of the above species. Finally, the average ratio of the each chromosome is calculating by the method A+G/T+C. All the results are given by percentage values. The Table-1 is given by all the chromosome details of the Oryza sativa (japonica cultivar-group) species. The CGR plots are drawn using Gnu plot method.

Pictorial Representation

Chaos Game Representation (CGR) for gene (or DNA) sequences was introduced by (Jeffrey H. J, 1990, 1992) and the essential structures of genome sequences of a few model organisms were obtained using CGR plots. Each chromosome has been taken in above 200,000 base pairs. Therefore, we did not represent the whole genomes. We have taken only first 100,000 base pairs nucleotide sequences for the above representation.

Results and Discussions

The structure of DNA is specific to each species and undergoes only slight variations along the whole genome (Deschavanne P. J et al, 1999). Diversity among species is considerable and is primarily a consequence of base concentration, stretches of bases with unusual frequencies. The frequencies of occurrence are to point out the basis of the genome (Deschavanne P. J et al, 1999). In our analysis is giving the matrix frequency calculation of every chromosome complete genome sequences. We analysed every chromosome and given in the Table - 1 is shown by the individual nucleotide contents percentage. The table - 2 has shown by the first order Markov chain matrix frequency of
all chromosomes and it is representing in dinucleotide codons. The Table - 3 is show by the second order Markov chain matrix frequency of all chromosomes and it is representing in trinuclotide codons. The Table – 4 is shown, the classification of triplets to occurring in which regions. The Table - 5 is shown by the relations of the start and stop codons of all chromosomes.

We analyze the complete genome of Oryza sativa (japonica cultivar-group) species chromosomes nucleotide contents and the calculation is giving in Table - 1. From the table-1, the chromosome 4 has been largest residues (17028043 base pairs). The lowest residues have been chromosome
11 (298736 base pairs). Chromosome no. 1 is having the lowest Adenine residues (26.99%) and chromosome 10 is having the highest adenine residues (28.84%). The range of Thymine residues is 27.81% got the chromosome no. 9 and 28.73% of thymine residues are having chromosome no.10. The range of Guanine ratio is 21.36% (chromosome 5) and 22.70 (chromosome 1). The range of Cytosine ratio is 21.19% (chromosome 10) and 22.16 (chromosome 1). The range of A+G content ratio is 49.55% (chromosome 3) and 50.18% (chromosome 2). The range of T+C content ratio is 49.82% (chromosome 2) and 50.45% (chromosome 3). The chromosome 8 has been representing
in same nucleotide content ratio in Guanine and Cytosine residues (21.57%).

We generate and analyse the first order Markov chain matrix frequency for 16 (4 x 4) nucleotide doublet codons for Oryza sativa (japonica cultivar-group) species complete genome chromosomes and the matrix frequency for each doublet codon is given in Table-2. The Table-2 has shown, the AA codon minimum frequency range is 0.299 for chromosome- 1 and the maximum frequency range is 0.321 for chromosome-10. The CA codon minimum frequency range is 0.289 for chromosome-1 and the maximum frequency range is 0.317 for chromosome-5. The GA codon minimum frequency range is 0.271 for chromosome-3 and the maximum frequency range is 0.285 for chromosome-5. The TA codon minimum frequency range is 0.223 for chromosome- 1 and the maximum frequency range is 0.245 for chromosome- 7. The AC codon minimum frequency range is 0.179 for chromosome-10 & 11 and the maximum frequency range is 0.189 for chromosome-3. The CC codon minimum frequency range is 0.232 for chromosome-2 and the maximum frequency range is 0.249 for chromosome-4. The
GC codon minimum frequency range is 0.229 for chromosome- 10 and the maximum frequency range is 0.248 for chromosome-3. The TC codon minimum frequency range is 0.223 for chromosome-1 and the maximum frequency range is 0.207 for chromosome-11. The AG codon minimum frequency range is 0.204 for chromosome-10 and the maximum frequency range is 0.224 for chromosome-1. The CG codon minimum frequency range is 0.160 for chromosome- 5 and the maximum frequency range is 0.192 for chromosome- 4 & 6. The GG codon minimum frequency range is 0.233 for chromosome-12 and the maximum frequency range is 0.250 for chromosome-9. The TG codon minimum frequency range is 0.221 for chromosome-10 and the maximum frequency range is 0.248 for chromosome-1. The AT codon minimum frequency range is 0.283 for chromosome- 6 & 9 and the maximum frequency range is 0.296 for chromosome-10. The CT codon minimum frequency range is 0.266 for chromosome-4 and the maximum frequency range is 0.289 for chromosome-11. The GT codon minimum frequency range is 0.235 for chromosome-9 and the maximum frequency range is 0.245 for chromosome-10. The TT codon minimum frequency range is 0.306 for chromosome- 1 and the maximum frequency range is 0.320 for chromosome-8.

We have generate and analyse the second order Markov chain matrix frequency for 64 (4 x 4 x 4) nucleotide triplet codons for Oryza sativa (japonica cultivar-group) species complete genome chromosomes and the matrix frequency for each tripet codon is given in Table-3. From Table-3, most of the highest nucleotide triplet codon is representing in AAA and TTT. The most of the lowest nucleotide triplets are AGC and TGC. All the above chromosomes, we identify the low sparseness regions are mostly played in species of triplets as AC-A,T; GCA, TC-A,T; AG-A,C, CGA, T; GGC, TGA and TGC respectively. The highest sparseness regions are mostly played in species of triplets as AAA, C,G; CA-A,C; GAA, ATT, CTT, GTT and TT-A,T respectively.

Table-4 has shown the frequency triplet codons have separated by four regions. The regions are classifying the frequency range of 0.125-0.199, 0.200-0.249, 0.250-0.299, and above 0.300 matrix frequency of triplet codons. This table is easy to analyse and to study, how many tripets are coming
under particular range of frequency.

We have analysed the relations between the start codon and stop codon frequencies and it has given in Table-5. The genetic code is show in three types of stop codons. But the start codon is only one. Therefore, we tried to show one stop codon for every sequence. Our analysis has not succeeded. Nevertheless, the average of two-stop codon value is nearly equal to the start codon. This Table-4 is describes, separated and shown in start and stop codon for each chromosomes
in Oryza sativa (japonica cultivar-group). This table shows every start codon frequency is equal to the average of two-stop codon frequency. So this analysis is used to finding and expressed the start codon is equal to stop codon for every Oryza sativa (japonica cultivar-group) chromosome
complete genome sequences.

The figure 1, is shown the CGR plot for the first 100,000 base pairs of 12 chromosomes of Oryza sativa (japonica cultivar-group) species, we identify the genomes are cross overlapping in A, G and T, C. Four triangles are connecting in the mid point of 0.5, 0.5, 0.5, and 0.5 respectively. The A-T region is keeping in more numbers of residues.

Analysis of Individual Chromosome

Chromosome 1
The chromosome 1 is, total of 301936 base pairs. From the Table-1, the highest percentage value is Thymine residue (28.15%). The lowest percentage value is Cytosine residue (22.16%). The Adenine and Guanine residues are 26.99% and 22.70%. The highest combination of nucleotide residues of T+C percentage is 50.31%. The ratio of A+G & T+C is 1.0. The triplet of chromosome 1, the highest
tri-nucleotide is TTT (0.348%). The lowest tri-nucleotide is AGC (0.163%). From the Table-3, the matrix frequency of chromosome 1, high frequency of tri-nucleotide sequence has been representing in AAA, AAC, TTA, and TTT (above 0.300%). Tri-nucleotide sequence of 0.250% to 0.299% has been represented in AAG, AAT, CA-A,C,G; GA-A,C,G; TA-A,C,G; CC-G,T; GC-C,G; TC-C,G; AGT, CGG, GG-A,T; AT-A,C,T; CT-A,C,T; GT-A,C,T; TT-C,G. Tri-nucleotide sequence of 0.200% to 0.249% has been represented in CAT, GAT, TAT, AC-C,G,T; CC-A,C; GC-G,T; TC-G,T; AG-A,G; CG-A,C,T; GGG, TG-A,G,T; ATG,CTG and GGG. Tri-nucleotide sequence of 0.150% to 0.199% is representing in ACA, GCA, TCA, AGC, GGC, and TGC.

Chromosome 2
The chromosome 2 is total of 662387 base pairs. The highest percentage value is Adenine residue (28.39%). The lowest percentage value is Guanine residue (21.79%). The Thymine and Cytosine residues are 27.98% and 21.84%. The highest combination of nucleotide residues of A+G
percentage is 50.18%. The ratio of A+G & T+C content is 1.0. The triplet codon of chromosome 2, the highest triplet is AAA (0.351%). The lowest triplet codon is TGC (0.157%).

The matrix frequency of chromosome 2, the frequency of tri-nucleotide sequence has been represented in AA-A,C,G; CA-A,C; GAC, TAC, TT-A,T (above 0.300%). Tri-nucleotide sequence of 0.250% to 0.299% has been represented in AAT, CAG, GA-A,G; TA-A,G; CC-G,T; GCG, CGG, GGA,
T; AT-A,C,T; CT-A,C,T; GT-A,C,T; TT-C,G. Tri-nucleotide sequence of 0.200% to 0.249% has been represented in CAT, GAT, TAT, AC-C,G,T; CCC, GC-C,T; TC-C,G,T; AG-A,G,T; CG-A,T; GGG, TG-G,T; ATG,CTG and GTG. Tri-nucleotide sequence of 0.150% to 0.199% has been representing in ACA, CCA, GCA, TCA, AGC, CGC, GGC, TGA, and TGC.

Chromosome 3
The chromosome 3 is total of 831805 base pairs. The highest percentage value is Thymine residue (28.34%). The lowest percentage value is Guanine residue (22.05%). The Adenine and Cytosine residues are 27.50% and 22.11%. The highest combination of nucleotide residues of T+C percentage is 50.45%. The ratio of A+G & T+C content is 1.0. The triplet codon of chromosome 3, the highest triplet is TTT (0.355%). The lowest triplet codon is TGC (0.162%).

The matrix frequency of chromosome 3, the frequency of tri-nucleotide sequence has been represented in AA-A,C; GTT, TT-A,C,T (above 0.300%). Tri-nucleotide sequence of 0.250% to 0.299% has been represented in AA-G,T, CAA, C, GA-A,C,G; TA-A,C,G; CC-G,T; GC-C,G, CGG, GGA,
T; AT-A,C,T; CT-A,C,T; GT-A,C; TTG. Tri-nucleotide sequence of 0.200% to 0.249% has been represented in CAG, T; GAT, TAT, AC-C,G,T; CC-A,C; GCT, TC-C,G,T; AGA, G,T; CG-A,C,T; GGG, TG-G,T; ATG,CTG and GTG. Trinucleotide sequence of 0.150% to 0.199% has been representing in ACA, GCA, TCA, AGC, GGC, TGA and TGC.

Chromosome 4
The chromosome 4 is total of 17028043 base pairs. The highest percentage value is Thymine residue (27.95%). The lowest percentage value is Cytosine residue (22.08%). The Adenine and Guanine residues are 27.88% and 22.10%. The highest combination of nucleotide residues of T+C
percentage is 50.03%. The ratio of A+G & T+C is 1.0. The triplet codon of chromosome 4, the highest triplet is AAA (0.347%). The lowest triplet codon is AGC (0.169%). The matrix frequency of chromosome 4, the frequency of trinucleotide sequence has been represented in AA-A,C; CAA,
CTT, TT-A,T (above 0.300%). Tri-nucleotide sequence of 0.250% to 0.299% has been represented in AA-G,T, CAC, G; GA-A,C,G; TA-A,C,G; CCT, GC-G,T; TCC, CGG, GG-A,T; TGG, AT-A,C,T; CT-A,C; GT-A,C,T; TT-C,G. Tri-nucleotide sequence of 0.200% to 0.249% has been represented in CAT, GAT, TAT, AC-C,G,T; CC-A,C,G; GCT, TCG, T; AG-A,G,T; CG-C,T; GG-C,G, TG-A,T; ATG, CTG and GTG. Tri-nucleotide sequence of 0.150% to 0.199% has been representing in ACA, GCA, TCA, AGC, CGA, and TGC.

Chromosome 5
The chromosome 5 is total of 476423 base pairs. The highest percentage value is Adenine residue (28.61%). The lowest percentage value is Guanine residue (21.36%). The Thymine and Cytosine residues are 28.47% and 21.55%. The highest combination of nucleotide residues of T+C percentage is 50.02%. The ratio of A+G & T+C content is 1.0. The triplet codon of chromosome 5, the highest triplets is AAA and TTT (0.354%). The lowest triplet codon is AGC (0.143%). The matrix frequency of chromosome 5, the frequency of tri-nucleotide sequence has been represented in AA-C,G; CA-A,C; GA-A,C(same ratio); TAC & TTC are the same ratio. CTT, GTT, TT-A; AAA and TTT
are the same ratio of nucleotides (above 0.300%). Tri-nucleotide sequence of 0.250% to 0.299% has been represented in AAT, CAG; GAG; TA-A,G; CC-G,T, GCG=TCC; CGG, GG-A,T; AT-A,C,T; CT-A,C; GT-A,C; TTG. Tri-nucleotide sequence of 0.200% to 0.249% has been represented in CAT, GAT, TAT, AC-C,G,T; CCC, GC-C,T, TC-G,T; AG-A,G,T; CG-A,T; GGG, TG-G,T; ATG, CTG and GTG. Tri-nucleotide
sequence of 0.150% to 0.199% has been representing in ACA, CCA, GCA, TCA, AGC, CGC, GGC and TGC.

Chromosome 6
The chromosome 6 is total of 1949261 base pairs. The highest percentage value is Adenine residue (28.03%). The lowest percentage value is Cytosine residue (21.88%). The Thymine and Guanine residues are 27.96% and 22.12%. The highest combination of nucleotide residues of A+G
percentage is 50.15%. The ratio of A+G & T+C content is 1.0. The triplet codon of chromosome 6, the range of triplets is TTT (0.359%) and TGC (0.172%). The matrix frequency of chromosome 6, the frequency of tri-nucleotide sequence has been represented in AA-A,C,G; CAA, GAA, GTT, TT-A,T of nucleotides (above 0.300%). Tri-nucleotide sequence of 0.250% to 0.299% has been represented in AAT, CA-C,G; GA-C,G; TA-A,C,G; CC-G,T, GC-C,G; CGG, GGA, T; AT-A,C,T; CT-A,C,T; GT-A,C; TT-C,G. Tri-nucleotide sequence of 0.200% to 0.249% has been represented in CAT, GAT, TAT, AC-C,G,T; CC-A,C, GCT, TC-C,G,T; AG-G,T; CG-A,C,T; GG-C,G, TG-A,G,T; ATG, CTG and GTG. Trinucleotide sequence of 0.125% to 0.199% has been represented in ACA, GCA, TCA, AG-A,C and TGC.

Chromosome 7
The chromosome 7 is total of 993326 base pairs. The highest percentage value is Adenine residue (28.67%). The lowest percentage value is Guanine residue (21.47%). The Thymine and Cytosine residues are 28.24% and 21.63%. The highest combination of nucleotide residues of A+G percentage is 50.14%. The ratio of A+G & T+C content is 1.0. The triplet codon of chromosome 7, the frequency of triplets is AAA (0.361%) and AGC (0.161%). The matrix frequency of chromosome 7, the frequency of tri-nucleotide sequence has been represented in AA-A,C,G; CA-A,C; TTA, T of nucleotides (above 0.300%). Tri-nucleotide sequence of 0.250% to 0.299% has been represented in AAT, CAG, GA-A,C,G; TA-A,C,G; CC-G,T, GC-C,G; CGG, GG-A,T; AT-A,C,T; CT-A,C,T; GT-A,C,T; TT-C,G. Tri-nucleotide sequence of 0.200% to 0.249% has been represented in CAT, GAT, TAT, AC-C,G; CC-A,C, GCT, TC-C,G; AG-G,T; CGC, T; GGG, TG-G,T; ATG, CTG and GTG. Tri-nucleotide sequence of 0.125% to 0.199% has been represented in ACA, T; GCA, TC-A,T; AG-A,C, CGA, GGC, TGA and TGC.

Chromosome 8
The chromosome 8 is total of 8367279 base pairs. The highest percentage value is Adenine residue (28.49%). The lowest percentage values are represented in Guanine and Cytosine residues (21.57%). The Thymine residue is 28.37%. The highest combination of nucleotide residues of A+G percentage is 50.06%. The ratio of A+G & T+C content is 1.0. The triplet codon of chromosome 8, the range of triplets is TTT (0.360%) and AGC (0.164%). The matrix frequency of chromosome 8, the frequency of tri-nucleotide sequence has been represented in AA-A,C,G; CA-A,C; GAA, ATT, CTT, GTT, TT-A,T of nucleotides (above 0.300%). Tri-nucleotide sequence of 0.250% to 0.299% has been represented in AAT, CAG, GA-C,G; TA-A,C,G; CCG, T, GC-C,G; CGG, GGT, AT-A,C; CT-A,C; GT-A,C; TTC,
G. Tri-nucleotide sequence of 0.200% to 0.249% has been represented in CAT, GAT, TAT, AC-C,G,T; CC-A,C, GCT, TC-C,G,T; AG-G,T; CG-C,T; GG-A,G, TG-G,T; ATG, CTG and GTG. Tri-nucleotide sequence of 0.125% to 0.199% has been represented in ACA, GCA, TCA, AG-A,C; CGA, GGC, TGA and TGC.

Chromosome 9
The chromosome 9 is total of 2439243 base pairs. The highest percentage value is Adenine residue (28.06%). The lowest percentage value is Guanine residue (22.06%). The Thymine and Cytosine residues are 27.81% and 22.07%. The highest combination of nucleotide residues of A+G percentage is 50.12%. The ratio of A+T & G+C is 1.0. The highest triplets are AAA and TTT (0.346%). The lowest triplet codon is AGC (0.162%). The matrix frequency of chromosome 9, the frequency of tri-nucleotide sequence has been represented in AA-A,C; CA-A,C; CTT, GTT, TTA, T (above 0.300%). Tri-nucleotide sequence of 0.250% to 0.299% has been represented in AA-G,T, CAG, GAA, C,G; TA-A,C,G; CCT, GC-C,G; TCC, CGG, GG-A,T; AT-A,C,T; CT-A,C; GT-A,C; TT-C,G. Tri-nucleotide sequence
of 0.200% to 0.249% has been represented in CAT, GAT, TAT, AC-C,G,T; CC-A,C,G; GCT, TC-G,T; AG-A,G,T; CG-C,T; GGG, TG-A,G,T; ATG, CTG and GTG. Tri-nucleotide sequence of 0.150% to 0.199% has represented in ACA, GCA, TCA, AGC, CGA, GGC and TGC.

Chromosome 10
The chromosome 10 is total of 306812 base pairs. The highest percentage value is Adenine residue (28.84%). The lowest percentage value is Cytosine residue (21.19%). The Thymine and Guanine residues are 28.73% and 21.24%. The highest combination of nucleotide residues of A+G percentage is 50.08%. The ratio of A+G & T+C content is 1.0. The highest triplet is AAA (0.360%). The lowest triplet codon is ACA (0.165%). The matrix frequency of chromosome 10, the frequency of tri-nucleotide sequence has been represented in AA-A,C,G; CA-A,C; GAA, ATT, CTT, TT-A,T (above 0.300%). Tri-nucleotide sequence of 0.250% to 0.299% has been represented in AAT, CAG, GA-C,G;
TA-A,C,G; CCT, GCC, CGG, GG-A,T; AT-A,C; CT-A,C; GT-A,C,T; TT-C,G. Tri-nucleotide sequence of 0.200% to 0.249% has been represented in CAT, GAT, TAT, AC-C,G; CC-A,C,G; GC-G,T, TC-C,G; AG-G,T; CGC, GGG, TGG, T; ATG, CTG, GTG. Tri-nucleotide sequence of 0.150% to 0.199% has been represented in AC-A,T; GCA, TC-A,T; AG-A,C, CG-A,T; GGC, TGA and TGC.

Chromosome 11
The chromosome 11 is total of 298736 base pairs. The highest percentage value of Thymine residue is 28.66%. The lowest percentage value of Cytosine residue is 21.21%. The Adenine and Guanine residues are 28.50% and 21.63%. The highest combination of nucleotide residues of A+G percentage is 50.13%. The ratio of A+G & T+C content is 1.0. The highest triplet codon frequency (TTT) is 0.363%. The lowest triplet codon frequency AGC is 0.146%. The matrix frequency of chromosome 11, the frequency of tri-nucleotide sequence has been represented in AA-A,C,G; CA-A,C; GA-C,G, TAC, GTT, TT-A,C,T (above 0.300%). Tri-nucleotide sequence of 0.250% to 0.299% has been represented in AAT, CAG, GAA, TA-A,G; CC-G,T, CGG, GG-A,T; ATA,C,T; CT-A,C,T; GT-A,C; TTG. Tri-nucleotide sequence of 0.200% to 0.249% has been represented in CAT, GAT, TAT, AC-C,G; CC-A,C; GC-C,G; TC-C,G; AG-A,G,T; CGA, T; GGG, TG-A,G,T; ATG, CTG, GTG. Tri-nucleotide sequence of 0.150% to 0.199% has been represented in ACA, T; GC-A,T; TC-A,T; AGC, CGC, GGC and TGC.

Chromosome 12
The chromosome 12 is total of 2229048 base pairs. The highest percentage value of Adenine residue is 28.54%. The lowest percentage value of Guanine residue is 21.60%. The Thymine and Cytosine residues are 28.21% and 21.65%. The highest combination of nucleotide residues of A+G
percentage is 50.14%. The ratio of A+G & T+C content is 1.0. The highest triplet codon frequency of TTT is 0.353%. The lowest triplet codon frequency of AGC is 0.154%. The matrix frequency of chromosome 12, the frequency of trinucleotide sequence has been represented in AA-A,C,G; CA-A,C; GA-A,C; TAC, TT-A,T (above 0.300%). Trinucleotide sequence of 0.250% to 0.299% has been represented in AAT, CAG, GAG; TA-A,G; CC-G,T, GCG, CGG, GG-A,T; AT-A,C,T; CT-A,C,T; GT-A,C,T; TT-C,G. Trinucleotide sequence of 0.200% to 0.249% has been represented in CAT, GAT, TAT, AC-C,G,T; CC-A,C; GC-C,T; TC-C,G,T; AG-A,G,T; CG-A,T; GGG, TG-A,G,T; ATG, CTG, GTG. Tri-nucleotide sequence of 0.150% to 0.199% has represented in ACA, GCA, TCA, AGC, CGC, GGC and TGC.

Conclusion
The new techniques just described identifying the rice chromosome and specifying the region using the first order Markov chain, second order Markov chain and CGR methods of the DNA sequence analysis. The probabilities defining these models can calculated directly and easily from the raw DNA sequences, implying that the CGR gives no further insight into the structure of the DNA sequence than is given by the dinucleotide and trinucleotide frequencies. In this paper, we have shown that simple Markov Chain models based solely on dinucleotide and trinucleotide frequencies can account for the complex patterns exhibited in CGR of Oryza sativa (japonica cultivar-group) chromosome sequences. The Oryza sativa (japonica cultivar-group) species chromosome sequences are more similar to each other. However, our analysis is visible of sequence pattern is similar in each other. Some high matrix frequency value (0.300) of tri-nucleotide codon is having by small number of trinucleotides, but the matrix values are different in each other. The low-resolution codon frequencies are having by small number of tri-nucleotides (0.125-0.199), but the matrix value is different. We observed the above results the low-resolution tri-nucleotides are very low. The chromosome 1 is having four high frequency (above 0.300%) tri-nucleotides. The chromosome 8 has been representing in same nucleotide content ratio in Guanine and Cytosine residues (21.57%). The frequency matrix values 0.200% to 0.299% are highly responsible for the Oryza sativa (japonica cultivar- group) species. It is representing more number of trinucleotide codons. Finally, we observed the frequency of start codon is equal to average of two stop codon frequencies. The in silico analysis of the matrix frequency study is used in invitro/ invivo studies for reassembling the particular repaired codon region which is modified by the gene tinkering (codon replacing) methods. The future analysis can integrate these procedures into one logical, individual gene.

References

  1. Almeida JS, Carrico JA, Maretzek A, Noble PA, Fletcher M (2001) Analysis of genomic sequences by Chaos Game Representation. Bioinformatics 17: 429- 437. » CrossRef   » PubMed  »  Google Scholar

  2. Deschavanne PJ, Giron A, Villain J, Fagot G, Fertil B (1999) Genomic Signature: Characterization and Classification of Species Assessed by Chaos Game Representation of Sequences. Mol Biol Evol 16: 1391-1399. » CrossRef   » PubMed  »  Google Scholar

  3. Fukui K (1985) Identification of plant chromosome by the image analysis method. The Cell 17: 145-149.

  4. Fuentes JL, Cornide MT, Alvarez A, Suarez E, Borges E (2005) Genetic diversity analysis of rice varieties (Oryza sativa L.) based on morphological, pedigree and DNA polymorphism data, Plant Genetic Resources: Characterization and Utilization. Plant Genetic Resources 3: 353-359.

  5. Gao LZ, Zhang CH, Chang LP, Jia JZ, Qiu ZE et al. (2005) Microsatellite diversity within Oryza sativa with emphasis on indica–japonica divergence. Genetics Research 85: 1-14. » CrossRef   » PubMed  »  Google Scholar

  6. Goff SA, Ricke D, Lan TH, Presting G, Wang R, et al. (2002) A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica). Science 296: 92-100. » CrossRef   » Google Scholar

  7. Goldman N (1993) Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representation of DNA sequences. Nucleic Acids Research 21: 2487-2491. » CrossRef   » PubMed  »  Google Scholar

  8. Iijima K, Fukui K (1991) Clarification of the conditions for the image analysis of plant chromosomes. Bull Natl Inst Agrobiol Resour 6: 1-58. » CrossRef   »  Google Scholar

  9. Jeffrey HJ (1990) Chaos game representation of gene structure. Nucleic Acids Research 18: 2163-2170. » CrossRef   » PubMed  »  Google Scholar

  10. Jeffrey HJ (1992) Chaos game visualization of sequences. Computer Graphics 16: 25-34. » CrossRef  »  Google Scholar

  11. Karlin S, Burge C (1995) Dinucleotide relative abundance extremes: a genomic signature. Trends Genet 11: 283-290. » CrossRef   » PubMed  »  Google Scholar

  12. Mohanty AK, Narayana Rao AVSS (2000) Factorial Moments Analyses Show a Characteristic Length Scale in DNA Sequences. Phys Rev Lett 84: 1832-1835. » PubMed  »  Google Scholar

  13. Nussinov R (1980) Some rules in the ordering of nucleotides in the DNA. Nucleic Acids Research 8: 4545-4562. » CrossRef   » PubMed  »  Google Scholar

  14. Nussinov R (1981) Nearest neighbor nucleotide patterns: Structural and biological implications. Journal of Biol Chemistry 256: 8458-8462. » CrossRef   » PubMed  »  Google Scholar

  15. Yoshiaki NA, Baltazar A, Hisataka N, Ikuo H, Manabu A, et al. (2003) A Comprehensive Homology Search for Rice Specific Sequences. Genome Informatics 14: 533-534.
This Article
DOWNLOAD
» XML (49 KB)
» PDF (218 KB)
» Supplementary Material
» Citation

CONTRIBUTE

SHARE

EXPLORE
Related Article at