According to the Zipf's law, the distribution of rank-ordered frequency of words in the natural language can be modelled on the power law. In this paper, we examine the frequency distribution of 64 codons over the coding and non-coding regions of 88 DNA from EMBL and GenBank database, using exponential fitting. Also, we regard 20 amino-acids as vocabulary, perform the same frequency analysis to the same database and show that amino-acids can be used as biological meaningful words for Zipf's approach. Our analysis suggests that a natural language structure may exist not only in the coding region of DNA but in the non-coding one of DNA.
ASJC Scopus subject areas
- Statistical and Nonlinear Physics
- Physics and Astronomy(all)
- Applied Mathematics