To Homepage of Tero Harju

Andrzej Ehrenfeucht, Tero Harju, Ion Petre,
David M. Prescott and Grzegorz Rozenberg

Computation in Living Cell.
Gene assembly in ciliates

Springer-Verlag, 2004, xviii + 201 pages.





Natural Computing is a research area concerned with computing taking place in nature and with human-designed computing inspired by nature. It is a fast growing, genuinely interdisciplinary field involving, among others, biology and computer science.

The contribution of Natural Computing to computer science is quite significant, and it comes in the period when computer science is undergoing an important transformation that combines knowledge about human-design computing (as going on in computer science) with knowledge about computing observed in nature. Several areas of natural computing, such as evolutionary algorithms (see Ghosh and Tsutsui [23]), neural networks (see Haykin [27]), quantum computing (see Hirvensalo [29]), and DNA computing (see Paun, Rozenberg, and Salomaa [41], Jonoska and Seeman [30]), are flourishing in computer science. Characteristic for these areas is the use of paradigms underlying natural systems. Thus, e.g., evolutionary algorithms use the concepts of mutation, recombination, and natural selection from the theory of evolution, while neural networks are being inspired by the highly interconnected neural structures in the brain and nervous system.

DNA computing is based on paradigms from molecular biology; researchers in DNA computing study the use of DNA (and other) molecules for the purposes of computing. Research in DNA computing can be roughly divided into two (not disjoint) streams: DNA computing in vitro and DNA computing in vivo. The former is concerned with the theoretical foundations and experimental work on building DNA based computers in test tubes. The latter is concerned with constructing computational components in living cells (such as simple switching circuits, see [57]), and with studying computational processes taking place in living cells. In recent years, some of the life processes going on in ciliates have attracted attention of researchers in DNA computing community.

Ciliates (ciliated protozoa) are single-celled eukaryotic organisms (see [47]). It is an ancient group of organisms which originated around two billion years ago, and it is a very diverse group - some 8,000 different species are currently known. Two characteristics hold ciliates together as a single group: the possession of hair-like cilia used for motility and food capture, and the presence of two kinds of functionally different nuclei in the same cell - a micronucleus and a macronucleus.

The macronucleus is the ''household nucleus'' that provides RNA transcripts for producing proteins, while the micronucleus is a dormant nucleus, where no production of RNA transcripts is attempted at all. The micronucleus is activated only in the process of sexual reproduction, where at some stage (the genome of) the micronucleus gets transformed into (the genome of) the macronucleus in the process called gene assembly - it is the most involved DNA processing known in living organisms. Gene assembly is so involved because the form of the micronuclear genome is drastically different from the form of the macronuclear genome (see [48], [54]).

The computational nature of gene assembly in ciliates was brought to the attention of the DNA computing community in a series of papers by L. Kari and L. F. Landweber (see [37], [38]), where the authors notice that the process of assembling a macronuclear gene from its micronuclear form resembles the structure of the solution of the so called directed hamiltonian path problem proposed by L. Adleman in his seminal paper [1] that invigorated DNA computing research. (See also [55] for an even earlier hint on the computational nature of gene assembly.) Since then research on the computational nature of gene assembly in ciliates has developed rapidly, and it has involved both biologists and computer scientists. One line of this research has followed the original view of Kari and Landweber, and it has focussed on the computational power (in the sense of computability theory) of their intermolecular model. The other line of this research, carried out by the authors of this book and based on an intramolecular model, has focussed on the gene assembly itself, including topics such as the possible forms of the genes generated during gene assembly and possible strategies for the gene assembly. This book centers on the phenomena of gene assembly.

DNA computing represents one side of cooperation/interaction between computer scientists and biologists: molecular biology assisting computer scientists to achieve a really bold goal of providing an alternative or a complement for silicon based computers. On the other side of this cooperation/interaction, in bioinformatics (see Lesk [39]) and in computational molecular biology (see Pevzner [43]), computer scientists and mathematicians assist biologists in understanding the structure and function of biomolecules, such as DNA and proteins, in living cells. The research presented in this book lies at the intersection of all three areas: DNA computing, bioinformatics, and computational biology. But most naturally it belongs to natural computing because it is deeply concerned with the computational nature of complex biological phenomena.

This book is organized as follows.

Part I of the book gives the biological background, and it consists of three chapters. Chap. 1 provides an overview of the structures common to cells and the molecular principles on which cells operate. Chap. 2 describes the features of ciliates that make them uniquely useful for the study of natural computing. In Chap. 3 we postulate three molecular operations, ld, hi, and dlad, that accomplish gene assembly in ciliates.

Part II introduces formal models for studying gene assembly. Chap. 4 describes the process of model forming that leads to the formulation of three models: MDS descriptors, legal strings, and overlap graphs. In particular, it explains how abstracting from more details of gene structure leads to these three models (in this increasing level of abstraction). This chapter is an informal introduction to the formal framework of this book - it lays a foundation for a biologist to acquire intuitive insights and understanding about the more formal chapters of Part II. Chap. 5 introduces basic mathematical notions and notations needed in this book. Formalization of gene assembly on the MDS descriptors level is presented in Chaps. 6 and 7, on the level of legal strings in Chaps. 8 and 9, and on the level of overlap graphs in Chaps. 10 and 11.

Part III gives three examples of research topics concerned with gene assembly. In Chap. 12 we consider properties of gene assembly that are independent of the choice of gene assembly strategy. Since at present we do not know which strategies are actually used by ciliates, studying properties that are common to all strategies is of course important. In Chap. 13 we analyze the influence of molecular operations on the form of the genes that they assemble. In particular, we give formal characterizations of the forms of genes that can be assembled by each subset of the set of the three molecular operations ld, hi, and dlad. In Chap. 14 we use graph theory for formulating yet another point of view on gene assembly. We view it here as a process of dynamically changing decomposition of a graph representing a gene. One can view this chapter as a structural graph theoretic formulation of the novel paradigm ''computing by folding and recombination'' that underlies a big part of research on computational aspects of gene assembly.

Finally, Part IV is an epilogue for this book. Chap. 15 demonstrates how to formulate an intermolecular model of gene assembly using the ''pointer approach'' of this book. In this way we formulate one possible bridge to the original intermolecular model of Kari and Landweber. Chap. 16 provides a perspective on the research presented in this book, and in particular it outlines a number of possible future lines of research.


Adleman, L. M., Molecular computation of solutions to combinatorial problems. Science 226 (Nov. 1994), 1021 - 1024.

Alberts, B., Bray, D., Lewis, J., Raff, M., Johnson, A., Essential Cell Biology: An Introduction to the Molecular Biology of the Cell, 1998.

Anderson, E., Chrobak, M., Noga, J., Sgall, J., and Woeginger, G., Solution of a problem in DNA computing. Theoret. Comput. Sci. 287 (2002), 387 - 391.

Berman, P., and Hannenhalli, S., Fast sorting by reversals. Combinatorial Pattern Matching, Lecture Notes in Comput. Sci. 1072 (1996), 168 - 185.

Bouchet, A., Circle graphs. Combinatorica 7 (1987), 243 - 254.

Bouchet, A., Circle graph obstructions. J. Combin. Theory Ser B 60 (1994), 107 - 144.

Caprara, A., Sorting by reversals is difficult. In Proceedings of the 1st Annual International Conference on Computational Molecular Biology, S. Istrail, P. Pevzner and M. Waterman (eds.), 75 - 83, 1997.

Daley, M., Computational Modeling of Genetic Processes in Stichotrichous Ciliates. PhD Thesis, University of London, Ontario, Canada, 2003.

Daley, M., and Kari, L., Some properties of ciliate bio-operations. Lecture Notes in Comput. Sci. 2450 (2003), 116 - 127.

Daley, M., Ibarra, O. H., and Kari, L., Closure propeties and decision questions of some language classes under ciliate bio-operations. Theoret. Comput. Sci., to appear.

Dassow, J., Mitrana, V., and Salomaa, A., Operations and languages generating devices suggested by the genome evolution. Theoret. Comput. Sci. 270, (2002) 701 - 738.

de Fraysseix, H., Local complementation and interlacement graphs. Discrete Math. 33 (1981), 29 - 35.

de Fraysseix, H., A characterization of circle graphs. European J. Combin. 5 (1984), 223 - 238.

de Fraysseix, H., and Ossona de Mendez, P., A short proof of a Gauss problem. Lecture Notes in Comput. Sci. 1353 (1997), 230 - 235.

Ehrenfeucht, A., Harju, T., Petre, I., Prescott, D. M., and Rozenberg, G., Formal systems for gene assembly in ciliates. Theoret. Comput. Sci. 292 (2003), 199 - 219.

Ehrenfeucht, A., Harju, T., Petre, I., and Rozenberg, G., Patterns of micronuclear genes in cliates. Lecture Notes in Comput. Sci. 2340 (2002), 279 - 289.

Ehrenfeucht, A., Harju, T., Petre, I., and Rozenberg, G., Characterizing the micronuclear gene patterns in ciliates. Theory of Comput. Syst. 35 (2002), 501 - 519.

Ehrenfeucht, A., Harju, T., and Rozenberg, G., Gene assembly through cyclic graph decomposition. Theoret. Comput. Sci. 281 (2002), 325 - 349.

Ehrenfeucht, A., Petre, I., Prescott, D. M., and Rozenberg, G., Universal and simple operations for gene assembly in ciliates. In Words, Sequences, Languages: Where computer science, biology and linguistics come across, V. Mitrana and C. Martin-Vide (eds.), Kluwer Academic Publishers, Dortrecht/Boston, 329 - 342, (2001).

Ehrenfeucht, A., Petre, I., Prescott, D. M., and Rozenberg, G., String and graph reduction systems for gene assembly in ciliates. Math. Structures Comput. Sci. 12 (2001), 113 - 134.

Ehrenfeucht, A., Petre, I., Prescott, D. M., and Rozenberg, G., Circularity and other invariants of gene assembly in cliates. In Words, semigroups, and transductions, M. Ito, Gh. Paun and S. Yu (eds.), World Scientific, Singapore, 81 - 97, 2001.

Ehrenfeucht, A., Prescott, D. M., and Rozenberg, G., Computational aspects of gene (un)scrambling in ciliates. In Evolution as Computation, L. F. Landweber, E. Winfree (eds.), Springer-Verlag, Berlin, Heidelberg, 216 - 256, 2001.

Ghosh, A., Tsutsui, S. (eds.), Advances in Evolutionary Computing: Theory and Applications, Natural Computing Series, Springer-Verlag, Berlin Heidelberg, 2003.

Hannenhalli, S., and Pevzner, P. A., Transforming cabbage into turnip (Polynomial algorithm for sorting signed permutations by reversals). In Proceedings of the 27th Annual ACM Symposium on Theory of Computing, 178 - 189, 1995.

Harju, T., Petre, I., and Rozenberg, G., Formal properties of gene assembly: Equivalence problem for overlap graphs. To appear.

Harju, T., and Rozenberg, G., Computational processes in living cells: gene assembly in ciliates. Lecure Notes in Comput. Sci. 2450 (2003), 1 - 20.

Haykin, S. S., Neural Networks: A Comprehensive Foundation, Prentice Hall, 1998.

Head, T., Formal language theory and DNA: an analysis of the generative capacity of specific recombinant behaviors. Bull. Math. Biology 49(6) (1987), 737 - 190.

Hirvensalo, M., Quantum Computing. Natural Computing Series, Springer-Verlag, Berlin Heidelberg, 2001.

Jonoska, N., and Seeman, N. C. (eds.), DNA Computing, 7th International Workshop on DNA Based Computers, DNA 7, Lecture Notes in Comput. Sci. 2340, Springer-Verlag, Berlin Heidelberg, 2002.

Kaplan, H., Shamir, R., and Tarjan, R. E., A faster and simpler algorithm for sorting signed permutations by reversals. SIAM Journal on Computing 29 (1999), 880 - 892.

Kari, J., and Kari, L. Context free recombinations. In Where Mathematics, Computer Science, Linguistics, and Biology Meet, C. Martin-Vide and V. Mitrana (eds.), Kluwer Academic Publishers, Dordrecht/Boston/London, 361 - 375, 2000.

Kari, L., Kari, J., and Landweber, L. F., Reversible molecular computation in ciliates. In Jewels are Forever, J. Karhumäki, H. Maurer, G. Paun and G. Rozenberg (eds.), Springer-Verlag, Berlin, Heidelberg, 353 - 363, 1999.

Kari, L., and Landweber, L. F., Computational power of gene rearrangement. In Proceedings of DNA Bases Computers, V E. Winfree and D. K. Gifford (eds.), American Mathematical Society, 207 - 216 (1999).

Knuth, D.E., The Art of Computer Programming, Volume 1, Fundamental Algorithms, 3rd edition, Reading, Massachusetts: Addison-Wesley, 1997.

Kotzig, A., Moves without forbidden transitions in graph. Mat. casopis 18 (1968), 76 - 80.

Landweber, L. F., and Kari, L., The evolution of cellular computing: nature's solution to a computational problem. In Proceedings of the 4th DIMACS meeting on DNA based computers, Philadelphia, PA, 3 - 15 (1998).

Landweber, L. F., and Kari, L., Universal molecular computation in ciliates. In Evolution as Computation, L. F. Landweber and E. Winfree (eds.), Springer-Verlag, Berlin, Heidelberg, 2002.

Lesk, A. M., Introduction to Bioinformatics. Oxford University Press, Oxford, 2002.

Nanninga, N. (ed.), Molecular Cytology of Escherichia coli, Academic Press, London, 1985.

Paun, Gh., Rozenberg, G., and Salomaa, A., DNA Computing. Springer-Verlag, Berlin Heidelberg, 1998.

Pevzner, P. A., DNA physical mapping and alternating Eulerian cycles in colored graphs. Algorithmica 13 (1995), 77 - 105.

Pevzner, P. A., Computational Molecular Biology: An algorithmic approach. MIT Press, 2000.

Prescott, D. M., Cells: Principles of Molecular Structure and Function. Jones and Barlett Publishers, Boston, MA, USA, 1988.

Prescott, D. M., Cutting, splicing, reordering, and elimination of DNA sequences in Hypotrichous ciliates. BioEssays 14 (1992), 317 - 324.

Prescott, D. M., The unusual organization and processing of genomic DNA in Hypotrichous ciliates. Trends in Genet. 8 (1992), 439 - 445.

Prescott, D. M., The DNA of ciliated protozoa. Microbiol Rev. 58(2) (1994), 233 - 267.

Prescott, D. M., Genome gymnastics: unique modes of DNA evolution and processing in ciliates. Nat Rev Genet. 1(3) (2000), 191 - 198.

Prescott, D. M., Bostock, C. J., Murti, K. G., Lauth, M. R., Gamow, E., DNA in ciliated protozoa. I. Electron microscopy and sedimentation analysis of macronuclear and micronuclear DNA of Stylonychia mytilus. Chromosoma 34 (1971), 355 - 366.

Prescott, D. M., and DuBois, M., Internal eliminated segments (IESs) of Oxytrichidae. J. Eukariot. Microbiol. 43 (1996), 432 - 441.

Prescott, D. M., Ehrenfeucht, A., and Rozenberg, G., Molecular operations for DNA processing in hypotrichous ciliates. European Journal of Protistology 37 (2001), 241 - 260.

Prescott, D. M., Ehrenfeucht, A., and Rozenberg, G., Template-guided recombination for IES elimination and unscrambling of genes in stichotrichous ciliates. Technical Report 2002-01, LIACS, Leiden University, 2002.

Prescott, D. M., and Rozenberg, G., How ciliates manipulate their own DNA - A splendid example of natural computing. Natural Computing 1 (2002), 165 - 183.

Prescott, D. M., and Rozenberg, G., Encrypted genes and their reassembly in ciliates. In Cellular Computing, M.Amos (ed.), Oxford University Press, Oxford, 2003, to appear.

Smith, W. D., and Schweitzer, A., DNA computers in vitro and vivo. Technical report, NEC Research Institute (1995).

Tucker, A. C., A new applicable proof of the Euler circuit theorem. Amer. Math. Monthly 83 (1976), 638 - 640.

Weiss, R., Basu, S., Hooshangi, S., Kalmbach, A., Karig, D., Mehreja, R., and Netravali, I., Genetic circuit building blocks for cellular computation, communications and signal processing. Natural Computing 2 (2003), 47 - 84.

West, D. B., Introduction to Graph Theory. Prentice Hall, Upper Saddle River, NJ. 1996.


Part I Biological Background
1  An Overview of the Cell

     1.1  Cells
    1.2  Major components of eukaryotic cells
    1.3  Chromosome structure
    1.4  Chromosomes and genes
2  Ciliates
    2.1  Defining characteristics of ciliates
    2.2  Nuclear dualism
    2.3  Micronuclear versus macronuclear DNA
3  Molecular Operations for Gene Assembly
    3.1  Homologous recombination
    3.2  Three molecular operations

Part II Formal Modelling of Gene Assembly
4  Model Forming
    4.1  Formalizing genes
    4.2  Levels of abstraction
    4.3  Formalizing molecular operations
    4.4  Marriage of models
5  Mathematical Preliminaries
    5.1  Sets and functions
    5.2  Strings>
    5.3  Signed strings
    5.4  Circular strings
    5.5  Graphs
6  MDS Arrangements and MDS Descriptors
    6.1  MDS arrangements
    6.2  MDS descriptors
7  MDS Descriptor Pointer Reduction System
    7.1  Assembly operations on MDS descriptors
    7.2  The assembling power of the operations
8  Legal Strings
    8.1  Representation by legal strings
    8.2  Realizable legal strings
9  String Pointer Reduction System
    9.1  Assembly operations on strings
    9.2  Equivalence to descriptor pointer reduction system
        9.2.1  ld and snr
        9.2.2  hi and spr
        9.2.3  dlad and sdr
10  Overlap Graphs
    10.1  Overlap graphs of legal strings
    10.2  Realizable graphs
    10.3  The overlap equivalence problem
11  Graph Pointer Reduction System
    11.1  Assembly operations on graphs
    11.2  Equivalence to string pointer reduction system
        11.2.1  From sld to gld
        11.2.2  From shi to ghi
        11.2.3  From sdlad to gdlad
        11.2.4  Reverse implications

Part III Properties of Gene Assembly
12  Invariants
    12.1  MDS-IES descriptors
    12.2  Invariant theorem
13  Patterns of Subsets of Rules
    13.1  Small reductions
    13.2  Disjoint cycles
    13.3  Subsets of successful patterns
        13.3.1  sld
        13.3.2  sld and shi
        13.3.3  sld and sdlad
        13.3.4  shi
        13.3.5  sdlad
        13.3.6  shi and sdlad
    13.4  Complexity of reductions
14  Gene Assembly Through Cyclic Graph Decomposition
    14.1  Graphs with labels and colours
    14.2  Folding an MI-graph
    14.3  Unfolding paired MI-graphs
    14.4  Assembled MI-graphs of genomes
    14.5  Intracyclic unfolding

Part IV Epilogue
15  Intermolecular Model
    15.1  String rules
    15.2  The intermolecular model in terms of signed strings
    15.3  Invariants of the intermolecular model
16  Discussion
    16.1  Between biology and computer science
    16.2  Gene assembly strategies
    16.3  Scope of the operations
    16.4  Pointer alignment

To Homepage of Tero Harju