Preface
Natural Computing is a research area concerned with computing
taking place in nature and with human-designed computing inspired
by nature. It is a fast growing, genuinely interdisciplinary field
involving, among others, biology and computer science.
The contribution of Natural Computing to computer science is quite
significant, and it comes in the period when computer science is
undergoing an important transformation that combines knowledge
about human-design computing (as going on in computer science)
with knowledge about computing observed in nature. Several areas
of natural computing, such as evolutionary algorithms (see
Ghosh and Tsutsui [23]), neural networks (see Haykin
[27]), quantum computing (see Hirvensalo
[29]), and DNA computing (see Paun,
Rozenberg, and Salomaa [41], Jonoska and Seeman
[30]), are flourishing in computer science.
Characteristic for these areas is the use of paradigms underlying
natural systems. Thus, e.g., evolutionary algorithms use the
concepts of mutation, recombination, and natural selection from
the theory of evolution, while neural networks are being inspired
by the highly interconnected neural structures in the brain and
nervous system.
DNA computing is based on paradigms from molecular biology;
researchers in DNA computing study the use of DNA (and other)
molecules for the purposes of computing. Research in DNA computing
can be roughly divided into two (not disjoint) streams: DNA
computing in vitro and DNA computing in vivo. The
former is concerned with the theoretical foundations and
experimental work on building DNA based computers in test tubes.
The latter is concerned with constructing computational components
in living cells (such as simple switching circuits, see [57]),
and with studying computational
processes taking place in living cells. In recent years, some of
the life processes going on in ciliates have attracted attention
of researchers in DNA computing community.
Ciliates (ciliated protozoa) are single-celled eukaryotic
organisms (see [47]). It is an ancient group of
organisms which originated around two billion years ago, and it is
a very diverse group - some 8,000 different species are currently
known. Two characteristics hold ciliates together as a single
group: the possession of hair-like cilia used for motility and
food capture, and the presence of two kinds of functionally
different nuclei in the same cell - a micronucleus and a
macronucleus.
The macronucleus is the ''household nucleus'' that provides RNA
transcripts for producing proteins, while the micronucleus is a
dormant nucleus, where no production of RNA transcripts is
attempted at all. The micronucleus is activated only in the
process of sexual reproduction, where at some stage (the genome
of) the micronucleus gets transformed into (the genome of) the
macronucleus in the process called gene assembly - it is the most
involved DNA processing known in living organisms. Gene assembly
is so involved because the form of the micronuclear genome is
drastically different from the form of the macronuclear genome
(see [48], [54]).
The computational nature of gene assembly in ciliates was brought
to the attention of the DNA computing community in a series of
papers by L. Kari and L. F. Landweber (see [37],
[38]), where the authors notice that the process of
assembling a macronuclear gene from its micronuclear form
resembles the structure of the solution of the so called directed
hamiltonian path problem proposed by L. Adleman in his seminal
paper [1] that invigorated DNA computing research.
(See also [55] for an even earlier hint on the
computational nature of gene assembly.) Since then research on the
computational nature of gene assembly in ciliates has developed
rapidly, and it has involved both biologists and computer
scientists. One line of this research has followed the original
view of Kari and Landweber, and it has focussed on the
computational power (in the sense of computability theory) of
their intermolecular model. The other line of this research,
carried out by the authors of this book and based on an
intramolecular model, has focussed on the gene assembly itself,
including topics such as the possible forms of the genes generated
during gene assembly and possible strategies for the gene
assembly. This book centers on the phenomena of gene assembly.
DNA computing represents one side of cooperation/interaction
between computer scientists and biologists: molecular biology
assisting computer scientists to achieve a really bold goal of
providing an alternative or a complement for silicon based
computers. On the other side of this cooperation/interaction, in
bioinformatics (see Lesk [39]) and in computational
molecular biology (see Pevzner [43]), computer
scientists and mathematicians assist biologists in understanding
the structure and function of biomolecules, such as DNA and
proteins, in living cells. The research presented in this book
lies at the intersection of all three areas: DNA computing,
bioinformatics, and computational biology. But most naturally it
belongs to natural computing because it is deeply concerned with
the computational nature of complex biological phenomena.
This book is organized as follows.
Part I
of the book gives the biological background, and it
consists of three chapters. Chap. 1 provides an
overview of the structures common to cells and the molecular
principles on which cells operate. Chap. 2 describes
the features of ciliates that make them uniquely useful for the
study of natural computing. In Chap. 3 we postulate
three molecular operations, ld, hi, and dlad,
that accomplish gene assembly in ciliates.
Part II
introduces formal models for studying gene assembly.
Chap. 4 describes the process of model forming that
leads to the formulation of three models: MDS descriptors, legal
strings, and overlap graphs. In particular, it explains how
abstracting from more details of gene structure leads to these
three models (in this increasing level of abstraction). This
chapter is an informal introduction to the formal framework of
this book - it lays a foundation for a biologist to acquire
intuitive insights and understanding about the more formal
chapters of Part II. Chap. 5 introduces basic
mathematical notions and notations needed in this book.
Formalization of gene assembly on the MDS descriptors level is
presented in Chaps. 6 and 7, on the level
of legal strings in Chaps. 8 and 9,
and on the level of overlap graphs in Chaps. 10 and 11.
Part III
gives three examples of research topics concerned with
gene assembly. In Chap. 12 we consider properties of gene
assembly that are independent of the choice of gene assembly
strategy. Since at present we do not know which strategies are
actually used by ciliates, studying properties that are common to
all strategies is of course important. In Chap. 13
we analyze the influence of molecular operations on the form of
the genes that they assemble. In particular, we give formal
characterizations of the forms of genes that can be assembled by
each subset of the set of the three molecular operations ld,
hi, and dlad. In Chap. 14 we use graph
theory for formulating yet another point of view on gene assembly.
We view it here as a process of dynamically changing decomposition
of a graph representing a gene. One can view this chapter as a
structural graph theoretic formulation of the novel paradigm
''computing by folding and recombination'' that underlies a big
part of research on computational aspects of gene assembly.
Finally, Part IV
is an epilogue for this book. Chap. 15
demonstrates how to formulate an intermolecular model of gene
assembly using the ''pointer approach'' of this book. In this way
we formulate one possible bridge to the original intermolecular
model of Kari and Landweber. Chap. 16 provides a
perspective on the research presented in this book, and in
particular it outlines a number of possible future lines of
research.