Computational biology is at the interface of computer science and biology. It involves the development and application of data analytics and theoretical methods, mathematical modelling and computational simulation techniques to the study of biological, behavioural, and social systems [Huerta:2000]. Computational biology is the application of core technologies of computer science (e.g. algorithms, databases, artificial intelligence, etc.) to the problems arising from biology. Computational biology builds theoretical models of biological systems, just as mathematical biology does with mathematical models.
Molecular biology is now a high-performance and high-throughput science. One now deals with tera- and petabyte amounts of data. This huge amount of data also takes a lot of computing time. Typically it requires computer clusters.
With the large-scale generation and integration of various omics such as genomics, transcriptomics, proteomics, metabolomics, and interactomics, complex biological systems (which are just machines) and processes could be reverse engineered from data.
Computational biology is particularly exciting today because the problems are large enough to motivate efficient algorithms and the demand of biology on computational science is increasing. The problems are also accessible and are expected to yield medical advances. Overall biology, and molecular biology especially, is increasingly becoming an information science.
Developments in biology are coming astonishingly quickly, generating remarkable possibilities. Computational biology is increasingly of interest in both life science and computational science departments. Many solutions to difficult problems go from biology to computer science: e.g. fragment assembly, sequence analysis, algorithms for phylogenetic trees, evolutionary algorithms and neural networks. In reverse many possible solutions to difficult problems go from computer science to biology: e.g. sequencing by hybridization, DNA computing, etc.
The goal of computational and systems biology is to apply large-scale numerical methods to the study of molecular, cellular and structural biology. The release of the human genome sequence has focused attention on the increasing importance of computational and systems biology for the analysis of gene function. However, only a small fraction of the information generated in modern biology laboratories has been subjected to systematic computational analysis. Thus, the future of systems biology lies not only in improved methods to study sequence information but also in the development of entirely new approaches to the numerical analysis of proteins, cells and organisms.
The cost for DNA sequencing is decreasing at an exponentially pace. The development of ever new techniques of sequencing that power this trend have been termed Next Generation Sequencing. Usually reference genomes are assembled from millions of short reads or nowadays even single molecule sequencing via nanopores (pores of nanometer size) is possible [Maitra+Al:2012]. Genome-wide association study (GWAS) analysis can identify human variants associated with disease. RNA-seq reveals both RNA expression levels and isoforms (transcript variants, i.e. splice variants). Chromatin immunoprecipitation followed by sequencing, or ChIP-seq for short, reveals where key genomic regulators bind to the genome. Quantitative Trait Loci (QTLs) can predict phenotypes. Chromatin accessibility changes can reveal genome functional elements.
One in over 1000 base pairs is different from individual to individual. A fundamental question is which location in the genome has variants that are highly associated to a specific trait or disease and how is that difference causing certain phenotypes?
Computational biology develops large scale computable models for biological research. It uses large scale data from different omics and bioinformatics in order to conceptualize the raw data.