Biological Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
Format: PDF / Kindle (mobi) / ePub
Like a data-guzzling turbo engine, advanced data mining has been powering post-genome biological studies for two decades. Reflecting this growth, Biological Data Mining presents comprehensive data mining concepts, theories, and applications in current biological and medical research. Each chapter is written by a distinguished team of interdisciplinary data mining researchers who cover state-of-the-art biological topics.
The first section of the book discusses challenges and opportunities in analyzing and mining biological sequences and structures to gain insight into molecular functions. The second section addresses emerging computational challenges in interpreting high-throughput Omics data. The book then describes the relationships between data mining and related areas of computing, including knowledge representation, information retrieval, and data integration for structured and unstructured biological data. The last part explores emerging data mining opportunities for biomedical applications.
This volume examines the concepts, problems, progress, and trends in developing and applying new data mining techniques to the rapidly growing field of genome biology. By studying the concepts and case studies presented, readers will gain significant insight and develop practical solutions for similar biological data mining projects in the future.
revisited. J. Comput. Biol. 13:283–295.  Gorodkin, J., Stricklin, S.L., Stormo, G.D. 2001. Discovering common stem-loop motifs in unaligned RNA sequences. Nucleic Acids Res. 29:2135–2144.  Mathews, D.H., Turner, D.H. 2002. Dynalign: an algorithm for ﬁnding the secondary structure common to two RNA sequences. J. Mol. Biol. 317:191–203.  Holmes, I., Rubin, G.M. 2002. Pairwise RNA structure comparison with stochastic context-free grammars. In Proceedings of the Paciﬁc Symposium
surface analysis methods, namely, graph-based, geometric hashing, and methods using series expansion of 3D function. 5.5.1 Graph-based methods Graph theoretical approaches are frequently applied for protein surface comparison since some common protein surface representations, e.g., triangular mesh, can be naturally considered as a graph. In a graph representation of a protein surface, geometrical and often physicochemical features of a local Protein Surface Representation and Comparison 97
is a carbonyl group bonded to a hydroxyl group (OH), which can only appear at the end of a carbon chain because the carbon must make three bonds in addition to its connection to the R group. The R side chain distinguishes one amino acid from another and also confers the speciﬁc chemical properties of the amino acid. Twenty standard amino acids are incorporated into a protein based on the coded instructions and they are grouped into three classes (hydrophobic, polar, and charged) via the
for a variety of alphabets . They found protein blocks to be the best choice according to their ‘bits saved per position,’ a measure of how much prediction improvement there is for the alphabet over simply predicting the most frequent character. 7.3.5 Relative solvent accessibility prediction Solvent accessibility determines the degree to which a residue in a protein structure can interact with a solvent molecule. This is important, as it can ascertain the local shape of protein based on
showing three residues in the center using the ﬁner representation, and two residues ﬂanking the central residues on both sides using a coarser representation as an averaging statistic. Length of this vector equals 5 × 20. Predicting Local Structure and Function of Proteins 147 types of feature matrices per sequence. When multiple types of features are considered, the lth feature matrix is speciﬁed by F l . 7.4.3 Information encoding In order to encode information for a residue, ProSAT