Biological Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)

Biological Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)

Language: English

Pages: 733

ISBN: 1420086847

Format: PDF / Kindle (mobi) / ePub

Like a data-guzzling turbo engine, advanced data mining has been powering post-genome biological studies for two decades. Reflecting this growth, Biological Data Mining presents comprehensive data mining concepts, theories, and applications in current biological and medical research. Each chapter is written by a distinguished team of interdisciplinary data mining researchers who cover state-of-the-art biological topics.

The first section of the book discusses challenges and opportunities in analyzing and mining biological sequences and structures to gain insight into molecular functions. The second section addresses emerging computational challenges in interpreting high-throughput Omics data. The book then describes the relationships between data mining and related areas of computing, including knowledge representation, information retrieval, and data integration for structured and unstructured biological data. The last part explores emerging data mining opportunities for biomedical applications.

This volume examines the concepts, problems, progress, and trends in developing and applying new data mining techniques to the rapidly growing field of genome biology. By studying the concepts and case studies presented, readers will gain significant insight and develop practical solutions for similar biological data mining projects in the future.

Clustering-Based Support for Software Architecture Restructuring (Software Engineering Research)

Software Engineering 2: Specification of Systems and Languages (Texts in Theoretical Computer Science. An EATCS Series)

Software Engineering: A Methodical Approach

Genetic Programming Theory and Practice X (Genetic and Evolutionary Computation)

3D Rendering In Computer Graphics

Web Services, Service-Oriented Architectures, and Cloud Computing (2nd Edition) (The Savvy Manager's Guide)




















revisited. J. Comput. Biol. 13:283–295. [18] Gorodkin, J., Stricklin, S.L., Stormo, G.D. 2001. Discovering common stem-loop motifs in unaligned RNA sequences. Nucleic Acids Res. 29:2135–2144. [19] Mathews, D.H., Turner, D.H. 2002. Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol. 317:191–203. [20] Holmes, I., Rubin, G.M. 2002. Pairwise RNA structure comparison with stochastic context-free grammars. In Proceedings of the Pacific Symposium

surface analysis methods, namely, graph-based, geometric hashing, and methods using series expansion of 3D function. 5.5.1 Graph-based methods Graph theoretical approaches are frequently applied for protein surface comparison since some common protein surface representations, e.g., triangular mesh, can be naturally considered as a graph. In a graph representation of a protein surface, geometrical and often physicochemical features of a local Protein Surface Representation and Comparison 97

is a carbonyl group bonded to a hydroxyl group (OH), which can only appear at the end of a carbon chain because the carbon must make three bonds in addition to its connection to the R group. The R side chain distinguishes one amino acid from another and also confers the specific chemical properties of the amino acid. Twenty standard amino acids are incorporated into a protein based on the coded instructions and they are grouped into three classes (hydrophobic, polar, and charged) via the

for a variety of alphabets [15]. They found protein blocks to be the best choice according to their ‘bits saved per position,’ a measure of how much prediction improvement there is for the alphabet over simply predicting the most frequent character. 7.3.5 Relative solvent accessibility prediction Solvent accessibility determines the degree to which a residue in a protein structure can interact with a solvent molecule. This is important, as it can ascertain the local shape of protein based on

showing three residues in the center using the finer representation, and two residues flanking the central residues on both sides using a coarser representation as an averaging statistic. Length of this vector equals 5 × 20. Predicting Local Structure and Function of Proteins 147 types of feature matrices per sequence. When multiple types of features are considered, the lth feature matrix is specified by F l . 7.4.3 Information encoding In order to encode information for a residue, ProSAT

Download sample