Parallel Computing for Bioinformatics and Computational Biology: Models, Enabling Technologies, and Case Studies
Albert Y. Zomaya
Discover how to streamline complex bioinformatics applications with parallel computing
This publication enables readers to handle more complex bioinformatics applications and larger and richer data sets. As the editor clearly shows, using powerful parallel computing tools can lead to significant breakthroughs in deciphering genomes, understanding genetic disease, designing customized drug therapies, and understanding evolution.
A broad range of bioinformatics applications is covered with demonstrations on how each one can be parallelized to improve performance and gain faster rates of computation. Current parallel computing techniques and technologies are examined, including distributed computing and grid computing. Readers are provided with a mixture of algorithms, experiments, and simulations that provide not only qualitative but also quantitative insights into the dynamic field of bioinformatics.
Parallel Computing for Bioinformatics and Computational Biology is a contributed work that serves as a repository of case studies, collectively demonstrating how parallel computing streamlines difficult problems in bioinformatics and produces better results. Each of the chapters is authored by an established expert in the field and carefully edited to ensure a consistent approach and high standard throughout the publication.
The work is organized into five parts:
- Algorithms and models
- Sequence analysis and microarrays
- Protein folding
- Platforms and enabling technologies
Researchers, educators, and students in the field of bioinformatics will discover how high-performance computing can enable them to handle more complex data sets, gain deeper insights, and make new discoveries.
Table of Contents
PART I: ALGORITHMS AND MODELS.
1 Parallel and Evolutionary Approaches to Computational Biology (Nouhad J. Rizk).
1.3 Evolutionary Computation Applied to Computational Biology.
2 Parallel Monte Carlo Simulation of HIV Molecular Evolution in Response to Immune Surveillance (Jack da Silva).
2.2 The Problem.
2.3 The Model.
2.4 Parallelization with MPI.
2.5 Parallel Random Number Generation.
2.6 Preliminary Simulation Results.
2.7 Future Directions.
3 Differential Evolutionary Algorithms for In Vivo Dynamic Analysis of Glycolysis and Pentose Phosphate Pathway in Escherichia coli (Christophe Chassagnole).
3.2 Mathematical Model.
3.3 Estimation of the Parameters of the Model.
3.4 Kinetic Parameter Estimation by DE.
3.5 Simulation and Results.
3.6 Stability Analysis.
3.7 Control Characteristic.
4 Compute-Intensive Simulations for Cellular Models (K. Burrage).
4.2 Simulation Methods for Stochastic Chemical Kinetics.
4.3 Aspects of Biology— Genetic Regulation.
4.4 Parallel Computing for Biological Systems.
4.5 Parallel Simulations.
4.6 Spatial Modeling of Cellular Systems.
4.7 Modeling Colonies of Cells.
5 Parallel Computation in Simulating Diffusion and Deformation in Human Brain (Ning KangI0.
5.2 Anisotropic Diffusion Simulation in White Matter Tractography.
5.3 Brain Deformation Simulation in Image-Guided Neurosurgery.
PART II: SEQUENCE ANALYSIS AND MICROARRAYS.
6 Computational Molecular Biology (Azzedine Boukerche).
6.2 Basic Concepts in Molecular Biology.
6.3 Global and Local Biological Sequence Alignment.
6.4 Heuristic Approaches for Biological Sequence Comparison.
6.5 Parallel and Distributed Sequence Comparison.
7 Special-Purpose Computing for Biological Sequence Analysis (Bertil Schmidt).
7.2 Hybrid Parallel Computer.
7.3 Dynamic Programming Communication Pattern.
7.4 Performance Evaluation.
7.5 FutureWork and Open Problems.
8 Multiple Sequence Alignment in Parallel on a Cluster ofWorkstations (Amitava Datta).
9 Searching Sequence Databases Using High-Performance BLASTs (Xue Wu).
9.2 Basic Blast Algorithm.
9.3 Blast Usage and Performance Factors.
9.4 High Performance BLASTs.
9.5 Comparing BLAST Performance.
9.7 Future Directions.
10 Parallel Implementations of Local Sequence Alignment: Hardware and Software (Vipin Chaudhary).
10.2 Sequence Alignment Primer.
10.3 Smith–Waterman Algorithm.
10.6 HMMER — Hidden Markov Models.
10.8 Specialized Hardware: FPGA.
11 Parallel Computing in the Analysis of Gene Expression Relationships (Robert L. Martino).
11.1 Significance of Gene Expression Analysis.
11.2 Multivariate Gene Expression Relations.
11.3 Classification Based on Gene Expression.
11.4 Discussion and Future Directions.
12 Assembling DNA Fragments with a Distributed Genetic Algorithm (Gabriel Luque).
12.2 DNA Fragment Assembly Problem.
12.3 DNA Fragment Assembly Using the Sequential GA.
12.4 DNA Fragment Assembly Problem Using the Parallel GA.
12.5 Experimental Results.
13 A Cooperative Genetic Algorithm for Knowledge Discovery in Microarray Experiments (Mohammed Khabzaoui).
13.2 Microarray Experiments.
13.3 Association Rules.
13.4 Multi-Objective Genetic Algorithm.
13.5 Cooperative Multi-Objective Genetic Algorithm (PMGA).
PART III: PHYLOGENETICS.
14 Parallel and Distributed Computation of Large Phylogenetic Trees (Alexandros Stamatakis).
14.2 Maximum Likelihood.
14.3 State-of-the-Art ML Programs.
14.4 Algorithmic Solutions in RAxML-III.
14.5 HPC Solutions in RAxML-III.
14.6 Future Developments.
15 Phylogenetic Parameter Estimation on COWs (Ekkehard Petzold).
15.2 Phylogenetic Tree Reconstruction using Quartet Puzzling.
15.3 Hardware, Data, and Scheduling Algorithms.
15.4 Parallelizing PEst.
15.5 Extending Parallel Coverage in PEst.
16 High-Performance Phylogeny Reconstruction Under Maximum Parsimony (Tiffani L. Williams).
16.2 Maximum Parsimony.
16.3 Exact MP: Parallel Branch and Bound.
16.4 MP Heuristics: Disk-Covering Methods.
16.5 Summary and Open Problems.
PART IV: PROTEIN FOLDING.
17 Protein Folding with the Parallel Replica Exchange Molecular Dynamics Method (Ruhong Zhou).
17.2 REMD Method.
17.3 Protein Folding with REMD.
17.4 Protein Structure Refinement with REMD.
18 High-Performance Alignment Methods for Protein Threading (R. Andonov).
18.2 Formal Definition.
18.3 Mixed Integer Programming Models.
18.4 Divide-and-Conquer Technique.
18.6 Future Research Directions.
19 Parallel Evolutionary Computations in Discerning Protein Structures (Richard O. Day).
19.2 PSP Problem.
19.3 Protein Structure Discerning Methods.
19.4 PSP Energy Minimization EAs.
19.5 PSP Parallel EA Performance Evaluation.
19.6 Results and Discussion.
19.7 Conclusions and Suggested Research.
PART V: PLATFORMS AND ENABLING TECHNOLOGIES.
20 A Brief Overview of Grid Activities for Bioinformatics and Health Applications (Ali Al Mazari).
20.2 Grid Computing.
20.3 Bioinformatics and Health Applications.
20.4 Grid Computing for Bioinformatics and Health Applications.
20.5 Grid Activities in Europe.
20.6 Grid Activities in the United Kingdom.
20.7 Grid Activities in the USA.
20.8 Grid Activities in Asia and Japan.
20.9 International Grid Collaborations.
20.10 International Grid Collaborations.
20.11 Conclusions and Future Trends.
21 Parallel Algorithms for Bioinformatics (Shahid H. Bokhari).
21.2 Parallel Computer Architecture.
21.3 Bioinformatics Algorithms on the Cray MTA System.
22 Cluster and Grid Infrastructure for Computational Chemistry and Biochemistry (Kim K. Baldridge).
22.2 GAMESS Execution on Clusters.
22.3 Portal Technology.
22.4 Running GAMESS with Nimrod Grid-Enabling Infrastructure.
22.5 Computational ChemistryWorkflow Environments.
23 DistributedWorkflows in Bioinformatics (Arun Krishnan).
23.2 Challenges of Grid Computing.
23.3 Grid Applications.
23.4 Grid Programming.
23.5 Grid Execution Language.
23.6 GUI-BasedWorkflow Construction and Execution.
23.7 Case Studies.
24 Molecular Structure Determination on a Computational and Data Grid (Russ Miller).
24.2 Molecular Structure Determination.
24.3 Grid Computing in Buffalo.
24.4 Center for Computational Research.
24.5 ACDC-Grid Overview.
24.6 Grid Research Collaborations.
24.7 Grid Research Advancements.
24.8 Grid Research Application Abstractions and Tools.
25 GIPSY: A Problem-Solving Environment for Bioinformatics Applications (Rajendra R. Joshi).
25.3 Currently Deployed Applications.
26 TaskSpaces: A Software Framework for Parallel Bioinformatics on Computational Grids (Hans De Sterck).
26.2 The TaskSpaces Framework.
26.3 Application: Finding Correctly Folded RNA Motifs.
26.4 Case Study: Operating the Framework on a Computational Grid.
26.5 Results for the RNA Motif Problem.
26.7 Summary and Conclusion.
27 The Organic Grid: Self-Organizing Computational Biology on Desktop Grids (Arjav J. Chakravarti).
27.2 Background and RelatedWork.
27.5 Future Directions.
28 FPGA Computing in Modern Bioinformatics (H. Simmler).
28.1 Parallel Processing Models.
28.2 Image Processing Task.
28.3 FPGA Hardware Accelerators.
28.4 Image Processing Example.
28.5 Case Study: Protein Structure Prediction.
29 Virtual Microscopy: Distributed Image Storage, Retrieval, Analysis, and Visualization (T. Pan).
29.3 Image Analysis.
29.4 Clinical Use.
29.6 Future Directions.