R Bioinformatics Cookbook Use R and Bioconductor to perform RNAseq, genomics, data visualization, and bioinformatic analysis

Dan MacLean



Key Features

  • Apply modern R packages to handle biological data using real-world examples
  • Represent biological data with advanced visualizations suitable for research and publications
  • Handle real-world problems in bioinformatics such as next-generation sequencing, metagenomics, and automating analyses

Book Description

Handling biological data effectively requires an in-depth knowledge of machine learning techniques and computational skills, along with an understanding of how to use tools such as edgeR and DESeq. With the R Bioinformatics Cookbook, you'll explore all this and more, tackling common and not-so-common challenges in the bioinformatics domain using real-world examples.

This book will use a recipe-based approach to show you how to perform practical research and analysis in computational biology with R. You will learn how to effectively analyze your data with the latest tools in Bioconductor, ggplot, and tidyverse. The book will guide you through the essential tools in Bioconductor to help you understand and carry out protocols in RNAseq, phylogenetics, genomics, and sequence analysis. As you progress, you will get up to speed with how machine learning techniques can be used in the bioinformatics domain. You will gradually develop key computational skills such as creating reusable workflows in R Markdown and packages for code reuse.

By the end of this book, you'll have gained a solid understanding of the most important and widely used techniques in bioinformatic analysis and the tools you need to work with real biological data.

What you will learn

  • Employ Bioconductor to determine differential expressions in RNAseq data
  • Run SAMtools and develop pipelines to find single nucleotide polymorphisms (SNPs) and Indels
  • Use ggplot to create and annotate a range of visualizations
  • Query external databases with Ensembl to find functional genomics information
  • Execute large-scale multiple sequence alignment with DECIPHER to perform comparative genomics
  • Use d3.js and Plotly to create dynamic and interactive web graphics
  • Use k-nearest neighbors, support vector machines and random forests to find groups and classify data

Who this book is for

This book is for bioinformaticians, data analysts, researchers, and R developers who want to address intermediate-to-advanced biological and bioinformatics problems by learning through a recipe-based approach. Working knowledge of R programming language and basic knowledge of bioinformatics are prerequisites.



  • 運用現代 R 套件處理生物資料,並以實際案例進行示範

  • 使用適合研究和出版的高級視覺化方式呈現生物資料

  • 處理生物資訊領域中的實際問題,如次世代定序、宏基因組學和自動化分析


有效處理生物資料需要深入了解機器學習技術和計算能力,以及如何使用工具如 edgeR 和 DESeq。在《R 生物資訊學食譜》中,您將探索這些內容,並使用實際案例解決生物資訊領域中常見和不太常見的挑戰。

本書將以食譜式的方式向您展示如何使用 R 在計算生物學中進行實際研究和分析。您將學習如何使用 Bioconductor、ggplot 和 tidyverse 中的最新工具有效地分析資料。本書將引導您了解 Bioconductor 中的基本工具,以幫助您理解並執行 RNAseq、親緣關係、基因組學和序列分析等協議。隨著學習的進展,您將了解機器學習技術如何應用於生物資訊領域。您將逐漸培養關鍵的計算能力,如在 R Markdown 中創建可重複使用的工作流程和代碼重用的套件。



  • 使用 Bioconductor 確定 RNAseq 資料中的差異表現

  • 運行 SAMtools 並開發流程以尋找單核苷酸多態性 (SNP) 和插入/刪除 (Indels)

  • 使用 ggplot 創建和註釋各種視覺化圖表

  • 使用 Ensembl 查詢外部資料庫以尋找功能基因組學資訊

  • 使用 DECIPHER 執行大規模多序列比對以進行比較基因組學

  • 使用 d3.js 和 Plotly 創建動態和互動式網頁圖形

  • 使用 k-nearest neighbors、支持向量機和隨機森林尋找群組並對資料進行分類


本書適合生物資訊學家、資料分析師、研究人員和 R 開發人員,他們希望通過食譜式的學習方法解決中高級生物學和生物資訊學問題。需要具備 R 程式語言的工作知識和基本的生物資訊學知識。


Dan MacLean has a Ph.D. in molecular biology from the University of Cambridge and gained postdoctoral experience in genomics and bioinformatics at Stanford University in California. Dan is now a Honorary Professor in the School of Computing Sciences at the University of East Anglia. He has worked in bioinformatics and plant pathogenomics, specializing in R and Bioconductor and developing analytical workflows in bioinformatics, genomics, genetics, image analysis, and proteomics at The Sainsbury Laboratory since 2006. Dan has developed and published software packages in R, Ruby, and Python with over 100,000 downloads combined.


Dan MacLean在劍橋大學獲得分子生物學博士學位,並在加州斯坦福大學進行基因組學和生物信息學的博士後經驗。Dan現在是東安格利亞大學計算科學學院的名譽教授。自2006年以來,他一直在The Sainsbury Laboratory從事生物信息學和植物病原基因組學的工作,專注於R和Bioconductor的應用,並開發生物信息學、基因組學、遺傳學、圖像分析和蛋白質組學的分析工作流程。Dan已經開發並發布了R、Ruby和Python的軟件包,總下載量超過10萬次。


  1. Performing Quantitative RNAseq
  2. Finding Genetic Variants With Next-Generation Sequence Data
  3. Analyzing Gene and Protein Sequence For Domains and Motifs
  4. Phylogenetic Analysis and Visualisation
  5. Metagenomics
  6. Proteomics from Spectrum to Annotation
  7. Producing Publication and Web-Ready Visualizations
  8. Working with Databases and Remote Data Sources
  9. Useful Statistical and Machine Learning Methods in Bioinformatics
  10. Programming and Analysis with Tidyverse
  11. Building reusable workflows with packages and objects for code re-use


- 進行定量 RNAseq
- 利用下一代序列數據尋找基因變異
- 分析基因和蛋白質序列以尋找區域和模式
- 系統發生學分析和可視化
- 微生物基因組學
- 從光譜到註釋的蛋白質組學
- 生成出版物和網頁可用的視覺化圖表
- 與數據庫和遠程數據源一起工作
- 生物信息學中有用的統計和機器學習方法
- 使用 Tidyverse 進行編程和分析
- 使用套件和對象構建可重用的工作流程以實現代碼重用