Statistics and Data Analysis for Microarrays Using R and Bioconductor, Second Edition

Sharing & Social Bookmarking:

Question about this product?

Series: Chapman & Hall/CRC Mathematical & Computational Biology.

Richly illustrated in color, Statistics and Data Analysis for Microarrays Using R and Bioconductor, Second Edition provides a clear and rigorous description of powerful analysis techniques and algorithms for mining and interpreting biological information. Omitting tedious details, heavy formalisms, and cryptic notations, the text takes a hands-on, example-based approach that teaches students the basics of R and microarray technology as well as how to choose and apply the proper data analysis tool to specific problems.

New to the Second Edition

Completely updated and double the size of its predecessor, this timely second edition replaces the commercial software with the open source R and Bioconductor environments. Fourteen new chapters cover such topics as the basic mechanisms of the cell, reliability and reproducibility issues in DNA microarrays, basic statistics and linear models in R, experiment design, multiple comparisons, quality control, data pre-processing and normalization, Gene Ontology analysis, pathway analysis, and machine learning techniques. Methods are illustrated with toy examples and real data and the R code for all routines is available on an accompanying CD-ROM.

With all the necessary prerequisites included, this best-selling book guides students from very basic notions to advanced analysis techniques in R and Bioconductor. The first half of the text presents an overview of microarrays and the statistical elements that form the building blocks of any data analysis. The second half introduces the techniques most commonly used in the analysis of microarray data.

Table of Contents

Introduction

Bioinformatics — An Emerging Discipline

The Cell and Its Basic Mechanisms

The Cell

The Building Blocks of Genomic Information

Expression of Genetic Information

The Need for High-Throughput Methods

Microarrays

Microarrays — Tools for Gene Expression Analysis

Fabrication of Microarrays

Applications of Microarrays

Challenges in Using Microarrays in Gene Expression Studies

Sources of Variability

Reliability and Reproducibility Issues in DNA Microarray Measurements

Introduction

What Is Expected from Microarrays?

Basic Considerations of Microarray Measurements

Sensitivity

Accuracy

Reproducibility

Cross Platform Consistency

Sources of Inaccuracy and Inconsistencies in Microarray Measurements

The MicroArray Quality Control (MAQC) Project

Image Processing

Introduction

Basic Elements of Digital Imaging

Microarray Image Processing

Image Processing of cDNA Microarrays

Image Processing of Affymetrix Arrays

Introduction to R

Introduction to R

The Basic Concepts

Data Structures and Functions

Other Capabilities

The R Environment

Installing Bioconductor

Graphics

Control Structures in R

Programming in R vs C/C++/Java

Bioconductor: Principles and Illustrations

Overview

The Portal

Some Explorations and Analyses

Elements of Statistics

Introduction

Some Basic Concepts

Elementary Statistics

Degrees of Freedom

Probabilities

Bayes’ Theorem

Testing for (or Predicting) a Disease

Probability Distributions

Probability Distributions

Central Limit Theorem

Are Replicates Useful?

Basic Statistics in R

Introduction

Descriptive Statistics in R

Probabilities and Distributions in R

Central Limit Theorem

Statistical Hypothesis Testing

Introduction

The Framework

Hypothesis Testing and Significance

"I Do Not Believe God Does Not Exist"

An Algorithm for Hypothesis Testing

Errors in Hypothesis Testing

Classical Approaches to Data Analysis

Introduction

Tests Involving a Single Sample

Tests Involving Two Samples

Analysis of Variance (ANOVA)

Introduction

One-Way ANOVA

Two-Way ANOVA

Quality Control

Linear Models in R

Introduction and Model Formulation

Fitting Linear Models in R

Extracting Information from a Fitted Model: Testing Hypotheses and Making Predictions Some Limitations of the Linear Models

Dealing with Multiple Predictors and Interactions in the Linear Models, and Interpreting Model Coefficients

Experiment Design

The Concept of Experiment Design

Comparing Varieties

Improving the Production Process

Principles of Experimental Design

Guidelines for Experimental Design

A Short Synthesis of Statistical Experiment Designs

Some Microarray Specific Experiment Designs

Multiple Comparisons

Introduction

The Problem of Multiple Comparisons

A More Precise Argument

Corrections for Multiple Comparisons

Corrections for Multiple Comparisons in R

Analysis and Visualization Tools

Introduction

Box Plots

Gene Pies

Scatter Plots

Volcano Plots

Histograms

Time Series

Time Series Plots in R

Principal Component Analysis (PCA)

Independent Component Analysis (ICA)

Cluster Analysis

Introduction

Distance Metric

Clustering Algorithms

Partitioning around Medoids (PAM)

Biclustering

Clustering in R

Quality Control

Introduction

Quality Control for Affymetrix Data

Quality Control of Illumina Data

Data Pre-Processing and Normalization

Introduction

General Pre-Processing Techniques

Normalization Issues Specific to cDNA Data

Normalization Issues Specific to Affymetrix Data

Other Approaches to the Normalization of Affymetrix Data

Useful Pre-Processing and Normalization Sequences

Normalization Procedures in R

Batch Pre-Processing

Normalization Functions and Procedures for Illumina Data

Methods for Selecting Differentially Regulated Genes

Introduction

Criteria

Fold Change

Unusual Ratio

Hypothesis Testing, Corrections for Multiple Comparisons, and Resampling

ANOVA

Noise Sampling

Model-Based Maximum Likelihood Estimation Methods

Affymetrix Comparison Calls

Significance Analysis of Microarrays (SAM)

A Moderated t-Statistic

Other Methods

Reproducibility

Selecting Differentially Expressed (DE) Genes in R

The Gene Ontology (GO)

Introduction

The Need for an Ontology

What Is the Gene Ontology (GO)?

What Does GO Contain?

Access to GO

Other Related Resources

Functional Analysis and Biological Interpretation of Microarray Data

Over-Representation Analysis (ORA)

Onto-Express

Functional Class Scoring

The Gene Set Enrichment Analysis (GSEA)

Uses, Misuses, and Abuses in GO Profiling

Introduction

"Known Unknowns"

Which Way Is Up?

Negative Annotations

Common Mistakes in Functional Profiling

Using a Custom Level of Abstraction through the GO Hierarchy

Correlation between GO Terms

GO Slims and Subsets

A Comparison of Several Tools for Ontological Analysis

Introduction

Existing tools for Ontological Analysis

Comparison of Existing Functional Profiling Tools

Drawbacks and Limitations of the Current Approach

Focused Microarrays — Comparison and Selection

Introduction

Criteria for Array Selection

Onto-Compare

Some Comparisons

ID Mapping Issues

Introduction

Name Space Issues in Annotation Databases

A Comparison of Some ID Mapping Tools

Pathway Analysis

Terms and Problem Definition

Over-Representation and Functional Class Scoring Approaches in Pathway Analysis

An Approach for the Analysis of Metabolic Pathways

An Impact Analysis of Signaling Pathways

Variations on the Impact Analysis Theme

Pathway Guide

Kinetic models vs. Impact Analysis

Conclusions

Data Sets and Software Availability

Machine Learning Techniques

Introduction

Main Concepts and Definitions

Supervised Learning

Practicalities Using R

The Road Ahead

What Next?

References

A Summary appears at the end of each chapter.

Reviews

Praise for the First Edition

The book by Draghici is an excellent choice to be used as a textbook for a graduate-level bioinformatics course. This well-written book with two accompanying CD-ROMs will create much-needed enthusiasm among statisticians.

Journal of Statistical Computation and Simulation, Vol. 74

I really like Draghici's book. As the author explains in the Preface, the book is intended to serve both the statistician who knows very little about DNA microarrays and the biologist who has no expertise in data analysis. The author lays out a study plan for the statistician that excludes 5 of the 17 chapters (4-8). These chapters present the basics of statistical distributions, estimation, hypothesis testing, ANOVA, and experimental design. What that leaves for the statistician is the three-chapter primer on microarrays and image processing, plus all of the data analysis tools specific to the microarray situation. … it includes two CDs with trial versions of several specialised software packages. Anyone who uses microarray data should certainly own a copy.

Technometrics, Vol. 47, No. 1, February 2005

Author/Editor Biography

Sorin Draghici the Robert J. Sokol MD Endowed Chair in Systems Biology in the Department of Obstetrics and Gynecology, professor in the Department of Clinical and Translational Science and Department of Computer Science, and head of the Intelligent Systems and Bioinformatics Laboratory at Wayne State University. He is also the chief of the Bioinformatics and Data Analysis Section in the Perinatology Research Branch of the National Institute for Child Health and Development. A senior member of IEEE, Dr. Draghici is an editor of IEEE/ACM Transactions on Computational Biology and Bioinformatics, Journal of Biomedicine and Biotechnology, and International Journal of Functional Informatics and Personalized Medicine. He earned a Ph.D. in computer science from the University of St. Andrews.

Customers who bought Statistics and Data Analysis for Microarrays Using R and Bioconductor, Second Edition also bought:

  • Image Coming Soon

    Sparse Modeling

    Theory, Algorithms, and Applications

  • Image Coming Soon

    An Introduction to Statistics and Data Analysis for Bioinformatics using R

  • Multivariate Survival Analysis and Competing Risks

    Multivariate Survival Analysis and Competing Risks

  • Flexible Imputation of Missing Data

    Flexible Imputation of Missing Data

  • Statistical Computing in C++ and R

    Statistical Computing in C++ and R