Ballgown bridges the gap between transcriptome assembly and expression analysis

AC Frazee, G Pertea, AE Jaffe, B Langmead… - Nature …, 2015 - nature.com
Nature biotechnology, 2015nature.com
To the Editor: Analysis of raw reads from RNA sequencing (RNA-seq) makes it possible to
reconstruct complete gene structures, including multiple splice variants, without relying on
previously established annotations1–3. Downstream statistical modeling of summarized
gene or transcript expression data output from these pipelines is facilitated by the
Bioconductor project, which provides open-source tools for analysis of high-throughput
genomics data4. However, the outputs of upstream processing tools often are aggregated …
To the Editor: Analysis of raw reads from RNA sequencing (RNA-seq) makes it possible to reconstruct complete gene structures, including multiple splice variants, without relying on previously established annotations1–3. Downstream statistical modeling of summarized gene or transcript expression data output from these pipelines is facilitated by the Bioconductor project, which provides open-source tools for analysis of high-throughput genomics data4. However, the outputs of upstream processing tools often are aggregated across samples or are not in a format that is readily compatible with downstream Bioconductor packages. This gap has slowed rigorous statistical analysis of expression quantitative trait locus (eQTL), time-course, continuous covariates or of confounded experimental designs at the transcript level and has led to considerable controversy in the analysis of populationlevel RNA-seq data5. In this Correspondence, we report the development of two pieces of software, Tablemaker and Ballgown, that bridge the gap between transcriptome assembly and fast, flexible differential expression analysis (Supplementary Fig. 1).
Tablemaker uses a GTF file (the standard output from any transcriptome assembler) and spliced read alignments to produce files that explicitly specify the structure of assembled transcripts, mappings from exons and splice junctions to transcripts, and several measures of feature expression, including fragments per kilobase of transcript per million reads sequenced (FPKM) and average per-base coverage
nature.com