04 Library Preparation and QC

Why should I avoid technical replicates and pseudoreplicates?

When designing RNA-seq or ChIP-seq experiments, it is very important to avoid technical replicates and pseudo-biological replicates as they will lead to spurious results (e.g. spurious differential gene expression data; DGE data in case of RNA-seq). Creating pseudo-biological replicates occurs frequently, especially for in vitro studies. Doing so can often lead to hundreds of false positive differentially expressed genes.

How do I amplify Illumina sequencing libraries?

In case the library preparation did not generate sufficient library material required to load a sequencer, Illumina libraries can be amplified with a universal PCR protocol. While the amplification can rescue experiments, it is worth considering on a per-project basis if perhaps the library preparation should be repeated instead. For quantitative experiments, it is generally recommended to treat all libraries the same throughout the pipeline. Insufficient library yields in the initial library preparation could be signs of sample contamination, processing errors, etc.

How do I remove long fragments from a library?

Ampure XP/SPRI bead "upper cut" protocol to remove double-stranded DNA fragments over 670 bases: Bead-based size selections are the preferred method as they can be applied in a high-throughput fashion to your samples.  Please note:
  • Bead-based size selection cannot carry out precise "cuts";  Thus, you will also lose some of the library molecules in the size ranges that you intend to keep.

Suboptimal RNA samples - How much RNA sample to start with?

RNA-seq experiments should best be carried out with samples of consistent RNA integrity and input amounts.  However, some RNA-seq samples can be so limited and irreplaceable that experiments have to be carried out with less than the recommended input amounts. Similar complications can occur if some of the samples are significantly more degraded than others.

Do you recommend PCR-free sequencing library preparations?

Is PCR-free library preparation still advantageous? In general, the original concerns about library PCR amplification (presented in papers from 2008) are no longer very relevant.  This is due to the use of modern polymerases that are designed for complex samples like Kapa HiFi,  NEB Q5, or QIAseq HiFi polymerase.  The previous "standard", the high-fidelity Phusion enzyme had tremendous disadvantages for complex samples (Quail et al. 2012 Optimal enzymes for amplifying sequencing libraries.

How do I size select libraries for the HiSeq 4000 with beads?

The HiSeq 4000 sequencer is the most demanding Illumina sequencer with regards to library insert sizes.  Nevertheless, the majority of existing Illumina sequencing libraries can be sequenced as is on the HiSeq 4000: The libraries should not have any or no visible adapter dimers and the library fragments should be mostly shorter than 670 bases). Other sequencing libraries can be made compatible by size-selection (removing both adapter-dimer traces and fragments of more than 670 bases, if the latter are numerou

When do you recommend 3'-Tag RNA-seq?

3’Tag-Seq is a protocol to generate low-cost and low-noise gene expression profiling data.   The protocol is also known as  TagSeq, 3’Tag RNA-Seq, Digital RNA-seq, Quant-Seq (please note that most of these names have also been used for a variety of other protocols previously). In contrast to traditional RNA-Seq, which generates sequencing libraries from the whole transcripts, 3-Tag-Seq only generates a single initial library molecule per transcript, complementary to 3′-end sequences.

Which strand is sequenced for my strand-specific RNA-seq data?

Strand-Specific RNA-Seq Libraries RNA-Seq (conventional) after Poly-A enrichment or ribodepletion: By default we generate strand-specific RNA-seq libraries. Strand-specific (also known as stranded or directional) RNA-seq libraries substantially enhance the value of an RNA-seq experiment. They add information on the originating strand and thus can precisely delineate the boundaries of transcripts in regions with genes on opposite strands. There are several ways to accomplish strand-specificity.