05 Sequencing

What are UMIs and why are they used in high-throughput sequencing?

UMI is an acronym for Unique Molecular Identifier.  UMIs are complex indices added to sequencing libraries before any PCR amplification steps, enabling the accurate bioinformatic identification of PCR duplicates. UMIs are also known as "Molecular Barcodes" or "Random Barcodes".  The idea seems to have been first implemented in an iCLIP protocol (König et al.

How should I submit the barcode sequence information? In which direction will they be sequenced?

Depending on sequencer and in case of the HiSeq 4000 even depending on run type (single-end or paired-end) Illumina uses different approaches to sequence the indices.   Please find detailed information here: indexed-sequencing-overview-guide-15057455-04-Illumina-pages1to8 The correct orientation of the barcode sequence fuehrer depends on the way the barcodes are added to the library. The gist of it is: For barcodes added to Illumina libraries

What read numbers/yields can I expect from Illumina sequencing?

The Illumina specifications are based on the Illumina PhiX control library. Better or similar yields can be expected for other high complexity libraries (e.g. genomic, RNA-seq libraries) if they are within the recommended insert size ranges and do not average extreme GC-contents. Yields can vary depending on library type.

My libraries show peaks larger than expected. Can I still sequence these PCR-bubbles?

PCR amplified sequencing libraries frequently display library molecules seemingly about twice the excepted size or even bigger.  In most cases, this phenomenon is caused by over-amplification of the libraries.  These PCR artifacts do occur in cases the PCR reactions run out of essential reagents - in most cases the PCR primers will be exhausted.  If primers are no longer available the PCR products will anneal to each other  (the sequencing adapter sequence will be the by far most common sequences available).

How should I prepare and sequence samples for ChIP-seq?

If we prepare the sequencing libraries we require ChIP-seq DNA samples to be submitted after reversal of the cross-linking. Ideally, the fragment lengths should be between 100 and 300 bp, and preferably under 500 bp. The former will result in the tightest peaks. For ChIP-seq it is common to start with DNA samples with concentrations too low to measure.

In which form will I receive the data?

All sequencing data will be available for secure download via our SLIMS server. Illumina sequencing data will be delivered as compressed FASTQ files. By default the data will be de-multiplexed (e.g. split according to sample).  Each SLIMS directory will further contain a file with the de-multiplexing metrics and a file listing md5 checksums  for all the FASTQ files.