06 Sequencing Data

Why does FASTQC show unexpectedly high sequence duplication levels (PCR-duplicates)?

FASTQC is primarily designed to QC whole-genome shotgun sequencing data.

When should I trim my Illumina reads and how should I do it?

Should I trim adapters from my Illumina reads? This depends on the objective of your experiments. In case you are sequencing for counting applications like differential gene expression (DGE) RNA-seq analysis, ChIP-seq, ATAC-seq, read trimming is generally not required anymore when using modern aligners. For such studies local aligners or pseudo-aligners should be used.

Where can I find the UMIs in the Tag-Seq data? When and how should I trim my Tag-Seq data? What is the low complexity stretch in the Tag-Seq data?

By default, we will generate Tag-Seq and Batch-Tag-Seq gene expression profiling data that incorporate Unique Molecular Identifiers (UMIs) in the sequence reads.This FAQ provides information on the usage of UMIs: https://dnatech.genomecenter.ucdavis.edu/faqs/should-i-remove-pcr-duplicates-from-my-rna-seq-data/ . Please note that the UMIs provide optional additional data analysis options; for many applications, frequently the UMI

Should I remove PCR duplicates from my RNA-seq data?

Should I remove PCR duplicates from my RNA-seq data? The short and generalized answer to the question "Should I remove PCR duplicates from my RNA-seq data?" is in most cases NO. For some scenarios, de-duplification can be helpful, but only when using UMIs. Please see the details below. The vast majority of RNA-seq data are analyzed without duplicate removal. Duplicate removal is not possible for single-read data (without UMIs).

Which data will I receive from the PacBio Sequel II sequencer? Will they have quality scores?

We will deliver the complete data set generated by the PacBio Sequel to you securely via Bioshare. For push-button type secondary analyses (combining data for up to 2 SMRT-cells e.g.

What data will I receive for Illumina sequencing? Demultiplexing, Trimming, Filtering

By default you will receive gzip compressed FASTQ data, as individual files for each sample (demultiplexed).

How do I download my sequencing data?

We deliver sequencing data via two portals: SLIMS for Illumina data, and BioShare for PacBio and Nanopore data. Both portals offer secure access to the data and support several download protocols.

Which strand is sequenced for my strand-specific RNA-seq data?

Strand-Specific RNA-Seq Libraries RNA-Seq (conventional) after Poly-A enrichment or ribodepletion: By default we generate strand-specific RNA-seq libraries. Strand-specific (also known as stranded or directional) RNA-seq libraries substantially enhance the value of an RNA-seq experiment. They add information on the originating strand and thus can precisely delineate the boundaries of transcripts in regions with genes on opposite strands. There are several ways to accomplish strand-specificity.

My FASTQ file contains some "N"s. Is there a problem with my data?

Please note that when opening an Illumina sequence fastq file it is expected that the first few thousand reads are of comparatively low quality and frequently contain "N"s. An "N" means that the Illumina software was not able to make a basecall for this base. The reads at the beginning and end of the sequence data files originate from the edges of the flowcells, where imaging is more difficult, thus these reads show below average quality.

How should the miRNA/small-RNA data be trimmed?

We are using the PerkinElmer NEXTflex™ Small RNA-Seq kit for the generation of micro RNA and small RNA-seq libraries because it significantly reduces sequence-specific biases in the library preparation. For this purpose the adapters oligonucleotides contain 4 randomized bases at the ligation junctions. These randomized bases should be removed by trimming before mapping the sequence reads.

DNA Technologies Core

Why does FASTQC show unexpectedly high sequence duplication levels (PCR-duplicates)?

When should I trim my Illumina reads and how should I do it?

Where can I find the UMIs in the Tag-Seq data? When and how should I trim my Tag-Seq data? What is the low complexity stretch in the Tag-Seq data?

Should I remove PCR duplicates from my RNA-seq data?

Which data will I receive from the PacBio Sequel II sequencer? Will they have quality scores?

What data will I receive for Illumina sequencing? Demultiplexing, Trimming, Filtering

How do I download my sequencing data?

Which strand is sequenced for my strand-specific RNA-seq data?

My FASTQ file contains some "N"s. Is there a problem with my data?

How should the miRNA/small-RNA data be trimmed?

Important Update: Changes to Sequencing Data Delivery

Introducing a New Generation of Sequencers – #2 The AVITI

Introducing a New Generation of Sequencers - #1 The Revio

Pacbio REVIO Launch Party on March 12th

Holiday Schedule