06 Sequencing Data

When should I trim my Illumina reads and how should I do it?

Should I trim adapters from my Illumina reads? This depends on the objective of your experiments. In case you are sequencing for counting applications like differential gene expression (DGE) RNA-seq analysis, ChIP-seq, ATAC-seq, read trimming is generally not required anymore when using modern aligners.  For such studies local aligners or pseudo-aligners should be used. Modern "local aligners" like STAR, BWA-MEM, HISAT2, will "soft-clip" non-matching sequences.

Where can I find the UMIs in the Tag-Seq data? When and how should I trim my Tag-Seq data? What is the low complexity stretch in the Tag-Seq data?

By default, we will generate Tag-Seq and Batch-Tag-Seq gene expression profiling data that incorporate Unique Molecular Identifiers (UMIs) in the sequence reads. (This FAQ provides information on the usage of UMIs:  https://dnatech.genomecenter.ucdavis.edu/faqs/should-i-remove-pcr-duplicates-from-my-rna-seq-data/ ). Please note that the UMIs provide optional additional data analysis options; for many applications, the UMI information

Should I remove PCR duplicates from my RNA-seq data?

Should I remove PCR duplicates from my RNA-seq data? The short and generalized answer to the question "Should I remove PCR duplicates from my RNA-seq data?" is in most cases NO.  For some scenarios, de-duplification can be helpful, but only when using UMIs. Please see the details below. The vast majority of RNA-seq data are analyzed without duplicate removal. Duplicate removal is not possible for single-read data (without UMIs).

What data will I receive for Illumina sequencing? Demultiplexing, Trimming, Filtering

By default you will receive gzip compressed FASTQ data, as individual files  for each sample (demultiplexed).  The demultiplexing is included in the service if you provide us the barcodes sequences on the submission form. The files will be available for download from our secure SLIMS server. You will receive only the reads from clusters passing the Illumina quality filter, also called Illumina chastity filter  -

Which strand is sequenced for my strand-specific RNA-seq data?

Strand-Specific RNA-Seq Libraries RNA-Seq (conventional) after Poly-A enrichment or ribodepletion: By default we generate strand-specific RNA-seq libraries. Strand-specific (also known as stranded or directional) RNA-seq libraries substantially enhance the value of an RNA-seq experiment. They add information on the originating strand and thus can precisely delineate the boundaries of transcripts in regions with genes on opposite strands. There are several ways to accomplish strand-specificity.

My FASTQ file contains some "N"s. Is there a problem with my data?

Please note that when opening an Illumina sequence fastq file it is expected that the first few thousand reads are of comparatively low quality and frequently contain "N"s.  An "N" means that the Illumina software was not able to make a basecall for this base. The reads at the beginning and end of the sequence data files originate from the edges of the flowcells, where imaging is more difficult, thus these reads show below average quality.

How should the miRNA/small-RNA data be trimmed?

We are using the PerkinElmer NEXTflex™ Small RNA-Seq kit for the generation of micro RNA and small RNA-seq libraries because it significantly reduces sequence-specific biases in the library preparation.  For this purpose the adapters oligonucleotides contain 4 randomized bases at the ligation junctions.  These randomized bases should be removed by trimming before mapping the sequence reads.