Small RNA Analysis Report
We have employed our small RNA Analysis Pipeline, assembled with the
best components of various computational tools, extended by our in-house
computational methods.
Briefly, as shown in the Flowchart the
following analyses were performed:
1. Raw data QA
2.
Trimming and cutting adapter
3. STAR mapping
4.
Count circRNA, miRNA, piRNA, snoRNA, snRNA and tRNA
5. QC
report
During the library preparation and sequencing, artificial/technical
biases could be introduced and affect the accuracy of downstream
analysis. Therefore, we performed a thorough quality assessment of the
FASTQ data and, if needed, some quality improvement processing may be
taken to ensure the accurate analysis result.
The primary QA
criteria and possible interpretations included (note that we only list
some of the most probable interpretations for the QA result.
A
thorough understanding of the FastQC report can be found at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
The fastQC visualization and summary files are available in *.fastqc.zip
and *.fastqc.html. You can check detailed data quality statistics in
upzipped files or, more conveniently, open the html file on the web
browser to visualize and check data quality based on the guideline link
above.
Further, we have aggregated FastQC reports from all the
samples using the MultiQC tool, this provides a convenient way to assess
quality stats across all the samples in a single HTML report, namely,
multiqc_report.html.
The QC report and multiqc report files are
saved in 01.fastqc/* fastqc.html and 01.fastqc/*fastqc.zip.
Based on the manual of SMARTer smRNA-seq, we removed the 5’- 3 bases and the 3’ poly A ends. Then, the adatpers were further trimmed. The read quality of the remaining sequence is evaluated using its corresponding PHRED score. Poor quality reads (average PHRED ≤ 20) are removed according to quality control parameter set in the command line (−rr 20). The bases with bad quality (PHRED ≤ 20) in the head and tail ends of the read can also be removed with the related parameters (−rh 20 −rt 20). Users can specify qualified reads of specific length intervals for input into subsequent modules.
The fastq files after pre-processing are saved in 00.TrimmedFastq.
The QC report of the pre-processing analysis are saved in
01.fastqc/COMPSRA_QCreport/*_QCReport.txt
Table 1.Summary of QC Reports of Project 19205-91Q2
Figure 1. Length Distribution in Percentage
The software COMPSRA was used to identify the different types of
small RNA.
COMPSRA uses STAR as its default RNA sequence
aligner with default parameters which are customizable on the command
line. Qualified reads are first mapped to the human genome hg38, and
then aligned reads are quantified and annotated in the Annotation
Module. The clean reads with 17 or more bases were mapped to all kinds
of small RNA databases (shown in the flowchart). The bam files are saved
in 02.bam.
COMPSRA currently uses several different small RNA
databases for annotating human genome mapped reads and provides all the
possible annotations: miRBase for miRNA; piRNABank, piRBase and
piRNACluster for piRNA; gtRNAdb for tRNA; GENCODE release 27 for snRNA
and snoRNA; circBase for circular RNA. To conform the different
reference human genome versions in these databases, we use an automatic
LiftOver created by the UCSC Genome Browser Group. All the databases
used are already pre-built, enabling speedy annotation.
The
number of circRNA, miRNA, piRNA, snoRNA, snRNA and tRNA were counted for
each sample. The raw counts were saved in 04.smallRNAcounts. The raw
counts were then normalized by counts per million, and the CPM counts
were also saved in 04.smallRNAcounts/*CPM.txt.
CPM (Counts Per Million) are obtained by dividing counts by the
library counts sum and multiplying the results by a million.
The small RNA quatification report for each sample is saved in 05.smallRNAreport.
Table 2. Summary of Counts in Different Types of Small
RNA
Overall, the small RNA analysis was successfully completed. All supporting documents including the raw data have been transferred to you, which we presume will assist greatly for any further validations and pursuit of key research answers.
Table 3. The List of software used in the analysis pipeline.
Software | Version |
---|---|
Fastqc | v0.11.8 |
Cutadapt | 3.2 |
STAR | 2.7.1a |
COMPSRA | V1.0 |
Address: 126 Corporate Boulevard, South Plainfield, New Jersey 07080
Email: custom-services@admerahealth.com
Phone:
908-222-0533