1. Analysis Schema

We have employed our small RNA Analysis Pipeline, assembled with the best components of various computational tools, extended by our in-house computational methods.

Briefly, as shown in the Flowchart the following analyses were performed:
     1. Raw data QA
     2. Trimming and cutting adapter
     3. STAR mapping
     4. Count circRNA, miRNA, piRNA, snoRNA, snRNA and tRNA
     5. QC report

2. Analysis Workflow


3. Raw Data Quality Assessment

During the library preparation and sequencing, artificial/technical biases could be introduced and affect the accuracy of downstream analysis. Therefore, we performed a thorough quality assessment of the FASTQ data and, if needed, some quality improvement processing may be taken to ensure the accurate analysis result.

The primary QA criteria and possible interpretations included (note that we only list some of the most probable interpretations for the QA result.

A thorough understanding of the FastQC report can be found at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. The fastQC visualization and summary files are available in *.fastqc.zip and *.fastqc.html. You can check detailed data quality statistics in upzipped files or, more conveniently, open the html file on the web browser to visualize and check data quality based on the guideline link above.

Further, we have aggregated FastQC reports from all the samples using the MultiQC tool, this provides a convenient way to assess quality stats across all the samples in a single HTML report, namely, multiqc_report.html.

The QC report and multiqc report files are saved in 01.fastqc/* fastqc.html and 01.fastqc/*fastqc.zip.

4. Reads Processing and Statistics

Based on the manual of SMARTer smRNA-seq, we removed the 5’- 3 bases and the 3’ poly A ends. Then, the adatpers were further trimmed. The read quality of the remaining sequence is evaluated using its corresponding PHRED score. Poor quality reads (average PHRED ≤ 20) are removed according to quality control parameter set in the command line (−rr 20). The bases with bad quality (PHRED ≤ 20) in the head and tail ends of the read can also be removed with the related parameters (−rh 20 −rt 20). Users can specify qualified reads of specific length intervals for input into subsequent modules.

The fastq files after pre-processing are saved in 00.TrimmedFastq. The QC report of the pre-processing analysis are saved in 01.fastqc/COMPSRA_QCreport/*_QCReport.txt

Table 1.Summary of QC Reports of Project 19205-91Q2


Figure 1. Length Distribution in Percentage





5. Map, Annotate, and Count for Small RNAs

The software COMPSRA was used to identify the different types of small RNA.

COMPSRA uses STAR as its default RNA sequence aligner with default parameters which are customizable on the command line. Qualified reads are first mapped to the human genome hg38, and then aligned reads are quantified and annotated in the Annotation Module. The clean reads with 17 or more bases were mapped to all kinds of small RNA databases (shown in the flowchart). The bam files are saved in 02.bam.

COMPSRA currently uses several different small RNA databases for annotating human genome mapped reads and provides all the possible annotations: miRBase for miRNA; piRNABank, piRBase and piRNACluster for piRNA; gtRNAdb for tRNA; GENCODE release 27 for snRNA and snoRNA; circBase for circular RNA. To conform the different reference human genome versions in these databases, we use an automatic LiftOver created by the UCSC Genome Browser Group. All the databases used are already pre-built, enabling speedy annotation.

The number of circRNA, miRNA, piRNA, snoRNA, snRNA and tRNA were counted for each sample. The raw counts were saved in 04.smallRNAcounts. The raw counts were then normalized by counts per million, and the CPM counts were also saved in 04.smallRNAcounts/*CPM.txt.

CPM (Counts Per Million) are obtained by dividing counts by the library counts sum and multiplying the results by a million.

The small RNA quatification report for each sample is saved in 05.smallRNAreport.


Table 2. Summary of Counts in Different Types of Small RNA


6. Conclusions

Overall, the small RNA analysis was successfully completed. All supporting documents including the raw data have been transferred to you, which we presume will assist greatly for any further validations and pursuit of key research answers.


Appendix

Table 3. The List of software used in the analysis pipeline.

Software Version
Fastqc v0.11.8
Cutadapt 3.2
STAR 2.7.1a
COMPSRA V1.0

7. Contact Us

Address: 126 Corporate Boulevard, South Plainfield, New Jersey 07080
Email:
Phone: 908-222-0533