1. Bioinformatics Pipeline Overview

The diagram presented below offers an insightful overview of the comprehensive bioinformatics pipeline assembled by leveraging the finest components of diverse computational tools, extended by our proprietary in-house computational methods. To ensure transparency and reproducibility, list of the software and R-packages employed throughout these analyses, complete with specific versions, is provided at the end of this report.

Figure 1. Flowchart of Methylation Pipeline


2. Quality Assessment and Bisulfied Conversion Efficiency

In the initial stage, raw reads undergo trimming to eliminate any potential adapter contamination or low-quality bases. Subsequently, we utilized FastQC tool that is commonly used to assess the quality of high-throughput sequencing data, including data generated from Bisulfite Sequencing (Bs-seq). The FastQC report provides valuable insights into various aspects of the sequencing data, helping researchers identify potential issues or biases that might affect downstream analyses. We aggregated FastQC report and various other stats from Dragen pipeline to convenient MultiQC report and available as multiqc_report.html. Besides the usual quality matrices, Per-Position Sequence Content plot (Figure 2) and M-bias plot (Figure 3) holds particular significance in the context of Bs-seq, please review them carefully to asses accuracy of bisulfide conversion.

Figure 2. Per Position Sequence Content



Figure 3. M-bias Plot


3. Mapping and Methylation Stats

Next, the high-quality reads are aligned to the corresponding reference genome through the advanced DRAGEN Methylation pipeline tailored for Whole Genome Bisulfite Sequencing (WGBS) data. When designated, duplicate reads are filtered out, and methylation calling is conducted within the same pipeline. Simultaneously, a genome wide Cytosine Report compatible with MethylKit is generated for each individual sample. Detailed mapping statistics are visually depicted in Figure 4, while further numerical insights can be found in Table 1. Similarly, methylation statistics are summarized in Figure 5, with supplementary specifics shown in Table 2.

Figure 4. Pie Chart Depicting Different Alignment Types


Alignment Stats

Sequence pairs analysed in total 1246888
Paired-end alignments with a unique best hit 664864
Pairs without alignments under any condition 549413
Pairs that did not map uniquely 32611
Genomic sequence context not extractable (edges of chromosomes) 1

Table 1. Allignment Stats



Figure 5. Bar Chart Displaying Percentage of Methylation in Various Contextst



Cytosine Methylation

Total C’s analysed 41019163
Methylated C’s in CpG context 1448307
Methylated C’s in CHG context 228669
Methylated C’s in CHH context 923127
Methylated C’s in Unknown context 32153
Unmethylated C’s in CpG context 604275
Unmethylated C’s in CHG context 8653986
Unmethylated C’s in CHH context 29160799
Unmethylated C’s in Unknown context 63034
Percentage methylation (CpG context)% 70.6%
Percentage methylation (CHG context)% 2.6%
Percentage methylation (CHH context)% 3.1%
Methylated C’s in Unknown context% 33.8%

Table 2. Methylation Statistics Across Different Contexts



4. Differential Analysis

4.1. Pre-processing

For each pair of samples in each group, the differential methylation analysis is performed by extracting relevant cytosine context (i.e. CpG, CHH, or CHG) from the genome-wide cytosine report. From the resulting files, first, the reads with coverage of less than X are filtered out to retain high confidence, and methylation counts. The counts are then normalized. For further analysis bases covered in all the samples are retained. For CpG context only, destrand=TRUE is set which merge reads on both strands of a CpG dinucleotide to achieve better coverage.

4.2.Pairwise correlation, Hierarchical clustering, and Principle Component Analysis

The pairwise correlation is calculated using the Pearson method for all samples, and correlation plots were generated (provided with raw data). The hierarchical clustering is performed using wards methods based on the similarity of the methylation profile. The resulting dendrograms are provided below. Further, we performed Principal Component Analysis (PCA) and plotted the first two components. Generally, the samples clustered in a single group and/or staying closer in the PCA plot indicate the higher similarity between the samples, such as biological or technical replicates. On the contrary, samples from different experimental groups (ex. Mutant vs Wildtype) tend to cluster separately in a dendrogram and/or stay apart from each other in the PCA plot. Together these plots serve as a good indication for errors such as sample swapping or miss-labeling.


Figure 6. CpG PCA Plot Contrasting Two Conditions




Figure 7. CpG Clustering Plot


4.3.Detection of Differential Methylation Sites and annotations

Finally, differentially methylated sites between groups are detected by applying the methylation difference threshold of 25% and q-value cut-off of 0.05. We’ve provided annotated R-object “diffann_*.rds”, if you wish to choose a different cut-off for your analysis The resulting differentially methylated sites are annotated using R-package Genomation and the percentage of differentially methylated bases overlapping with exon/intron/promoter was plotted.

Figure 8 Pie Chart Displaying Annotations of DMRs


5. Additional and Tailored Analysis

We offer additional analyses, including region-based DMR detection and DMR identification in non-CpG contexts. These analyses follow the same methodology as explained in the previous section.

Moreover, we take great satisfaction in providing customized analyses and producing publication-ready figures that contribute to your research objectives. Our dedicated team is ready to collaborate closely with you, ensuring that both the analysis and visuals we deliver are in perfect alignment with your research goals. We invite you to discuss any customized analysis requests with our project manager, who will expertly guide you through the process of conceptualization, planning, and execution of the specialized analyses you may require.

This individualized approach empowers us to cater to your distinct needs, resulting in profound insights and impactful figures that drive your research forward.

6. Appendix


Table3: Key Files Included in Your Delivery Folder

File Path Description
00.Fastq Fastq Files
01.FastqQualityCheck/multiqc_report.html Aggregated QC report
01.FastqQualityCheck/Dragen_Stats/*csv Various Dragen Stats
02.BamFiles/*.bam Duplicate marked alignment file
03.CX_Report/*.CX_report.txt.gz Cytosine Report
04.DE_analysis/C**/ DE Analysis and associated plots for CpG and others


Table4: Grouping information for differential methylation analysis
Sample Group ID
Sample1 A ID_1
Sample2 A ID_2
Sample3 A ID_3
Sample4 B ID_4
Sample5 B ID_5
Sample6 B ID_6


Table5: The List of software used in the analysis pipeline
Software Version
DRAGEN Methylation Pipeline 3.6.3
FASTQ Toolkit 2.2.5
FastQC v0.11.9
Bowtie2 2.4.1
Bismark 0.22.3
MethylKit 1.12.0
R 3.6
Reference genome Hg38
Genome annotation file NCBI Refseq (As recommended by Methylkit)



8. Contact Us

Address: 126 Corporate Boulevard, South Plainfield, New Jersey 07080
Email:
Phone: 908-222-0533