1. Bioinformatics Pipeline Overview

2. Quality Assessment and Bisulfied Conversion Efficiency

3. Mapping and Methylation Stats

4. Differential Expression Analysis

4.2 Pairwise correlation, Hierarchical clustering, and Principle Component Analysis

4.3 Detection of Differential Methylation Sites and annotations

5. Additional and Tailored Analysis

1. Bioinformatics Pipeline Overview

The diagram presented below offers an insightful overview of the comprehensive bioinformatics pipeline assembled by leveraging the finest components of diverse computational tools, extended by our proprietary in-house computational methods. To ensure transparency and reproducibility, list of the software and R-packages employed throughout these analyses, complete with specific versions, is provided at the end of this report.

Figure 1. Flowchart of Methylation Pipeline

2. Quality Assessment and Bisulfied Conversion Efficiency

In the initial stage, raw reads undergo trimming to eliminate any potential adapter contamination or low-quality bases. Subsequently, we utilized FastQC tool that is commonly used to assess the quality of high-throughput sequencing data, including data generated from Bisulfite Sequencing (Bs-seq). The FastQC report provides valuable insights into various aspects of the sequencing data, helping researchers identify potential issues or biases that might affect downstream analyses. We aggregated FastQC report and various other stats from Dragen pipeline to convenient MultiQC report and available as multiqc_report.html. Besides the usual quality matrices, Per-Position Sequence Content plot (Figure 2) and M-bias plot (Figure 3) holds particular significance in the context of Bs-seq, please review them carefully to asses accuracy of bisulfide conversion.

Figure 2. Per Position Sequence Content

Figure 3. M-bias Plot

3. Mapping and Methylation Stats

Next, the high-quality reads are aligned to the corresponding reference genome through the advanced DRAGEN Methylation pipeline tailored for Whole Genome Bisulfite Sequencing (WGBS) data. When designated, duplicate reads are filtered out, and methylation calling is conducted within the same pipeline. Simultaneously, a genome wide Cytosine Report compatible with MethylKit is generated for each individual sample. Detailed mapping statistics are visually depicted in Figure 4, while further numerical insights can be found in Table 1. Similarly, methylation statistics are summarized in Figure 5, with supplementary specifics shown in Table 2.

Figure 4. Pie Chart Depicting Different Alignment Types

Alignment Stats

Sequence pairs analysed in total	1246888
Paired-end alignments with a unique best hit	664864
Pairs without alignments under any condition	549413
Pairs that did not map uniquely	32611
Genomic sequence context not extractable (edges of chromosomes)	1

Table 1. Allignment Stats

Figure 5. Bar Chart Displaying Percentage of Methylation in Various Contextst

Cytosine Methylation

Total C’s analysed	41019163
Methylated C’s in CpG context	1448307
Methylated C’s in CHG context	228669
Methylated C’s in CHH context	923127
Methylated C’s in Unknown context	32153
Unmethylated C’s in CpG context	604275
Unmethylated C’s in CHG context	8653986
Unmethylated C’s in CHH context	29160799
Unmethylated C’s in Unknown context	63034
Percentage methylation (CpG context)%	70.6%
Percentage methylation (CHG context)%	2.6%
Percentage methylation (CHH context)%	3.1%
Methylated C’s in Unknown context%	33.8%

Table 2. Methylation Statistics Across Different Contexts

4. Differential Analysis

4.1. Pre-processing

For each pair of samples in each group, the differential methylation analysis is performed by extracting relevant cytosine context (i.e. CpG, CHH, or CHG) from the genome-wide cytosine report. From the resulting files, first, the reads with coverage of less than X are filtered out to retain high confidence, and methylation counts. The counts are then normalized. For further analysis bases covered in all the samples are retained. For CpG context only, destrand=TRUE is set which merge reads on both strands of a CpG dinucleotide to achieve better coverage.

4.2.Pairwise correlation, Hierarchical clustering, and Principle Component Analysis

The pairwise correlation is calculated using the Pearson method for all samples, and correlation plots were generated (provided with raw data). The hierarchical clustering is performed using wards methods based on the similarity of the methylation profile. The resulting dendrograms are provided below. Further, we performed Principal Component Analysis (PCA) and plotted the first two components. Generally, the samples clustered in a single group and/or staying closer in the PCA plot indicate the higher similarity between the samples, such as biological or technical replicates. On the contrary, samples from different experimental groups (ex. Mutant vs Wildtype) tend to cluster separately in a dendrogram and/or stay apart from each other in the PCA plot. Together these plots serve as a good indication for errors such as sample swapping or miss-labeling.

Figure 6. CpG PCA Plot Contrasting Two Conditions

Figure 7. CpG Clustering Plot

4.3.Detection of Differential Methylation Sites and annotations

Finally, differentially methylated sites between groups are detected by applying the methylation difference threshold of 25% and q-value cut-off of 0.05. We’ve provided annotated R-object “diffann_*.rds”, if you wish to choose a different cut-off for your analysis The resulting differentially methylated sites are annotated using R-package Genomation and the percentage of differentially methylated bases overlapping with exon/intron/promoter was plotted.

Figure 8 Pie Chart Displaying Annotations of DMRs

5. Additional and Tailored Analysis

We offer additional analyses, including region-based DMR detection and DMR identification in non-CpG contexts. These analyses follow the same methodology as explained in the previous section.

Moreover, we take great satisfaction in providing customized analyses and producing publication-ready figures that contribute to your research objectives. Our dedicated team is ready to collaborate closely with you, ensuring that both the analysis and visuals we deliver are in perfect alignment with your research goals. We invite you to discuss any customized analysis requests with our project manager, who will expertly guide you through the process of conceptualization, planning, and execution of the specialized analyses you may require.

This individualized approach empowers us to cater to your distinct needs, resulting in profound insights and impactful figures that drive your research forward.

6. Appendix

Table3: Key Files Included in Your Delivery Folder

File Path	Description
00.Fastq	Fastq Files
01.FastqQualityCheck/multiqc_report.html	Aggregated QC report
01.FastqQualityCheck/Dragen_Stats/*csv	Various Dragen Stats
02.BamFiles/*.bam	Duplicate marked alignment file
03.CX_Report/*.CX_report.txt.gz	Cytosine Report
04.DE_analysis/C**/	DE Analysis and associated plots for CpG and others

Table4: Grouping information for differential methylation analysis

Sample	Group	ID
Sample1	A	ID_1
Sample2	A	ID_2
Sample3	A	ID_3
Sample4	B	ID_4
Sample5	B	ID_5
Sample6	B	ID_6

Table5: The List of software used in the analysis pipeline

Software	Version
DRAGEN Methylation Pipeline	3.6.3
FASTQ Toolkit	2.2.5
FastQC	v0.11.9
Bowtie2	2.4.1
Bismark	0.22.3
MethylKit	1.12.0
R	3.6
Reference genome	Hg38
Genome annotation file	NCBI Refseq (As recommended by Methylkit)

7. Citation

8. Contact Us

Address: 126 Corporate Boulevard, South Plainfield, New Jersey 07080
Email: custom-services@admerahealth.com
Phone: 908-222-0533

Methylation Analysis Report

Platform: NovaSeq-S4-2x150
Kit: Zymo EZ DNA Methylation Gold
Project ID: XXXXX-XX
Species: Human

2023-07-14

Analysis Schema

1. Bioinformatics Pipeline Overview

2. Quality Assessment and Bisulfied Conversion Efficiency

3. Mapping and Methylation Stats

4. Differential Expression Analysis

4.1 Pre-processing

4.2 Pairwise correlation, Hierarchical clustering, and Principle Component Analysis

4.3 Detection of Differential Methylation Sites and annotations

5. Additional and Tailored Analysis

6. Appendix

7. Citation

8. Contact Us

1. Bioinformatics Pipeline Overview

2. Quality Assessment and Bisulfied Conversion Efficiency

3. Mapping and Methylation Stats

Alignment Stats

Cytosine Methylation

4. Differential Analysis

4.1. Pre-processing

4.2.Pairwise correlation, Hierarchical clustering, and Principle Component Analysis

4.3.Detection of Differential Methylation Sites and annotations

5. Additional and Tailored Analysis

6. Appendix

7. Citation

8. Contact Us

Methylation Analysis Report

Platform: NovaSeq-S4-2x150 Kit: Zymo EZ DNA Methylation Gold Project ID: XXXXX-XX Species: Human

2023-07-14

Analysis Schema

1. Bioinformatics Pipeline Overview

2. Quality Assessment and Bisulfied Conversion Efficiency

3. Mapping and Methylation Stats

Alignment Stats

Cytosine Methylation

4. Differential Analysis

4.1. Pre-processing

4.2.Pairwise correlation, Hierarchical clustering, and Principle Component Analysis

4.3.Detection of Differential Methylation Sites and annotations

5. Additional and Tailored Analysis

6. Appendix

7. Citation

8. Contact Us

Platform: NovaSeq-S4-2x150
Kit: Zymo EZ DNA Methylation Gold
Project ID: XXXXX-XX
Species: Human