Figures are shown as examples only.
ChIP-Seq is a method to study epigenetics
interaction between DNA and proteins by identifying the target binding
sites. For ChIP-Seq, we employ an in-house bioinformatics approach to
map reads, call peaks, perform differential analysis, and detect motifs
using reputable computational resources. In addition, we also offer
ATAC-Seq and CUT&RUN analysis. For
ATAC-Seq (Yan
et al., 2020) and CUT&RUN (Yu et
al., 2021) specifically, ENCODE ATAC-seq
pipeline and CUT&RUNTools 2.0
pipeline are proceeded to the peak detection step, respectively, and
then followed by our downstream analysis.
For our in-house bioinformatics pipeline, FastQC was used to check
the quality of raw and trimmed reads firstly. Trimmomatics was used to
cut adapters and trim low-quality bases with a default setting. After
mapping reads to the reference genome, mapped reads that have MAPQ score
< 10 were removed (filters may be set conditionally). Duplicates were
also removed. deepTools was used to normalized BAM and generate BW
format for visualization. MACS2 was used to call peak. Annotation was
performed using ChIPseeker (R package). If there was no replicate,
MAnorm (R package) was used for sample comparison. On the other hand, if
there were replicates, their called peaks were merged and DiffBind (R
package) was then used for the comparison. Other downstream procedures
included GO/KEGG (R package clusterProfiler), combined density profiles
(deepTools), and motif detection (MEME).
File
availability:
Raw / trimmed FastQ
Mapping statistics
FastQC
reports / MultiQC reports
BAM
BigWig (BW)
Peak Calling is a computational method used to identify areas in the
genome that have been enriched with aligned reads. MACS algorithm can be
used for identifying transcription factor binding sites (narrow peaks)
and histone modification enriched regions (broad peaks). It outputs key
files such as peak files (file which contains the peak locations along
with peak summit, p-value, and q-value) and summit
files (file which contains peak summit locations for motif
analysis).
Sample Cluster Heatmap. Replicate samples.
Clustering of samples helps to segregate data into similar groups.
Correlation heatmap for all peak-calling samples.
PCA plot. Replicate samples. PCA is a procedure
which principle components are obtained by orthogonally transforming a
set of possibly correlated variables (high-dimensional data) into a
smaller number (few dimensions) of linearly uncorrelated variables.
Dimensional reduction of peak datasets using PCA.
Peak overlay plots show the number of peaks that are common and different between comparable datasets.
Venn diagram of binding site overlaps of replicate samples in the comparison groups.
MA plot for data normalization. Red dots represent sites that are significantly differentially bound. p < 0.05. Blue line: 0-fold change.
II.Volcano plot
Volcano plot that shows significantly differentially bound sites. If log2(Target)-log2(Control) < 0, there are decreased affinity toward binding sties; whereas if log2(Target)-log2(Control) > 0, there are increased affinity toward binding sites.
Box plot of read distributions for differentially bound sites.
Heatmap between replicate samples of comparison groups.
MA plot before normalization using common peaks (left) and after normalization displaying all peaks (right). M: log2 fold changes. A: average expression signal. Green line: robust regression; red line: LOWESS regression; blue line: M = 0.
Gene Ontology (GO) enrichment analysis. Dot plot of significant Biological Process GO terms.
KEGG pathway enrichment analysis. Bar graph of significant KEGG terms.
Density plots and genomic heatmaps show respective peak intensities centered at the transcriptional start site (TSS) flanked by 3kb regions. x-axis represents the genomic location relative to the TSS. y-axis represents signal intensity.
MEME was used to analyze motif sequences around the peak regions. The
MEME-chIP performs motif discovery, enrichment, search, comparison, and
visualization.
Motif analysis. Top two motifs found within the peak regions (+100 nucleotides on each side) of a sample were shown. The first motif (E-value = 4.4e-182) was related to that of CTCF. The second motif (E-value = 1.9e-138) was related to that of SPIB or SPI1.
A list of software, tools, and libraries used in the analysis
pipeline:
Trimmomatic (v0.38)
FastQC (v0.11.8)
MultiQC
(v1.11)
SAMtools (v0.1.19)
Picard (v2.20.4)
bedTools
(v2.29.2)
deepTools (v3.4.1)
MACS (v2.2.6)
MEME (v5.1.1)
(https://meme-suite.org/meme/)
R libraries:
ChIPseeker (v1.22.1)
clusterProfiler (v3.14.3)
DiffBind (v3.8.4)
For ChIP-Seq:
BWA (v0.7.10)
For ATAC-Seq:
ENCODE ATAC-seq pipeline (https://github.com/ENCODE-DCC/atac-seq-pipeline)
Bowtie2 (v2.3.4.1)
For Cut&Run:
CUT&RUNTools 2.0 (https://github.com/fl-yu/CUT-RUNTools-2.0)
Bowtie2
(v2.3.4.1)
Language:
Shell
R (v4.3.0)
Python (v3.7.13)
Perl
(v5.26.2)
ATAC-Seq Assay for Transposase-Accessible Chromatin with
Sequencing
ChIP-Seq Chromatin Immunoprecipitation with
Sequencing
CUT&RUN Cleavage Under Targets and Release Using
Nuclease
GO Gene Ontology
KEGG Kyoto Encyclopedia of Genes
and Genomes
MACS Model-based Analysis of ChIP-Seq
PCA
Principal Component Analysis