MethScope-Tutorial

library(MethScope)

Generate cell/pixel/sample MRMPs’ embedding

The CRAN package includes a small toy dataset for quick installation checks. The GitHub repository also provides a larger inst/extdata/example.cg file for fuller end-to-end testing after cloning the repository. Replace either example query with your own .cg data file path. For more details about how to create .cg files, please check out YAME (https://zhou-lab.github.io/YAME/). We provided three reference MRMPs definitions (mouse brain and human pan tissues), please check out our MRMPs reference tutorial for creating your own reference.

# CRAN-compatible toy example
example_file <- system.file("extdata", "toy.cg", package = "MethScope")
reference_pattern <- system.file("extdata", "toy.cm", package = "MethScope")
input_pattern <- GenerateInput(example_file, reference_pattern)
# Fuller GitHub example after cloning the repository
example_file <- "inst/extdata/example.cg"
reference_pattern <- "Liu2021_MouseBrain.cm"
input_pattern <- GenerateInput(example_file, reference_pattern)

If your input data contains many cells, we recommend splitting your .cg files and run the above code in parallel to improve the runtime. To split your .cg files, please refer to the YAME documentation (https://zhou-lab.github.io/YAME/). For understanding of each pattern, please check out our knowYourCG tool: https://www.bioconductor.org/packages/release/bioc/html/knowYourCG.html

Perform cell type annotation

To use our pre-trained model, simply use PredictCellType function as shown below for ultra fast cell type annotation. We provided two built in pre-trained models for mouse brain: Liu2021_MouseBrain_P1000 and human tissue atlas: Zhou2025_HumanAtlas_P1000.

model <- Liu2021_MouseBrain_P1000()
prediction_result <- PredictCellType(model,input_pattern)

Train the classification model

To train your own model, use our Input_training function with the cell_type_label vector that correspond to each row of your input_pattern matrix. You can provide your own list for xgb model parameters or set cross_validation = T to find the optimal parameter.

trained_model <- Input_training(input_pattern,cell_type_label)

Visualize the prediction result

We provided some built in functions for visualizing the prediction results

umap_plot <- PlotUMAP(input_pattern,prediction_result)
### cell_type_label is the true cell type label
PlotConfusion(prediction_result,cell_type_label)
PlotF1(prediction_result,cell_type_label)

Cell type deconvolution

Cell type proportions can be estimated with our nnls_deconv functions, to obtain the reference input, please check out our (https://github.com/zhou-lab/methscope_data) which stores the reference patterns.

reference_pattern <- "Liu2021_MouseBrain.cm"
reference_input <- readRDS("2021Liu_reference_pattern.rds")
cell_proportion <- nnls_deconv(reference_input,input_pattern)

Unsupervised clustering

After obtaining the cell by MRMPs matrix input_pattern, simply use it to cluster cells using existing pipeline. Here is a demonstration using Seurat for clustering and UMAP plotting

Pattern.obj <- CreateSeuratObject(counts = t(input_pattern), assay = "DNAm")
VariableFeatures(Pattern.obj) <- rownames(Pattern.obj[['DNAm']])
DefaultAssay(Pattern.obj) <- "DNAm"
Pattern.obj <- NormalizeData(Pattern.obj, assay = "DNAm", verbose = FALSE)
Pattern.obj <- ScaleData(Pattern.obj, assay = "DNAm", verbose = FALSE)
### Can directly use the initial counts matrix
Pattern.obj@assays$DNAm@layers$scale.data <- as.matrix(Pattern.obj@assays$DNAm@layers$counts)
Pattern.obj <- RunPCA(Pattern.obj,assay="DNAm",reduction.name = 'mpca', verbose = FALSE)
Pattern.obj <- FindNeighbors(Pattern.obj, reduction = "mpca", dims = 1:30)
Pattern.obj <- FindClusters(Pattern.obj, verbose = FALSE, resolution = 0.7)
Pattern.obj <- RunUMAP(Pattern.obj, reduction = "mpca",  reduction.name = "meth.umap", dims = 1:30)