The CRAN package includes a small toy dataset for quick installation
checks. The GitHub repository also provides a larger
inst/extdata/example.cg file for fuller end-to-end testing
after cloning the repository. Replace either example query with your own
.cg data file path. For more details about how to create .cg files,
please check out YAME (https://zhou-lab.github.io/YAME/). We provided three
reference MRMPs definitions (mouse brain and human pan tissues), please
check out our MRMPs reference tutorial for creating your own
reference.
# CRAN-compatible toy example
example_file <- system.file("extdata", "toy.cg", package = "MethScope")
reference_pattern <- system.file("extdata", "toy.cm", package = "MethScope")
input_pattern <- GenerateInput(example_file, reference_pattern)# Fuller GitHub example after cloning the repository
example_file <- "inst/extdata/example.cg"
reference_pattern <- "Liu2021_MouseBrain.cm"
input_pattern <- GenerateInput(example_file, reference_pattern)If your input data contains many cells, we recommend splitting your .cg files and run the above code in parallel to improve the runtime. To split your .cg files, please refer to the YAME documentation (https://zhou-lab.github.io/YAME/). For understanding of each pattern, please check out our knowYourCG tool: https://www.bioconductor.org/packages/release/bioc/html/knowYourCG.html
To use our pre-trained model, simply use PredictCellType function as shown below for ultra fast cell type annotation. We provided two built in pre-trained models for mouse brain: Liu2021_MouseBrain_P1000 and human tissue atlas: Zhou2025_HumanAtlas_P1000.
To train your own model, use our Input_training function with the cell_type_label vector that correspond to each row of your input_pattern matrix. You can provide your own list for xgb model parameters or set cross_validation = T to find the optimal parameter.
We provided some built in functions for visualizing the prediction results
Cell type proportions can be estimated with our nnls_deconv functions, to obtain the reference input, please check out our (https://github.com/zhou-lab/methscope_data) which stores the reference patterns.
After obtaining the cell by MRMPs matrix input_pattern, simply use it to cluster cells using existing pipeline. Here is a demonstration using Seurat for clustering and UMAP plotting
Pattern.obj <- CreateSeuratObject(counts = t(input_pattern), assay = "DNAm")
VariableFeatures(Pattern.obj) <- rownames(Pattern.obj[['DNAm']])
DefaultAssay(Pattern.obj) <- "DNAm"
Pattern.obj <- NormalizeData(Pattern.obj, assay = "DNAm", verbose = FALSE)
Pattern.obj <- ScaleData(Pattern.obj, assay = "DNAm", verbose = FALSE)
### Can directly use the initial counts matrix
Pattern.obj@assays$DNAm@layers$scale.data <- as.matrix(Pattern.obj@assays$DNAm@layers$counts)
Pattern.obj <- RunPCA(Pattern.obj,assay="DNAm",reduction.name = 'mpca', verbose = FALSE)
Pattern.obj <- FindNeighbors(Pattern.obj, reduction = "mpca", dims = 1:30)
Pattern.obj <- FindClusters(Pattern.obj, verbose = FALSE, resolution = 0.7)
Pattern.obj <- RunUMAP(Pattern.obj, reduction = "mpca", reduction.name = "meth.umap", dims = 1:30)