---
title: "Introduction to omicwas"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to omicwas}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Installation

```{r, eval = FALSE}
install.packages("devtools")
devtools::install_github("fumi-github/omicwas")
```

If you encounter dependency error for `sva` package, install it from Bioconductor:

```{r, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
   install.packages("BiocManager")
BiocManager::install("sva")
```

## Usage

`omicwas` is a package for cell-type-specific disease association testing,
using bulk tissue data as input.
The package accepts DNA methylation data for epigenome-wide association studies (EWAS),
as well as gene expression data for differential gene expression analyses.
The main function is `ctassoc`

```{r setup}
library(omicwas)
```

See description.

```{r}
?ctassoc
```

## Analyzing DNA methylation

Let's load a sample data.

```{r}
data(GSE42861small)
X = GSE42861small$X
W = GSE42861small$W
Y = GSE42861small$Y
Y = Y[seq(1, 20), ] # for brevity
C = GSE42861small$C
```

See description.

```{r}
?GSE42861small
```

The conventional way is to use ordinary linear regression.

```{r}
result = ctassoc(X, W, Y, C = C,
                 test = "full")
result$coefficients
```

We recommend nonlinear regression with ridge regularization.
For DNA methylation, we use the **logit** function for normalization,
and the test option is `nls.logit`

```{r}
result = ctassoc(X, W, Y, C = C,
                 test = "nls.logit",
                 regularize = TRUE)
print(result$coefficients, n = 20)
```

The first 19 lines show the result for CpG site cg10543797.
Line 1 shows that the basal methylation level in CD4.
(actually CD4+ T cells) is
plogis(1.498) = 0.817, so this cell type is 81% methylated.
Line 8 shows that the CD4.-specific effect of the disease
is 7.10e-04	(in logit scale).
Since the p.value is 0.64, this is not significant.
Line 15 shows that the effect of sexF (female compared to male) is
-4.14e-02 with a small p.value 9.15e-05.
Since sexF is a covariate that has uniform effect across cell types,
the celltype column is NA.

## Analyzing gene expression

Let's load a sample data.

```{r}
data(GTExsmall)
X = GTExsmall$X
W = GTExsmall$W
Y = GTExsmall$Y + 1
Y = Y[seq(1, 20), ] # for brevity
C = GTExsmall$C
```

See description.

```{r}
?GTExsmall
```

The conventional way is to use ordinary linear regression.

```{r}
result = ctassoc(X, W, Y, C = C,
                 test = "full")
result$coefficients
```

We recommend nonlinear regression with ridge regularization.
For DNA methylation, we use the **log** function for normalization,
and the test option is `nls.log`

```{r}
result = ctassoc(X, W, Y, C = C,
                 test = "nls.log",
                 regularize = TRUE)
print(result$coefficients, n = 15)
```

The first 13 lines show the result for transcript ENSG00000059804.
Line 1 shows that the basal expression level in Granulocytes is
exp(8.847) = 6955.
Line 7 shows that the Granulocytes-specific effect of age
is 4.62e-04	(in log scale).
Since the p.value is 0.82, this is not significant.
Line 13 shows that the effect of sexF (female compared to male) is
-2.57e-03 with p.value 0.97
Since sexF is a covariate that has uniform effect across cell types,
the celltype column is NA.

## Analyzing mQTL

For QTL analyses, we use `ctcisQTL` function instead of `ctassoc`.
To speed up computation, we perform linear ridge regression,
thus the statistical test is almost identical to `ctassoc(test = "nls.identity", regularize = TRUE)`.
We analyze only in the linear scale.
Association analysis is performed between each row of Y and each row of X.
See description.

```{r}
?ctcisQTL
```

Let's load a sample data.

```{r}
data(GSE79262small)
X    = GSE79262small$X
Xpos = GSE79262small$Xpos
W    = GSE79262small$W
Y    = GSE79262small$Y
Ypos = GSE79262small$Ypos
C    = GSE79262small$C
X    = X[seq(1, 3001, 100), ] # for brevity
Xpos = Xpos[seq(1, 3001, 100)]
Y    = Y[seq(1, 501, 100), ]
Ypos = Ypos[seq(1, 501, 100)]
```

See description.

```{r}
?GSE79262small
```

Analyze mQTL.

```{r}
ctcisQTL(X, Xpos, W, Y, Ypos, C = C)
```

The result is stored in a file.

```{r}
head(
  read.table(file.path(tempdir(), "ctcisQTL.out.txt"),
             header = TRUE,
             sep ="\t"))
```

The first 3 lines show the result for the association of
SNP rs6678176 with CpG site cg19251656.
Line 1 shows that the CD4T-specific effect of the SNP is -0.003.
Since the p.value is 0.75, this is not significant.
