---
title: "Quantifying rhythmicity in one condition"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Quantifying rhythmicity in one condition}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = '#>',
  message = FALSE,
  fig.align = 'center',
  fig.retina = 2)
```

## Introduction

Here we show how to use `limorhyde2` to quantify rhythmicity in data from one condition. The data are based on mouse liver samples from the [circadian gene expression atlas in mammals](https://doi.org/10.1073/pnas.1408886111) ([GSE54650](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54650)).

## Load packages

```{r load_packages}
library('data.table')
library('ggplot2')
library('limorhyde2')

# doParallel::registerDoParallel() # register a parallel backend to minimize runtime
theme_set(theme_bw())
```

## Load the data

The expression data are in a matrix with one row per gene and one column per sample. The metadata are in a table with one row per sample. To save time and space, the expression data include only a subset of genes.

```{r load_data}
y = GSE54650$y
y[1:5, 1:5]

metadata = GSE54650$metadata
metadata
```

## Fit linear models

The first step is to fit a series of linear models based on periodic splines for each genomic feature, in this case each gene, using [limma](https://doi.org/doi:10.18129/B9.bioc.limma). `getModelFit()` takes several arguments besides the expression data and metadata, but here we just use the defaults. For example, the data are from one condition, so we leave `condColname` as `NULL`. Also, the expression data are from microarrays and already log-transformed, so we leave `method` as `'trend'`.

```{r model_fit}
fit = getModelFit(y, metadata)
```

## Get posterior fits

The next step is obtain posterior estimates of the model coefficients using [multivariate adaptive shrinkage](https://doi.org/10.1038/s41588-018-0268-8) ([mashr](https://stephenslab.github.io/mashr/index.html)), which learns patterns in the data and accounts for noise in the original fits.

```{r posterior_fit}
fit = getPosteriorFit(fit)
```

## Get rhythm statistics

We can now use the posterior fits to compute rhythm statistics, i.e. properties of the curve, for each gene. 

```{r rhythm_stats}
rhyStats = getRhythmStats(fit)
```

We can examine the distributions of the statistics in various ways, such as ranking genes by peak-to-trough amplitude (no p-values necessary) or plotting peak-to-trough amplitude vs. peak phase.

```{r plot_stats, fig.width = 5, fig.height = 4}
print(rhyStats[order(-peak_trough_amp)], nrows = 10L)

ggplot(rhyStats) +
  geom_point(aes(x = peak_phase, y = peak_trough_amp), alpha = 0.2) +
  xlab('Peak phase (h)') +
  ylab('Peak-to-trough amplitude (norm.)') +
  scale_x_continuous(breaks = seq(0, 24, 4))
```

## Get observed and fitted time-courses

We can also compute the expected measurements for one or more genes at one or more time-points, which correspond to the fitted curves. Here we plot the posterior fits and observed expression for three of the top rhythmic genes (converting from gene id to gene symbol).

```{r expected_meas}
genes = data.table(
  id = c('13088', '13170', '13869'),
  symbol = c('Cyp2b10', 'Dbp', 'Erbb4'))

measFit = getExpectedMeas(fit, times = seq(0, 24, 0.5), features = genes$id)
measFit[genes, symbol := i.symbol, on = .(feature = id)]
print(measFit, nrows = 10L)
```

Next we combine the observed expression data and metadata. The curves show how `limorhyde2` is able to fit non-sinusoidal rhythms.

```{r plot_timecourse, fig.width = 7, fig.height = 2.25}
measObs = mergeMeasMeta(y, metadata, features = genes$id)
measObs[genes, symbol := i.symbol, on = .(feature = id)]
print(measObs, nrows = 10L)

ggplot() +
  facet_wrap(vars(symbol), scales = 'free_y', nrow = 1) +
  geom_line(aes(x = time, y = value), data = measFit) +
  geom_point(aes(x = time %% 24, y = meas), shape = 21, size = 1.5,
             data = measObs) +
  labs(x = 'Circadian time (h)', y = 'Expression (norm.)') +
  scale_x_continuous(breaks = seq(0, 24, 4))
```
