---
title: "sbl: Sparse Bayesian Learning for QTL Mapping and Genome-Wide Association Studies"
author: |-
  Meiyue Wang and Shizhong Xu
 
  Department of Botany and Plant Sciences, University of California, Riverside
date: "October 17, 2018"
output: rmarkdown::html_document
vignette: >
  %\VignetteIndexEntry{'sbl.tutorial'}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Introduction

Single marker models that detecting one locus at a time are subject to many problems in 
genome-wide association studies (GWAS) and quantitative trait locus (QTL) mapping, which
includes large matrix inversion, over-conservativeness after Bonferroni correction and
difficulty in evaluation of total genetic contribution. Such problems can be solved 
by a multiple locus model which includes all markers in the same model with effects 
being estimated simultaneously. The sparse Bayesian learning method (SBL), implemented
in `sbl` package, is a multiple locus model that can handle extremely large sample 
size (>100,000) and outcompetes other multiple locus GWAS methods in terms of detection 
power and computing time. 


# `sbl` package installation

`sbl` can be downloaded and installed locally. The download link is
[here](https://github.com/MeiyueComputBio/sbl/tree/master/R%20packge).

```{r installation, eval=FALSE, message=FALSE, warning=FALSE}
install.packages('sbl_0.1.0.tar.gz', repos=NULL, type='source')
```


# SBL for QTL mapping and GWAS

The usage of `sbl` to perform QTL mapping and GWAS is very simple
(Note: Please remove the markers without variation before running the program):

1. Load input data
    + A vector of response variables (observations of the trait).
    + A design matrix for fixed effects, which represents non-genetic factors that 
contribute to the phenotypic variance. This can be the intercept only.
    + A matrix storing genotype indicators. The original three genotypes 
  ($a_1a_1$, $a_1a_2$, $a_2a_2$) should be coded to numeric indicator as (-1, 0, 1) 
    or (0, 1, 2).

```{r load data, message=FALSE, warning=FALSE}
library('sbl')

# load example data
data(phe)
data(intercept)
data(gen)
```


2. Call `sblgwas()` function to perform regression and detect significant markers.

```{r minimal invocation, message=FALSE, warning=FALSE,}
# A minimal invocation of "sblgwas()" function looks like: 
fit1<-sblgwas(x = intercept, y = phe, z = gen)

# Restuls of markers surrounding the second simulated QTL with non-zero effect in the example data
fit1$blup[c(17:21),]
```


Users can arbitrarily set the value of `t` between [0, 2] to control the sparseness of model, 
default is -1. 

```{r hyper parameter, message=FALSE, warning=FALSE}
# Setting t = 0 leads to the most sparse model  
fit2<-sblgwas(x = intercept, y = phe, z = gen, t = 0)
fit2$parm

# Setting t = -2 leads to the least sparse model  
fit3<-sblgwas(x = intercept, y = phe, z = gen, t = -2)
fit3$parm
```


`max.iter` and `min.err` are two thresholds to stop the program when either of them
is met. 
`max.iter` defines the maximum number of iterations that the program is 
allowed to run, default is 200; 
`min.err` defines the minimum threshold of mean squared error of random effects 
estimated from the current and the previous iteration, default is 1e-6.

```{r iteration, message=FALSE, warning=FALSE}
# Set max.iter and min.err to control the convergence of the program 
fit4<-sblgwas(x = intercept, y = phe, z = gen, t = -1, max.iter = 300, min.err = 1e-8)
fit4$parm
```