---
title: "Introduction to rTPC"
author: "Daniel Padfield"
output: rmarkdown::html_vignette
date: "`r Sys.Date()`"
description: >

vignette: >
  %\VignetteIndexEntry{Introduction to rTPC}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

#### An introduction to rTPC and how it can be used to fit thermal performance curves using nls.multstart.

------------------------------------------------------------------------

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  #tidy.opts=list(width.cutoff=60),
  #tidy=TRUE,
  fig.align = 'center'
)
```

```{r runtimer, include=FALSE}
runstart <- lubridate::now()
```

```{r setup, message=FALSE}
# load packages
library(rTPC)
library(nls.multstart)
library(broom)
library(tidyverse)
```

**rTPC** provides a suite of functions to help fit thermal performance curves to empirical data. After searching the literature, **rTPC** contains `r length(rTPC::get_model_names())` different model formulations that have been used previously. These functions can be easily applied to methods in R that use non-linear least squares regression to estimate thermal performance curves.

The available model formulations can be accessed using **get_model_names()**.

```{r get_mod_names}
# list model names
get_model_names()
```

They are generally named after the author of the paper (and hence the name of the model within the literature) and the year at which I found the model to be first used, separated by a **"\_"**. Some original model formulations have been altered so that all models take temperature in degrees centigrade and raw rate values as input.

We can demonstrate the fitting procedure by taking a single curve from the example dataset **rTPC** - a dataset of 60 TPCs of respiration and photosynthesis of the aquatic algae, *Chlorella vulgaris*. We can plot the data using **ggplot2**.

```{r first_plot, fig.width=6, fig.height = 4}
# load in data
data("chlorella_tpc")

# keep just a single curve
d <- filter(chlorella_tpc, curve_id == 1)

# show the data
ggplot(d, aes(temp, rate)) +
  geom_point() +
  theme_bw(base_size = 12) +
  labs(
    x = 'Temperature (ºC)',
    y = 'Metabolic rate',
    title = 'Respiration across temperatures'
  )

```

For each model, **rTPC** has helper functions that estimate sensible start values (**get_start_vals()**), lower (**get_lower_lims()**) and upper (**get_upper_lims()**) limits. To demonstrate this, we shall use the sharpe-schoolfield model for high temperature inactivation only.

```{r}
# choose model
mod = 'sharpschoolhigh_1981'

# get start vals
start_vals <- get_start_vals(
  d$temp,
  d$rate,
  model_name = 'sharpeschoolhigh_1981'
)

# get limits
low_lims <- get_lower_lims(d$temp, d$rate, model_name = 'sharpeschoolhigh_1981')
upper_lims <- get_upper_lims(
  d$temp,
  d$rate,
  model_name = 'sharpeschoolhigh_1981'
)

start_vals
low_lims
upper_lims
```

One problem with most methods of fitting models in R using non-linear least squares regression is that they are sensitive to the choice of starting parameters. This problem also occurs in previous specialist R packages that help fit thermal performance curves, such as [devRate](https://github.com/frareb/devRate/) and [temperatureresponse](https://github.com/low-decarie/temperatureresponse). These methods can fail entirely or give different parameter estimates between multiple runs of the same code.

To overcome this, we recommend using the R package [**nls.multstart**](https://github.com/padpadpadpad/nls.multstart), which uses **minpackLM::nlsLM()**, but allows for multiple sets of starting parameters. It iterates through multiple starting values, attempting a fit with each set of start parameters. The best model is then picked using AIC scores.

Using **nls_multstart()**, we will use Latin Hypercube Sampling (LHS), which can only be used when `iter` is set to a single number. Instead of sampling from a uniform distribution across the bounds of each parameter, these methods try to take a set of samples from the range of parameter values that covers the parameter space optimally for any given set of parameters. _This approach can result in less iterations being needed to get the same reliability of model fitting than either the shotgun or grid-start approaches_.

```{r fit_model}
# fit model
fit <- nls_multstart(
  rate ~ sharpeschoolhigh_1981(temp = temp, r_tref, e, eh, th, tref = 15),
  data = d,
  iter = 500,
  start_lower = start_vals - 10,
  start_upper = start_vals + 10,
  lower = low_lims,
  upper = upper_lims,
  supp_errors = 'Y',
  lhstype = 'random'
)

fit
```

To calculate additional parameters of interest, we can use **rTPC::calc_params()**. This function uses high resolution predictions of the fitted model to estimate traits associated with a thermal performance curve. The currently available methods can be viewed by running `?calc_params`. For example, we may be interested in variation in the optimum temperature, $T_{opt}$, given that we adapted algae to different temperatures.

```{r}
# calculate additional traits
calc_params(fit) %>%
  # round for easy viewing
  mutate_all(round, 2)
```

Finally for this introduction, we can get predictions of our model using **broom::augment()**, which is similar to **predict()**. These are then plotted over our original data.

```{r pred_and_plot, fig.width=6, fig.height = 4}
# predict new data
new_data <- data.frame(temp = seq(min(d$temp), max(d$temp), 0.5))
preds <- augment(fit, newdata = new_data)

# plot data and model fit
ggplot(d, aes(temp, rate)) +
  geom_point() +
  geom_line(aes(temp, .fitted), preds, col = 'blue') +
  theme_bw(base_size = 12) +
  labs(
    x = 'Temperature (ºC)',
    y = 'Metabolic rate',
    title = 'Respiration across temperatures'
  )

```

```{r tot_time, include=FALSE}
tot_time <- lubridate::as.duration(lubridate::now() - runstart)
```

[Built in `r tot_time`s]{style="opacity: 0.1;font-size: small;"}