---
title: "Introduction to keyholder"
author: "Evgeni Chasnovski"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to keyholder}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
options(tibble.print_min = 3, tibble.print_max = 3)
```

`keyholder` is a package for storing information (*keys*) about rows of data
frame like objects. The common use cases are to track rows of data without
modifying it and to backup and restore information about rows. This is done with
creating a class __keyed_df__ which has special attribute "keys". Keys are
updated according to changes in rows of reference data frame.

`keyholder` is designed to work tightly with
[dplyr](https://github.com/tidyverse/dplyr) package. All its one- and two-table verbs update keys properly.

```{r setup, include = FALSE}
library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
library(keyholder, quietly = TRUE, warn.conflicts = FALSE)
```

```{r Create basic example tibble}
mtcars_tbl <- mtcars %>% as_tibble()
```


## Set keys

The general agreement is that keys are always converted to [tibble](https://tibble.tidyverse.org). In this way one can use multiple
variables as keys by binding them.

There are two general ways of creating keys:

- With assigning. The assigned object will be converted to tibble with
`as_tibble()`. To make sense it should have the same number of rows as reference
data frame. There are two functions for assigning: `keys<-` and `assign_keys()`
which are basically the same. The former use more suitable for interactive use
and the latter - for piping with
[magrittr](https://magrittr.tidyverse.org)'s pipe operator `%>%`.

```{r set keys by assigning}
mtcars_tbl_keyed <- mtcars_tbl
keys(mtcars_tbl_keyed) <- tibble(id = 1:nrow(mtcars_tbl_keyed))

mtcars_tbl %>% assign_keys(tibble(id = 1:nrow(.)))
```

- With `key_by()` and its scoped variants (`*_all()`, `*_if()` and `*_at()`).
This is similar in its design to `dplyr`'s `group_by()`: it takes some columns
from reference data frame and makes keys from them. It has two important
options: `.add` (whether to add specified columns to existing keys) and
`.exclude` (whether to exclude specified columns from reference data frame).
Grouping is ignored.

```{r set keys by key_by}
mtcars_tbl %>% key_by(vs, am)

mtcars_tbl %>% key_by(starts_with("c"))

mtcars_tbl %>% key_by(starts_with("c"), .exclude = TRUE)

  # Scoped variants
mtcars_tbl %>% key_by_all()

# One can also rename variables before keying by supplying .funs
mtcars_tbl %>% key_by_if(rlang::is_integerish, .funs = toupper)

mtcars_tbl %>% key_by_at(c("vs", "am"))
```

To track rows use `use_id()` which creates a special key `.id` with row numbers
as values.

To properly unkey object use `unkey()`.

```{r unkey}
mtcars_tbl_keyed <- mtcars_tbl %>% key_by(vs, am)

# Good
mtcars_tbl_keyed %>% unkey()

# Bad
attr(mtcars_tbl_keyed, "keys") <- NULL
mtcars_tbl_keyed
```

## Get keys

There are three ways of extracting keys:

- With `keys()`. This function always returns a tibble. In case of no keys it
returns a tibble with number of rows as in reference data frame and zero
columns.

```{r get keys with keys}
mtcars_tbl %>% keys()

mtcars_tbl %>% key_by(vs, am) %>% keys()
```

- With `raw_keys()` which is just a wrapper for `attr(.tbl, "keys")`.

```{r get keys with raw_keys}
mtcars_tbl %>% raw_keys()

mtcars_tbl %>% key_by(vs, am) %>% raw_keys()
```

- With `pull_key()` which works like `dplyr`'s `pull` applied to keys:

```{r get keys with pull_key}
mtcars_tbl %>% key_by(vs, am) %>% pull_key(vs)
```

## Manipulate keys

- Remove certain keys with `remove_keys()` and its scoped variants. If all keys
are removed one can automatically unkey object by setting option `.unkey` to
`TRUE`.

```{r remove keys}
mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys(vs)

mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys(everything(), .unkey = TRUE)

  # Scoped variants
# Identical to previous one
mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys_all(.unkey = TRUE)

mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys_if(rlang::is_integerish)
```

- Restore certain keys with `restore_keys()` and its scoped variants. Restoring
means creating or modifying a column in reference data frame with values taken
from keys. After restoring certain key one can remove it from keys by setting
`.remove` to `TRUE`. There is also an option `.unkey` identical to one in
`remove_keys()` (which is meaningful only in case `.remove` is `TRUE`).

```{r restore keys}
mtcars_tbl_keyed <- mtcars_tbl %>%
  key_by(vs, mpg) %>%
  mutate(vs = 1, mpg = 0)
mtcars_tbl_keyed

mtcars_tbl_keyed %>% restore_keys(vs)

mtcars_tbl_keyed %>% restore_keys(vs, .remove = TRUE)

mtcars_tbl_keyed %>% restore_keys(vs, mpg, .unkey = TRUE)

mtcars_tbl_keyed %>% restore_keys(vs, mpg, .remove = TRUE, .unkey = TRUE)

  # Scoped variants
mtcars_tbl_keyed %>% restore_keys_all()

mtcars_tbl_keyed %>% restore_keys_if(rlang::is_integerish, .remove = TRUE)
```

One important feature of `restore_keys()` is that restoring keys beats 
'not-modifying' grouping variables rule. It is made according to the ideology of
keys: they contain information about rows and by restoring you want it to be
available. Groups are recomputed after restoring.

```{r restore keys grouping}
mtcars_tbl_keyed %>% group_by(vs, mpg)

mtcars_tbl_keyed %>% group_by(vs, mpg) %>% restore_keys(vs, mpg)
```

- Rename keys with `rename_keys()` and its scoped variants. Renaming is done
with `dplyr`'s `rename()` or its scoped variant and so renaming format comes
from them.

```{r rename keys}
mtcars_tbl %>% key_by(vs, am) %>% rename_keys(Vs = vs)

  # Scoped variants
mtcars_tbl %>% key_by(vs, am) %>% rename_keys_all(.funs = toupper)
```

## React to subset

A method for subsetting function `[` is implemented for `keyed_df` to react on 
changes in rows: if rows in reference data frame are rearranged or removed the
same operation is done to keys.

```{r reaction to subset}
mtcars_tbl_subset <- mtcars_tbl %>% key_by(vs, am) %>%
  `[`(c(3, 18, 19), c(2, 8, 9))

mtcars_tbl_subset

keys(mtcars_tbl_subset)
```

## Verbs from dplyr

All one- and two-table verbs from `dplyr` (with present scoped variants) support
`keyed_df`. Most functions react to changes in rows as in `[` but some functions
(`summarise()`, `distinct()` and `do()`) unkey object.

```{r dplyr verbs}
mtcars_tbl_keyed <- mtcars_tbl %>% key_by(vs, am)

mtcars_tbl_keyed %>% select(gear, mpg)

mtcars_tbl_keyed %>% summarise(meanMPG = mean(mpg))

mtcars_tbl_keyed %>% filter(vs == 1) %>% keys()

mtcars_tbl_keyed %>% arrange_at("mpg") %>% keys()

band_members %>% key_by(name) %>%
  semi_join(band_instruments, by = "name") %>%
  keys()
```
