---
title: "Using provenance to view the lineage of a data item"
date: "June 11, 2020"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using provenance to view the lineage of a data item}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
A common question when debugging is wondering how a variable got the 
value that it has.  This question can be answered with the 
`debug.lineage` function.  When asking for the lineage of a variable
or a plot or file output, the `debug.lineage` function will show
just the lines of code that contributed either directly or
indirectly to the value being queried.

The `debug.lineage` function can also be used to find every variable,
plot, or file that was derived from a particular variable, showing
the lines of code involved in that derivation.

For each data node queried, `debug.lineage` returns a data frame representing its 
forwards (how the data is used), or backwards (how the data was generated) lineage. 

Each data frame contains the following columns:
* `scriptNum` The script number of the data item 
* `scriptName` The name of the script containing the data item
* `startLine` The line number for that part of the lineage
* `code` The line of code for that part of the lineage


## Usage

The function signature for `debug.lineage` is:
```
debug.lineage(..., start.line = NA, script.num = 1, all = FALSE, forward = FALSE)
```

The parameters of this function are:

* `...` The names of data nodes to be queried.
* `start.line` The line number of the queried data nodes.
* `script.num` The script number of the queried data nodes. 
Defaults to script number 1 (main script).
* `all`	If TRUE, this function returns the lineages of all data items.
* `forward` If TRUE, this function returns the forwards lineage (how the data is used) 
instead of the backwards lineage (how the data was generated).

This function may be called only after initialising the debugger using either 
`prov.debug`, `prov.debug.run`, or `prov.debug.file`. For example:
```
prov.debug.run("myScript.R")
debug.lineage(x)
debug.lineage("x", start.line = 5, script.num = 2)
debug.lineage("a", b, forward = TRUE)
debug.lineage(all = TRUE)
```


## Example

Let `myScript.R` be the following:
```
a <- 1
b <- 2
cc <- 4

v1 <- c(a:10)
v1 <- rep(v1, b)

m1 <- matrix(v1, cc)
print(m1)
```

### Backwards lineage of a variable
By default, the backwards lineage of the queried object is returned when the
function is called, showing how it was derived.

If no start lines are specified, the backwards lineage of the last occurence of that
object will be returned.

For example, the result for `debug.lineage(v1)` is:
```
$v1
  scriptNum scriptName startLine             code
1         1 myScript.R         1           a <- 1
2         1 myScript.R         2           b <- 2
3         1 myScript.R         5    v1 <- c(a:10)
4         1 myScript.R         6 v1 <- rep(v1, b)
```
The backwards lineage for the `v1` variable at line 6 is given as it is the last 
occurence of that variable.

The `start.line` parameter is used to specify which occurence of the queried
variable the lineage should be returned for.

For example, `debug.lineage("v1", start.line = 5)` will return the backwards
lineage of `v1` at line 5, resulting in:
```
$v1
  scriptNum scriptName startLine          code
1         1 myScript.R         1        a <- 1
2         1 myScript.R         5 v1 <- c(a:10)
```

### Forwards lineage of a variable
If the `forward` parameter is set to `TRUE`, the forwards lineage will be returned
instead of the default backwards lineage, showing how that object was used.

If no start lines are specified, the forwards lineage of the first occurence of that
object will be returned.

For example, the result for `debug.lineage("v1", forward = TRUE)` is:
```
$v1
  scriptNum scriptName startLine                 code
1         1 myScript.R         5        v1 <- c(a:10)
2         1 myScript.R         6     v1 <- rep(v1, b)
3         1 myScript.R         8 m1 <- matrix(v1, cc)
4         1 myScript.R         9            print(m1)
```
The forwards lineage for the `v1` variable at line 5 is given as it is the first
occurrence of that variable.

The `start.line` parameter may also be used to specify which occurence of the queried
variable the forwards lineage should be returned for.

Multiple objects may be queried per function call for both backwards and forwards
lineages, but only 1 start line may be specified in this case.
