---
title: "Data Input for pandemonium"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data input for pandemonium}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)
```

## Inputs

To launch the app, the user needs to provide a data frame with the observations. 

```{r,eval=FALSE}
pandemonium(df = Bikes$df)
```

This will automatically sort the variables into numeric and other variables, as well as impute any missing data using `VIM::kNN()` if VIM is installed otherwise entries with missing data will be removed.
Numeric variables can be sorted into the *clustering space* (space1) and the *linked space* (space2) once loaded into the app in the data screen. Variables can be added or removed from each space with the drop down selector that appears by clicking on the box.

Alternatively, the data can be passed in as two separate arrays with the clustering variables in `df = space1` and the linked variables in `linked = space2`. This will select them in space 1 and 2 in the data screen when loading the app. This can be done like this:

```{r,eval=FALSE}
pandemonium(df = Bikes$space1, linked = Bikes$space2)
```

### Optional inputs

A complete input for `pandemonium()` includes optional data and function inputs. All inputs are shown in the following call.
```{r,eval=FALSE}
pandemonium(df,
  cov = NULL, is.inv = FALSE, exp = NULL, linked = NULL, linked.cov = NULL,
  linked.exp = NULL, group = NULL, label = NULL, user_dist = NULL,
  dimReduction = list(tSNE = tSNE, umap = umap), getCoords = list(normal = normCoords), getScore = NULL
)
```

#### Data Inputs
| Input      | Type       | Applies to | Default | Purpose     |
|------------|------------|------------|---------|-------------|
| `label`            | vector, length = n  | points     | row index | Shown in tours/dim. reduction hover text |
| `group`            | vector / data.frame | points     | none    | Define user-specified groups; categorical or numeric |
| `cov`,`linked.cov`*| matrix              | group/space | computed via `stats::cov` | Used in `getScores`, `getCoords`, anomaly tour |
| `exp`,`linked.exp` | data frame with column `value` length of number of variables in space| variables | mean vector | Reference point in space used in getCoords |
| `user_dists`       | matrix              | space1     | ignored | Advanced: overrides `getDists` output |

\* `cov` can also be the inverse covariance matrix by setting `is.Inv=TRUE`

#### Function inputs

| Input  | Type |  Use |
|---|--------|---------|
|`getCoords`| named list of coordinate functions | computes coordinates for distance calculations|
|`getScores`| Function that returns a named list   | computes scores and/or bins for use in plotting|

> See `vignette("get-scores")` and `vignette("get-coords")` for more information on these inputs.

## The data page

Once a call to `pandemonium()` is made the app will load into the data page which looks like below.
```{r,echo=FALSE}
knitr::include_graphics("Images/data_Input_page.png")
```

On this page, variables can be removed from either space or even moved between them. A Coordinate function can be selected from the input functions. There are also two additional inputs for groupings or flags as well as a label.

#### Groupings
In this input you can select variables passed to pandemonium in the `group=` input, as well as any variables removed from space 1 or space 2 by deleting them in their inputs. Variables that were automatically removed from the two spaces for being non-numeric can also be selected. This is designed for categorical data so that it can be compared to clustering results. The selected variable(s) will be converted into a single factor with `interaction()`.

#### Label

In this input you can select the label passed to pandemonium in the `label=` input, as well as the same removed variables from above. This input is designed to give a unique label for each point so row numbers or unique IDs are recommended.

#### Moving Variables

One of the key features of the data input page is the ability to move variables across spaces after loading the application.
This feature can have some unexpected effects on the covariance matrix and reference point if provided. Removing a variable from a space will slice out the corresponding entries from the covariance matrix and reference point. Adding variables to a space will cause the covariance matrix and reference point to be recalculated, `cov()` is used for the covariance matrix and `colMeans()` for the reference point. If the provided covariance matrix is an inverse covariance matrix it is first inverted using `solve()` before slicing. In some cases this may behave unexpectedly and altering variables may be better done by relaunching pandemonium with the correctly filtered data.