---
title: "remstats"
output: 
  rmarkdown::html_document:
    theme: spacelab
    highlight: pygments
    code_folding: show
vignette: >
  %\VignetteIndexEntry{remstats}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(remstats)
```

<i>This vignette explains how to compute statistics for relational event history data using `remstats()`, for both tie-oriented and actor-oriented models. When using the package please cite:

Meijerink-Bosman, M., Back, M., Geukes, K., Leenders, R., & Mulder, J. (2023). Discovering trends of social interaction behavior over time: An introduction to relational event modeling. Behavior Research Methods. https://doi.org/10.3758/s13428-022-01821-8

</i>

---

## Introduction 

Relational event modeling approaches enable researchers to investigate both exogenous and endogenous factors influencing the evolution of a time-ordered sequence of relational events — commonly known as a *relational event history*. These models are categorized into *tie-oriented models*, where the probability of a dyad interacting next is modeled in a single step (e.g., see Butts, 2008), and *actor-oriented models*, which first model the probability of a sender initiating an interaction and subsequently the probability of the sender's choice of receiver (e.g., see Stadtfeld & Block, 2017). The `R` package `remstats` is designed to compute a variety of statistics for both types of models.

The `remstats` package is part of the `remverse` suite of `R` packages developed at Tilburg University to support applied researchers in relational event modeling. For preparing the relational event history, `remstats` requires prior processing with `remify::remify()`. Model estimation can subsequently be executed using `remstimate`.

---

## Quick workflow example

```{r}
# Load data
data(history)
data(info)

# Set weights to 1 (no event weighting)
history$weight <- 1

# Define effects
effects <- ~ 1 + send("extraversion", info) + inertia()

# Prepare event history
reh <- remify::remify(edgelist = history, model = "tie")

# Compute statistics
stats <- remstats(reh = reh, tie_effects = effects)

# Estimate model parameters (requires remstimate)
# fit <- remstimate::remstimate(reh = reh, stats = stats, method = "MLE")
```

---

## Data

Relational event history data describes a time-ordered series of interactions between actors in a network. A relational event minimally contains information on the time of the event and the actors involved.

We use the `history` dataset from the `remstats` package (see `?history`): a small simulated relational event history with 115 events among 10 actors. Each event also records a `setting` (either `"work"` or `"social"`) and an event `weight`.

```{r}
head(history)
```

Exogenous information on actors is stored in the `info` object (see `?info`), containing age, sex, extraversion, and agreeableness scores measured at multiple time points for each of the 10 actors:

```{r}
head(info)
```

---

# {.tabset .tabset-fade .tabset-pills}

## Tie-oriented model

### Preparing the event history

We prepare the event history for the tie-oriented model with `remify::remify()`. When a `weight` column is present in the edgelist, it is used to weight events in the computation of statistics. Here we set weights to one to avoid this:

```{r}
history$weight <- 1
reh <- remify::remify(edgelist = history, model = "tie")
```

### Computing statistics

Statistics for the tie-oriented model are supplied to the `tie_effects` argument of `remstats()` as a `formula` object of the form `~ terms`. An overview of available statistics is obtained with `tie_effects()` or `?tie_effects`.

As a first example, we request the standardized inertia statistic:

```{r}
effects <- ~ inertia(scaling = "std")
out <- remstats(tie_effects = effects, reh = reh)
```

The output is a 3-dimensional array. Rows correspond to unique time points, columns to potential events in the risk set, and slices to statistics:

```{r}
dim(out)
```

The number of rows equals `reh$M` — the number of unique time points (115 here). When simultaneous events are present, the number of rows is less than the total number of events, since statistics are computed once per unique time point. There are 90 columns corresponding to the $10 \times 9$ directed dyads in the full risk set. The two slices are: (1) a baseline statistic, automatically included when event timing is exact (`ordinal = FALSE`), and (2) the inertia statistic. The baseline can be suppressed by including `-1` in the effects formula.

```{r}
out
```

The risk set composition is stored as an attribute of the output. Each row describes one dyad, with columns `sender`, `receiver`, `type` (if event types are present), and `id` (the column index in the statistics array):

```{r}
head(attr(out, "riskset"))
```

Exogenous statistics require an `attr_actors` object. The `attr_actors` can be passed directly inside effect functions or globally via the `attr_actors` argument of `remstats()`. Statistics are separated by `+` in the formula, and interactions are specified with `:` or `*`:

```{r}
effects <- ~ inertia(scaling = "std") + 
             send("extraversion", info) + 
             inertia(scaling = "std"):send("extraversion", info)
out <- remstats(tie_effects = effects, reh = reh)
out
```

### Memory

The `memory` argument controls which past events are included in the computation of endogenous statistics at each time point $t$. Four options are available:

- `"full"` (default): the entire event history before $t$ is included.
- `"window"`: only events within a time window of length `memory_value` before $t$ are included. For example, `memory = "window", memory_value = 200` includes only events that occurred at most 200 time units ago.
- `"interval"`: only events within a specific time interval before $t$ are included, defined by `memory_value = c(lower, upper)`. For example, `memory_value = c(100, 200)` includes only events between 100 and 200 time units ago.
- `"decay"`: past events are weighted by an exponential decay function with half-life `memory_value`. More recent events receive higher weight.

```{r}
# Full memory (default)
out_full <- remstats(
    tie_effects  = ~ inertia(),
    reh          = reh,
    memory       = "full"
)

# Window memory: only events within the last 500 time units
out_window <- remstats(
    tie_effects  = ~ inertia(),
    reh          = reh,
    memory       = "window",
    memory_value = 500
)

# Decay memory: exponential decay with half-life 200
out_decay <- remstats(
    tie_effects  = ~ inertia(),
    reh          = reh,
    memory       = "decay",
    memory_value = 200
)
```

### Event types and `consider_type`

When an event type variable is present in the relational event history, the `consider_type` argument in each effect function controls how the statistic accounts for event types. We illustrate this using the `setting` column in `history` (values: `"work"`, `"social"`), passed to `remify()` via the `event_type` argument:

```{r}
reh_typed <- remify::remify(
    edgelist   = history,
    model      = "tie",
    event_type = "setting"
)
```

Three options are available for `consider_type`:

**`"ignore"`**: event type is not taken into account. The statistic is computed over all past events regardless of type. This is the appropriate choice when event types are not a substantive concern, or when no types are present.

```{r}
out_ignore <- remstats(
    tie_effects = ~ inertia(consider_type = "ignore"),
    reh         = reh_typed
)
dim(out_ignore)       # one inertia slice plus baseline
dimnames(out_ignore)[[3]]
```

**`"separate"`**: the statistic is computed separately for each event type, resulting in one slice per type. For example, with two types `"social"` and `"work"`, inertia yields: (1) the number of past `"social"` events for each dyad, and (2) the number of past `"work"` events for each dyad.

```{r}
out_separate <- remstats(
    tie_effects = ~ inertia(consider_type = "separate"),
    reh         = reh_typed
)
dim(out_separate)       # two inertia slices (one per type) plus baseline
dimnames(out_separate)[[3]]
```

**`"interact"`**: the statistic conditions on the type of the most recent event. The statistic at time $t$ is computed only over past events of the same type as the event at $t-1$, effectively allowing the effect of past event history to depend on the type of the triggering event.

```{r}
out_interact <- remstats(
    tie_effects = ~ inertia(consider_type = "interact"),
    reh         = reh_typed
)
dim(out_interact)
dimnames(out_interact)[[3]]
```

---

## Tie-oriented model with case-control sampling

For large networks where computing statistics over the full risk set is computationally expensive, `remstats` supports *case-control sampling* (also known as dyad sampling). Instead of computing statistics for all dyads in the risk set at each time point, a random sample of dyads is drawn and used as the comparison set, substantially reducing computation time.

Case-control sampling is enabled by setting `sampling = TRUE` in `remstats()`. Two additional arguments control the sampling:

- `samp_num`: the number of dyads to sample per event (must be ≤ the size of the active risk set).
- `seed`: an optional integer random seed for reproducibility.

```{r}
reh <- remify::remify(edgelist = history, model = "tie")

out_sampled <- remstats(
    tie_effects = ~ inertia() + send("extraversion", info),
    reh         = reh,
    sampling    = TRUE,
    samp_num    = 20,
    seed        = 42
)
```

The output is an object of class `tomstats_sampled`. Its structure differs from the standard output: rather than a full array over all dyads, it stores only the statistics for the sampled dyads plus the observed event at each time point. This object can be passed directly to `remstimate` for model estimation using the case-control likelihood.

Note that case-control sampling is only available for the tie-oriented model and is not supported for the actor-oriented model.

---

## Actor-oriented model

### Preparing the event history

We prepare the event history for the actor-oriented model. Note that actor-oriented modeling requires directed events:

```{r}
reh <- remify::remify(edgelist = history, model = "actor")
```

### Computing statistics

For the actor-oriented model, statistics must be specified separately for the two modeling steps:

- **Sender activity rate step**: supplied to `sender_effects` — models which actor sends the next event.
- **Receiver choice step**: supplied to `receiver_effects` — models which actor the sender chooses as receiver.

An overview of available statistics for each step is obtained with `actor_effects()` or `?actor_effects`.

We start by requesting the `outdegreeSender` statistic for the sender step:

```{r}
out <- remstats(
    sender_effects = ~ outdegreeSender(),
    reh            = reh
)
```

The output for the actor-oriented model is a list with two elements:

```{r}
names(out)
```

Since no statistics were requested for the receiver step, `receiver_stats` is empty. The `sender_stats` object contains the baseline statistic (automatically included when `ordinal = FALSE`) and the requested `outdegreeSender` statistic:

```{r}
out
```

The dimensions of `out$sender_stats` are 115 × 10 × 2: rows are unique time points, columns are potential senders, and slices are statistics.

We extend the model with a statistic for the receiver choice step:

```{r}
out <- remstats(
    sender_effects   = ~ outdegreeSender(),
    receiver_effects = ~ inertia(),
    reh              = reh
)
```

The statistics for the receiver choice step are accessed with `out$receiver_stats`. The baseline statistic is not defined for the receiver step, so its dimensions are 115 × 10 × 1: rows are time points, columns are potential receivers, and slices are statistics.

The computed values in the receiver choice step give the statistic for each potential receiver, given the current sender. For example:

```{r}
# Label columns with receiver names and rows with senders
dimnames(out$receiver_stats)[[2]] <- reh$meta$dictionary$actors$actorName
dimnames(out$receiver_stats)[[1]] <- reh$edgelist$actor1[-1]
head(out$receiver_stats[,, "inertia"])
```

At the first time point, inertia is zero for all receivers because no prior events have occurred. At the second time point, inertia is 1 for the receiver of the first event (given the same sender). At the third time point, the sender is a different actor with no prior sending history, so inertia is again zero for all receivers.
