--- title: "Risk set" author: "" package: remify date: "" output: rmarkdown::html_document: theme: spacelab highlight: pygments code_folding: show css: "remify-theme.css" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Definition of full, active, active_saturated and manual risk set} %\VignetteEncoding{UTF-8} --- This vignette provides a definition of the _full_, _active_, _active\_saturated_, and _manual_ risk sets, explains how each is specified in `remify::remify()`, and shows how the processed risk set is represented in the returned `remify` object. --- Consider a small example network: ```{r} library(remify) # loading package data(randomREHsmall) # data # processing the edgelist reh <- remify( edgelist = randomREHsmall$edgelist, directed = TRUE, ordinal = FALSE, model = "tie", actors = randomREHsmall$actors, origin = randomREHsmall$origin ) ``` --- # {.tabset .tabset-fade .tabset-pills .tabcontent} ## Definition of risk set A relational event history consists of a time-ordered sequence of (directed or undirected) interactions. For each event, we know: - its __time__ of occurrence, either as a timestamp/date/continuous value or as an order index - the __actors__ involved - the __type__ of the event (if measured) For instance, the first five events of the `randomREHsmall` sequence are: ```{r, include = TRUE} randomREHsmall$edgelist[1:5,] ``` where `time`, `actor1`, `actor2` describe each observed event (in this example, event types are not annotated). When modeling a relational event sequence, we define per each time point a _risk set_: the set of dyadic events that were at risk of occurring at that time point (which always includes the event that actually occurred). The risk set is a central building block of the likelihood for both tie-oriented and actor-oriented models. Four possible definitions are discussed below: _full_, _active_, _active\_saturated_, and _manual_. --- ### The _full_ risk set The _full_ risk set assumes all possible dyads are at risk throughout the entire observation period. For a network with $N$ actors, $C$ event types, and directed events, the risk set has size $D = N(N-1)C$; for undirected events, $D = N(N-1)C/2$. In `randomREHsmall`, dyads are directed, $N = 5$ actors, and $C = 1$ (untyped), so $D = 5 \times 4 \times 1 = 20$. The risk set dyad composition is stored in `reh$index$dyad_map`: ```{r, include = TRUE} head(reh$index$dyad_map) ``` The ID of each dyad is defined by a two-step approach: **Step 1.** Actors' names are sorted alphanumerically and assigned integer IDs from $1$ to $N$: ```{r, include = TRUE} sorted_actors <- sort(randomREHsmall$actors) sorted_actors N <- length(randomREHsmall$actors) # IDs are 1 to N names(sorted_actors) <- 1:N sorted_actors ``` **Step 2.** Dyads are indexed by looping first over `actor1`, then `actor2`, and finally `type` (if applicable), skipping self-loops: ```{r, include = TRUE} sorted_types <- c(" ") # no event type in randomREHsmall C <- length(sorted_types) dyad_mat <- matrix(NA, nrow = N*(N-1)*C, ncol = 3) colnames(dyad_mat) <- c("actor1", "actor2", "type") rownames(dyad_mat) <- 1:(N*(N-1)*C) d <- 1 for(type in sorted_types){ for(actor1 in sorted_actors){ for(actor2 in sorted_actors){ if(actor1 != actor2){ dyad_mat[d,] <- c(actor1, actor2, type) d <- d + 1 } } } } dyad_mat[1:5,] dim(dyad_mat)[1] # full risk set size = 20 ``` The row index in `dyad_mat` corresponds to the `dyadID`. This ID is used throughout 'remverse' to identify dyads. For tie-oriented modeling, the observed dyad IDs per event are stored in `reh$ids$dyad`: ```{r, include = TRUE} head(reh$ids$dyad) # dyadID of the first few observed events ``` --- ### Visualizing the risk set A useful visualization plots the risk set as a grid with senders on the y-axis and receivers on the x-axis. At each time point, the observed dyad is colored green and all other at-risk dyads are colored gray. ```{r, echo = FALSE, dev=c("jpeg"), dev.args = list(bg = "white"), fig.alt = "Visualizing risk set composition at each time point"} risk_set <- expand.grid(sorted_actors, sorted_actors) dyad_occurred <- c(11, 4, 11, 11) op <- par(no.readonly = TRUE) layout_matrix <- matrix(c(1,2,3,4), ncol=2, byrow=TRUE) layout(layout_matrix, widths=c(1/2,1/2), heights=c(1/2,1/2)) par(oma=c(2,0,2,0)) par(mar=c(6,6,1,0)) par(mgp=c(6,1,0)) par(mfrow=c(2,2)) for(m in 1:4){ value <- rep(NA, dim(risk_set)[1]) for(d in 1:length(value)){ if(risk_set[d,1] != risk_set[d,2]){ if(d == dyad_occurred[m]) value[d] <- "#2ffd20" else value[d] <- "#b2b2b2" } else { value[d] <- "#ffffff" } } dat <- data.frame(row=as.numeric(risk_set[,1]), col=as.numeric(risk_set[,2]), value=value) plot.new() plot.window(xlim=c(0.5,N+0.5), ylim=c(0.5,N+0.5), asp=1) with(dat, { rect(col-0.5, row-0.5, col+0.5, row+0.5, col=value, border="#f1f1f1") text(x=c(1:N), y=0, labels=sorted_actors, srt=90, pos=1, xpd=TRUE, adj=c(0.5,0), offset=1.5, cex=0.8) text(x=0, y=c(1:N), labels=sorted_actors, srt=0, pos=2, xpd=TRUE, adj=c(1,0.5), offset=-0.5, cex=0.8) mtext(text="actor2", side=1, line=4, outer=FALSE, adj=0, at=floor(N/2), cex=0.6) mtext(text="actor1", side=2, line=0, outer=FALSE, adj=1, at=floor(N/2)+1, cex=0.6) mtext(text=bquote(t[.(m)]), side=3, line=0, outer=FALSE, adj=1, at=floor(N/2)+1) }) } par(op) ``` Considering the first four time points of `randomREHsmall`, we observe: the dyad _(Colton, Kayla)_ at $t_1$, $t_3$, and $t_4$, and the dyad _(Lexy, Colton)_ at $t_2$. Green cells mark observed events; gray cells mark other at-risk dyads; white cells mark infeasible events (self-loops). For undirected networks, dyads are sorted alphanumerically before processing, so the risk set is restricted to the lower triangular grid. For instance, `c("Lexy", "Colton")` becomes `c("Colton", "Lexy")`. ```{r, echo = FALSE, out.width="50%", dev=c("jpeg"), dev.args = list(bg = "white"), fig.alt = "Visualizing full risk set for undirected network"} op <- par(no.readonly = TRUE) dyad_occurred <- c(NA, NA, 16) for(m in 3){ value <- rep(NA, dim(risk_set)[1]) for(d in 1:dim(risk_set)[1]){ if(risk_set[d,1] != risk_set[d,2]){ if(d == dyad_occurred[m]) value[d] <- "#2ffd20" else if(as.character(risk_set[d,1]) < as.character(risk_set[d,2])) value[d] <- "#b2b2b2" else value[d] <- "#ffffff" } } dat <- data.frame(row=as.numeric(risk_set[,1]), col=as.numeric(risk_set[,2]), value=value) plot.new() plot.window(xlim=c(0.5,N+0.5), ylim=c(0.5,N+0.5), asp=1) with(dat, { rect(col-0.5, row-0.5, col+0.5, row+0.5, col=value, border="#f1f1f1") text(x=c(1:N), y=0, labels=sorted_actors, srt=90, pos=1, xpd=TRUE, adj=c(0.5,0), offset=1.5, cex=0.8) text(x=0, y=c(1:N), labels=sorted_actors, srt=0, pos=2, xpd=TRUE, adj=c(1,0.5), offset=-0.5, cex=0.8) mtext(text="actor2", side=1, line=4, outer=FALSE, adj=0, at=floor(N/2)) mtext(text="actor1", side=2, line=0, outer=FALSE, adj=1, at=floor(N/2)+1) mtext(text=bquote(t[.(m)]), side=3, line=0, outer=FALSE, adj=1, at=floor(N/2)+1) }) } par(op) ``` --- ## The _active_, _active\_saturated_, and _manual_ risk set The _full_ risk set maintains a constant structure throughout the event history: all possible dyads are always at risk. Three alternative definitions accommodate settings where not all dyads are plausibly at risk. --- ### The _active_ risk set In sparse networks, the number of observed dyads can be far smaller than $D$. The _active_ risk set restricts the risk set to only those dyads observed at least once in the event history, and maintains this structure throughout. Specify with `riskset = "active"`. The use of the active risk set can substantially decrease computational time for both statistics and model estimation. However, this reduction excludes dyads that were never observed but might be plausibly at risk. It is good practice to examine the active dyads before proceeding and to consider whether the exclusion of unobserved dyads is defensible for the data at hand. ```{r} reh_active <- remify( edgelist = randomREHsmall$edgelist, directed = TRUE, ordinal = FALSE, model = "tie", actors = randomREHsmall$actors, riskset = "active", origin = randomREHsmall$origin ) # number of dyads in full vs active risk set reh_active$D reh_active$activeD # the active dyads reh_active$riskset_info$included ``` The grid visualization below shows the active risk set at time $t_3$. Only the dyads observed at least once in the history (gray and green tiles) are at risk; all other dyads are excluded (white tiles). ```{r, echo = FALSE, out.width="50%", dev=c("jpeg"), dev.args = list(bg = "white"), fig.alt = "Visualizing active risk set at time t3"} op <- par(no.readonly = TRUE) # active dyads: those observed in randomREHsmall active_dyads <- unique(randomREHsmall$edgelist[, c("actor1","actor2")]) dyad_occurred_m3 <- 11 # Colton -> Kayla at t3 value <- rep(NA, dim(risk_set)[1]) for(d in 1:length(value)){ if(risk_set[d,1] != risk_set[d,2]){ a1 <- as.character(risk_set[d,1]) a2 <- as.character(risk_set[d,2]) is_active <- any(active_dyads$actor1 == a1 & active_dyads$actor2 == a2) if(d == dyad_occurred_m3) value[d] <- "#2ffd20" else if(is_active) value[d] <- "#b2b2b2" else value[d] <- "#ffffff" } else { value[d] <- "#ffffff" } } dat <- data.frame(row=as.numeric(risk_set[,1]), col=as.numeric(risk_set[,2]), value=value) plot.new() plot.window(xlim=c(0.5,N+0.5), ylim=c(0.5,N+0.5), asp=1) with(dat, { rect(col-0.5, row-0.5, col+0.5, row+0.5, col=value, border="#f1f1f1") text(x=c(1:N), y=0, labels=sorted_actors, srt=90, pos=1, xpd=TRUE, adj=c(0.5,0), offset=1.5, cex=0.8) text(x=0, y=c(1:N), labels=sorted_actors, srt=0, pos=2, xpd=TRUE, adj=c(1,0.5), offset=-0.5, cex=0.8) mtext(text="actor2", side=1, line=4, outer=FALSE, adj=0, at=floor(N/2), cex=0.6) mtext(text="actor1", side=2, line=0, outer=FALSE, adj=1, at=floor(N/2)+1, cex=0.6) mtext(text=bquote(t[3]), side=3, line=0, outer=FALSE, adj=1, at=floor(N/2)+1) }) par(op) ``` --- ### The _active\_saturated_ risk set The _active\_saturated_ risk set extends the active risk set by additionally including the reverse direction of each observed dyad and all event types for each observed actor pair. For instance, if A→B is observed, B→A is also included in the risk set. This reflects the assumption that observing an interaction between two actors implies that both directions (and all event types) are structurally possible, even if not yet observed. Specify with `riskset = "active_saturated"`. ```{r} reh_sat <- remify( edgelist = randomREHsmall$edgelist, directed = TRUE, model = "tie", riskset = "active_saturated", actors = randomREHsmall$actors, origin = randomREHsmall$origin ) reh_sat$activeD reh_sat$riskset_info$included ``` --- ### The _manual_ risk set There are settings where only a specific, user-defined set of dyads should be considered at risk throughout the event history. This is the _manual_ risk set, specified via `riskset = "manual"` and the `manual.riskset` argument. `manual.riskset` is a `data.frame` with columns `actor1` and `actor2` (and optionally `type`) listing all dyads that are at risk. This defines a **static** risk set that does not vary over time. Any observed dyads from the edgelist that are absent from `manual.riskset` are added automatically with a warning. Typical use cases include: - **Structural constraints:** only certain actor pairs can interact (e.g., members of specific organizational units, or actors within each other's contact list). - **Sparse networks:** the user has domain knowledge about which dyads are plausibly at risk, beyond just those observed. - **Spatial or group constraints:** only actors present in the same location or group can interact. #### _Specifying a manual risk set_ ```{r} data(randomREH) # Suppose we want to restrict the risk set to dyads among three actors only, # while all other dyads are considered structurally impossible. manual_rs <- data.frame( actor1 = c("Alexander", "Alexander", "Colton", "Colton", "Lexy", "Lexy"), actor2 = c("Colton", "Lexy", "Alexander", "Lexy", "Alexander", "Colton") ) reh_manual <- remify( edgelist = randomREH$edgelist, directed = TRUE, ordinal = FALSE, model = "tie", actors = randomREH$actors, riskset = "manual", manual.riskset = manual_rs, origin = randomREH$origin ) # Active (manual) risk set size reh_manual$activeD # Inspect the decoded risk set reh_manual$riskset_info$included ``` Note that any observed event involving actor pairs outside `manual.riskset` will be automatically added to the risk set, ensuring that the likelihood is well-defined for all observed events. The grid visualization below illustrates the manual risk set at time $t_3$, with dyads involving `"Richard"` and `"Francesca"` excluded (white tiles). ```{r, echo = FALSE, out.width="50%", dev=c("jpeg"), dev.args = list(bg = "white"), fig.alt = "Visualizing manual risk set at time t3"} op <- par(no.readonly = TRUE) dyad_occurred_m3 <- 11 value <- rep(NA, dim(risk_set)[1]) for(d in 1:length(value)){ if(risk_set[d,1] != risk_set[d,2]){ if(d == dyad_occurred_m3) value[d] <- "#2ffd20" else value[d] <- "#b2b2b2" if(risk_set[d,1] %in% c("Richard","Francesca") | risk_set[d,2] %in% c("Richard","Francesca")){ value[d] <- "#ffffff" } } else { value[d] <- "#ffffff" } } dat <- data.frame(row=as.numeric(risk_set[,1]), col=as.numeric(risk_set[,2]), value=value) plot.new() plot.window(xlim=c(0.5,N+0.5), ylim=c(0.5,N+0.5), asp=1) with(dat, { rect(col-0.5, row-0.5, col+0.5, row+0.5, col=value, border="#f1f1f1") text(x=c(1:N), y=0, labels=sorted_actors, srt=90, pos=1, xpd=TRUE, adj=c(0.5,0), offset=1.5, cex=0.8) text(x=0, y=c(1:N), labels=sorted_actors, srt=0, pos=2, xpd=TRUE, adj=c(1,0.5), offset=-0.5, cex=0.8) mtext(text="actor2", side=1, line=4, outer=FALSE, adj=0, at=floor(N/2), cex=0.6) mtext(text="actor1", side=2, line=0, outer=FALSE, adj=1, at=floor(N/2)+1, cex=0.6) mtext(text=bquote(t[3]), side=3, line=0, outer=FALSE, adj=1, at=floor(N/2)+1) }) par(op) ``` Here, tiles for dyads involving `"Richard"` and `"Francesca"` are white (excluded); the risk set contains only dyads among `"Colton"`, `"Kayla"`, and `"Lexy"`. For undirected networks, the same `manual.riskset` logic applies but the grid visualization focuses on the lower triangular, since actor pairs are sorted alphanumerically. ```{r, echo = FALSE, out.width="50%", dev=c("jpeg"), dev.args = list(bg = "white"), fig.alt = "Visualizing manual risk set at time t3 - undirected"} op <- par(no.readonly = TRUE) dyad_occurred <- c(NA, NA, 16) for(m in 3){ value <- rep(NA, dim(risk_set)[1]) for(d in 1:length(value)){ if(risk_set[d,1] != risk_set[d,2]){ if(d == dyad_occurred[m]) value[d] <- "#2ffd20" else if(as.character(risk_set[d,1]) < as.character(risk_set[d,2])) value[d] <- "#b2b2b2" if(risk_set[d,1] %in% c("Richard","Francesca") | risk_set[d,2] %in% c("Richard","Francesca")){ value[d] <- "#ffffff" } } else { value[d] <- "#ffffff" } } dat <- data.frame(row=as.numeric(risk_set[,1]), col=as.numeric(risk_set[,2]), value=value) plot.new() plot.window(xlim=c(0.5,N+0.5), ylim=c(0.5,N+0.5), asp=1) with(dat, { rect(col-0.5, row-0.5, col+0.5, row+0.5, col=value, border="#f1f1f1") text(x=c(1:N), y=0, labels=sorted_actors, srt=90, pos=1, xpd=TRUE, adj=c(0.5,0), offset=1.5, cex=0.8) text(x=0, y=c(1:N), labels=sorted_actors, srt=0, pos=2, xpd=TRUE, adj=c(1,0.5), offset=-0.5, cex=0.8) mtext(text="actor2", side=1, line=4, outer=FALSE, adj=0, at=floor(N/2)) mtext(text="actor1", side=2, line=0, outer=FALSE, adj=1, at=floor(N/2)+1) mtext(text=bquote(t[.(m)]), side=3, line=0, outer=FALSE, adj=1, at=floor(N/2)+1) }) } par(op) ``` --- ## The processed risk set After processing, the `remify` object exposes the risk set through `riskset_info` (when `attach_riskset = TRUE`, which is the default). The key field is `riskset_info$included`, which contains the decoded table of at-risk dyads: ```{r} reh_active <- remify( edgelist = randomREH$edgelist, directed = TRUE, ordinal = FALSE, model = "tie", actors = randomREH$actors, riskset = "active", origin = randomREH$origin, riskset_decode = "labels" ) head(reh_active$riskset_info$included) nrow(reh_active$riskset_info$included) # number of active dyads ``` For the full risk set, the dyad map is accessible directly via `reh$index$dyad_map`. For active or manual risk sets, it is in `reh$index$dyad_map_active`. The per-event dyad IDs (mapping each observed event to its position in the risk set) are stored in `reh$ids$dyad`: ```{r} # dyadID of the first 10 observed events head(reh_active$ids$dyad, 10) ``` For actor-oriented modeling with an active or manual risk set, sender and receiver risk sets are stored in `reh$sender_riskset` and `reh$receiver_riskset` respectively, and `reh$index$sender_map` provides the mapping between sender IDs and actor names.