---
title: "interaction effects between endogenous variables"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{interaction effects between endogenous variables}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
EVAL_DEFAULT <- FALSE
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = EVAL_DEFAULT
)
```

```{r setup}
library(modsem)
```

## The Problem

Interaction effects between two endogenous (i.e., dependent) variables work as you would expect for the product indicator methods (`"dblcent"`, `"rca"`, `"ca"`, `"uca"`). However, for the `lms` and `qml` approaches, it is not as straightforward.

The `lms` and `qml` approaches can (by default) handle interaction effects between endogenous and exogenous (i.e., independent) variables, but they do not natively support interaction effects between two endogenous variables. When an interaction effect exists between two endogenous variables, the equations cannot easily be written in "reduced" form, meaning that normal estimation procedures won't work.

## The Solution

Despite these limitations, there is a workaround for both the `lms` and `qml` approaches. Essentially, the model can be split into two parts: one linear and one non-linear. You can replace the covariance matrix used in the estimation of the non-linear model with the model-implied covariance matrix from a linear model. This allows you to treat an endogenous variable as if it were exogenous—provided that it can be expressed in a linear model.

## Example

Let's consider the theory of planned behavior (TPB), where we wish to estimate the quadratic effect of `INT` on `BEH` (`INT:INT`). We can use the following model:

```{r}
tpb <- '
# Outer Model (Based on Hagger et al., 2007)
  ATT =~ att1 + att2 + att3 + att4 + att5
  SN =~ sn1 + sn2
  PBC =~ pbc1 + pbc2 + pbc3
  INT =~ int1 + int2 + int3
  BEH =~ b1 + b2

# Inner Model (Based on Steinmetz et al., 2011)
  INT ~ ATT + SN + PBC
  BEH ~ INT + PBC
  BEH ~ INT:INT
'
```

Since `INT` is an endogenous variable, its quadratic term (i.e., an interaction effect with itself) would involve two endogenous variables. As a result, we would normally not be able to estimate this model using the `lms` or `qml` approaches. However, we can split the model into two parts: one linear and one non-linear.

While `INT` is an endogenous variable, it can be expressed in a linear model since it is not affected by any interaction terms:

```{r}
tpb_linear <- 'INT ~ PBC + ATT + SN'
```

We can then remove this part from the original model, giving us:

```{r}
tpb_nonlinear <- '
# Outer Model (Based on Hagger et al., 2007)
  ATT =~ att1 + att2 + att3 + att4 + att5
  SN =~ sn1 + sn2
  PBC =~ pbc1 + pbc2 + pbc3
  INT =~ int1 + int2 + int3
  BEH =~ b1 + b2

# Inner Model (Based on Steinmetz et al., 2011)
  BEH ~ INT + PBC
  BEH ~ INT:INT
'
```

Now, we can estimate the non-linear model since `INT` is treated as an exogenous variable. However, this would not incorporate the structural model for `INT`. To address this, we can instruct `modsem` to replace the covariance matrix (`phi`) of (`INT`, `PBC`, `ATT`, `SN`) with the model-implied covariance matrix from the linear model while estimating both models simultaneously. To achieve this, we use the `cov.syntax` argument in `modsem`:

```{r}
est_lms <- modsem(tpb_nonlinear, data = TPB, cov.syntax = tpb_linear,
                  method = "lms")
summary(est_lms)

est_qml <- modsem(tpb_nonlinear, data = TPB, cov.syntax = tpb_linear,
                  method = "qml")
summary(est_qml)
```

It is also possible to make `modsem` attempt to split up the model syntax automatically using the
`auto.split.syntax` argument:
```{r}
est_lms <- modsem(tpb, data = TPB, method = "lms", auto.split.syntax = TRUE)
summary(est_lms)
```