---
title: "fetwfe: A Package for Fused Extended Two-Way Fixed Effects"
author: "Gregory Faletto"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    math_method: "mathml"
    output_file: "FETWFE_vignette.html"
vignette: >
  %\VignetteIndexEntry{fetwfe: A Package for Fused Extended Two-Way Fixed Effects}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\VignetteDepends{dplyr, bacondecomp}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
```

# Introduction

If you understand the basic idea of what difference-in-differences with staggered adoptions is, all you need to know about fused extended two-way fixed effects (FETWFE) to get started using the `{fetwfe}` package is this: given an appropriately formatted panel data set, `fetwfe()` will give you an estimate of the overall average treatment effect on the treated units, the average treatment effect within each cohort, and standard errors for each of these estimates.

Feel free to skip to the "Package Usage" section if you want to jump right in to using the package. In the next "Background" subsection, you can read a little more background information on the methodology if you'd like.

## Background

This vignette is written under the assumption that you're at least vaguely familiar with developments in [difference-in-differences](https://en.wikipedia.org/wiki/Difference_in_differences) with [staggered adoptions](https://mike-data-analysis.share.connect.posit.cloud/sec-difference-in-differences.html#sec-multiple-periods-and-variation-in-treatment-timing) since about 2018. Just to make sure we're on the same page, the brief recap is:

- Historically, under staggered adoptions researchers used the [standard two-way fixed effects estimator](https://mike-data-analysis.share.connect.posit.cloud/sec-difference-in-differences.html#sec-two-way-fixed-effects) and interpreted the coefficient on the treatment dummy as an average treatment effect on the treated units.
- In the late 2010s, econometricians formally checked what this estimator was doing and found that in fact, this coefficient was not any kind of reasonable average treatment effect estimator.
- Since then, a number of new difference-in-differences estimators that are asymptotically unbiased under staggered adoptions have been developed.

The estimator in this package, fused extended two-way fixed effects (FETWFE), is one of those asymptotically unbiased estimators. Of course, I made this estimator because I think FETWFE brings something to the table that the others don't. Here's a brief summary on that:

One issue with these estimators has been that they've worked so hard to be unbiased that they are **inefficient** (in the language of econometrics), or **high-variance** (in the language of machine learning). These estimators add extra parameters in order to remove bias, but estimating extra parameters means you have less data per parameter and your estimates are noisier.

In machine learning, creating a more flexible estimator with lots of parameters and then finding that it is too high variance (that is, it *overfits*) is a familiar issue. The most common solution has been regularization.

You could just add $\ell_2$ or $\ell_1$ regularization to a difference-in-differences regression estimator and probably see an improvement in your efficiency, but FETWFE does something more sophisticated than that. (Plus, that approach wouldn't allow you to get valid standard errors for your treatment effect estimates, but FETWFE does.) Qualitatively, FETWFE uses machine learning to learn which of these added parameters were actually unnecessary to add, and then takes them back out in order to improve efficiency.

That's all the description I'll give you in this vignette. You can learn all of the details in the paper on arXiv:

> **[Fused Extended Two-Way Fixed Effects for Difference-in-Differences With Staggered Adoptions](https://arxiv.org/abs/2312.05985)**

If you want to learn a little more before you dive into the full paper, here are some other resources with descriptions of the methodology that provide a little more detail than this vignette:

- My [blog post](https://gregoryfaletto.com/2023/12/13/new-paper-fused-extended-two-way-fixed-effects-for-difference-in-differences-with-staggered-adoptions/) announcing the paper.
- [Some slides](https://gregoryfaletto.com/2024/02/11/presentation-on-fused-extended-two-way-fixed-effects/) I made for a presentation on FETWFE.
- Another [blog post](https://gregoryfaletto.com/2025/01/03/new-r-fetwfe-package-implementing-fused-extended-two-way-fixed-effects/) focused on what this package is doing under the hood.

But the headline summary of what fused extended two-way fixed effects brings to the table in a crowded field of estimators is: **fused extended two-way fixed effects is not only unbiased, it also uses machine learning to maximize efficiency (minimize variance)**. Further, unlike many machine learning estimators, **fused extended two-way fixed effects gives you valid standard errors for the treatment effect estimates.**

# Package Usage

The package provides a single exported function, `fetwfe()`, which implements the FETWFE estimator. Its primary arguments include:

- **`pdata`**: A data frame in panel (long) format.
- **`time_var`**: A character string specifying the name of the time period variable.
- **`unit_var`**: A character string specifying the unit (e.g. state, firm) variable.
- **`treatment`**: A character string specifying the treatment indicator variable (which must be an absorbing binary indicator).
- **`response`**: A character string specifying the response (outcome) variable.
- **`covs`**: A character vector of covariate names (typically time-invariant or the pre-treatment values), if applicable.
- **Additional tuning parameters:** such as the tuning parameter for the bridge penalty (controlled via argument `q`) and options for verbosity, standard error calculation, and so on.

The function returns a list containing, for example, the estimated overall average treatment effect, cohort-specific treatment effects, standard errors (when available), and various diagnostic quantities.

You can get the full documentation details by using `?fetwfe` in R when you have the package loaded.

For a detailed discussion of how the package's standard errors are computed, the assumptions they rely on, and an experimental cluster-robust option (`se_type = "cluster"`) suitable as a sensitivity check, see the companion vignette `inference_vignette`.

In the next sections, we'll walk through examples of how `fetwfe()` is used.

## Simulated Data Example

I'll start illustrating how to use `fetwfe()` by using a simulated data set. The example below simulates a balanced panel with 20 time periods and 30 individuals. Each individual is assigned at random to one of five cohort levels; with the randomly drawn treatment timing, three of those levels resolve to cohorts that are actually treated within the panel window and the rest are never treated.

In the simulation, each individual is assigned a random cohort (which determines the timing of treatment) and three time-invariant covariates are generated. The response variable is constructed so that, after treatment, its evolution depends on a treatment effect (which varies by cohort) and a linear trend, plus the covariates and some random noise.

Below is the complete code for simulating the data, converting it into the required pdata format, and running the `fetwfe()` function.

**I borrowed some of the below code from [Asjad Naqvi](https://asjadnaqvi.github.io/)'s helpful [website for DiD estimators](https://asjadnaqvi.github.io/DiD/docs/code_r). Thanks for sharing the code publicly!**

```{r}
# Set seed for reproducibility
set.seed(123456L)

# 20 time periods, 30 individuals, and 5 cohort levels
tmax = 20; imax = 30; nlvls = 5

dat =
  expand.grid(time = 1:tmax, id = 1:imax) |>
  within({
    cohort      = NA
    effect      = NA
    first_treat = NA
    cov1        = NA
    cov2        = NA
    cov3        = NA
    for (chrt in 1:imax) {
      cohort = ifelse(id==chrt, sample.int(nlvls, 1), cohort)
    }
    for (lvls in 1:nlvls) {
      effect      = ifelse(cohort==lvls, sample(2:10, 1), effect)
      first_treat = ifelse(cohort==lvls, sample(1:(tmax+6), 1), first_treat)
    }
    # three time-invariant covariates: one value per individual,
    # drawn AFTER the timing loop so first_treat is unchanged
    for (chrt in 1:imax) {
      cov1 = ifelse(id==chrt, rnorm(1), cov1)
      cov2 = ifelse(id==chrt, rnorm(1), cov2)
      cov3 = ifelse(id==chrt, rnorm(1), cov3)
    }
    first_treat = ifelse(first_treat>tmax, Inf, first_treat)
    treat       = time >= first_treat
    rel_time    = time - first_treat
    y           = id + time + ifelse(treat, effect*rel_time, 0) +
                  0.5*cov1 - 0.7*cov2 + 1.2*cov3 + rnorm(imax*tmax)
    rm(chrt, lvls, cohort, effect)
  })

head(dat)
```

The simulated data (`dat`) now has columns for time, id, a treatment indicator (`treat`), three time-invariant covariates (`cov1`, `cov2`, `cov3`), and a response variable (`y`). Next, we convert this data into the panel data format required by `fetwfe()`.

```{r}
library(dplyr)

# Specify column names for the pdata format
time_var <- "time"       # Column for the time period
unit_var <- "unit"       # Column for the unit identifier
treatment <- "treated"   # Column for the treatment dummy indicator
response <- "response"   # Column for the response variable

# Convert the dataset
pdata <- dat |>
  mutate(
    # Rename id to unit and convert to character
    {{ unit_var }} := as.character(id),
    # Ensure treatment dummy is 0/1
    {{ treatment }} := as.integer(treat),
    # Rename y to response
    {{ response }} := y
  ) |>
  select(
    {{ time_var }}, {{ unit_var }}, {{ treatment }}, {{ response }},
    cov1, cov2, cov3
  )

# Preview the resulting pdata dataframe
head(pdata)
```

Now that `pdata` is properly formatted, we run the FETWFE estimator on the simulated data.

```{r}

library(fetwfe)

# Run the FETWFE estimator on the simulated data
result <- fetwfe(
  pdata = pdata,                        # The panel dataset
  time_var = "time",                    # The time variable
  unit_var = "unit",                    # The unit identifier
  treatment = "treated",                # The treatment dummy indicator
  response = "response",                # The response variable
  covs = c("cov1", "cov2", "cov3")      # The time-invariant covariates
)

# Display the average treatment effect estimates
summary(result)
```

When you run this code, the function internally performs all the necessary data preparation, applies the fusion penalty via a bridge regression (using the `grpreg` package), and returns a list with overall and cohort-specific treatment effect estimates, standard errors (if available), and additional diagnostics.

## A "Real Data" Example

Next I illustrate FETWFE in an empirical context. I'll use the `castle` data set from the `bacondecomp` package, which comes from the study by Cheng and Hoekstra (2013) of castle-doctrine ("stand-your-ground") laws and homicide. It is a balanced state-year panel covering all 50 states over 2000-2010, with the laws adopted in a staggered fashion across states from 2005 to 2009.

```{r}
library(bacondecomp)  # for the example data

# Load the example data
data(castle)

# Response: the log homicide rate, so the ATT reads roughly as a
# percentage change in the homicide rate.
castle$l_homicide <- log(castle$homicide)

# Treatment indicator. `cdl` records the share of the year the
# castle-doctrine law was in effect; a state is treated from the year its
# law took effect, so we binarize with `cdl > 0`. `fetwfe()` requires the
# treatment column to be an integer 0/1 absorbing indicator.
castle$treated <- as.integer(castle$cdl > 0)

# Call the estimator. Here
# - 'year' is the time period variable (an integer),
# - 'state' is the unit identifier,
# - 'treated' is the absorbing treatment indicator.
res <- fetwfe(
    pdata = castle,
    time_var = "year",
    unit_var = "state",
    treatment = "treated",
    response = "l_homicide"
    )

summary(res)

# Average treatment effect on the treated units (in percentage point
# units)
100 * res$att_hat

# Conservative 95% confidence interval for ATT (in percentage point units)

low_att <- 100 * (res$att_hat - qnorm(1 - 0.05 / 2) * res$att_se)
high_att <- 100 * (res$att_hat + qnorm(1 - 0.05 / 2) * res$att_se)

c(low_att, high_att)

# Cohort average treatment effects and confidence intervals (in percentage
# point units)

catt_df_pct <- res$catt_df
catt_df_pct[["Estimated TE"]] <- 100 * catt_df_pct[["Estimated TE"]]
catt_df_pct[["SE"]] <- 100 * catt_df_pct[["SE"]]
catt_df_pct[["ConfIntLow"]] <- 100 * catt_df_pct[["ConfIntLow"]]
catt_df_pct[["ConfIntHigh"]] <- 100 * catt_df_pct[["ConfIntHigh"]]

catt_df_pct
```

FETWFE estimates that adopting a castle-doctrine law is associated with roughly a +5.5% change in the homicide rate. The sign and rough magnitude are consistent with Cheng and Hoekstra (2013), who found that these laws increased homicide. FETWFE fused all five adoption cohorts (those adopting in 2005 through 2009) to a single common effect, so each row of `res$catt_df` reports the same estimate.

This example is covariate-free; `castle`'s smallest cohorts contain a single state, so the design cannot support additional covariates. Covariate handling is illustrated in the simulated example above — the `covs` argument is optional.

## Testing the zero-effect null

Empirical users of difference-in-differences routinely want to ask "is this treatment effect statistically distinguishable from zero?". The confidence intervals in `catt_df` answer that implicitly --- a CI that excludes 0 corresponds to rejecting `H_0: tau = 0` at level `alpha`. The package also surfaces a `P_value` column (two-sided `2 * pnorm(-|estimate / se|)`) and, for `fetwfe()` and `betwfe()`, a `selected` logical flag, both at the overall ATT level and per-cohort.

For `fetwfe()`, the `selected` flag carries something stronger than the usual CI test. Under the restriction selection consistency theorem (Theorem 6.2 in the paper), when the true `tau_ATT(r, t) = 0`, the estimator returns *exactly* 0 with probability tending to 1. The package's interpretation: when `selected = FALSE` for a cohort, the asymptotic conclusion is that the truth is zero --- a strictly stronger statement than what a standard CI provides. The package surfaces this as an *asymptotic 100% confidence interval of `{0}`* for selected-out cohorts. For selected-out cohorts, `P_value` is reported as `NA` --- the inferential content lives in `selected`.

```{r zero-effect-demo, message = FALSE}
set.seed(2026)
sim <- genCoefs(R = 3, T = 6, d = 2, density = 0.5, eff_size = 2)
dat <- simulateData(sim, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5)
res_demo <- fetwfeWithSimulatedData(dat, verbose = FALSE)
res_demo$catt_df
```

`betwfe()` uses bridge regression directly on the coefficients (rather than on the fused restrictions used by `fetwfe()`). Under the bridge oracle property of Kock (2013), `selected = FALSE` for `betwfe()` is an analogous asymptotic statement that the truth is zero, under a sparsity assumption different from the one Theorem 6.2 establishes for `fetwfe()`. For `etwfe()` and `twfeCovs()`, neither has a selection step, so the `selected` column is omitted; the `P_value` column is a standard post-OLS t-test.

It is worth keeping in mind that the asymptotic 100% CI of `{0}` interpretation is an `N -> infinity` statement. In small samples, it may be wise to read `selected` and `P_value` together rather than relying on either alone. In particular, even when `selected = TRUE`, the standard CI may cover zero --- the selection step says "the truth is nonzero" but the magnitude CI says "could be zero".

## Tidy outputs with broom

If you use the tidyverse, the fitted object also plays nicely with the [broom](https://broom.tidymodels.org/) package. `tidy()` reshapes the overall ATT and per-cohort CATTs into a long data frame with the standard `term` / `estimate` / `std.error` / `statistic` / `p.value` / `conf.low` / `conf.high` columns, plus a `selected` flag (see "Testing the zero-effect null" below):

```{r tidy-demo, eval = requireNamespace("broom", quietly = TRUE)}
library(broom)

tidy_res <- tidy(result)
tidy_res
```

`glance()` returns a one-row model-level summary — panel-shape counts, the bridge-regression tuning, the variance components, and the inference settings:

```{r glance-demo, eval = requireNamespace("broom", quietly = TRUE)}
glance(result)
```

`augment()` appends `.fitted` and `.resid` columns to your panel (auto-aligned to the design the estimator actually fit on, so the same raw `pdata` you handed to `fetwfe()` works):

```{r augment-demo, eval = requireNamespace("broom", quietly = TRUE)}
head(augment(result, data = pdata))
```

From there, the tidied output goes straight into `ggplot2` or `modelsummary` without needing to read the package's class documentation:

```{r tidy-plot, eval = requireNamespace("broom", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("dplyr", quietly = TRUE)}
library(ggplot2)
tidy_res |> 
  dplyr::filter(term != "ATT") |>
  # Remove non-digits to extract the number, then reorder the factor levels
  dplyr::mutate(term = reorder(term, as.numeric(gsub("\\D", "", term)))) |>
  ggplot(aes(x = term, y = estimate)) +
  geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = NULL, y = "Cohort treatment effect", title = "Cohort ATTs")
```

See the simulation vignette for an example of how you can use functions in the FETWFE package to simulate panel data.

# Conclusion

This should be enough to get you started using `fetwfe()` on your own data. Please feel free to [reach out](https://gregoryfaletto.com/about/) if you have any questions or feedback or run into any issues using the package. You can also [create an issue](https://github.com/gregfaletto/fetwfePackage/issues) if you think there’s a bug in the package or you’d like to request a feature. **Thanks so much for checking out the package!**

# References

- Cheng, C., & Hoekstra, M. (2013). Does Strengthening Self-Defense Law Deter Crime or Escalate Violence? Evidence from Expansions to Castle Doctrine. *Journal of Human Resources*, 48(3), 821-854.
- Faletto, G. (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. [arXiv preprint arXiv:2312.05985](https://arxiv.org/abs/2312.05985).
- Flack, E., & Jee, E. (2020). bacondecomp: Goodman-Bacon Decomposition. R package version 0.1.1. [https://CRAN.R-project.org/package=bacondecomp](https://CRAN.R-project.org/package=bacondecomp).
- Kock, A. B. (2013). Oracle inequalities for high-dimensional nonparametric models. *Econometric Theory*.

