Getting started with GLBFP

This vignette shows the basic workflow for pointwise and grid-based density estimation with GLBFP. It is intentionally practical. For a conceptual map of the full package, see the “Package overview and workflow map” vignette.

Installation

install.packages("remotes")
remotes::install_github("AurelienNicosiaULaval/GLBFP")

Load the package

library(GLBFP)

Estimate density at one point

The functions ASH(), LBFP(), and GLBFP() estimate the density at a single point. The data are supplied as a numeric matrix or data frame with observations in rows.

x <- matrix(rnorm(300), ncol = 1)
b <- compute_bi_optim(x, m = 1)

fit <- GLBFP(x = 0, data = x, b = b, m = 1)
fit
#> GLBFP Density Estimation:
#> Point: (0) 
#> Estimated density: 0.386735 
#> Estimated standard error: 0.0838592 
#> 95% confidence interval: 0.362643, 0.410827 
#> Bandwidths (b): 0.155145 
#> Shifts (m): 1 
#> Relative grid coordinate (u): 0.916139

Lowercase aliases are also available for interactive workflows. They call the same estimators and keep the uppercase API unchanged.

fit_alias <- glbfp(x = 0, data = x, b = b, m = 1)
identical(fit$estimation, fit_alias$estimation)
#> [1] TRUE

The returned object contains the evaluation point, the estimated density, the bandwidth, and the shift parameter. For LBFP() and GLBFP(), uncertainty components are also returned.

names(fit)
#>  [1] "x"            "estimation"   "sd"           "IC"           "b"           
#>  [6] "m"            "method"       "dimension"    "u"            "cell_index"  
#> [11] "visited"      "prefix_nodes" "prefix_order"
summary(fit)
#> Method: GLBFP 
#> Dimension: 1 
#> Point: 0 
#> Estimation: 0.3867351 
#> Standard error: 0.0838592 
#> 95% CI: 0.362643259199835, 0.410826868862035 
#> Bandwidths (b): 0.155144970240404 
#> Shifts (m): 1 
#> Relative grid coordinate (u): 0.91613931912711 
#> Visited cells: 2 
#> Prefix nodes: 2
predict(fit)
#> [1] 0.3867351

Estimate density on a grid

The *_estimate() functions evaluate the same estimator over a regular grid or a user-supplied set of points.

grid_fit <- GLBFP_estimate(data = x, b = b, m = 1, grid_size = 80)
head(cbind(grid_fit$grid, density = grid_fit$densities))
#>             V1     density
#> [1,] -2.546881 0.085941125
#> [2,] -2.481241 0.067760688
#> [3,] -2.415601 0.049580251
#> [4,] -2.349960 0.031399814
#> [5,] -2.284320 0.017352329
#> [6,] -2.218680 0.008262111
head(as.data.frame(grid_fit))
#>          V1     density          sd    IC_lower   IC_upper visited prefix_nodes
#> 1 -2.546881 0.085941125 0.067942425 0.066422031 0.10546022       2            2
#> 2 -2.481241 0.067760688 0.041194890 0.055925860 0.07959552       2            2
#> 3 -2.415601 0.049580251 0.024146032 0.042643368 0.05651713       2            2
#> 4 -2.349960 0.031399814 0.020860213 0.025406910 0.03739272       2            2
#> 5 -2.284320 0.017352329 0.016030533 0.012746937 0.02195772       1            2
#> 6 -2.218680 0.008262111 0.009668979 0.005484322 0.01103990       1            2

For one-dimensional grids, the plot method returns a ggplot2 object.

plot(grid_fit)

Using the included data

The ashua data contain daily flow and level observations for the Ashuapmushuan river. The example below uses a small grid so the vignette remains fast.

data("ashua")

river_data <- ashua[, c("flow", "level")]
b2 <- c(8, 0.4)
x0 <- c(mean(river_data$flow), mean(river_data$level))

fit2 <- GLBFP(x = x0, data = river_data, b = b2, m = c(1, 1))
fit2
#> GLBFP Density Estimation:
#> Point: (249.0230, 30.4197) 
#> Estimated density: 0.00377442 
#> Estimated standard error: 0.000316735 
#> 95% confidence interval: 0.00376918, 0.00377966 
#> Bandwidths (b): 8.0, 0.4 
#> Shifts (m): 1, 1 
#> Relative grid coordinate (u): 0.690325, 0.224216

grid_fit2 <- GLBFP_estimate(
  data = river_data,
  b = b2,
  m = c(1, 1),
  grid_size = 15
)

plot(grid_fit2, contour = TRUE)

Input expectations

The package expects finite numeric data. Missing values should be removed or imputed before estimation. Constant data require explicit non-degenerate min_vals and max_vals bounds because a density estimator for continuous data needs a positive evaluation range.

Where to go next

After this vignette: