---
title: "lightsf: A Curated Collection of Georeferenced and Spatial Datasets"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{A Curated Collection of Georeferenced and Spatial Datasets}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(lightsf)
library(ggplot2)
library(dplyr)
```

# Introduction

The `lightsf` package offers a **curated and diverse collection of georeferenced and spatial datasets** from various domains, enabling researchers, educators, and analysts to easily explore spatial patterns and perform geostatistical analysis in R.  

This package consolidates datasets from **multiple open and trusted sources**, including **Kaggle, spData, adespatial, chopin, and bivariateLeaflet**, to provide a unified resource for spatial data exploration and visualization.  

The datasets included in `lightsf` cover a broad spectrum of topics such as **urban studies, housing markets, environmental monitoring, transportation networks, and socio-economic indicators**. Each dataset is carefully formatted and documented to support both **educational purposes** and **applied spatial analysis**.  

`lightsf` provides data in multiple spatial formats —including **point patterns**, **polygons**, **socio-economic data frames**, and **network-like structures**— allowing users to perform tasks ranging from **basic exploratory mapping** to **advanced spatial modeling**.  

By centralizing geospatial datasets in a single package, `lightsf` simplifies the workflow for those who wish to learn, teach, or apply spatial data science techniques without the need to gather and preprocess data from multiple sources.

## Dataset Suffixes

Each dataset in the `lightsf` package uses a **suffix** to indicate the type of spatial data it contains:

- `_pts`: Refers to **point-based datasets** that include georeferenced locations, usually represented by latitude and longitude coordinates.

- `_poly`: Refers to **polygon-based datasets**, typically representing areas, administrative boundaries, or spatial zones.

- `_points`: Refers to **point datasets** similar to `_pts`, often derived from other spatial sources or including additional spatial or attribute information.

These suffixes help users quickly identify the **geometric structure** and **spatial representation** of each dataset included in the `lightsf` package.

## Example Datasets

Below are selected example datasets included in the `lightsf` package:

- `nc_points`: Mildly clustered **georeferenced points** representing locations in **North Carolina, United States**.  

- `dc_poly`: **Polygon-based spatial dataset** containing **Washington D.C. census tract data**, suitable for creating **choropleth maps** and exploring demographic or spatial patterns.  

- `afcon_poly`: **Polygon dataset** representing **spatial patterns of conflict in Africa (1966–1978)**, useful for studying regional clustering and spatial heterogeneity.  

## Data Visualization with lightsf Data

### Spatial Patterns of Conflict in Africa (1966–1978)

```{r afcon-poly-plot, fig.width=6, fig.height=4.5, out.width="90%", message=FALSE, warning=FALSE}

# Basic exploration of the dataset
names(afcon_poly)
class(afcon_poly)
length(afcon_poly)
str(afcon_poly)

# Ensure the dataset is a data frame
afcon_df <- as.data.frame(afcon_poly)

# Create a scatter plot of coordinates colored by total conflicts
ggplot(afcon_df, aes(x = x, y = y)) +
  geom_point(aes(color = totcon, size = totcon), alpha = 0.8) +
  scale_color_gradient(low = "lightyellow", high = "darkred") +
  labs(
    title = "Spatial Patterns of Conflict in Africa (1966–1978)",
    x = "Longitude",
    y = "Latitude",
    color = "Total Conflicts",
    size = "Conflict Intensity"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    legend.position = "right"
  )
```

## Conclusion

The `lightsf` package provides a **curated and diverse collection of georeferenced and spatial datasets** designed to support spatial data analysis, visualization, and education in R.  
It brings together datasets from multiple open sources, offering ready-to-use spatial data covering topics such as **urban studies, housing markets, environmental monitoring, transportation, and socio-economic indicators**.  

By providing well-structured and documented datasets in various spatial formats, `lightsf` facilitates **exploratory mapping**, **geostatistical modeling**, and **teaching of spatial analysis concepts**.  

<div class="tocify-extend-page" data-unique="tocify-extend-page" style="height: 0;"></div>
