---
title: "Writing a custom diagnostic"
author: "Michael Chirico"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Writing a custom diagnostic}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Motivation

`potools` provides several "diagnostic" functions used to check the "health" of the messaging corpus available in a given package. These are `check_cracked_messages`, which looks for messages split into chunks which are hard to translate; `check_untranslated_cat`, which looks for messages displayed via `cat()` which are not marked for translation; and `check_untranslated_src`, which looks for messages in the `src` directory which are not marked for translation.

These just crack the surface of the types of diagnostics that are possible for improving the quality of messaging to users -- not only in the process of translation, but also for bettering the experience in English!

In this vignette we'll demonstrate just such a use case by writing a custom diagnostic function that checks for typos in your messages by applying the function `utils::aspell()`.

## Writing the diagnostic

We'll call our function `check_spelling`; it will take as input a `data.table` like that produced by `get_message_data()`, and give as output a `data.table` indexing any issues found. Specifically, it should have three or four columns: `call`, `file`, `line_number`, and `replacement`. The first three come directly from the input; the last one is optional and suggests to the user a way to repair any "unhealthy" messages.

```{r check_spelling}
check_spelling = function(message_data) {
  # if aspell isn't installed, this won't work; be sure to return an object with the right schema anyway
  if (!nzchar(Sys.which("aspell"))) {
    warning("'aspell' is not installed; returning nothing")
    return(message_data[0, .(call, file, line_number)])
  }

  # aspell() works on files, so we'll write the msgid to files
  aspell_dir <- file.path(tempdir(), 'aspell')
  dir.create(aspell_dir)
  original_dir <- setwd(aspell_dir)
  on.exit({
    unlink(aspell_dir, recursive = TRUE)
    setwd(original_dir)
  })

  # (!is_repeat) makes sure we only check duplicate messages once
  # plural messages are in a list, so handle them separately
  message_data[(!is_repeat), by = .(file, type), {
    if (.BY$type == "singular") {
      cat(msgid, file = .BY$file, sep = "\n")
      # aspell() results has 5 columns: Original, File, Line, Column, Suggestions; we only need 1 & 5
      results = utils::aspell(.BY$file)
      unlink(.BY$file)

      typo_idx <- sapply(results$Original, grep, msgid)
      # take the first suggestion
      replacement = sapply(
        seq_along(results$Suggestions),
        function(typo_i) {
          # take the identified typo & replace it with aspell's 1st suggestion in the original `call`
          gsub(
            results$Original[typo_i], results$Suggestions[[typo_i]][1L],
            call[typo_idx[typo_i]], fixed = TRUE
          )
        }
      )

      .(
        call = call[typo_idx],
        file = file[typo_idx],
        line_number = line_number[typo_idx],
        replacement = replacement
      )
    } else {
      # unlist() to write both the n=1 and n!=1 messages to the file side-by-side
      all_msgid <- unlist(msgid_plural)
      cat(all_msgid, file = .BY$file, sep = "\n")
      results = utils::aspell(.BY$file)
      unlink(.BY$file)

      # odd numbers in grep output --> first entry for each plural_msgid; even numbers --> second entry.
      # do this arithmetic trick to re-map that to the original entry number in msgid_plural
      typo_idx <- ((sapply(results$Original, grep, all_msgid) - 1L) %/% 2L) + 1L
      # potentially overwrite each call >1 time if both messages have a typo
      replacement = call
      for (typo_i in seq_along(results$Suggestions)) {
        replacement[typo_idx[typo_i]] <- gsub(
          results$Original[typo_i], results$Suggestions[[typo_i]][1L],
          replacement[typo_idx[typo_i]], fixed = TRUE
        )
      }
      typo_idx <- unique(typo_idx)

      .(
        call = call[typo_idx],
        file = file[typo_idx],
        line_number = line_number[typo_idx],
        replacement = replacement[typo_idx]
      )
    }
  }]
}
```

In a package, we would probably use a few more helper functions to clean up & simplify  the body of this diagnostic; we're piling everything in sequence for illustration to have everything in one place.

## Running the diagnostic

We can check how the diagnostic works on a simple test package `GreatSpelling` created for this vignette.

```{r GreatSpelling}
library(potools)
great_spelling_messages = get_message_data("GreatSpelling")

# showing the structure of the messagedata for this package
great_spelling_messages

# running our diagnostic
check_spelling(great_spelling_messages)
```

That should covers the basics -- I look forward to seeing all the great uses you more creative developers can devise. Thanks for reading!