---
title: "Neat Data for Presentation"
output: rmarkdown::html_vignette
author: Shiva
vignette: >
  %\VignetteIndexEntry{Neat Data for Presentation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
require(neatR)
```

We use R extensively not just for intensive computation, but also for presentation.
Javascript visualization libraries in R and elegant ways to present data using 
R markdown makes R one stop shop for analytics and data science.

We spend most of the time preparing, cleaning, analyzing and modeling the data. 
However, the last leg of analytics, which is presentation of results 
don't get enough attention most of the times.

neatR package helps in formatting the results by providing simple utility 
functions covering common use cases.

### Formatting dates

Often, we encounter dates which are either in `mm/dd/yyyy` or `dd/mm/yyyy` format 
and wondering what is the month or what is the date especially if there are no 
date values after 12th day of a month. An unambiguous approach would be to show 
the date in `mmm dd, yyyy` format with day of week which is easier to grasp. 


```{r echo = TRUE}
ndate(Sys.Date() - 3)
ndate(Sys.Date() - 1)
ndate(Sys.Date())
ndate(Sys.Date() + 1)
ndate(Sys.Date() + 4)
```

To just get the date without the day of week,  set `display.weekday` to `FALSE`

```{r echo = TRUE}
ndate(Sys.Date(), show_weekday = FALSE)
```

When we are looking at the monthly data, abbreviating the date to mmm'yy is an 
elegant way to show the date and often helpful for charts.

```{r echo = TRUE}
ndate(Sys.Date(), show_weekday = FALSE, show_month_year = TRUE)
```

To see the context of the date with respect to current date 
(referring dates within 1 week before or after current date), 
use the `nday` function.

Day of week with context based on current date, `reference_alias` can be directly used on dates or timestamps.

```{r echo = TRUE}
nday(Sys.Date(), show_relative_day = FALSE)
nday(Sys.Date(), show_relative_day = TRUE)
```

Below is another example with context based on current date.

```{r echo = TRUE}
x <- seq(Sys.Date() - 10, Sys.Date() + 10, by = "1 day")
nday(x, show_relative_day = TRUE)
```

### Formatting timestamp

Timestamps are feature rich representation of date and time. 

```{r echo = TRUE}
ntimestamp(Sys.time())
```

To format only date from the timestamp, we can use `ndate` function.

```{r echo = TRUE}
ndate(Sys.time())
```

To extract and format only the time from timestamp, we can do the following,

```{r echo = TRUE}
ntimestamp(Sys.time(),
  show_weekday = FALSE,
  show_date = FALSE, show_timezone = FALSE
)
```

Note: Hours are shown based on 12H clock format with AM / PM suffix.

Components of time can be toggled on or off based on preference.

```{r echo = TRUE}
ntimestamp(Sys.time(),
  show_date = FALSE, show_weekday = FALSE,
  show_hours = TRUE, show_minutes = TRUE,
  show_seconds = FALSE, show_timezone = FALSE
)
```

Timezone can be toggled on or off using `show_timezone` parameter.
```{r echo = TRUE}
ntimestamp(Sys.time(), show_timezone = FALSE)
```

### Formatting number

Most of the times, we deal with large numbers which are shown in scientific 
format from the output of a statistical model or just the raw data itself.
`nnumber` can format the numeric data and show them in easily readable way.

By default, the numbers are formatted in a more appropriate unit that best 
represents individual values. See the below example,

```{r echo = TRUE}
x <- c(10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000)
nnumber(x)
nnumber(x, digits = 0)
```

`nnumber` can automatically determine best single unit to display all the numbers
by setting `unit = 'auto'`. In the below example the unit of thousand seem to
best fit most of the numbers. Any number lower than 0.1K are displayed as '<0.1K'
for easier reference.

```{r echo = TRUE}
x <- c(1e6, 99e3, 76e3, 42e3, 12e3, 789, 53)
nnumber(x, unit = "auto")
```

We can specify the units in which the number to be formatted,

```{r echo = TRUE}
nnumber(123456789.123456, unit = "Mn")
```

Default units are, 'K' for thousand, 'Mn' for million, 'Bn' for billion,
'Tn' for trillions. The unit labeling can be customized using `unit_labels` 
which is a list encompassing values and labels.

```{r echo = TRUE}
nnumber(123456789.123456, unit = "M", unit_labels = list(million = "M"))
```

Below example, gives customization of all units.

```{r echo = TRUE}
x <- c(10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000)
nnumber(x,
  unit_labels =
    list(thousand = "K", million = "M", billion = "B", trillion = "T")
)
```

Along with the formatted number, we can (optionally) add a prefix or suffix.

```{r echo = TRUE}
nnumber(123456789.123456,
  unit = "M", unit_labels = list(million = "M"),
  prefix = "$ "
)

nnumber(123456789.123456,
  unit = "M", unit_labels = list(million = "M"),
  suffix = " CAD"
)

nnumber(123456789.123456,
  unit = "M", unit_labels = list(million = "M"),
  prefix = "$ ", suffix = " CAD"
)
```

Sometimes, we are interested in showing the number as it is, which can be done
by setting `unit = ''`. `thousand_separator` parameter is useful in separating the
thousands which makes it easy to read the numbers. 

```{r echo = TRUE}
nnumber(123456789.123456,
  digits = 2, unit = "",
  thousand_separator = ","
)
```

`thousand_separator` can take the following values `",", ".", "'", " ", "_", ""`

The parameter `unit` can take any of the following values,

`custom`: Unit is customized for each individual values. This is the default value
to the `unit` parameter.

`auto`: A single unit that best represents the overall data is automatically 
detected and applied based on majority of the values.


`K`: The numbers are displayed in thousands.

`Mn`: The number are displayed in millions.

`Bn`: The number are displayed in billions.

`Tn`: The number are displayed in trillions.

If the unit labels are customized and provided via a list, for an example: `unit_labels = list(thousand = 'k')` then this string `k` to be provided for the `unit`.

### Formatting percentages

Percentage data can come in two types, with or without multiplied by 100.
For an example, 22.8% can be stored as 22.8 or 0.228

```{r echo = TRUE}
npercent(22.8, is_ratio = FALSE)
npercent(0.228, is_ratio = TRUE)
```

By default, `is_decimal` is set as TRUE and decimal digits is set to 1.

It is also useful to show if the percent is a positive number by adding a prefix 
of plus sign. This is the default behavior of the npercent function, which 
can be set to FALSE

```{r echo = TRUE}
npercent(0.228, show_plus_sign = TRUE)
npercent(0.228, show_plus_sign = FALSE)
```


When the percentages are high (especially while calculating growth from 
time A to time B), it would be easy to read this as 'nX'.

```{r echo = TRUE}
tesla_2017 <- 20
tesla_2023 <- 200
g <- (tesla_2023 - tesla_2017) / tesla_2017
npercent(g, show_plus_sign = TRUE)
npercent(g, show_plus_sign = TRUE, show_growth_factor = TRUE)
```


### Formatting string

Formatting character vectors or string can be done with case type, options to 
remove special characters and selecting only English characters and numbers 
from the string. 

Below are the available `case` conversions, 

`lower`: converts string to lower case.

`upper`: converts string to upper case.

`title`: converts string to title case (first letter of each word is capitalized
except stop words. Based on `tools::toTitleCase`).

`start`: converts string to start case (first letter of each word is capitalized
and rest of the letters are in lower case).

`initcap`: converts string to initcap case (first letter of first word is 
capitalized and rest of the letters are in lower case).

```{r echo = TRUE}
nstring("   All MOdels are wrong.   some ARE useful!!!  â",
  case = "title", remove_specials = TRUE
)
```

To exclude any special characters and retain only numbers and english alphabets, 
we can set `en_only` parameter to `TRUE`

```{r echo = TRUE}
nstring("   All MOdels are wrong.   some ARE useful!!!  â  ",
  case = "title", remove_specials = TRUE, ascii_only = TRUE
)
```

By default, Trailing and leading white spaces are removed and
extra white spaces are reduced to single white space.



