---
title: "obAnalytics Guide"
author: "[Philip Stubbings](http://parasec.net)"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
  pdf_document:
    toc: true
  html_document:
    theme: journal
    toc: true
  vignette: >
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteIndexEntry{obAnalytics Guide}
    \usepackage[utf8]{inputenc}
---

```{r, include=FALSE}
library(obAnalytics)
knitr::opts_chunk$set(dpi=100, fig.width=10, fig.height=6, results="hide")
```

<!-- rmarkdown v1 -->

# Overview 

_obAnalytics_ is an R package intended for visualisation and analysis of limit
order data. 

This guide is structured as an end-to-end walk-through and is intended to 
demonstrate the main features and functionality of the package.

## Recommended environment settings

Due to the large number of columns in the example data, it is recommended to
set the display width to make the most use of the display. It is also
recommended to set _digits.secs=3_ and _scipen=999_ in order to display
timestamps and fractions nicely. This can be achieved as follows:

```{r}
max.cols <- Sys.getenv("COLUMNS")
options(width=if(max.cols != "") max.cols else 80, scipen=999, digits.secs=3)
```

# Loading data

The main focus of this package is reconstruction of a limit order book. The 
_processData_ function will perform data processing based on a supplied CSV 
file, the format of which is defined in the _Expected csv schema_ section.

The data processing consists of a number of stages:


1. Cleaning of duplicate and erroneous data.

2.  Identification of sequential event relationships.

3.  Inference of trade events via order-matching.

4.  Inference of order types (limit vs market).

5.  Construction of volume by price level series.

6.  Construction of order book summary statistics.


Limit order events are related to one another by volume deltas (the change in 
volume for a limit order). To simulate a matching-engine, and thus determine 
directional trade data, volume deltas from both sides of the limit order book 
are ordered by time, yielding a sequence alignment problem, to which the the 
Needleman-Wunsch algorithm has been applied.

```{r, eval=F}
# load and process example csv data from the package inst/extdata directory.
csv.file <- system.file("extdata", "orders.csv.xz", package="obAnalytics")
lob.data <- processData(csv.file)
```

## Expected csv schema 

The CSV file is expected to contain 7 columns:

--------------------------------------------------------------------------------
           Column name Description
---------------------- ---------------------------------------------------------
                __id__ Numeric limit order unique identifier.

         __timestamp__ Time in milliseconds when event received locally.

__exchange.timestamp__ Time in milliseconds when order first created on the 
                       exchange.

             __price__ Price level of order event.

            __volume__ Remaining order volume.

            __action__ Event action describes the limit order lifecycle. One 
                       of: __created__, __modified__, __deleted__.

         __direction__ Side of order book. On of: __bid__ or __ask__.
--------------------------------------------------------------------------------
Table: Expected CSV schema.

## Preprocessed example data

For illustrative purposes, the package contains a sample of preprocessed data.
The data, taken from the [Bitstamp](https://www.bitstamp.net/websocket) 
(bitcoin) exchange on 2015-05-01, consists of 50,393 limit order events and 482 
trades occuring from midnight up until ~5am.

The sample data, which has been previously processed by the _processData_ 
function, may be attached to the environment with the _data()_ function:

```{r}
data(lob.data)
```

The _lob.data_ object is a list containing four data.frames.

--------------------------------------------------------------------------------
       data.frame Summary
----------------- --------------------------------------------------------------
       __events__ Limit order events.

       __trades__ Inferred trades (executions).

        __depth__ Order book price level depth through time.

__depth.summary__ Limit order book summary statistics.
--------------------------------------------------------------------------------
Table: lob.data summary.

The contents of which are briefly discussed in the following sections.

### Events

The _events_ data.frame contains the lifecycle of limit orders and makes up the
core data of the obAnalytics package. Each row corresponds to a single limit
order action, of which three types are possible.

--------------------------------------------------------------------------------
 Event action Meaning
------------- ------------------------------------------------------------------
  __created__ The order is created with a specified amount of volume and a limit 
              price.

  __changed__ The order has been partially filled. On each modification, the 
              remaining volume will decrease. 

  __deleted__ The order may be deleted at the request of the trader or, in the 
              event that the order has been completely filled, deleted by the 
              exchange. An order deleted by the exchange as a result of being 
              filled will have 0 remaining volume at time of deletion.
--------------------------------------------------------------------------------
Table: Possible limit order actions.

In addition to the event _action_ type, a row consists of a number of 
attributes relating to the lifecycle of a limit order.

--------------------------------------------------------------------------------
             Attribute Meaning 
---------------------- ---------------------------------------------------------
          __event.id__ Event Id.

                __id__ Limit Order Id.

         __timestamp__ Local event timestamp (local time the event was 
                       observed).

__exchange.timestamp__ Exchange order creation time.

             __price__ Limit order price level.

            __volume__ Remaining limit order volume.

            __action__ Event action: _created_, _changed_, _deleted_. (as 
                       described above).

         __direction__ Order book side: _bid_, _ask_.

              __fill__ For _changed_ or _deleted_ events, indicates the change 
                       in volume between this event and the last.

    __matching.event__ Matching _event.id_ if this event is part of a trade. NA
                       otherwise.

              __type__ Limit order type (see Event types below.)

__aggressiveness.bps__ The distance of the order from the edge of the book in 
                       Basis Points (BPS). If an order is placed exactly at the 
                       best bid/ask queue, this value will be 0. If placed 
                       behind the best bid/ask, the value will be negative. A 
                       positive value is indicative of a _innovative_ order: The 
                       order was placed inside the bid/ask spread, which would 
                       result in the change to the market midprice.
--------------------------------------------------------------------------------
Table: Limit order event attributes.

An individual limit order (referenced by the _id_ attribute) may be of six 
different types, all of which have been classified by onAnalytics.

--------------------------------------------------------------------------------
 Limit order type Meaning
----------------- --------------------------------------------------------------
      __unknown__ It was not possible to infer the order type given the 
                  available data. 

__flashed-limit__ Order was created then subsequently deleted. 96% of example 
                  data. These types of orders are also referred to as _fleeting
                  orders_ in the literature.

__resting-limit__ Order was created and left in order book indefinitely until 
                  filled.

 __market-limit__ Order was partially filled before landing in the order book at 
                  it’s limit price. This may happen when the limit order crosses
                  the book because, in the case of a bid order, it's price is >=
                  the current best ask. However there is not enough volume 
                  between the current best ask and the order limit price to fill
                  the order's volume completely.

       __market__ Order was completely filled and did not come to rest in the 
                  order book. Similarly to a _market-limit_, the _market_ order
                  crosses the order book. However, it's volume is filled before
                  reaching it's limit price. Both market-limit and market orders
                  are referred to as _marketable_ limit orders in the 
                  literature.

       __pacman__ A limit-price modified _in situ_ (exchange algorithmic order).
                  The example data contains a number of these order types. They
                  occur when a limit order's _price_ attribute is _updated_. In 
                  the example data, this occurs from a special order type 
                  offered by the exchange which, in the case of a bid, will 
                  _peg_ the limit price to the best ask once per second until 
                  the order has been filled.
--------------------------------------------------------------------------------
Table: Order types.

The following table demonstrates a small snapshot (1 second) of event data. 
Some of the attributes have been omitted or renamed for readability.

```{r}
one.sec <- with(lob.data, {
  events[events$timestamp >= as.POSIXct("2015-05-01 04:55:10", tz="UTC") & 
         events$timestamp <= as.POSIXct("2015-05-01 04:55:11", tz="UTC"),  ]
})
one.sec$volume <- one.sec$volume*10^-8
one.sec$fill <- one.sec$fill*10^-8
one.sec$aggressiveness.bps <- round(one.sec$aggressiveness.bps, 2)
one.sec <- one.sec[, c("event.id", "id", "price", "volume", "action", 
    "direction", "fill", "matching.event", "type", "aggressiveness.bps")]
colnames(one.sec) <- c(c("event.id", "id", "price", "vol", "action", "dir", 
    "fill", "match", "type", "agg"))
print(one.sec, row.names=F)
```

```{r, echo=F, results="markup"}
knitr::kable(one.sec, row.names=F)
```

### Trades

The package automatically infers execution/trade events from the provided limit
order data.

The trades data.frame contains a log of all executions ordered by local 
timestamp. 

In addition to the usual timestamp, price and volume information, each row also
contains the trade direction (buyer or seller initiated) and maker/taker limit 
order ids. 

The maker/taker event and limit order ids can be used to group trades into 
market impacts - An example of which will be demonstrated later in this guide.

```{r}
trades.ex <- tail(lob.data$trades, 10)
trades.ex$volume <- round(trades.ex$volume*10^-8, 2)
print(trades.ex, row.names=F)
```

```{r example trades, echo=F, results="markup"}
knitr::kable(trades.ex, digits=2, row.names=F)
```

Each row, representing a single trade, consists of the following attributes:

--------------------------------------------------------------------------------
         Attribute Meaning
------------------ -------------------------------------------------------------
     __timestamp__ Local event timestamp.

         __price__ Price at which the trade occurred.

        __volume__ Amount of traded volume.

     __direction__ The trade direction: _buy_ or _sell_.

__maker.event.id__ Corresponding market making event id in _events_ data.frame.

__taker.event.id__ Corresponding market taking event id in _events_ data.frame.

         __maker__ Id of the market making limit order in _events_ data.frame.

         __taker__ Id of the market taking limit order in _events_ data.frame.
--------------------------------------------------------------------------------
Table: Trade data attributes.

### Depth

The depth data.frame describes the amount of available volume for all price 
levels in the limit order book through time. Each row corresponds to a limit 
order event, in which volume has been added or removed.

The data.frame represents a run-length-encoding of the cumulative sum of depth
for all price levels and consists of the following attributes:

--------------------------------------------------------------------------------
    Attribute Meaning
------------- ------------------------------------------------------------------
__timestamp__ Time at which volume was added or removed.

    __price__ Order book price level.

   __volume__ Amount of remaining volume at this price level.

     __side__ The side of the price level: bid or ask.
--------------------------------------------------------------------------------
Table: Depth attributes.

### Depth summary

The depth.summary data.frame contains various summary statistics describing the 
state of the order book after every limit order event. The metrics are intended 
to quantify the shape of the order book through time.

--------------------------------------------------------------------------------
            Attribute Meaning
--------------------- ----------------------------------------------------------
        __timestamp__ Local timestamp corresponding to events.

   __best.bid.price__ Best bid price.

     __best.bid.vol__ Amount of volume available at the best bid.

 __bid.vol25:500bps__ The amount of volume available for 20 25bps percentiles 
                      below the best bid.

   __best.ask.price__ The best ask price.

     __best.ask.vol__ Amount of volume available at the best ask.

 __ask.vol25:500bps__ The amount of volume available for 20 25bps percentiles 
                      above the best ask.
--------------------------------------------------------------------------------
Table: Order book summary metrics.

# Visualisation

The package provides a number of functions for the visualisation of limit order 
events and order book liquidity. The visualisations all make use of the ggplot2 
plotting system.

## Order book shape

The purpose of the cumulative volume graph is to quickly identify the shape of 
the limit order book for the given point in time. The "shape" is defined as the 
cumulative volume available at each price level, starting at the best bid/ask. 

Using this shape, it is possible to visually summarise order book imbalance and
market depth. 

```{r} 
# get a limit order book for a specific point in time, limited to +- 150bps
# above/below best bid/ask price.
lob <- orderBook(lob.data$events, 
    tp=as.POSIXct("2015-05-01 04:38:17.429", tz="UTC"), bps.range=150)

# visualise the order book liquidity.
plotCurrentDepth(lob, volume.scale=10^-8)
```

In the figure above, an order book has been reconstructed with the _orderBook_
function for a specific point in time. The visualisation produced with the
_plotCurrentDepth_ function depicts a number of order book features. Firstly,
the embedded bar chart at the bottom of the plot shows the amount of volume 
available at specific price levels ranging from the _bid_ side on the left 
(blue) through to the _ask_ side (red) on the right. Secondly, the blue and red
lines show the _cumulative_ volume of the bar chart for the bid and ask sides of
the order book respectively. Finally, the two subtle vertical lines at price 
points \$234 and \$238 show the position of the top 1% largest limit orders.


## Price level volume

The available volume at each price level is colour coded according to the range 
of volume at all price levels. The colour coding follows the visible spectrum, 
such that larger amounts of volume appear "hotter" than smaller amounts, where 
cold = blue, hot = red.

Since the distribution of limit order size exponentially decays, it can be 
difficult to visually differentiate: most values will appear to be blue. The 
function provides price, volume and a colour bias range to overcome this.

Setting _col.bias_ to 0 will colour code volume on the logarithmic scale, while
setting _col.bias_ < 1 will "squash" the spectrum. For example, a uniform 
_col.bias_ of 1 will result in 1/3 blue, 1/3 green, and 1/3 red applied across
all volume - most values will be blue. Setting the _col.bias_ to 0.5 will 
result in 1/7 blue, 2/7 green, 4/7 red being applied such that there is greater
differentiation amongst volume at smaller scales.

```{r}
# plot all lob.data price level volume between $233 and $245 and overlay the 
# market midprice.
spread <- getSpread(lob.data$depth.summary)
plotPriceLevels(lob.data$depth, spread, price.from=233, price.to=245, 
    volume.scale=10^-8, col.bias=0.25, show.mp=T)
```

The above plot shows all price levels between \$227 and \$245 for ~5 hours of 
price level data with the market midprice indicated in white. The volume has 
been scaled down from Satoshi to Bitcoin for legibility (1 Bitcoin = 10^8 
Satoshi). Note the large sell/ask orders at \$238 and \$239 respectively.

Zooming into the same price level data, between 1am and 2am and providing
_trades_ data to the plot will show trade executions. In the below plot which
has been centred around the bid/ask spread, shows the points at which market
_sell_ (circular red) and _buy_ (circular green) orders have been executed with
respect to the order book price levels.

```{r}
# plot 1 hour of trades centred around the bid/ask spread. 
plotPriceLevels(lob.data$depth, trades=lob.data$trades, 
    price.from=236, price.to=237.75, volume.scale=10^-8, col.bias=0.2,
    start.time=as.POSIXct("2015-05-01 01:00:00.000", tz="UTC"),
    end.time=as.POSIXct("2015-05-01 02:00:00.000", tz="UTC"))
```

Zooming in further to a 30 minute window, it is possible to display the bid ask
spread clearly. In the below plot, _show.mp_ has been set to FALSE. This has 
the effect of displaying the actual spread (bid = green, ask = red) instead of 
the (bid+ask)/2 midprice.

```{r}
# zoom in to 30 minutes of bid/ask quotes.
plotPriceLevels(lob.data$depth, spread, price.from=235.25, price.to=237,
    start.time=as.POSIXct("2015-05-01 00:45:00.000", tz="UTC"), 
    end.time=as.POSIXct("2015-05-01 01:15:00.000", tz="UTC"), 
    volume.scale=10^-8, col.bias=0.5, show.mp=F)
```

Zooming in, still further, to ~4 minutes of data focussed around the spread, 
shows (in this example) the bid rising and then dropping, while, by comparison,
the ask price remains static.

This is a common pattern: 2 or more algorithms are competing to be ahead of 
each other. They both wish to be at the best bid+1, resulting in a cyclic game
of leapfrog until one of the competing algorithm withdraws, in which case the
remaining algorithm "snaps" back to the next best bid (+1). This behavior 
results in a _sawtooth_ pattern which has been characterised by some as market-
manipulation. It is simply an emergent result of event driven limit order 
markets.

```{r}
# zoom in to 4 minutes of bid/ask quotes.
plotPriceLevels(lob.data$depth, spread, price.from=235.90, price.to=236.25,
    start.time=as.POSIXct("2015-05-01 00:55:00.000", tz="UTC"), 
    end.time=as.POSIXct("2015-05-01 00:59:00.000", tz="UTC"), 
    volume.scale=10^-8, col.bias=0.5, show.mp=F)
```

By filtering the price or volume range it is possible to observe the behavior 
of individual market participants. This is perhaps one of the most useful and
interesting features of this tool. 

In the below plot, the displayed price level volume has been restricted between 
8.59 an 8.72 bitcoin, resulting in obvious display of an individual algo. Here, 
the algo. is most likely seeking _value_ by placing limit orders below the bid 
price waiting for a market impact in the hope of reversion by means of market 
resilience (the rate at which market makers fill a void after a market impact).

```{r}
plotPriceLevels(lob.data$depth, spread, price.from=232.5, price.to=237.5,
    volume.scale=10^-8, col.bias=1, show.mp=T,
    end.time=as.POSIXct("2015-05-01 01:30:00.000", tz="UTC"),
    volume.from=8.59, volume.to=8.72)
```

Using the same volume filtering approach, the next plot shows the behaviour of
an individual _market maker_ operating with a bid/ask spread limited between 
3.63 and 3.83 bitcoin respectively. The rising bid prices at times are due to
the event driven "leapfrog" phenomenon discussed previously.

```{r}
plotPriceLevels(lob.data$depth, price.from=235.65, price.to=237.65,
    volume.scale=10^-8, col.bias=1,
    start.time=as.POSIXct("2015-05-01 01:00:00.000", tz="UTC"),
    end.time=as.POSIXct("2015-05-01 03:00:00.000", tz="UTC"),
    volume.from=3.63, volume.to=3.83)
```

## Liquidity

The _plotVolumePercentiles_ function plots the available volume in 25bps 
increments on each side of the order book in the form of a stacked area graph. 

The resulting graph is intended to display the market "quality" either side of 
the limit order book: The amount of volume available at increasing depths in the
order book which would effect the VWAP (Volume Weighted Average Price) of a 
market order. 

The example below favours buyers, since there is more volume available within 25 
BPS of the best ask price in comparison to the thinner market -25 BPS below the 
best bid.

The top of the graph depicts the ask side of the book, whilst the bottom depicts 
the bid side. Percentiles and order book sides can be separated by an optional 
subtle line (_perc.line_) for improved legibility.

```{r}
plotVolumePercentiles(lob.data$depth.summary, volume.scale=10^-8, perc.line=F, 
    start.time=as.POSIXct("2015-05-01 01:00:00.000", tz="UTC"),
    end.time=as.POSIXct("2015-05-01 04:00:00.000", tz="UTC"))
``` 

Zooming in to a 5 minute window and enabling the percentile borders with 
(_perc.line=FALSE_), the amount of volume available at each 25 bps price level 
is exemplified. 

```{r}
# visualise 5 minutes of order book liquidity.
# data will be aggregated to second-by-second resolution.
plotVolumePercentiles(lob.data$depth.summary,
    start.time=as.POSIXct("2015-05-01 04:30:00.000", tz="UTC"),
    end.time=as.POSIXct("2015-05-01 04:35:00.000", tz="UTC"),
    volume.scale=10^-8)
```

## Order cancellations

Visualising limit order cancellations can provide insights into order placement
processes. The _plotVolumeMap()_ function generates a visualisation of limit
order cancellation events (excluding market and market limit orders).

```{r}
plotVolumeMap(lob.data$events, volume.scale=10^-8, log.scale = T)
```

Interestingly, the order cancellation visualisation shows the systematic
activity of individual market participants. By filtering the display of
cancellation events within a volume range, it is possible to isolate what are
most likely individual order placement strategies. The following graph shows an
individual strategy cancelling orders within the [3.5, 4] volume range.

```{r}
plotVolumeMap(lob.data$events, volume.scale=10^-8, volume.from=3.5, volume.to=4)
```

Restricting the volume between [8.59, 8.72] shows a strategy placing orders at a
fixed price at a fixed distance below the market price.

```{r}
plotVolumeMap(lob.data$events, volume.scale=10^-8, volume.from=8.59, 
    volume.to=8.72)
```

# Analysis

In addition to the visualisation functionality of the package, _obAnalytics_ can
also be used to study market event, trade and order book data. 

## Order book reconstruction

After loading and processing data, it is possible to reconstruct the limit order
book for any given point in time. The next example shows the order book at a
specific time (millisecond precision) limited to 10 price levels. The
_liquidity_ column shows the cumulative sum of volume from the best bid/ask up
until each price level row.

```{r}
tp <- as.POSIXct("2015-05-01 04:25:15.342", tz="UTC")
ob <- orderBook(lob.data$events, max.levels=10)
print(ob)
```

```{r example order book, echo=F, results="markup"}
with(ob, {
  asks$liquidity <- asks$liquidity*10^-8
  bids$liquidity <- bids$liquidity*10^-8
  cols <- c("id", "timestamp", "liquidity", "price")
  knitr::kable(cbind(bids[, cols], asks[order(asks$liquidity), rev(cols)]), 
      row.names=F, align=c("r","r","r","r","l","l","l","l"), digits=2)
})
```

## Market impacts

Using the _trades_ and _events_ data, it is possible to study market impact
events.

### All market impacts

The _tradeImpacts()_ function groups individual trade events into market impact
events. A market impact occurs when an order consumes 1 or more resting orders
from the limit order book.

The following example shows the top 10 most aggressive sell impacts in terms of
the _depth_ removed from the order book in terms of BPS. The VWAP column
indicates the volume weighted average price that the market _taker_ received for
their market order. In addition, the _hits_ column shows the number of _hit_
resting limit orders used to fulfil this market order.

```{r}
impacts <- tradeImpacts(lob.data$trades)
impacts <- impacts[impacts$dir == "sell", ]
bps <- 10000 * (impacts$max.price - impacts$min.price) / impacts$max.price
types <- with(lob.data, events[match(impacts$id, events$id), ]$type)
impacts <- cbind(impacts, type=types, bps)
head(impacts[order(-impacts$bps), ], 10)
```

```{r impacts example, echo=F, results="markup"}
impacts <- tradeImpacts(lob.data$trades)
impacts <- impacts[impacts$dir == "sell", ]
impacts <- impacts[, c("id", "max.price", "min.price", "vwap", "hits", "vol", 
    "end.time")]
impacts$vol <- impacts$vol*10^-8
bps <- 10000 * (impacts$max.price - impacts$min.price) / impacts$max.price
types <- with(lob.data, events[match(impacts$id, events$id), ]$type)
impacts <- cbind(impacts, type=types, bps)
knitr::kable(head(impacts[order(-impacts$bps), ], 10), row.names=F, digits=2)
```

### Individual impact

The main purpose of the package is to load limit order, quote and trade data for
arbitrary analysis. It is possible, for example, to examine an individual market
impact event. The following code shows the sequence of events as a single market
(sell) order is filled.

The _maker.agg_ column shows how far each limit order was above or below the
best bid when it was placed. The _age_ column, shows how long the order was
resting in the order book before it was hit by this market order.


```{r}
impact <- with(lob.data, trades[trades$taker == 65596324, 
    c("timestamp", "price", "volume", "maker")])
makers <- with(lob.data, events[match(impact$maker, events$id), ])
makers <- makers[makers$action == "created", 
    c("id", "timestamp", "aggressiveness.bps")]
impact <- cbind(impact, maker=makers[match(impact$maker, makers$id), 
    c("timestamp", "aggressiveness.bps")])
age <- impact$timestamp - impact$maker.timestamp
impact <-  cbind(impact[!is.na(age), c("timestamp", "price", "volume", 
    "maker.aggressiveness.bps")], age[!is.na(age)])
colnames(impact) <- c("timestamp", "price", "volume", "maker.agg", "age")
impact$volume <- impact$volume*10^-8
print(impact)
```

```{r impact example, echo=F, results="markup"}
knitr::kable(impact, row.names=F, digits=2)
```

