---
title: "OpenCPU"
author: "Andreas Blaette (andreas.blaette@uni-due.de)"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
editor_options: 
  chunk_output_type: console
vignette: >
  %\VignetteIndexEntry{OpenCPU}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

## Objective

Sometimes, it is practically or legally not possible to move corpus data to a local machine. This vignette explains the usage of CWB corpora that are hosted on an [OpenCPU](https://www.opencpu.org/) server.

```{r, eval = TRUE}
library(polmineR)
```


## Remote Corpora

### Publicly Available Corpora

The GermaParl corpus is hosted on an OpenCPU server with the IP 132.252.238.66 (subject to change). To use the corpus, use the `corpus()`-method. The only difference is that you will need to supply the IP address using the argument `server`. 

```{r, eval = FALSE}
gparl <- corpus("GERMAPARL", server = "http://opencpu.politik.uni-due.de")
```

The `gparl` object is an object of class `remote_corpus`.

```{r, eval = FALSE}
is(gparl)
```


### Using polmineR core functionality

The polmineR at this stage exposes a limited set of its functionality for remote corpora. Simple investigations in the remote corpus are possible.


#### Get corpus size

```{r, eval = FALSE}
size(gparl)
```


#### Get structural annotation (metadata)

```{r, eval = FALSE}
s_attributes(gparl)
```


#### Subsetting

```{r, eval = FALSE}
gparl2006 <- subset(gparl, year == "2006")
```

The returned object has the class `remote_subcorpus`.

```{r, eval = FALSE}
is(gparl2006)
```


#### Simple count

```{r, eval = FALSE}
count(gparl, query = "Integration")
```

The `count()`-method works for `remote_subcorpus` objects, too.

```{r, eval = FALSE}
count(gparl2006, query = "Integration")
```


#### KWIC

```{r, render = knit_print, message = FALSE, eval = FALSE}
kwic(gparl, query = "Islam", left = 15, right = 15, meta = c("speaker", "party", "date"))
```

Works for the `remote_subcorpus`, too.

```{r, render = knit_print, message = FALSE, eval = FALSE}
kwic(gparl2006, query = "Islam", left = 15, right = 15, meta = c("speaker", "party", "date"))
```


### Restricted Corpora

1. Create directory for registry file-style files with credentials

2. Create file with credentials for your corpus in this directory

Note: Filename is corpus id in lowercase

```{r, eval = FALSE}
##
## registry entry for corpus GERMAPARLSAMPLE
##

# long descriptive name for the corpus
NAME "GermaParlSample"
# corpus ID (must be lowercase in registry!)
ID   germaparlsample
# path to binary data files
HOME http://localhost:8005
# optional info file (displayed by ",info;" command in CQP)
INFO https://zenodo.org/record/3823245#.XsrU-8ZCT_Q 

# corpus properties provide additional information about the corpus:
##:: user = "YOUR_USER_NAME"
##:: password = "YOUR_PASSWORD"
```

3. Set environment variable "OPENCPU_REGISTRY" in .Renviron to dir just mentioned

4. Get server whereabouts

```{r, eval = FALSE}
x <- corpus("MIGPRESS_FAZ", server = "YOURSERVER", restricted = TRUE)
```

## Next steps

Upcoming versions of polmineR will expose further functionality. This is a simple proof-of-concept!