---
title: "AI Assistant Module Guide"
author: "Jaewoong Heo"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{AI Assistant Module Guide}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  echo = TRUE,
  eval = FALSE
)
```

# AI Assistant Module

AI-powered statistical analysis code generation module for jsmodule package.

## Overview

The AI Assistant module provides an interactive chat interface that generates R code for statistical analysis. It integrates seamlessly with jsmodule's gadgets and supports multiple AI providers (Anthropic Claude, OpenAI GPT, Google Gemini).

## Quick Start

### 1. API Key Setup

Add your API key to `.Renviron` file:

```{r api-setup}
# Open .Renviron file
usethis::edit_r_environ()

# Add one of the following lines:
# ANTHROPIC_API_KEY=your_key_here
# OPENAI_API_KEY=your_key_here
# GOOGLE_API_KEY=your_key_here

# Save and restart R session
```

### 2. Basic Usage

#### Option A: Use with jsBasicGadget

```{r basic-gadget}
library(jsmodule)

# Launch gadget with AI Assistant included
jsBasicGadget()

# Navigate to "AI Assistant" tab
```

#### Option B: Standalone Shiny App

```{r standalone-app}
library(shiny)
library(jsmodule)
library(survival)

ui <- fluidPage(
  titlePanel("AI Statistical Assistant"),
  aiAssistantUI("ai")
)

server <- function(input, output, session) {
  data <- reactive(colon)
  data.label <- reactive(jstable::mk.lev(colon))

  callModule(aiAssistant, "ai",
    data = data,
    data_label = data.label
  )
}

shinyApp(ui, server)
```

## Features

### Code Generation
- Statistical analysis code (regression, survival analysis, descriptive statistics)
- Visualization code (ggplot2, jskm, forestplot)
- Table generation (jstable, DT)
- Follows jsmodule conventions and best practices

### Multiple AI Providers
- **Anthropic Claude** (default): claude-3-7-sonnet, claude-3-5-sonnet, claude-3-opus
- **OpenAI GPT**: gpt-4o, gpt-4-turbo, gpt-3.5-turbo
- **Google Gemini**: gemini-2.0-flash-exp, gemini-1.5-pro, gemini-1.5-flash

### Export Options
- **Word (.docx)**: Tables with formatted layout
- **PowerPoint (.pptx)**: Plots as editable vector graphics
- **Excel (.xlsx)**: Tables with data preservation
- **R Script (.R)**: Complete reproducible code

### Safety Features
- Sandboxed code execution (only allowed packages)
- Pre-execution code review and editing
- Error handling with AI-assisted fixes
- No file system or network access

## Important Notes

### Data Access
- The AI can only access data provided through the `data` parameter
- Data is referred to as `out` in generated code
- File upload data is automatically reactive

### Allowed Packages
Generated code can only use these packages:
```
jstable, jskm, jsmodule, survival, ggplot2, ggpubr,
pROC, data.table, DT, gridExtra, GGally, forestploter,
MatchIt, timeROC
```

### Variable Structure
The module automatically generates variable structure information:
- Factor variables
- Numeric variables
- Custom structures (if provided via `data_varStruct` parameter)

### API Key Resolution Order
1. Explicit `api_key` argument in `callModule()`
2. UI input (if `show_api_config = TRUE`)
3. Environment variables (`.Renviron` file)

### API Configuration Modes

The `show_api_config` parameter controls how API keys are managed:

#### `show_api_config = TRUE` (Default)
- **Use Case**: Development, personal use, or when users provide their own API keys
- **Behavior**:
  - Shows Settings panel in the UI
  - Users can select AI provider and model
  - Users can enter API key directly in the interface
  - API key input takes precedence over `.Renviron` file
- **Security Note**: API keys entered in UI are only stored in browser memory and never saved to disk
- **Recommendation**: Suitable for local development and single-user applications

```{r dev-mode}
# Development mode - users can configure in UI
aiAssistantUI("ai", show_api_config = TRUE)  # Default

callModule(aiAssistant, "ai",
  data = data,
  data_label = data.label,
  show_api_config = TRUE
)
```

#### `show_api_config = FALSE`
- **Use Case**: Production deployment, shared applications, or pre-configured environments
- **Behavior**:
  - Hides Settings panel completely
  - Only uses `.Renviron` file or explicit `api_key` argument
  - No UI elements for API configuration
- **Security Note**: Prevents users from seeing or modifying API keys
- **Recommendation**: Mandatory for production deployments with shared API keys

```{r prod-mode}
# Production mode - API key from .Renviron only
aiAssistantUI("ai", show_api_config = FALSE)

callModule(aiAssistant, "ai",
  data = data,
  data_label = data.label,
  show_api_config = FALSE
)
```

## Advanced Usage

### Custom Variable Structure

```{r custom-varStruct}
server <- function(input, output, session) {
  data <- reactive(lung)
  data.label <- reactive(jstable::mk.lev(lung))

  # Define custom variable roles
  var_struct <- reactive({
    list(
      variable = names(lung),
      Base = c("age", "sex", "ph.ecog"),
      Event = "status",
      Time = "time"
    )
  })

  callModule(aiAssistant, "ai",
    data = data,
    data_label = data.label,
    data_varStruct = var_struct
  )
}
```

### Analysis Context

Provide background information to improve AI responses:

```{r analysis-context}
callModule(aiAssistant, "ai",
  data = data,
  data_label = data.label,
  analysis_context = reactive({
    "NCCTG lung cancer trial data.
     Primary outcome: time to death (status/time).
     Focus on performance status (ph.ecog) as predictor."
  })
)
```

### Production Deployment

Hide API configuration UI for production:

```{r production-deploy}
ui <- fluidPage(
  aiAssistantUI("ai", show_api_config = FALSE)
)

server <- function(input, output, session) {
  callModule(aiAssistant, "ai",
    data = data,
    data_label = data.label,
    show_api_config = FALSE  # Use only .Renviron
  )
}
```

## Troubleshooting

### API Key Not Found
**Problem**: "API key not configured" error

**Solution**:

1. Check `.Renviron` file has correct variable name
2. Restart R session after editing `.Renviron`
3. Verify key is valid (test in terminal: `Sys.getenv("ANTHROPIC_API_KEY")`)

### Code Execution Errors
**Problem**: Generated code fails to execute

**Solution**:

1. Click "Ask AI to Fix" button for automatic correction
2. Review code in editor before execution
3. Check data has required variables
4. Verify packages are installed

### Summary Results Too Fragmented
**Problem**: `summary()` results split into many pieces

**Solution**: This is now fixed in the latest version. Update jsmodule package.

### Text Output Shows Escape Sequences
**Problem**: `\n` visible instead of line breaks

**Solution**: This is now fixed in the latest version. Update jsmodule package.

## Best Practices

### 1. Be Specific in Questions
❌ Bad: "analyze this data"

✅ Good: "perform linear regression with wt.loss as outcome and age, sex, ph.ecog as predictors"

### 2. Review Generated Code
Always review code in the editor before clicking "Run Code"

### 3. Provide Context
Use `analysis_context` parameter to give AI background about your data

### 4. Use Appropriate Model
- Use faster models (Sonnet, GPT-4o) for simple tasks
- Use advanced models (Opus, GPT-4) for complex analyses

### 5. Iterative Refinement
Ask follow-up questions to refine code rather than starting over

## Limitations

1. **No External Data Access**: Cannot read files or connect to databases
2. **Limited Package Scope**: Only allowed packages can be used
3. **Context Window**: Very long conversations may need to be cleared
4. **Visualization Preview**: Some complex plots may not render immediately
5. **Statistical Expertise**: AI provides code, not statistical consulting

## Examples

### Example 1: Descriptive Statistics
```
Q: "Create a Table 1 comparing baseline characteristics by treatment group (rx)"
```

### Example 2: Survival Analysis
```
Q: "Perform Cox regression with time and status as survival outcome,
    adjusting for age, sex, and ph.ecog"
```

### Example 3: Visualization
```
Q: "Create a Kaplan-Meier plot stratified by treatment group with risk table"
```

### Example 4: Model Diagnostics
```
Q: "Check VIF for multicollinearity in the linear model with wt.loss ~ age + sex + ph.ecog"
```

## Security Considerations

### Code Execution Security

#### Environment-Aware Execution (Development vs Production)

The AI Assistant module implements **environment-aware code execution** to balance security and usability:

**Development Mode** (Default):
- Uses standard `eval()` for code execution
- Easier debugging and development
- All console output visible
- Suitable for local, trusted environments

**Production Mode**:
- Uses `RAppArmor::eval.secure()` for sandboxed execution (Linux only)
- Enhanced security with resource limits:
  - 1GB RAM limit
  - 1MB file size limit
  - 10 second timeout
  - No new process creation
- Prevents system command execution
- Required for public deployments

**Environment Detection**:
The module automatically detects production environments using:

1. `DEPLOYMENT_ENV` environment variable (`production` or `development`)
2. shinyapps.io deployment detection
3. RStudio Connect detection
4. `.production` marker file

**Setting Deployment Mode**:

For local development (default):
```{r dev-env}
# No setup needed - defaults to development mode
# Or explicitly set in .Renviron:
# DEPLOYMENT_ENV=development
```

For production deployment:
```{r prod-env}
# Add to .Renviron file:
# DEPLOYMENT_ENV=production
```

Or create a marker file:
```bash
# In your app directory
touch .production
```

**Linux Server Setup** (for RAppArmor):
```bash
# Install AppArmor
sudo apt-get install apparmor apparmor-utils libapparmor-dev

# Install R package
R -e "install.packages('RAppArmor')"
```

**Platform Support**:

- ✅ **Linux**: Full RAppArmor sandboxing available
- ⚠️ **macOS/Windows**: Falls back to standard eval with warning in production mode
- Recommendation: Deploy on Linux servers for maximum security

#### Basic Security Features
- **Package Whitelist**: Only approved packages allowed
- **Pre-execution Review**: Code can be edited before execution
- **Error Handling**: Safe error messages without system information

### API Key Security

**⚠️ IMPORTANT: API Key Handling**

**How API Keys are Used**:

- API keys are read from environment variables (`.Renviron`) or UI input
- When entered in UI, keys exist only in the current R session memory
- API calls are made using the `httr` package to AI provider APIs

**This is Open Source**:

- All code is publicly available and auditable at https://github.com/jinseob2kim/jsmodule
- No hidden API key storage or transmission
- You can review the code yourself

✅ **What this module does NOT do with API keys**:

- Never saves them anywhere
- Never logs them
- Never transmits them except to the AI provider

✅ **What IS sent to AI providers**:

- Your prompts and questions
- Data structure information (variable names, types, sample statistics)
- Previous conversation history
- Generated code (for error fixing)

**NOT sent**:

- Raw data values (unless explicitly included in your question)
- File system information

#### Best Practices by Deployment Type

**For Personal/Desktop Use** (Recommended):
```{r personal-use}
# Store API key in .Renviron (user's home directory)
# This keeps the key private to your user account
# ANTHROPIC_API_KEY=your_key_here
```

**For Team/Shared Use**:

- Each team member should use their own API key in `.Renviron`
- Set `show_api_config = TRUE` to allow individual configuration
- Do NOT share API keys between users

**For Public Web Applications**:

- ⚠️ **NOT RECOMMENDED**: Do not deploy with `show_api_config = TRUE` publicly
- If you must deploy publicly, consider these alternatives:
  1. Implement server-side API proxy (requires custom backend)
  2. Use authentication to limit access
  3. Set strict usage quotas and monitoring

#### API Key Storage Locations

1. **`.Renviron` file** (Recommended for personal use):
   - Location: `~/.Renviron` (user home directory)
   - Security: Only accessible by your user account
   - Persistence: Survives R session restarts

2. **UI Input** (Development only):
   - Location: Browser memory (temporary)
   - Security: Lost when browser tab closes
   - Persistence: No - must re-enter each session

3. **`api_key` argument** (Advanced use):
   - Location: R script or code
   - Security: ⚠️ Avoid - keys visible in code
   - Persistence: Depends on where code is stored

#### Compliance Considerations

If you're working with sensitive data:

1. ✅ Data structure and variable names are sent to AI provider
2. ✅ Statistical summaries may be sent
3. ⚠️ Avoid including actual data values in questions
4. ⚠️ Review your organization's AI usage policy
5. ⚠️ Consider data anonymization before analysis

### Recommended Security Setup

**For Maximum Security**:
```{r max-security}
# 1. Store API key in .Renviron (never in code)
usethis::edit_r_environ()
# Add: ANTHROPIC_API_KEY=your_key

# 2. Use show_api_config = FALSE in production
aiAssistantUI("ai", show_api_config = FALSE)

# 3. Never commit .Renviron to version control
# Add to .gitignore:
# .Renviron
# .Renviron.local

# 4. Rotate API keys regularly (every 90 days recommended)

# 5. Monitor API usage through provider's dashboard
```

## Support

For issues or feature requests, please file an issue at:
https://github.com/jinseob2kim/jsmodule/issues

## License

Same as jsmodule package license.
