---
title: "Frequently Asked Questions"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Frequently Asked Questions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Installation Issues

### "Backend library is not loaded" error

**Problem**: You see the error "Backend library is not loaded. Please run install_localLLM() first."

**Solution**: Run the installation function after loading the package:

```{r}
library(localLLM)
install_localLLM()
```

This downloads the platform-specific backend library. You only need to do this once.

### Installation fails on my platform

**Problem**: `install_localLLM()` fails to download or install.

**Solution**: Check your platform is supported:
- Windows (x86-64)
- macOS (Apple Silicon / ARM64)
- macOS (Intel / x86-64)
- Linux (x86-64)

If you're on an unsupported platform, you may need to compile llama.cpp manually.

### "Library already installed" but functions don't work

**Problem**: `install_localLLM()` says the library is installed, but generation fails.

**Solution**: Try reinstalling:

```{r}
# Force reinstall
install_localLLM(force_reinstall = TRUE)

# Verify installation
lib_is_installed()
```

---

## Model Download Issues

### "Download lock" or "Another download in progress" error

**Problem**: A previous download was interrupted and left a lock file.

**Solution**: Clear the cache directory:

```{r}
cache_root <- tools::R_user_dir("localLLM", which = "cache")
models_dir <- file.path(cache_root, "models")
unlink(models_dir, recursive = TRUE, force = TRUE)
```

Then try downloading again.

### Download times out or fails

**Problem**: Large model downloads fail partway through.

**Solution**:
1. Check your internet connection
2. Try a smaller model first
3. Download manually and load from local path:

```{r}
# Download with browser or wget, then:
model <- model_load("/path/to/downloaded/model.gguf")
```

### "Model not found" when using cached model

**Problem**: You're trying to load a model by name but it's not found.

**Solution**: Check what's actually cached:

```{r}
cached <- list_cached_models()
print(cached)
```

Use the exact filename or a unique substring that matches only one model.

### Private Hugging Face model fails

**Problem**: Downloading a gated/private model fails with authentication error.

**Solution**: Set your Hugging Face token:

```{r}
# Get token from https://huggingface.co/settings/tokens
set_hf_token("hf_your_token_here")

# Now download should work
model <- model_load("https://huggingface.co/private/model.gguf")
```

---

## Memory Issues

### R crashes when loading a model

**Problem**: R crashes or freezes when calling `model_load()`.

**Solution**: The model is too large for your available RAM. Try:

1. Use a smaller quantized model (Q4 instead of Q8)
2. Free up memory by closing other applications
3. Check model requirements:

```{r}
hw <- hardware_profile()
cat("Available RAM:", round(hw$ram_total / 1e9, 1), "GB\n")
```

### "Memory check failed" warning

**Problem**: localLLM warns about insufficient memory.

**Solution**: The safety check detected potential issues. Options:

1. Use a smaller model
2. Reduce context size:
   ```{r}
   ctx <- context_create(model, n_ctx = 512)  # Smaller context
   ```
3. If you're sure you have enough memory, proceed when prompted

### Context creation fails with large n_ctx

**Problem**: Creating a context with large `n_ctx` fails.

**Solution**: Reduce the context size or use a smaller model:

```{r}
# Instead of n_ctx = 32768, try:
ctx <- context_create(model, n_ctx = 4096)
```

---

## GPU Issues

### GPU not being used

**Problem**: Generation is slow even with `n_gpu_layers = 999`.

**Solution**: Check if GPU is detected:

```{r}
hw <- hardware_profile()
print(hw$gpu)
```

If no GPU is listed, the backend may not support your GPU. Currently supported:

| Platform | GPU backend | Supported hardware |
|----------|-------------|-------------------|
| macOS (Apple Silicon) | Metal | All Apple Silicon (M1 and later) |
| macOS (Intel) | Metal | Intel Macs running macOS 12+ |
| Windows (x86-64) | Vulkan | NVIDIA GeForce 10xx+, AMD RX 400+, Intel Arc |
| Linux (x86-64) | Vulkan | NVIDIA GeForce 10xx+, AMD RX 400+, Intel Arc |

If your GPU is not listed, install with `force_cpu = TRUE` to use the CPU build:

```{r}
install_localLLM(force_cpu = TRUE)
```

### GPU runs out of memory

**Problem**: GPU runs out of memory during generation.

**Solution**: Reduce GPU layer count to split the model between GPU and CPU:

```{r}
# Offload fewer layers to GPU
model <- model_load("model.gguf", n_gpu_layers = 20)
```

---

## Generation Issues

### Backend prints too many log messages

**Problem**: `model_load()` or `context_create()` print hardware information, model metadata, or other log lines that clutter the console or appear in knitted documents and `R CMD check` output.

**Solution**: Reduce the verbosity level:

```{r}
# Default (verbosity = 1): warnings only — hardware limits, context size notes
model <- model_load("model.gguf")

# Fully silent loading
model <- model_load("model.gguf",  verbosity = 0)
ctx   <- context_create(model,     verbosity = 0)
```

Verbosity levels: `0` = silent, `1` = warnings only (default for `model_load` and `context_create`), `2` = informational messages, `3` = full debug output. `generate()` and `generate_parallel()` already default to `verbosity = 0`.

Note: `backend_init()` always prints one line (`localLLM backend library loaded successfully.`) regardless of verbosity; this cannot be suppressed.

---

### Output is garbled or nonsensical

**Problem**: The model produces meaningless text.

**Solution**:
1. Ensure you're using a chat template:
   ```{r}
   messages <- list(
     list(role = "user", content = "Your question")
   )
   prompt <- apply_chat_template(model, messages)
   result <- generate(ctx, prompt)
   ```

2. The model file may be corrupted - redownload it

### Output contains strange tokens like `<|eot_id|>`

**Problem**: Output includes control tokens.

**Solution**: Use the `clean = TRUE` parameter:

```{r}
result <- generate(ctx, prompt, clean = TRUE)
# or
result <- quick_llama("prompt", clean = TRUE)
```

### Generation stops too early

**Problem**: Output is cut off before completion.

**Solution**: Increase `max_tokens`:

```{r}
result <- quick_llama("prompt", max_tokens = 500)
```

### Same prompt gives different results

**Problem**: Running the same prompt twice gives different outputs.

**Solution**: Set a seed for reproducibility:

```{r}
result <- quick_llama("prompt", seed = 42)
```

With `temperature = 0` (default), outputs should be deterministic.

---

## Performance Issues

### Generation is very slow

**Problem**: Text generation takes much longer than expected.

**Solutions**:

1. **Use GPU acceleration**:
   ```{r}
   model <- model_load("model.gguf", n_gpu_layers = 999)
   ```

2. **Use a smaller model**: Q4 quantization is faster than Q8

3. **Reduce context size**:
   ```{r}
   ctx <- context_create(model, n_ctx = 512)
   ```

4. **Use parallel processing** for multiple prompts:
   ```{r}
   results <- quick_llama(c("prompt1", "prompt2", "prompt3"))
   ```

### Parallel processing isn't faster

**Problem**: `generate_parallel()` is no faster than sequential generation.

**Solution**: Ensure `n_seq_max` is set appropriately:

```{r}
ctx <- context_create(
  model,
  n_ctx = 2048,
  n_seq_max = 10  # Allow 10 parallel sequences
)
```

---

## Compatibility Issues

### "GGUF format required" error

**Problem**: Trying to load a non-GGUF model.

**Solution**: localLLM only supports GGUF format. Convert your model or find a GGUF version on Hugging Face (search for "model-name gguf").

### Model works in Ollama but not localLLM

**Problem**: An Ollama model doesn't work when loaded directly.

**Solution**: Use the Ollama integration:

```{r}
# List available Ollama models
list_ollama_models()

# Load via Ollama reference
model <- model_load("ollama:model-name")
```

---

## Common Error Messages

| Error | Cause | Solution |
|-------|-------|----------|
| "Backend library is not loaded" | Backend not installed | Run `install_localLLM()` |
| "Invalid model handle" | Model was freed/invalid | Reload the model |
| "Invalid context handle" | Context was freed/invalid | Recreate the context |
| "Failed to open library" | Backend installation issue | Reinstall with `install_localLLM(force_reinstall = TRUE)` |
| "Download timeout" | Network issue or lock file | Clear cache and retry |

---

## Getting Help

If you encounter issues not covered here:

1. **Check the documentation**: `?function_name`
2. **Report bugs**: Email **xu2009@purdue.edu** with:
   - Your code
   - The error message
   - Output of `sessionInfo()`
   - Output of `hardware_profile()`

---

## Quick Reference

```{r}
# Check installation status
lib_is_installed()

# Check hardware
hardware_profile()

# List cached models
list_cached_models()

# List Ollama models
list_ollama_models()

# Clear model cache
cache_dir <- file.path(tools::R_user_dir("localLLM", "cache"), "models")
unlink(cache_dir, recursive = TRUE)

# Force reinstall backend (re-runs GPU detection)
install_localLLM(force_reinstall = TRUE)
```