%\VignettePackage{miscFuncs} 
%\VignetteKeyword{LaTeX tables}
%\VignetteKeyword{Kalman filter}
%\VignetteKeyword{web scraping}
%\VignetteKeyword{development tools}
%\VignetteKeyword{2 by 2 tables}
%\VignetteIndexEntry{miscFuncs} 

\documentclass[nojss]{jss}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%
% Definitions
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%

\usepackage{epsfig}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{url}
\usepackage[authoryear]{natbib}

\newcommand{\real}{\mathbb{R}}
\newcommand{\I}{\mathbb{I}}
\newcommand{\rmd}{\mathrm{d}}
\newcommand{\N}{\mathrm{N}}
\newcommand{\cov}{\mathrm{cov}}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% almost as usual
\author{Benjamin M. Taylor\\Lancaster University, UK}
\title{Kalman Filtering with \pkg{miscFuncs}}

%% for pretty printing and a nice hypersummary also set:
\Plainauthor{Benjamin M. Taylor} %% comma-separated
\Plaintitle{miscFuncs:} %% without formatting
\Shorttitle{miscFuncs R package} %% a short title (if necessary)

%% an abstract and keywords
\Abstract{
This vignette provides a program template for use with the \code{KFadvance} function.
}
\Keywords{\LaTeX\ tables, Kalman filter, web scraping, development tools}
\Plainkeywords{LaTeX tables, Kalman filter, web scraping, development tools} %% without formatting
%% at least one keyword must be supplied

%% publication information
%% NOTE: Typically, this can be left commented and will be filled out by the technical editor
%% \Volume{13}
%% \Issue{9}
%% \Month{September}
%% \Year{2004}
%% \Submitdate{2004-09-29}
%% \Acceptdate{2004-09-29}

%% The address of (at least) one author should be given
%% in the following format:
\Address{
   \textbf{Benjamin M. Taylor}\\
   Division of Medicine,\\
   Lancaster University,\\
   Lancaster,\\
   LA1 4YF,\\
   UK\\
   E-mail: \email{b.taylor1@lancaster.ac.uk}\\
   URL: \url{http://www.lancs.ac.uk/staff/taylorb1/}

}
%% It is also possible to add a telephone and fax number
%% before the e-mail in the following format:
%% Telephone: +43/1/31336-5053
%% Fax: +43/1/31336-734

%% for those who use Sweave please include the following line (with % symbols):
%% need no \usepackage{Sweave.sty}

%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\begin{document}

\section{Introduction}

The purpose of this vignette is to provide a template R functions for implementing the Kalman Filter and for parameter estimation for Gaussian dynamic linear models. The functions should provide a FLEXIBLE basis on which to build R code for optimal linear filtering.

After loading \pkg{miscFuncs}, the templates below can be printed to the R console (and hence copied and pasted into an editor) using \code{KFtemplates}:

\begin{CodeChunk}
\begin{CodeInput}
library(miscFuncs)
KFtemplates()
\end{CodeInput}
\end{CodeChunk}



\section{The Statistical Model}

Let $\Theta_t$ be the state vector and $Y_t$ be the observation vector at time $t$.

The function \code{KFadvance} works with COLUMN VECTORS.

The statistical model is:

\begin{eqnarray}
   \Theta_t &=& A_t\Theta_{t-1} + B_t + C_t\epsilon_t,\qquad \epsilon_t\sim\N(0,W_t)\label{eqn:system}\\
   Y_t &=& D_t\Theta_{t-1} + E_t + F_t\nu_t,\qquad \nu_t\sim\N(0,V_t)\label{eqn:observation}
\end{eqnarray}

Suppose $\Theta_t$ has dimension $n\times1$ and $Y_t$ has dimension $m\times1$, then the matrices in the above have dimensions:

\begin{center}
\begin{tabular}{ll}
matrix & dimension\\
$A_t$ & $n \times n$ \\
$B_t$ & $n \times 1$ \\
$C_t$ & $n \times n$ \\
$W_t$ & $n \times n$ \\
$D_t$ & $m \times n$ \\
$E_t$ & $m \times 1$ \\
$F_t$ & $m \times m$ \\
$V_t$ & $m \times m$ 
\end{tabular}
\end{center}

The matrix $D_t$ acts like a design matrix in ordinary least squares regression.

The user must also specify priors (aka intial values) for $\Theta$ in the form of a prior mean $n\times1$ matrix (called \code{Xpost} in the code below\footnote{this might seem a strange name, given that it is the prior, but in the template code below, this object is overwritten and eventually becomes the posterior mean.}) and a prior variance $n\times n$ matrix (called \code{Vpost} in the code below).

Typically some, or all of $A_t$, $B_t$, $C_t$, $W_t$, $D_t$, $E_t$, $F_t$, $V_t$ will be parametrised. Part of the goal of filtering will typically involve estimating these parameters by maximum likelihood. To allow optimisation to work well, model parameters should be free to roam between real numbers. For example if a parameter represents a variance (ie should onyl take positive values) then the inside the \code{KFfit} function, use for example \code{sigma <- exp(param[1])} or if the parameter is on the $[0,1]$ then use the inverse logistic function, \code{exp(param[2])/(1+exp(param[2]))}. The template for parameter estimation comes later.

\section{Fitting the Model to Some Data}

This section provides template code for Kalman filtering under the above model.

\begin{CodeChunk}
\tiny
\begin{CodeOutput}
# see vignette for notation

KFfit <- function(  param,              # model parameters
        		    data,               # a matrix containing the data ie Y           
        		    ,...,               # delete this line and include other arguments to be passed
                                        #     to the function here
        		    optim=FALSE,        # set this to FALSE if not estimating parameters, otherwise,
        		                        #     if parameters have already been estimated, then set to TRUE
        		    history.means=FALSE,# whether to save a matrix of filtered E(Theta_t)
        		    history.vars=FALSE, # whether to save a list of filtered V(Theta_t)
        		    prior.mean=NULL,    # optional prior mean. column vector.    # Normally set 
        		    prior.var=NULL,     # optional prior mean. matrix.           # these inside the code
        		    fit=FALSE,          # whether to return a matrix of fitted values
        		    se.fit=FALSE,       # whether to return the standard error of the fitted values
        		    se.predict=FALSE,   # whether to return the prediction standard error =
        		                        #     se(fitted values) + observation variance V_t
        		    noisy=TRUE          # whether to print a progress bar, useful.
        		    na.rm=FALSE){       # whether to use NA handling. set to TRUE if any Y is NA 
		  
    start <- Sys.time()  
    
    if(se.predict & !se.fit){
        stop("Must have se.fit=TRUE in order to compute se.predict") # leads to a computational saving 
    }    
    
    #
    # Here, I would suggest creating dummy names for your paramters eg
    # sigma.obs <- exp(model.param[1])
    #
    #

    T <- dim(data)[1]   # ASSUMES OBSERVATIONS ARE IN A (T x m) matrix, ie row t contains the data for time t. 
                        # Note this is important for when KFadvance is called later     
    
    if(is.null(prior.mean)){
        Xpost <- DEFINE PRIOR MEAN HERE
    }
    else{
        Xpost <- prior.mean
    }
    
    if(is.null(prior.var)){   
        Vpost <- DEFINE PRIOR VARIANCE HERE
    }
    else{
        Vpost <- prior.var
    }    
    
    if (history.means){
        Xrec <- Xpost
    }
    if(history.vars){
        Vrec <- list()
        Vrec[[1]] <- Vpost
    }
    
    # delete or complete the following rows as necessary, also appears in the loop that follows
    A <- IF A IS FIXED OVER TIME THEN DEFINE IT HERE
    B <- IF B IS FIXED OVER TIME THEN DEFINE IT HERE
    C <- IF C IS FIXED OVER TIME THEN DEFINE IT HERE
    W <- IF W IS FIXED OVER TIME THEN DEFINE IT HERE   
    D <- IF D IS FIXED OVER TIME THEN DEFINE IT HERE
    E <- IF E IS FIXED OVER TIME THEN DEFINE IT HERE
    F <- IF F IS FIXED OVER TIME THEN DEFINE IT HERE
    V <- IF V IS FIXED OVER TIME THEN DEFINE IT HERE
    
    loglik <- 0 
    fitmat <- c() 
    sefitmat <- c()
    sepredictmat <- c()
    
    if(noisy){
        pb <- txtProgressBar(min=1,max=T,style=3)
    }
      
    for(t in 1:T){ 
    
        # delete or complete the following rows as necessary
        A <- IF A IS TIME-VARYING THEN DEFINE IT HERE
        B <- IF B IS TIME-VARYING THEN DEFINE IT HERE
        C <- IF C IS TIME-VARYING THEN DEFINE IT HERE
        W <- IF W IS TIME-VARYING THEN DEFINE IT HERE   
        D <- IF D IS TIME-VARYING THEN DEFINE IT HERE
        E <- IF E IS TIME-VARYING THEN DEFINE IT HERE
        F <- IF F IS TIME-VARYING THEN DEFINE IT HERE
        V <- IF V IS TIME-VARYING THEN DEFINE IT HERE
        
        # this bit calls KF advance      
        new <- KFadvance(obs=data[t,],oldmean=Xpost,oldvar=Vpost,A=A,B=B,C=C,D=Dt,E=E,F=F,W=W,V=V,marglik=TRUE,log=TRUE,na.rm=na.rm)
        
        Xpost <- new$mean
		Vpost <- new$var
		
		if(t==1){ # used when this function is called iteratively one step at a time
		    running.mean <- Xpost
		    running.var <- Vpost
		}
		
		loglik <- loglik + new$mlik
			
		
		if (history.means){
            Xrec <- cbind(Xrec,Xpost)
        }
        if(history.vars){
            Vrec[[t+1]] <- Vpost # since first entry is the prior
        }
        
        if(fit){
            fitmat <- cbind(fitmat,D%*%Xpost + E)
        }
        if(se.fit){
            sefitmat <- cbind(sefitmat,sqrt(diag(D%*%Vpost%*%t(D))))
        }
        if(se.predict){
            sepredictmat <- cbind(sepredictmat,sqrt((sefitmat[,ncol(sefitmat)])^2+diag(F%*%V%*%t(F))))
        }
        
        if(noisy){
            setTxtProgressBar(pb,t) 
        }
    }
    
    if(noisy){        
        close(pb)
    }
    end <- Sys.time()
    
    if(noisy){        
        cat("Time taken:",difftime(end,start,units="secs")," seconds.\n")
    }
    
    if(optim){
        return(-loglik) # just return the -log likelihood if in parameter estimation mode
    }
    else{
        retlist <- list(mean=Xpost,var=Vpost,mlik=loglik,data=data,running.mean=running.mean,running.var=running.var)
        if(history.means){
            retlist$history.means <- Xrec
        }
        if(history.vars){
            retlist$history.vars <- Vrec
        } 
        if(fit){
            retlist$fit <- fitmat
        }
        if(se.fit){
            retlist$se.fit <- sefitmat
        }
        if(se.predict){
            retlist$se.predict <- sepredictmat        
        }
        return(retlist)
    }
}
\end{CodeOutput}
\end{CodeChunk}

\section{Parameter Estimation}

This section provides template code for parameter estimation in Kalman filtering.

\begin{CodeChunk}
\tiny
\begin{CodeOutput}
KFparest <- function(   data, # data ie Y 
                        ...){ # delete and paste in OTHER ARGUMENTS TO BE PASSED TO KFfit
    start <- Sys.time()
    
    inits <- PUT INITIAL VALUES FOR PARAMETER VECTOR HERE
    
    # use optim to find optimal parameters
    oppars <- optim(inits,
                    KFfit,
                    data=data,
                    OTHER ARGUMENTS TO BE PASSED TO KFfit,  # delete and paste in OTHER ARGUMENTS 
                                                            # TO BE PASSED TO KFfit
                    optim=TRUE,
                    control=list(trace=100))    
    
    end <- Sys.time()
    cat("\n")
    cat("Time Taken",difftime(end,start,units="mins"),"\n")
    cat("\n")
    
    return(oppars)
}
\end{CodeOutput}
\end{CodeChunk}

\end{document}
