Bayesian Sparse Regression

Regression Analysis

The linear regression function is in the form
(1)
where is the muscle activity (regressand) to be reconstructed and

is the design matrix with the time series of a set of p voxels from the selected ROI in the 'regression set'. The parameters will be computed to realize the best mapping of the voxels to the muscle activity while making the function sparse. is the noise for n brain scans recorded in an experiment.
To favour a sparse representation, a Laplacian prior is chosen for β such that

where and α is the parameter of this density, then the maximum a posteriori estimate of β is given by
(2)
The term with the Euclidean norm || ||₂ expresses the least-square approximation of the regress and, while the term || ||₁is to produce a sparse representation. The || ||₁norm, corresponding to the absolute value, has a larger gradient than the || ||₂ norm in the vicinity of the minimum and thus should converge fast to zero, i.e. to a sparse representation. This criterion was named as least absolute shrinkage and selection operator or LASSO (Tibshirani 1996).

The Laplacian prior is equivalent to a two level hierarchical Bayes model with zero mean Gaussian priors with independent exponentially distributed variances (Figueiredo 2003; Lange and Sinsheimer 1993). This allows us to use the expectation- maximization (EM) algorithm to implement the LASSO criterion (Osborne et al. 2000, Tibshirani 1996) in two steps:

E-step

and

M-step

The parameter γ controlling the degree of sparseness of β may be eliminated using a Jeffrey’s non-informative hyper-prior.

>
This is equivalent to an automatic relevance determination (ARD) prior.
The EM steps may now be adjusted to get the following modified EM algorithm (Figueiredo 2003).

E-step

(4)

M-step

(5)

In our regression (4) and (5) are used to make β sparse, determined by the value of parameter σ. The optimal value of σ was determined by an adaptive process as explained in next section.

Evolution of error with number of voxels selected during training. The plot shows the evolution of training and selection set testing error and COD values with change in σ for the FCR muscle. With the decrease in the value of σ (and increase in the number of selected voxels) the (blue) training error decreases. However the testing error typically shows a (red) U-shaped behaviour with respect to σ. The binary search algorithm is used to find the value of σ (σ_opt) at which the testing error is minimum and COD is maximum. The voxel number corresponding to σ_opt is taken as the optimal (n_opt) for the reconstruction of the EMG.

Avoiding over-fitting

To achieve a good generalization of a model or mapping, it is necessary to control the complexity of the learnt function. If the function is too complex then it will reproduce unimportant details of the training set, thus over-fitting the training data. Such a function will fit the training data well but will be unable to predict a different test data well. Conversely, an overly simple function will not be able to capture the true mapping between the regressor and the regressand, thus under-fitting the training data. In our case the number of voxels chosen during training decides the ability of the model to generalize well. The typical behaviour of our data set is represented in (Fig. 1). The regression algorithm has one free parameter (σ) to adjust the number of voxels used during training. We utilize over-fitting, detected by the performances on a representative portion of the test set, the selection set, as a criterion to recursively select an optimal value of σ.

The EM algorithm loop starts with σ =1. The evaluated mapping vector σ is tested on the selection set and the coefficient of determination (R²) is evaluated. A binary search is used to determine the σ with minimal R².

The performance has been quantified using the coefficient of determination , where y_r represents the muscle regressand value, _r its average and y_c the reconstructed value.

Steps of Implementation

1) Preprocessing of regressand

The regressand is the physical parameter that will be reconstructed/ decoded from the brain activity. We reconstructed the muscle EMG and thus used smoothed EMG activity as regressand.
The regressand is re-sampled at the same frequency as the repetition time (TR) of the fMRI scan.
This data is used in the training (yt), test (yr) and selection (ys)

2) Preprocessing of imaging data

The brain images from the collected in different sessions should be realigned to the first scan of session 1 to ensure there was no displacement among scans from the different sessions.
For each subject a region of interest (ROI), may be defined for reconstruction
The voxel time series from the ROI (or entire brain) is extracted and band-pass filtered between 0.003 Hz (300 sec) and 0.2 Hz (5 sec) before the regression analysis. The low frequency cut-off value will remove any drift in the fMRI signal while the high cut-off of 0.2 Hz will reduce noise constrained by the frequency content of haemodynamic function (hrf). This preprocessed data was used as the regressor in the training (Xt), test (Xr) and selection (Xs) analysis.

The regressand and regressor data are divided into three parts with about 50% data as training set (Xt, yt), about 20% data as selection set (Xs, ys) and the rest as test set (Xr, yr). The data used for training, selection and testing may be shuffled to check for the generalization of the re-construction process. In case of multiple scanning sessions, the results are better if the selection and test set are from the same session.

Calculation of regression parameter β (beta) may then be carried out with the following matlab code:

%-----------------------------------------------------------------------
sigma=1;% parameter to be calculated by over-fitting criterion
last=100;
% thresh=.05; % threshold of 5%
%while (abs(e(i-1)-e(i-2))>e(i-1)*thresh)
while (i<last)
V=diag(abs(<>beta(:,i)));
beta(:,i+1)=V*inv(sigma^2*diag(ones(k,1))+V*Xt'*Xt*V)*V*Xt'*yt;
%e(i)=sum(abs(yt-Xt*beta(:,i+1)));
i=i+1;
end;
%---------------------------------------------------

In the above code beta represents the m x t matrix of regression parameters of t iteration steps and m voxels. y_t represents the n x 1 vector of regressand of n data points (scans) and Xt is the n x m matrix of voxel activity regressor.

In the first step the code starts with sigma=1. The 'while loop' may be run for a constant long time, e.g. 100 times. Conversely this may also be automated by checking for regression error. This may be done by commenting the current 'while' command and de-commenting the green lines. However, the convergence (thresh) may have to be tuned according to the noise in the data.

Checking for over-fitting: The regression parameter beta from step 4, beta(:,last) is checked on the selection set using the coefficient of performance (R2).

%-----------------------------------------------------------------------

yc=Xs*beta(:,last);% reconstruction in selection set

R2=1-sum((ys-yc).^2)/sum((ys-mean(ys)).^2);
% coeff .of performance calculation

%-------------------------------------------------------------------------

Depending on the value of the R2 the value of sigma is changed and step 4 is re-run. This is done by binary search either manually or using any code available on the web. The procedure is as follows:

Check the value of R2₁ for a particular sigma (say σ₁=1) using step 4 and step 5.
Change the sigma, (say σ₂=0.1) and calculate R2₂.
If R2₂ > R2₁, then next choose s₂ and try values around σ₂.
Check for sigma values, say σ₃=0.5 and σ₄=0.05 to get R2₃ and R2₄.
If R2₃>R2₄ then choose σ₃ and try values around σ₃ like step IV else try values around σ₄.
Continue like in steps IV and V till convergence of R₂ is achieved. This corresponds to the optimal σ_opt (sigma_opt) and gives the optimal beta (beta_opt) from step 4.

Reconstruction of the regrassand may now be achieved in the test set as

%-----------------------------------------------------------------------

yc = Xr*beta_opt;
 % reconstruction in test set

R2 = 1 - sum((yr-yc).^20)/sum((yr-mean(yr)).^2); % coeff .of performance of test

%-------------------------------------------------------------------------

A Matlab code for the above optimization procedure will be uploaded on this website in the near future. At present a population search algorithm is used for the purpose. This procedure may however take a long time and the recommended method is to optimize according to the steps given above.

The code is implemented for direct use in the program 'reconstruct.m' which takes data from 'data_sample.mat' and uses the function 'sp_regress.m'. 'data_sample.mat' contains muscle activity data from a wrist muscle as regressands (yt,ys,yr) recorded during fMRI. The brain activity from voxels from the motor cortex (area4, area6) make up the regressors (Xt,Xs,Xr). Function 'sp_regress.m' uses population search to optimize the parameter 'sigma'.

References

Ganesh G, Burdet E, Haruno M, Kawato M. Sparse linear regression for reconstructing muscle activity from human cortical fMRI. Neuroimage, 42(4):1463-72, 2008

Figueiredo MA. Adaptive sparseness for supervised learning. IEEE Trans. Pattern Analy. and Machine Intelligence, 25(9):1150-59, 2003.

Lange K, Sinsheimer J. Normal/Independent distributions and their applications in robust regression. J. Computational and Graphical Statistics 2: 175-198, 1993.

Osborne M, Presnell B, Turlach B, A new approach to variable selection in least squares problems. IMA J. Numerical Analysis 20: 389-404, 2000.

Tibshirani R. Regression shrinkage and selection via the LASSO. J. Royal Statistical Soc. (B) 58: 267-288, 1996.

Download

Sparse Regression Toolbox

This file includes next files.

sp_regress.m
reconstruct.m
data_sample.mat
Sparse_reg.doc