Top
Bayesian Sparse Regression
What is Bayesian Sparse Regression?

Bayesian sparse regression for reconstruction of individual muscle activity from fMRI – A method based on MAP-EM algorithm with ARD prior

In humans, electrophysiology or other invasive measurement of brain activity can generally not be used to identify a mapping from brain to muscle activity. Further, it is generally believed that the spatial resolution of non-invasive brain imaging modalities such as functional magnetic resonance imaging (fMRI) is not sufficient to isolate neural activity related to individual muscles. In Ganesh et al. 2008 we showed that it is possible to identify a quantitative mapping of fMRI to muscle activity. We simultaneously recorded surface electromyography (EMG) from two antagonist muscles and motor cortex activity using fMRI, during an isometric task requiring both reciprocal and co-activation of the wrist muscles. Bayesian sparse regression was used to identify the parameters of a linear function from the fMRI activity in areas 4 (M1) and 6 (pre-motor, SMA) to EMG, which we used to predict muscle activity in an independent test data set.

In this report we give a basic theory behind the algorithm and explain the steps for the implementation of the algorithm by a novice.

Regression Analysis

The linear regression function is in the form
(1)
where is the muscle activity (regressand) to be reconstructed and

is the design matrix with the time series of a set of p voxels from the selected ROI in the 'regression set'. The parameters will be computed to realize the best mapping of the voxels to the muscle activity while making the function sparse. is the noise for n brain scans recorded in an experiment.
To favour a sparse representation, a Laplacian prior is chosen for β such that

where and α is the parameter of this density, then the maximum a posteriori estimate of β is given by
(2)
The term with the Euclidean norm || ||2 expresses the least-square approximation of the regress and, while the term || ||1is to produce a sparse representation. The || ||1norm, corresponding to the absolute value, has a larger gradient than the || ||2 norm in the vicinity of the minimum and thus should converge fast to zero, i.e. to a sparse representation. This criterion was named as least absolute shrinkage and selection operator or LASSO (Tibshirani 1996).

The Laplacian prior is equivalent to a two level hierarchical Bayes model with zero mean Gaussian priors with independent exponentially distributed variances (Figueiredo 2003; Lange and Sinsheimer 1993). This allows us to use the expectation- maximization (EM) algorithm to implement the LASSO criterion (Osborne et al. 2000, Tibshirani 1996) in two steps:

E-step

and
M-step

The parameter γ controlling the degree of sparseness of β may be eliminated using a Jeffreyfs non-informative hyper-prior.
>
This is equivalent to an automatic relevance determination (ARD) prior.
The EM steps may now be adjusted to get the following modified EM algorithm (Figueiredo 2003).
E-step
(4)
M-step
(5)

In our regression (4) and (5) are used to make β sparse, determined by the value of parameter σ. The optimal value of σ was determined by an adaptive process as explained in next section.


Evolution of error with number of voxels selected during training. The plot shows the evolution of training and selection set testing error and COD values with change in σ for the FCR muscle. With the decrease in the value of σ (and increase in the number of selected voxels) the (blue) training error decreases. However the testing error typically shows a (red) U-shaped behaviour with respect to σ. The binary search algorithm is used to find the value of σ (σopt) at which the testing error is minimum and COD is maximum. The voxel number corresponding to σopt is taken as the optimal (nopt) for the reconstruction of the EMG.
Avoiding over-fitting

To achieve a good generalization of a model or mapping, it is necessary to control the complexity of the learnt function. If the function is too complex then it will reproduce unimportant details of the training set, thus over-fitting the training data. Such a function will fit the training data well but will be unable to predict a different test data well. Conversely, an overly simple function will not be able to capture the true mapping between the regressor and the regressand, thus under-fitting the training data. In our case the number of voxels chosen during training decides the ability of the model to generalize well. The typical behaviour of our data set is represented in (Fig. 1). The regression algorithm has one free parameter (σ) to adjust the number of voxels used during training. We utilize over-fitting, detected by the performances on a representative portion of the test set, the selection set, as a criterion to recursively select an optimal value of σ.

The EM algorithm loop starts with σ =1. The evaluated mapping vector σ is tested on the selection set and the coefficient of determination (R2) is evaluated. A binary search is used to determine the σ with minimal R2.

The performance has been quantified using the coefficient of determination , where yr represents the muscle regressand value, r its average and yc the reconstructed value.

Steps of Implementation
1) Preprocessing of regressand
2) Preprocessing of imaging data
3)
The regressand and regressor data are divided into three parts with about 50% data as training set (Xt, yt), about 20% data as selection set (Xs, ys) and the rest as test set (Xr, yr). The data used for training, selection and testing may be shuffled to check for the generalization of the re-construction process. In case of multiple scanning sessions, the results are better if the selection and test set are from the same session.
4)
Calculation of regression parameter β (beta) may then be carried out with the following matlab code:

%-----------------------------------------------------------------------
sigma=1;% parameter to be calculated by over-fitting criterion
last=100;
% thresh=.05; % threshold of 5%
%while (abs(e(i-1)-e(i-2))>e(i-1)*thresh)

while (i<last)
V=diag(abs(<>beta(:,i)));
beta(:,i+1)=V*inv(sigma^2*diag(ones(k,1))+V*Xt'*Xt*V)*V*Xt'*yt;
%e(i)=sum(abs(yt-Xt*beta(:,i+1)));
i=i+1;

end;
%---------------------------------------------------

In the above code beta represents the m x t matrix of regression parameters of t iteration steps and m voxels. yt represents the n x 1 vector of regressand of n data points (scans) and Xt is the n x m matrix of voxel activity regressor.

In the first step the code starts with sigma=1. The 'while loop' may be run for a constant long time, e.g. 100 times. Conversely this may also be automated by checking for regression error. This may be done by commenting the current 'while' command and de-commenting the green lines. However, the convergence (thresh) may have to be tuned according to the noise in the data.

5)
Checking for over-fitting: The regression parameter beta from step 4, beta(:,last) is checked on the selection set using the coefficient of performance (R2).
%-----------------------------------------------------------------------
yc=Xs*beta(:,last);% reconstruction in selection set
R2=1-sum((ys-yc).^2)/sum((ys-mean(ys)).^2); % coeff .of performance calculation
%-------------------------------------------------------------------------
Depending on the value of the R2 the value of sigma is changed and step 4 is re-run. This is done by binary search either manually or using any code available on the web. The procedure is as follows:
6)
Reconstruction of the regrassand may now be achieved in the test set as
%-----------------------------------------------------------------------
yc = Xr*beta_opt;
% reconstruction in test set
R2 = 1 - sum((yr-yc).^20)/sum((yr-mean(yr)).^2); % coeff .of performance of test
%-------------------------------------------------------------------------

A Matlab code for the above optimization procedure will be uploaded on this website in the near future. At present a population search algorithm is used for the purpose. This procedure may however take a long time and the recommended method is to optimize according to the steps given above.

The code is implemented for direct use in the program 'reconstruct.m' which takes data from 'data_sample.mat' and uses the function 'sp_regress.m'. 'data_sample.mat' contains muscle activity data from a wrist muscle as regressands (yt,ys,yr) recorded during fMRI. The brain activity from voxels from the motor cortex (area4, area6) make up the regressors (Xt,Xs,Xr). Function 'sp_regress.m' uses population search to optimize the parameter 'sigma'.

References

Ganesh G, Burdet E, Haruno M, Kawato M. Sparse linear regression for reconstructing muscle activity from human cortical fMRI. Neuroimage, 42(4):1463-72, 2008

Figueiredo MA. Adaptive sparseness for supervised learning. IEEE Trans. Pattern Analy. and Machine Intelligence, 25(9):1150-59, 2003.

Lange K, Sinsheimer J. Normal/Independent distributions and their applications in robust regression. J. Computational and Graphical Statistics 2: 175-198, 1993.

Osborne M, Presnell B, Turlach B, A new approach to variable selection in least squares problems. IMA J. Numerical Analysis 20: 389-404, 2000.

Tibshirani R. Regression shrinkage and selection via the LASSO. J. Royal Statistical Soc. (B) 58: 267-288, 1996.

Download
This file includes next files.