Bayesian sparse regression for reconstruction of individual muscle activity from fMRI – A method based on MAP-EM algorithm with ARD prior
In humans, electrophysiology or other invasive measurement of brain activity can generally not be used to identify a mapping from brain to muscle activity. Further, it is generally believed that the spatial resolution of non-invasive brain imaging modalities such as functional magnetic resonance imaging (fMRI) is not sufficient to isolate neural activity related to individual muscles. In Ganesh et al. 2008 we showed that it is possible to identify a quantitative mapping of fMRI to muscle activity. We simultaneously recorded surface electromyography (EMG) from two antagonist muscles and motor cortex activity using fMRI, during an isometric task requiring both reciprocal and co-activation of the wrist muscles. Bayesian sparse regression was used to identify the parameters of a linear function from the fMRI activity in areas 4 (M1) and 6 (pre-motor, SMA) to EMG, which we used to predict muscle activity in an independent test data set.
In this report we give a basic theory behind the algorithm and explain the steps for the implementation of the algorithm by a novice.
The linear regression function is in the form
(1)
where
is the muscle activity (regressand) to be reconstructed and
is the design matrix with the time series of a set of p voxels from the selected ROI in the 'regression set'. The parameters will be computed to realize the best mapping of the voxels to the muscle activity while making the function sparse. is the noise for n brain scans recorded in an experiment.
To favour a sparse representation, a Laplacian prior is chosen for β such that
where and α is the parameter of this density, then the maximum a posteriori estimate of β is given by
(2)
The term with the Euclidean norm || ||2 expresses the least-square approximation of the regress and, while the term || ||1is to produce a sparse representation. The || ||1norm, corresponding to the absolute value, has a larger gradient than the || ||2 norm in the vicinity of the minimum and thus should converge fast to zero, i.e. to a sparse representation. This criterion was named as least absolute shrinkage and selection operator or LASSO (Tibshirani 1996).
The Laplacian prior is equivalent to a two level hierarchical Bayes model with zero mean Gaussian priors with independent exponentially distributed variances (Figueiredo 2003; Lange and Sinsheimer 1993). This allows us to use the expectation- maximization (EM) algorithm to implement the LASSO criterion (Osborne et al. 2000, Tibshirani 1996) in two steps:
In our regression (4) and (5) are used to make β sparse, determined by the value of parameter σ. The optimal value of σ was determined by an adaptive process as explained in next section.
To achieve a good generalization of a model or mapping, it is necessary to control the complexity of the learnt function. If the function is too complex then it will reproduce unimportant details of the training set, thus over-fitting the training data. Such a function will fit the training data well but will be unable to predict a different test data well. Conversely, an overly simple function will not be able to capture the true mapping between the regressor and the regressand, thus under-fitting the training data. In our case the number of voxels chosen during training decides the ability of the model to generalize well. The typical behaviour of our data set is represented in (Fig. 1).
The regression algorithm has one free parameter (σ) to adjust the number of voxels used during training. We utilize over-fitting, detected by the performances on a representative portion of the test set, the selection
set, as a criterion to recursively select an optimal value of σ.
The EM algorithm loop starts with σ =1. The evaluated mapping vector σ is tested on the selection set and the coefficient of determination (R2) is evaluated. A binary search is used to determine the σ with minimal R2.
The performance has been quantified using the coefficient of determination , where yr represents the muscle regressand value, r its average and yc the reconstructed value.
In the above code beta represents the m x t matrix of regression parameters of t iteration steps and m voxels. yt represents the n x 1 vector of regressand of n data points (scans) and Xt is the n x m matrix of voxel activity regressor.
In the first step the code starts with sigma=1. The 'while loop' may be run for a constant long time, e.g. 100 times. Conversely this may also be automated by checking for regression error. This may be done by commenting the current 'while' command and de-commenting the green lines. However, the convergence (thresh) may have to be tuned according to the noise in the data.A Matlab code for the above optimization procedure will be uploaded on this website in the near future. At present a population search algorithm is used for the purpose. This procedure may however take a long time and the recommended method is to optimize according to the steps given above.
The code is implemented for direct use in the program 'reconstruct.m' which takes data from 'data_sample.mat' and uses the function 'sp_regress.m'. 'data_sample.mat' contains muscle activity data from a wrist muscle as regressands (yt,ys,yr) recorded during fMRI. The brain activity from voxels from the motor cortex (area4, area6) make up the regressors (Xt,Xs,Xr). Function 'sp_regress.m' uses population search to optimize the parameter 'sigma'.
Neuroimage, 42(4):1463-72, 2008
Sparse linear regression for reconstructing muscle activity from human cortical fMRI.IEEE Trans. Pattern Analy. and Machine Intelligence, 25(9):1150-59, 2003.
Adaptive sparseness for supervised learning.J. Computational and Graphical Statistics 2: 175-198, 1993.
Normal/Independent distributions and their applications in robust regression.IMA J. Numerical Analysis 20: 389-404, 2000.
, A new approach to variable selection in least squares problems.J. Royal Statistical Soc. (B) 58: 267-288, 1996.
Regression shrinkage and selection via the LASSO.