Abstract:
The presence of insignificant predictors in models causes estimation bias and reduces prediction precision.
Collinearity among predictors is a common problem that renders the design matrix unstable leading to unreliable OLS
coefficient estimates. Multiple linear regression analysis in a non-regularized routine is unsatisfactory due to poor prediction
as the inclusion of all variables reduces noise but increases variance and for interpretation, it becomes necessary to identify
the important predictors that have a high influence on the response variable. The study implements the Bayesian Stochastic
Search Variable selection (B-SSVS) algorithm in the context of multiple linear regression with the incorporation of a
correlation factor prior specification to address the correlation problem which reduces the performance of the Markov chain
Monte Carlo and Gibbs sampling process. Further, comparative analysis on variable selection performance with classical
penalized methods Elastic Net and Least Absolute Shrinkage Selection Operator (Lasso) is done using simulated data. We
found that B-SSVS with a correlation factor prior showed good performance, mixing and convergence properties based on the
diagnostic tests. B-SSVS performed better in variable selection compared to Elastic Net and Lasso shrinkage methods. We also
found out that Elastic Net outperforms Lasso in detecting the true predictors and has less cross-validation mean squared error.