Generalized ridge estimator shrinkage estimation based on particle swarm optimization algorithm.

Abdul Kareem, Qamar; Algamal, Zakariya

doi:10.33899/iqjoss.2020.167387

Journals List

Generalized ridge estimator shrinkage estimation based on particle swarm optimization algorithm.

IRAQI JOURNAL OF STATISTICAL SCIENCES

Article 4, Volume 17, Issue 2, December 2020, Pages 26-35 PDF (1.15 M)

Document Type: Research Paper

DOI: 10.33899/iqjoss.2020.167387

Authors

Qamar Abdul kareem^* ; Zakariya Algamal

Department of Statistics and Informatics, College of Computer science and Mathematics, University of Mosul, Mosul, Iraq

Abstract

It is well-known that in the presence of multicollinearity, the ridge estimator is an alternative to the ordinary least square (OLS) estimator. Generalized ridge estimator (GRE) is an generalization of the ridge estimator. However, the efficiency of GRE depends on appropriately choosing the shrinkage parameter matrix which is involved in the GRE. In this paper, a particle swarm optimization method, which is a metaheuristic continuous algorithm, is proposed to estimate the shrinkage parameter matrix. The simulation study and real application results show the superior performance of the proposed method in terms of prediction error.

Highlights

In this paper, a new shrinkage parameter selection of the generalized ridge estimator, which depends on employing the particle swarm optimization algorithm, was proposed. This proposed method allows us to handle multicollinearity with decreasing the variability of shrinkage parameter selection. Simulation and results demonstrate that the proposed method outperformed several classical methods. Furthermore, the results proved that the proposed method is more efficient than the method of Hoerl and Kennard (1970).

Keywords

Multicollinearity; shrinkage parameter; generalized ridge estimator; particle swarm

Full Text

Introduction

Regression modeling is a widely applied strategy for studying several real data problems. In the linear regression model, the response variable is considered as continuous and reasonably assumed to follow the normal distribution. In linear regression models, it is assumed that the correlations among the explanatory variables are not high (Alheety and Kibria, 2014,Alkhamisi and Shukur, 2007,Dorugade, 2014). However, this assumption is not always held in practice. In the linear regression model, the ordinary least squares (OLS) estimator is the best estimator among all linear and unbiased estimators. However, under multicollinearity, the OLS estimator becomes unhelpful due to its large variance. The ridge estimator (RE) (Hoerl and Kennard, 1970) has been consistently demonstrated to be an alternative to the OLS when multicollinearity exists. The RE can shrink all the regression coefficients toward zero to reduce the large variance (Asar and Genç, 2015,Algamal, 2018). The generalized ridge estimator (GRE) has also been considered as a generalization of the RE. The performance of the GRE fully depends on the values of the shrinkage parameter matrix. Accordingly, appropriate choosing the shrinkage parameter matrix is an important part of applying the GRE. Numerous approaches are available in the literature for estimating the shrinkage parameter (Khalaf and Shukur, 2005,Allen, 1974,Muniz and Kibria, 2009,Kibria, 2003,Hamed et al., 2013,Alkhamisi et al., 2006,Hefnawy and Farag, 2014). In recent years, numerous nature-inspired algorithms have been successfully introduced and applied as random search strategies for solving a number of optimization problems. The Particle swarm optimization algorithm is a comparatively recent population-based algorithm that is inspired by swarm intelligence. In this paper, the particle swarm optimization algorithm is proposed to estimate the values of the shrinkage parameter matrix in the GRE. Our proposed approach will efficiently help to find the best values with high prediction accuracy. The superiority of our proposed approach in different simulated examples and a real data application is proved.

2. Generalized ridge estimator

Suppose that we have a data set where is a response variable and represents a p-dimensional explanatory variable vector. Without loss of generality, it is assumed that the response variable is centered and the explanatory variables are standardized.

Consider the following linear regression model,

(1)

where is an vector of observations of the response variable, is an known design matrix of explanatory variables, is a vector of unknown regression coefficients, and is an vector of random errors with mean 0 and variance . Using the OLS method, the parameter estimation of Eq. (1) is given by

(2)

The OLS estimator is unbiased and has minimum variance among all linear unbiased estimators. However, in the presence of multicollinearity, the matrix is nearly singular, which makes the OLS estimator unstable due to the large variance. To reduce the effects of the multicollinearity, the RE (Hoerl and Kennard, 1970), which is the most commonly used method, adds a positive shrinkage parameter, , to the main diagonal of the matrix. The RE is defined as

(3)

where is the identity matrix with dimension . The estimator is biased but more stable and has less mean square error. The shrinkage parameter, , controls the shrinkage of toward zero. The OLS estimator can be considered as a special estimator from the RE with . For a large value of , the RE yields greater shrinkage approaching zero (Yang and Emura, 2017). Rewriting Eq. (1) as the canonical form introduced by [4] , we obtain

(4)

where , is a matrix such that will imply where is a diagonal matrix with the Eigen values of and . Then, the OLS estimator of is given by

(5)

Therefore the OLS estimator of is

(6 )

The RE estimator given in Eq (3) is rewritten by as (Hoerl and Kennard, 1970)

(7)

where .

Related to Eq. (3) and Eq.(5), the mean square error (MSE) is

(8)

The GRE is suggested by [4] to generalize the ridge estimator. Several researchers deal with the GRE (Loesgen, 1990,Trenkler and Toutenburg, 1990,Ohtani, 1995). The difference between the RE and the GRE is that there are values of for the GRE estimator such that (Hoerl and Kennard, 1970)

(9)

where and . The MSE of the GRE estimator, which is less than when using the RE and OLS, is

(10)

where (Hoerl and Kennard, 1970).

Since the ridge parameter is the key to reducing the multicollinearity, there are several ways to determine this value.Researcherssuggestseveral ways to choose the optimal k, such as (Hocking et al., 1976) (HK), (Nomura, 1988) (N), (Troskie and Chalton, 1996) (TC), (Firinguetti, 1999) (F), (Alkhamisi and Shukur, 2007) (HSL), (Batah et al., 2008) (SB), (Al-Hassan, 2010) (AH), (Dorugade and Kashid, 2010) (DK), (Månsson et al., 2010) (M), (Dorugade, 2014) (D), (Asar et al., 2014) (AS) and (Bhat and Raju, 2017) (SV1,SV2). These methods can be defined as:

(11)

(12)

(13)

(14)

(15)

(16)

(17)

(18)

(19)

(20)

(21)

(22)

(23)

3. The proposed methods

The efficiency of the ridge regression model strongly depends on appropriately choosing the shrinkage parameter. A choice of shrinkage parameter that is too small leads to overfitting the GRE, while a shrinkage parameter that is too large shrinks by too much, making a bias-variance tradeoff. Particle swarm optimization (PSO) is a nature- inspired metaheuristic algorithm that was originally proposed by Kennedy and Eberhart (1995) for solving continuous optimization problems.

PSO is inspired by the social or collective behavior of animals such as bird flocking and fish schooling. PSO, when compared with other computation intelligence-based algorithms, has several advantages, such as simple implementation, computationally higher efficiency, fewer parameters to tune, scalability and flexibility, robustness. For instance, compared with the genetic algorithm, there is no crossover and mutation genetic operation (Chen et al., 2014,Kiran, 2017,Lin et al., 2008,Lu et al., 2009,Zhou and Dickerson, 2014).

PSO performs the searching using a population, which is called swarm, of particles. Each particle has three features: (1) position, (2) velocity, and (3) fitness value. In PSO, each particle can be represented as a candidate solution (position) in the search space. The particles fly through the search space by their own efforts and in cooperation with other particles, and they follow the best solutions they have achieved (local best solutions), as well as tracking the best solutions that they found (the best global solution) (Cervantes et al., 2017,Lai et al., 2016,Mirjalili and Lewis, 2013,Wen et al., 2011).

Mathematically, the search space is assumed to be D-dimensional and there are particles in the swarm where . During the movement, each particle has a position vector with a velocity vector . In the PSO algorithm, the best position, which gives the best fitness value for the particle , is called best previous position and is denoted as . The best position found by all particles is called the global best, which is denoted as . In each iteration, the PSO algorithm searches for the optimal solution by updating the position and the velocity of the particle according to the following two equations:

(24)

(25)

where denotes the iteration in the algorithm, and is the inertia weight which is used to balance between the global search and the local search. In addition, (the cognition learning factor) and (social learning factor) are the acceleration coefficients. and are random values selected from a uniform distribution with (0,1). In this paper, a PSO algorithm is proposed to determine the shrinkage parameter matrix. The proposed method will efficiently help to reduce the MSE. The parameter configurations for our proposed method are presented as follows.

1- The number of particles, , is set to 50, and the number of iterations is . The acceleration coefficients and are set within the range [2, 4]. The and are updated during the iteration according to the following equations:

(26)

(27)

Further, the minimum and maximum values for the inertial weight are and . The inertial weight is updated according to the following equation:

(28)

2- The positions of each particle are randomly determined. The position of a particle represents the shrinkage parameters, . Here, the dimension of each particle is the number of explanatory variables. The initial positions of the particles are generated from a uniform distribution within the range [0,100].

3- The initial velocities of each particle are generated from a uniform distribution within the range [0, 4].

4- The fitness function which is the MSE as indicated in Eq (10) is calculated.

5- The velocities and positions are updated using Eq. (22) and Eq. (23), respectively.

6- Steps 4 and 5 are repeated until a is reached.

Monte Carlo simulation results

In this section, a comprehensive simulation study was conducted to evaluate the performance of the proposed methods (Alheety and Kibria, 2014,Alkhamisi and Shukur, 2007,Hoerl and Kennard, 1970,Yang and Emura, 2017,Hocking, Speed and Lynn, 1976,Troskie and Chalton, 1996,Firinguetti, 1999,Batah, Ramanathan and Gore, 2008,Al-Hassan, 2010,Dorugade and Kashid, 2010,Månsson, Shukur and Golam Kibria, 2010,Asar, Karaibrahimoğlu and Genç, 2014,Bhat and Raju, 2017,Bhat et al., 2016,Bhat, 2016,Hocking, 1976). Following McDonald and Galarneau (1975), the explanatory variables with different degrees of multicollinearity are generated by

(29)

where represents the correlation between the explanatory variables, and are independent standard normal pseudo-random numbers. The response variable is generated by

(30)

where are independent and identically normal distributed pseudo-random numbers with zero mean and variance . Because the sample size has direct impact on the prediction accuracy, three representative values of the sample size are considered: 30, 50 and 150. In addition, the number of explanatory variables is considered as . Further, because we are interested in the effect of multicollinearity, in which the degrees of correlation are considered more important, three values of the pairwise correlation are considered with . Besides, the value of is 1.

For a combination of these different values of , the generated data are repeated 5000 times and the averaged mean squared errors (MSE) is calculated as

(31)

where is the ridge estimator obtained by the method. In addition, the bias is calculated as

(32)

The MSE and bias values from the Monte Carlo simulation study are reported in Tables 1 – 3. Several results can be obtained as follows:

1- The simulation results indicate that the PSO method of selecting is superior to the other used selection methods for all combinations of , and in terms of MSE. We can see that the PSO method has smaller MSE and significantly lower MSE than others.

2- It is seen from Tables 1 – 3 that the estimator using the PSO method is usually more efficient than the OLS estimator for all values of and when multicollinearity is high or severe.

3- In terms of values, there is an increase in the MSE values when the correlation degree increases regardless of the value of and .

4- Regarding the number of explanatory variables, it is easily seen that there is a negative impact of MSE, where there is an increase in their values when the increases from four explanatory variables to twelve explanatory variables.

5- With respect to the value of , the MSE values decrease when increases, regardless of the value of and .

6- All the selection methods of are superior to the OLS estimator in terms of MSE.

7- According to the bias results, it was clearly seen that the proposed methods yielded smallest bias among the other estimating methods.

Table 1: Average MSE when n=30

Table 3: Average MSE when n=150

Table 4: Average bias when n=30

Table 5: Average bias when n=50

Table 6: Average bias when n=150

Real application results

To evaluate the predictive performance of the proposed method and to compare its performance with the other methods used in a real data application, the Portland cement dataset is employed. The Portland cement dataset became a standard dataset to examine and to remedy multicollinearity (Woods et al., 1932,Chen and Emura, 2017). It was widely used by numerous researchers. This dataset comes from an experimental investigation of heat evolved during the setting and hardening of Portland cements of varied composition and the dependence of this heat on the percentages of four compounds in the clinkers from which the cement was produced. There are 13 observations of heat evolved in calories per gram of cement ( ), tricalcium aluminate ( ), tetracalcium silicate ( ), tetracalcium alumino ferrite ( ), and dicalcium silicate ( ). Before fitting the linear regression model, the explanatory variables and the response variable are standardized. Then, eigenvalues of matrix are calculated with , , , and resulting in a condition number . Thus, multicollinearity exists. As a result, using the RE and the GRE will be more suitable than the OLS. The predictive performance for each method used is computed using the predicted MSE, , and the results are given in Table 7. It is apparent from Table 4 that there is an improvement of the predictive capability of PSO compared with the other methods used, where PSO significantly reduces the the PMSE. The reduction of MSE using PSO was 11.350%, 10.892%, 10.625%, 10.552%, 10.007%, 10.888%, 9.842%, 11.101%, 9.842%, 11.101%, 9.843%, 10.969%, 9.945%, 10.056%, 10.431%, and 10.213% compared with OLS, HK, KN, TC, f, HSL, AH, D, SB, DK, SV1, SV2, AS , and M, respectively.

Table 7: Real application results for the used methods

Method	PMSE
OLS	9303.049
PSO	8247.127
HK	9255.305
KN	9227.561
TC	9220.055
f	9164.227
HSL	9254.874
AH	9147.469
D	9277.002
SB	9147.553
DK	9263.284
SV1	9157.882
SV2	9169.254
AS	9207.598
M	9185.278

Conclusion

References

References
Alheety M and Kibria BG. A generalized stochastic restricted ridge regression estimator. Communications in Statistics-Theory and Methods. 2014; 43: 4415-4427.
Alkhamisi MA and Shukur G. A Monte Carlo study of recent ridge parameters. Communications in Statistics—Simulation and Computation®. 2007; 36: 535-547.
Dorugade A. New ridge parameters for ridge regression. Journal of the Association of Arab Universities for Basic and Applied Sciences. 2014; 15: 94-99.
Hoerl AE and Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970; 12: 55-67.
Asar Y and Genç A. New shrinkage parameters for the Liu-type logistic estimators. Communications in Statistics - Simulation and Computation. 2015; 45: 1094-1103.
Algamal ZY. Shrinkage parameter selection via modified cross-validation approach for ridge regression model. Communications in Statistics - Simulation and Computation. 2018: 1-9.
Khalaf G and Shukur G. Choosing ridge parameter for regression problems. Communications in Statistics - Theory and Methods. 2005; 34: 1177-1182.
Allen DM. The relationship between variable selection and data agumentation and a method for prediction. technometrics. 1974; 16: 125-127.
Muniz G and Kibria BMG. On some ridge regression estimators: An empirical comparisons. Communications in Statistics - Simulation and Computation. 2009; 38: 621-630.
Kibria BMG. Performance of some new ridge regression estimators. Communications in Statistics - Simulation and Computation. 2003; 32: 419-435.
Hamed R,Hefnawy AEL and Farag A. Selection of the ridge parameter using mathematical programming. Communications in Statistics - Simulation and Computation. 2013; 42: 1409-1432.
Alkhamisi M,Khalaf G and Shukur G. Some modifications for choosing ridge parameters. Communications in Statistics - Theory and Methods. 2006; 35: 2005-2020.
Hefnawy AE and Farag A. A combined nonlinear programming model and Kibria method for choosing ridge parameter regression. Communications in Statistics - Simulation and Computation. 2014; 43: 1442-1470.
Yang S-P and Emura T. A Bayesian approach with generalized ridge estimation for high-dimensional regression and testing. Communications in Statistics-Simulation and Computation. 2017; 46: 6083-6105.
Loesgen K. A generalization and Bayesian interpretation of ridge-type estimators with good prior means. Statistical Papers. 1990; 31: 147.
Trenkler G and Toutenburg H. Mean squared error matrix comparisons between biased estimators—An overview of recent results. Statistical Papers. 1990; 31: 165.
Ohtani K. Generalized ridge regression estimators under the LINEX loss function. Statistical Papers. 1995; 36: 99-110.
Hocking RR,Speed F and Lynn M. A class of biased estimators in linear regression. Technometrics. 1976; 18: 425-437.
Nomura M. On the almost unbiased ridge regression estimator. Communications in Statistics-Simulation and Computation. 1988; 17: 729-743.
Troskie C and Chalton D. Multidimensional statistical analysis and theory of random matrices, Proceedings of the Sixth Lukacs Symposium, eds. Gupta, AK and VL Girko1996, pp 273-292.
Firinguetti L. A generalized ridge regression estimator and its finite sample properties: A generalized ridge regression estimator. Communications in Statistics-Theory and Methods. 1999; 28: 1217-1229.
Batah FSM,Ramanathan TV and Gore SD. The efficiency of modified jackknife and ridge type regression estimators: a comparison. Surveys in Mathematics & its Applications. 2008; 3.
Al-Hassan YM. Performance of a new ridge regression estimator. Journal of the Association of Arab Universities for Basic and Applied Sciences. 2010; 9: 23-26.
Dorugade A and Kashid D. Alternative method for choosing ridge parameter for regression. Applied Mathematical Sciences. 2010; 4: 447-456.
Månsson K,Shukur G and Golam Kibria B. A simulation study of some ridge regression estimators under different distributional assumptions. Communications in Statistics-Simulation and Computation. 2010; 39: 1639-1670.
Asar Y,Karaibrahimoğlu A and Genç A. Modified ridge regression parameters: A comparative Monte Carlo study. Hacettepe Journal of Mathematics and Statistics. 2014; 43: 827-841.
Bhat S and Raju V. A class of generalized ridge estimators. Communications in Statistics-Simulation and Computation. 2017; 46: 5105-5112.
Kennedy J and Eberhart RC. Particle swarm optimization. Proceedings of IEEE Conference on Neural Network. 1995; 4: 1942–1948.
Chen K-H,Wang K-J,Wang K-M and Angelia M-A. Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Applied Soft Computing. 2014; 24: 773-780.
Kiran MS. Particle swarm optimization with a new update mechanism. Applied Soft Computing. 2017; 60: 670-678.
Lin S-W,Ying K-C,Chen S-C and Lee Z-J. Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Systems with Applications. 2008; 35: 1817-1824.
Lu Y,Wang S,Li S and Zhou C. Particle swarm optimizer for variable weighting in clustering high-dimensional data. Machine Learning. 2009; 82: 43-70.
Zhou W and Dickerson JA. A novel class dependent feature selection method for cancer biomarker discovery. Computers in Biology and Medicine. 2014; 47: 66-75.
Cervantes J,Garcia-Lamont F,Rodriguez L,López A,Castilla JR and Trueba A. PSO-based method for SVM classification on skewed data sets. Neurocomputing. 2017; 228: 187-197.
Lai C-M,Yeh W-C and Chang C-Y. Gene selection using information gain and improved simplified swarm optimization. Neurocomputing. 2016; 218: 331-338.
Mirjalili S and Lewis A. S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization. Swarm and Evolutionary Computation. 2013; 9: 1-14.
Wen JH,Zhong KJ,Tang LJ,Jiang JH,Wu HL,Shen GL and Yu RQ. Adaptive variable-weighted support vector machine as optimized by particle swarm optimization algorithm with application of QSAR studies. Talanta. 2011; 84: 13-18.
Bhat S,Vidya R and Parameshwar VP. Maximum Likelihood Estimation of Parameters in a Mixture Model. Communications in Statistics-Simulation and Computation. 2016; 45: 1776-1784.
Bhat SS. A comparative study on the performance of new ridge estimators. Pakistan Journal of Statistics and Operation Research. 2016; 12: 317-325.
Hocking RR. A Biometrics invited paper. The analysis and selection of variables in linear regression. Biometrics. 1976; 32: 1-49.
McDonald GC and Galarneau DI. A Monte Carlo evaluation of some ridge-type estimators. Journal of the American Statistical Association. 1975; 70: 407-416.
Woods H,Steinour HH and Starke HR. Effect of composition of Portland cement on heat evolved during hardening. Industrial & Engineering Chemistry. 1932; 24: 1207-1214.
Chen A-C and Emura T. A modified Liu-type estimator with an intercept term under mixture experiments. Communications in Statistics-Theory and Methods. 2017; 46: 6645-6667.

Statistics

Article View: 358

PDF Download: 277