Generalized ridge estimator shrinkage estimation based on particle swarm optimization algorithm. | |||||||||||||||||||||||||||||||||
IRAQI JOURNAL OF STATISTICAL SCIENCES | |||||||||||||||||||||||||||||||||
Article 4, Volume 17, Issue 2, December 2020, Pages 26-35 PDF (1.15 M) | |||||||||||||||||||||||||||||||||
Document Type: Research Paper | |||||||||||||||||||||||||||||||||
DOI: 10.33899/iqjoss.2020.167387 | |||||||||||||||||||||||||||||||||
Authors | |||||||||||||||||||||||||||||||||
Qamar Abdul kareem* ; Zakariya Algamal | |||||||||||||||||||||||||||||||||
Department of Statistics and Informatics, College of Computer science and Mathematics, University of Mosul, Mosul, Iraq | |||||||||||||||||||||||||||||||||
Abstract | |||||||||||||||||||||||||||||||||
It is well-known that in the presence of multicollinearity, the ridge estimator is an alternative to the ordinary least square (OLS) estimator. Generalized ridge estimator (GRE) is an generalization of the ridge estimator. However, the efficiency of GRE depends on appropriately choosing the shrinkage parameter matrix which is involved in the GRE. In this paper, a particle swarm optimization method, which is a metaheuristic continuous algorithm, is proposed to estimate the shrinkage parameter matrix. The simulation study and real application results show the superior performance of the proposed method in terms of prediction error. | |||||||||||||||||||||||||||||||||
Highlights | |||||||||||||||||||||||||||||||||
In this paper, a new shrinkage parameter selection of the generalized ridge estimator, which depends on employing the particle swarm optimization algorithm, was proposed. This proposed method allows us to handle multicollinearity with decreasing the variability of shrinkage parameter selection. Simulation and results demonstrate that the proposed method outperformed several classical methods. Furthermore, the results proved that the proposed method is more efficient than the method of Hoerl and Kennard (1970). | |||||||||||||||||||||||||||||||||
Keywords | |||||||||||||||||||||||||||||||||
Multicollinearity; shrinkage parameter; generalized ridge estimator; particle swarm | |||||||||||||||||||||||||||||||||
Full Text | |||||||||||||||||||||||||||||||||
Regression modeling is a widely applied strategy for studying several real data problems. In the linear regression model, the response variable is considered as continuous and reasonably assumed to follow the normal distribution. In linear regression models, it is assumed that the correlations among the explanatory variables are not high (Alheety and Kibria, 2014,Alkhamisi and Shukur, 2007,Dorugade, 2014). However, this assumption is not always held in practice. In the linear regression model, the ordinary least squares (OLS) estimator is the best estimator among all linear and unbiased estimators. However, under multicollinearity, the OLS estimator becomes unhelpful due to its large variance. The ridge estimator (RE) (Hoerl and Kennard, 1970) has been consistently demonstrated to be an alternative to the OLS when multicollinearity exists. The RE can shrink all the regression coefficients toward zero to reduce the large variance (Asar and Genç, 2015,Algamal, 2018). The generalized ridge estimator (GRE) has also been considered as a generalization of the RE. The performance of the GRE fully depends on the values of the shrinkage parameter matrix. Accordingly, appropriate choosing the shrinkage parameter matrix is an important part of applying the GRE. Numerous approaches are available in the literature for estimating the shrinkage parameter (Khalaf and Shukur, 2005,Allen, 1974,Muniz and Kibria, 2009,Kibria, 2003,Hamed et al., 2013,Alkhamisi et al., 2006,Hefnawy and Farag, 2014). In recent years, numerous nature-inspired algorithms have been successfully introduced and applied as random search strategies for solving a number of optimization problems. The Particle swarm optimization algorithm is a comparatively recent population-based algorithm that is inspired by swarm intelligence. In this paper, the particle swarm optimization algorithm is proposed to estimate the values of the shrinkage parameter matrix in the GRE. Our proposed approach will efficiently help to find the best values with high prediction accuracy. The superiority of our proposed approach in different simulated examples and a real data application is proved.
2. Generalized ridge estimator Suppose that we have a data set where is a response variable and represents a p-dimensional explanatory variable vector. Without loss of generality, it is assumed that the response variable is centered and the explanatory variables are standardized. Consider the following linear regression model, (1) where is an vector of observations of the response variable, is an known design matrix of explanatory variables, is a vector of unknown regression coefficients, and is an vector of random errors with mean 0 and variance . Using the OLS method, the parameter estimation of Eq. (1) is given by (2) The OLS estimator is unbiased and has minimum variance among all linear unbiased estimators. However, in the presence of multicollinearity, the matrix is nearly singular, which makes the OLS estimator unstable due to the large variance. To reduce the effects of the multicollinearity, the RE (Hoerl and Kennard, 1970), which is the most commonly used method, adds a positive shrinkage parameter, , to the main diagonal of the matrix. The RE is defined as (3)
where is the identity matrix with dimension . The estimator is biased but more stable and has less mean square error. The shrinkage parameter, , controls the shrinkage of toward zero. The OLS estimator can be considered as a special estimator from the RE with . For a large value of , the RE yields greater shrinkage approaching zero (Yang and Emura, 2017). Rewriting Eq. (1) as the canonical form introduced by [4] , we obtain (4) where , is a matrix such that will imply where is a diagonal matrix with the Eigen values of and . Then, the OLS estimator of is given by (5)
Therefore the OLS estimator of is (6 )
The RE estimator given in Eq (3) is rewritten by as (Hoerl and Kennard, 1970) (7)
where . Related to Eq. (3) and Eq.(5), the mean square error (MSE) is (8) The GRE is suggested by [4] to generalize the ridge estimator. Several researchers deal with the GRE (Loesgen, 1990,Trenkler and Toutenburg, 1990,Ohtani, 1995). The difference between the RE and the GRE is that there are values of for the GRE estimator such that (Hoerl and Kennard, 1970) (9) where and . The MSE of the GRE estimator, which is less than when using the RE and OLS, is (10) where (Hoerl and Kennard, 1970). Since the ridge parameter is the key to reducing the multicollinearity, there are several ways to determine this value.Researcherssuggestseveral ways to choose the optimal k, such as (Hocking et al., 1976) (HK), (Nomura, 1988) (N), (Troskie and Chalton, 1996) (TC), (Firinguetti, 1999) (F), (Alkhamisi and Shukur, 2007) (HSL), (Batah et al., 2008) (SB), (Al-Hassan, 2010) (AH), (Dorugade and Kashid, 2010) (DK), (Månsson et al., 2010) (M), (Dorugade, 2014) (D), (Asar et al., 2014) (AS) and (Bhat and Raju, 2017) (SV1,SV2). These methods can be defined as: (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) 3. The proposed methods The efficiency of the ridge regression model strongly depends on appropriately choosing the shrinkage parameter. A choice of shrinkage parameter that is too small leads to overfitting the GRE, while a shrinkage parameter that is too large shrinks by too much, making a bias-variance tradeoff. Particle swarm optimization (PSO) is a nature- inspired metaheuristic algorithm that was originally proposed by Kennedy and Eberhart (1995) for solving continuous optimization problems. PSO is inspired by the social or collective behavior of animals such as bird flocking and fish schooling. PSO, when compared with other computation intelligence-based algorithms, has several advantages, such as simple implementation, computationally higher efficiency, fewer parameters to tune, scalability and flexibility, robustness. For instance, compared with the genetic algorithm, there is no crossover and mutation genetic operation (Chen et al., 2014,Kiran, 2017,Lin et al., 2008,Lu et al., 2009,Zhou and Dickerson, 2014). PSO performs the searching using a population, which is called swarm, of particles. Each particle has three features: (1) position, (2) velocity, and (3) fitness value. In PSO, each particle can be represented as a candidate solution (position) in the search space. The particles fly through the search space by their own efforts and in cooperation with other particles, and they follow the best solutions they have achieved (local best solutions), as well as tracking the best solutions that they found (the best global solution) (Cervantes et al., 2017,Lai et al., 2016,Mirjalili and Lewis, 2013,Wen et al., 2011). Mathematically, the search space is assumed to be D-dimensional and there are particles in the swarm where . During the movement, each particle has a position vector with a velocity vector . In the PSO algorithm, the best position, which gives the best fitness value for the particle , is called best previous position and is denoted as . The best position found by all particles is called the global best, which is denoted as . In each iteration, the PSO algorithm searches for the optimal solution by updating the position and the velocity of the particle according to the following two equations: (24) (25) where denotes the iteration in the algorithm, and is the inertia weight which is used to balance between the global search and the local search. In addition, (the cognition learning factor) and (social learning factor) are the acceleration coefficients. and are random values selected from a uniform distribution with (0,1). In this paper, a PSO algorithm is proposed to determine the shrinkage parameter matrix. The proposed method will efficiently help to reduce the MSE. The parameter configurations for our proposed method are presented as follows. 1- The number of particles, , is set to 50, and the number of iterations is . The acceleration coefficients and are set within the range [2, 4]. The and are updated during the iteration according to the following equations: (26) (27) Further, the minimum and maximum values for the inertial weight are and . The inertial weight is updated according to the following equation: (28) 2- The positions of each particle are randomly determined. The position of a particle represents the shrinkage parameters, . Here, the dimension of each particle is the number of explanatory variables. The initial positions of the particles are generated from a uniform distribution within the range [0,100]. 3- The initial velocities of each particle are generated from a uniform distribution within the range [0, 4]. 4- The fitness function which is the MSE as indicated in Eq (10) is calculated. 5- The velocities and positions are updated using Eq. (22) and Eq. (23), respectively. 6- Steps 4 and 5 are repeated until a is reached.
In this section, a comprehensive simulation study was conducted to evaluate the performance of the proposed methods (Alheety and Kibria, 2014,Alkhamisi and Shukur, 2007,Hoerl and Kennard, 1970,Yang and Emura, 2017,Hocking, Speed and Lynn, 1976,Troskie and Chalton, 1996,Firinguetti, 1999,Batah, Ramanathan and Gore, 2008,Al-Hassan, 2010,Dorugade and Kashid, 2010,Månsson, Shukur and Golam Kibria, 2010,Asar, Karaibrahimoğlu and Genç, 2014,Bhat and Raju, 2017,Bhat et al., 2016,Bhat, 2016,Hocking, 1976). Following McDonald and Galarneau (1975), the explanatory variables with different degrees of multicollinearity are generated by (29) where represents the correlation between the explanatory variables, and are independent standard normal pseudo-random numbers. The response variable is generated by (30) where are independent and identically normal distributed pseudo-random numbers with zero mean and variance . Because the sample size has direct impact on the prediction accuracy, three representative values of the sample size are considered: 30, 50 and 150. In addition, the number of explanatory variables is considered as . Further, because we are interested in the effect of multicollinearity, in which the degrees of correlation are considered more important, three values of the pairwise correlation are considered with . Besides, the value of is 1. For a combination of these different values of , the generated data are repeated 5000 times and the averaged mean squared errors (MSE) is calculated as (31) where is the ridge estimator obtained by the method. In addition, the bias is calculated as (32) The MSE and bias values from the Monte Carlo simulation study are reported in Tables 1 – 3. Several results can be obtained as follows: 1- The simulation results indicate that the PSO method of selecting is superior to the other used selection methods for all combinations of , and in terms of MSE. We can see that the PSO method has smaller MSE and significantly lower MSE than others. 2- It is seen from Tables 1 – 3 that the estimator using the PSO method is usually more efficient than the OLS estimator for all values of and when multicollinearity is high or severe. 3- In terms of values, there is an increase in the MSE values when the correlation degree increases regardless of the value of and . 4- Regarding the number of explanatory variables, it is easily seen that there is a negative impact of MSE, where there is an increase in their values when the increases from four explanatory variables to twelve explanatory variables. 5- With respect to the value of , the MSE values decrease when increases, regardless of the value of and . 6- All the selection methods of are superior to the OLS estimator in terms of MSE. 7- According to the bias results, it was clearly seen that the proposed methods yielded smallest bias among the other estimating methods. Table 1: Average MSE when n=30
Table 3: Average MSE when n=150
Table 4: Average bias when n=30
Table 5: Average bias when n=50
Table 6: Average bias when n=150
Real application results To evaluate the predictive performance of the proposed method and to compare its performance with the other methods used in a real data application, the Portland cement dataset is employed. The Portland cement dataset became a standard dataset to examine and to remedy multicollinearity (Woods et al., 1932,Chen and Emura, 2017). It was widely used by numerous researchers. This dataset comes from an experimental investigation of heat evolved during the setting and hardening of Portland cements of varied composition and the dependence of this heat on the percentages of four compounds in the clinkers from which the cement was produced. There are 13 observations of heat evolved in calories per gram of cement ( ), tricalcium aluminate ( ), tetracalcium silicate ( ), tetracalcium alumino ferrite ( ), and dicalcium silicate ( ). Before fitting the linear regression model, the explanatory variables and the response variable are standardized. Then, eigenvalues of matrix are calculated with , , , and resulting in a condition number . Thus, multicollinearity exists. As a result, using the RE and the GRE will be more suitable than the OLS. The predictive performance for each method used is computed using the predicted MSE, , and the results are given in Table 7. It is apparent from Table 4 that there is an improvement of the predictive capability of PSO compared with the other methods used, where PSO significantly reduces the the PMSE. The reduction of MSE using PSO was 11.350%, 10.892%, 10.625%, 10.552%, 10.007%, 10.888%, 9.842%, 11.101%, 9.842%, 11.101%, 9.843%, 10.969%, 9.945%, 10.056%, 10.431%, and 10.213% compared with OLS, HK, KN, TC, f, HSL, AH, D, SB, DK, SV1, SV2, AS , and M, respectively. Table 7: Real application results for the used methods
In this paper, a new shrinkage parameter selection of the generalized ridge estimator, which depends on employing the particle swarm optimization algorithm, was proposed. This proposed method allows us to handle multicollinearity with decreasing the variability of shrinkage parameter selection. Simulation and results demonstrate that the proposed method outperformed several classical methods. Furthermore, the results proved that the proposed method is more efficient than the method of Hoerl and Kennard (1970). | |||||||||||||||||||||||||||||||||
References | |||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||
Statistics Article View: 358 PDF Download: 277 |