Shrinkage estimators in inverse Gaussian regression model: Subject review. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
IRAQI JOURNAL OF STATISTICAL SCIENCES | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Article 5, Volume 19, Issue 1, June 2022, Pages 46-53 PDF (668.03 K) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Document Type: Review Paper | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
DOI: 10.33899/iqjoss.2022.174331 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Authors | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Farah Abd ulghani* ; Rafal Al-Hamdani | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Department of Informatics & Statistic, College of Computer & Mathematical Science, University of Mosul, Mosul, Iraq | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The presence of the high correlation among predictors in regression modeling has undesirable effects on the regression estimating. There are several available biased methods to overcome this issue. The inverse Gaussian regression model (IGRM) is a special model from the generalized linear models. The IGRM is a well-known model in research application when the response variable under the study is skewed data. Numerous biased estimators for overcoming the multicollinearity in IGRM have been proposed in the literature using different theories. An overview of recent biased methods for IGRM is provided. A comparison among these biased estimators allows us to gain an insight into their performance. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Highlights | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In this paper, we presented a thorough review of literature regarding the biased estimators in inverse Gaussian regression model when the multicollinearity is existing. According to real data application, the two-parameter estimator has better performance than IGRM, IGRR, IGLE, and IGLT, in terms of MSE. In conclusion, the use of the two-parameter estimator is recommended when multicollinearity is present in the inverse Gaussians regression model. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Keywords | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Multicollinearity; biased estimator; inverse Gaussian regression model; Monte Carlo simulation | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Full Text | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. Introduction The inverse Gaussian regression model (IGRM) has been widely used in industrial engineering, life testing, reliability, marketing, and social sciences [1-7]. “Specifically, IGRM is used when the response variable under the study is positively skewed [8-10]. When the response variable is extremely skewness, the IGRM is preferable than gamma regression model [11]. In dealing with the IGRM, it is assumed that there is no correlation among the explanatory variables [12-32]. In practice, however, this assumption often not holds, which leads to the problem of multicollinearity. In the presence of multicollinearity, when estimating the regression coefficients for IGRM using the maximum likelihood (ML) method, the estimated coefficients are usually become unstable with a high variance, and therefore low statistical significance [33]. Numerous remedial methods have been proposed to overcome the problem of multicollinearity [34-38]. The ridge regression method [39] has been consistently demonstrated to be an attractive and alternative to the ML estimation method. Ridge regression is a biased method that shrinks all regression coefficients toward zero to reduce the large variance [40]. This done by adding a positive amount to the diagonal of . As a result, the ridge estimator is biased but it guaranties a smaller mean squared error than the ML estimator. In linear regression, the ridge estimator is defined as
where is an vector of observations of the response variable, is an known design matrix of explanatory variables, is a vector of unknown regression coefficients, is the identity matrix with dimension , and represents the ridge parameter (shrinkage parameter). The ridge parameter, , controls the shrinkage of toward zero. The OLS estimator can be considered as a special estimator from Eq. (1) with . For larger value of , the estimator yields greater shrinkage approaching zero [39, 41].
The inverse Gaussian distribution is a continuous distribution with two positive parameters: location parameter, , and scale parameter, , denoted as . Its probability density function is defined as
The mean and variance of this distribution are, respectively, and . Inverse Gaussian regression model is considered a member of the generalized linear models (GLM) family, extending the ideas of linear regression to the situation where the response variable is following the inverse Gaussian distribution. Following the GLM methodology, Eq. (1) can re-write in terms of exponential family function as
where and . Here, represents the dispersion parameter and represents the canonical link function. In GLM, a monotonic and differentiable link function connects the mean of the response variable with the linear predictor , where is the ith row of and is a vector of unknown regression coefficients. Because depends on and the mean of the response variable is a function of , then . Related to the IGR, the . Another possible link function for the IGRM is log link function, . The model estimation of the IGRM is based on the maximum likelihood method (ML). The log likelihood function of the IGRM under the canonical link function is defined as
The ML estimator is then obtained by computing the first derivative of the Eq. (3) and setting it equal to zero, as
Unfortunately, the first derivative cannot be solved analytically because Eq. (4) is nonlinear in . The iteratively weighted least squares (IWLS) algorithm or Fisher-scoring algorithm can be used to obtain the ML estimators of the IGRM parameters. In each iteration, the parameters are updated by
where and are and evaluated at , respectively. The final step of the estimated coefficients is defined as
where , , is a vector where ith element equals to , and . The covariance matrix of equals
and the MSE equals
where is the eigenvalue of the matrix and the dispersion parameter, , is estimated by [42]
In the presence of multicollinearity, the matrix becomes ill-conditioned leading to high variance and instability of the ML estimator of the IGRM parameters. As a remedy, Månsson and Shukur [43] proposed the IGR ridge estimator (IGRR) as
where . The ML estimator can be considered as a special estimator from Eq. (11) with . Regardless of value, the MSE of the is smaller than that of because the MSE of is equal to [33]
where is defined as the jth element of and is the eigenvector of the matrix. Comparing with the MSE of Eq. (9), is always small for .
Another popular biased estimator which is known as Liu estimator has been adopted in Poisson regression model. The inverse Gaussian Liu estimator (IGLE) is defined as
where . Regardless of value, the MSE of the is smaller than that of because the MSE of is equal to [33]
Alternative to Liu estimator, the Liu-type estimator was proposed by Liu [44] to overcome the problem of severe multicollinearity. The inverse Gaussian Liu-type estimator (IGLT) is defined as
where and . In Eq. (15), the parameter can be used totally to control the conditioning of . After the reduction of is reach a desirable level, then the expected bias that is generated can be corrected with the so-called bias correction parameter, [45-49]. Liu [44] proved that, in terms of MSE, the Liu-type estimator has superior properties over ridge estimator. The MSE of is defined as
Following Asar and Genç [50] and Huang and Yang [51] the two-parameter estimator in linear regression model is defined as:
where and . For IGRM, the two-parameter estimator (IGTP) is defined as:
It is obviously noted that the is a combination of two different estimators IGRR and IGLE. Furthermore, if , Eq. (18) will be the while if , Eq. (18) will be the . Besides, when , then Eq. (18) will equal . In terms of MSE, the two-parameter estimator has superior properties over ML estimator. The MSE of is defined as
To demonstrate the usefulness of the shrinkage estimators in real application, we present here a chemistry dataset with , where represents the number of imidazo[4,5-b] pyridine derivatives, which are used as anticancer compounds. While denotes the number of molecular descriptors, which are treated as explanatory variables [52]. The response of interest is the biological activities (IC50). Quantitative structure-activity relationship (QSAR) study has become a great deal of importance in chemometrics. The principle of QSAR is to model several biological activities over a collection of chemical compounds in terms of their structural properties [53]. Consequently, using of regression model is one of the most important tools for constructing the QSAR model. First, to check whether the response variable belongs to the inverse Gaussian distribution, a Chi-square test is used. The result of the test equals to 5.2762 with p-value equals to 0.2601. It is indicated form this result that the inverse Gaussian distribution fits very well to this response variable. That is, the following model is set
Second, to check whether there is a relationship among the explanatory variables or not, Figure 1 displays the correlation matrix among the 15 explanatory variables. It is obviously seen that there are correlations greater than 0.90 among MW, SpMaxA_D, and ATS8v ( ), between SpMax3_Bh(s) and ATS8v ( ), and between Mor21v with Mor21e ( ). Third, to test the existence of multicollinearity after fitting the inverse Gaussian regression model using log link function and the estimated dispersion parameter is 0.00103, the eigenvalues of the matrix are obtained as , , , , , , , , , , , , , , and . The determined condition number of the data is 40383.035 indicating that the severe multicollinearity issue is exist. The estimated inverse Gaussian regression coefficients and the estimated theoretical MSE values for the MLE, and the used estimators are listed in Table 1”. According to Table 1, it is clearly seen that the IGTP has MSE values less than the MSE of the IGRM, in general. Moreover, the MSE of the IGTP estimator is the lowest among all estimators. Specifically, it can be seen that the MSE of IGTP estimator was about 44.24%, 39.17%, 32.62%, and 12.11% lower than that of IGRM, IGRR, IGLE, and IGLT, respectively.
Figure 1. Correlation matrix among the 15 explanatory variables of the real data.
Table 1: The estimated coefficients and MSE values of the used estimators
In this paper, we presented a thorough review of literature regarding the biased estimators in inverse Gaussian regression model when the multicollinearity is existing. According to real data application, the two-parameter estimator has better performance than IGRM, IGRR, IGLE, and IGLT, in terms of MSE. In conclusion, the use of the two-parameter estimator is recommended when multicollinearity is present in the inverse Gaussians regression model. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
References | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8. References 1. Bhattacharyya, G.K. and A. Fries, Inverse Gaussian regression and accelerated life tests. Lecture Notes-Monograph Series, 1982. 2: p. 101-117. 2. Ducharme, G.R., Goodness-of-fit tests for the inverse Gaussian and related distributions. Test, 2001. 10(2): p. 271-290. 3. Folks, J.L. and A.S. Davis, Regression models for the inverse Gaussian distribution. Statistical Distributions in Scientific Work, 1981. 4(1): p. 91-97. 4. Fries, A. and G.K. Bhattacharyya, Optimal design for an inverse Gaussian regression model. Statistics & probability letters, 1986. 4(6): p. 291-294. 5. Heinzl, H. and M. Mittlböck, Adjusted R2 Measures for the Inverse Gaussian Regression Model. Computational Statistics, 2002. 17(4): p. 525-544. 6. Lemeshko, B.Y., et al., Inverse Gaussian model and its applications in reliability and survival analysis, in Mathematical and Statistical Models and Methods in Reliability. 2010, Springer. p. 433-453. 7. Malehi, A.S., F. Pourmotahari, and K.A. Angali, Statistical models for the analysis of skewed healthcare cost data: A simulation study. Health Economics Review, 2015. 5: p. 1-11. 8. Babu, G.J. and Y.P. Chaubey, Asymptotics and bootstrap for inverse Gaussian regression. Annals of the Institute of Statistical Mathematics, 1996. 48(1): p. 75-88. 9. Chaubey, Y.P., Estimation in inverse Gaussian regression: comparison of asymptotic and bootstrap distributions. Journal of Statistical Planning and Inference, 2002. 100(2): p. 135-143. 10. Wu, L. and H. Li, Variable selection for joint mean and dispersion models of the inverse Gaussian distribution. Metrika, 2011. 75(6): p. 795-808. 11. De Jong, P. and G.Z. Heller, Generalized linear models for insurance data. Vol. 10. 2008: Cambridge University Press Cambridge. 12. Wan, A.T.K., A note on almost unbiased generalized ridge regression estimator under asymmetric loss. Journal of Statistical Computation and Simulation, 2007. 62(4): p. 411-421. 13. Akdeniz, F. and S. Kaçiranlar, On the almost unbiased generalized liu estimator and unbiased estimation of the bias and mse. Communications in Statistics - Theory and Methods, 1995. 24(7): p. 1789-1797. 14. El Karoui, N., On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probability Theory and Related Fields, 2017. 170(1-2): p. 95-175. 15. Fallah, R., M. Arashi, and S.M.M. Tabatabaey, On the ridge regression estimator with sub-space restriction. Communications in Statistics - Theory and Methods, 2017. 46(23): p. 11854-11865. 16. Huang, H., J. Wu, and W. Yi, On the restricted almost unbiased two-parameter estimator in linear regression model. Communications in Statistics - Theory and Methods, 2016. 46(4): p. 1668-1678. 17. KaÇiranlar, S. and I. Dawoud, On the performance of the Poisson and the negative binomial ridge predictors. Communications in Statistics - Simulation and Computation, 2017: p. 0-0. 18. Li, Y. and H. Yang, On the Performance of the Jackknifed Modified Ridge Estimator in the Linear Regression Model with Correlated or Heteroscedastic Errors. Communications in Statistics - Theory and Methods, 2011. 40(15): p. 2695-2708. 19. Lopez Martin, M.D.M., et al., On the Selection of the Ridge and Raise Factors. Indian Journal of Science and Technology, 2017. 10(13): p. 1-8. 20. Månsson, K., On ridge estimators for the negative binomial regression model. Economic Modelling, 2012. 29(2): p. 178-184. 21. Månsson, K., B.M.G. Kibria, and G. Shukur, On Liu estimators for the logit regression model. Economic Modelling, 2012. 29(4): p. 1483-1488. 22. Månsson, K. and G. Shukur, On Ridge Parameters in Logistic Regression. Communications in Statistics - Theory and Methods, 2011. 40(18): p. 3366-3381. 23. Muniz, G. and B.M.G. Kibria, On Some Ridge Regression Estimators: An Empirical Comparisons. Communications in Statistics - Simulation and Computation, 2009. 38(3): p. 621-630. 24. Nomura, M., On exact small sample properties of the minimax generalized ridge regression estimators. Communications in Statistics - Simulation and Computation, 1988. 17(4): p. 1213-1229. 25. Ohtani, K., On small sample properties of the almost unbiased generalized ridge estimator. Communications in Statistics - Theory and Methods, 1986. 15(5): p. 1571-1578. 26. Segerstedt, B., On ordinary ridge regression in generalized linear models. Communications in Statistics - Theory and Methods, 1992. 21(8): p. 2227-2246. 27. Şiray, G.Ü., S. Toker, and S. Kaçiranlar, On the Restricted Liu Estimator in the Logistic Regression Model. Communications in Statistics - Simulation and Computation, 2014. 44(1): p. 217-232. 28. Toker, S. and S. Kaçıranlar, On the performance of two parameter ridge estimator under the mean square error criterion. Applied Mathematics and Computation, 2013. 219(9): p. 4718-4728. 29. van Wieringen, W.N., On the mean squared error of the ridge estimator of the covariance and precision matrix. Statistics & Probability Letters, 2017. 123: p. 88-92. 30. Varathan, N. and P. Wijekoon, On the Restricted Almost Unbiased Ridge Estimator in Logistic Regression. Open Journal of Statistics, 2016. 06(06): p. 1076-1084. 31. Wu, J., On the predictive performance of the almost unbiased Liu estimator. Communications in Statistics - Theory and Methods, 2015. 45(17): p. 5193-5203. 32. Wu, J. and H. Yang, On the Principal Component Liu-type Estimator in Linear Regression. Communications in Statistics - Simulation and Computation, 2014. 44(8): p. 2061-2072. 33. Kibria, B.M.G., K. Månsson, and G. Shukur, A Simulation Study of Some Biasing Parameters for the Ridge Type Estimation of Poisson Regression. Communications in Statistics - Simulation and Computation, 2015. 44(4): p. 943-957. 34. Baksalary, J.K., P.R. Pordzik, and G. Trenkler, A note on generalized ridge estimators. Communications in Statistics - Theory and Methods, 1990. 19(8): p. 2871-2877. 35. Firinguetti, L., H. Rubio, and Y.P. Chaubey, A non stochastic ridge regression estimator and comparison with the James-Stein estimator. Communications in Statistics - Theory and Methods, 2016. 45(8): p. 2298-2310. 36. Lazaridis, A., A Note Regarding the Condition Number: The Case of Spurious and Latent Multicollinearity. Quality & Quantity, 2007. 41(1): p. 123-135. 37. Özkale, M.R. and S. Kaçıranlar, A Prediction-Oriented Criterion for Choosing the Biasing Parameter in Liu Estimation. Communications in Statistics - Theory and Methods, 2007. 36(10): p. 1889-1903. 38. Shen, X., et al., A novel generalized ridge regression method for quantitative genetics. Genetics, 2013. 193(4): p. 1255-68. 39. Hoerl, A.E. and R.W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 1970. 12(1): p. 55-67. 40. Asar, Y. and A. Genç, New shrinkage parameters for the Liu-type logistic estimators. Communications in Statistics - Simulation and Computation, 2015. 45(3): p. 1094-1103. 41. Algamal, Z.Y. and M.H. Lee, Penalized Poisson Regression Model using adaptive modified Elastic Net Penalty. Electronic Journal of Applied Statistical Analysis, 2015. 8(2): p. 236-245. 42. Uusipaikka, E., Confidence intervals in generalized regression models. 2009, NW: Chapman & Hall/CRC Press. 43. Månsson, K. and G. Shukur, A Poisson ridge regression estimator. Economic Modelling, 2011. 28(4): p. 1475-1481. 44. Liu, K., Using Liu-type estimator to combat collinearity. Communications in Statistics-Theory and Methods, 2003. 32(5): p. 1009-1020. 45. Alheety, M.I. and B.M. Golam Kibria, Modified Liu-Type Estimator Based on (r − k) Class Estimator. Communications in Statistics - Theory and Methods, 2013. 42(2): p. 304-319. 46. Kibria, B.M.G. and A.K.M.E. Saleh, Preliminary test ridge regression estimators with student?s t errors and conflicting test-statistics. Metrika, 2004. 59(2): p. 105-124. 47. Norouzirad, M. and M. Arashi, Preliminary test and Stein-type shrinkage ridge estimators in robust regression. Statistical Papers, 2017. 48. Wu, J., Preliminary test Liu-type estimators based on W, LR, and LM test statistics in a regression model. Communications in Statistics - Simulation and Computation, 2016. 46(9): p. 6760-6771. 49. Wu, J., Modified Restricted Almost Unbiased Liu Estimator in Linear Regression Model. Communications in Statistics - Simulation and Computation, 2014. 45(2): p. 689-700. 50. Asar, Y. and A. Genç, A New Two-Parameter Estimator for the Poisson Regression Model. Iranian Journal of Science and Technology, Transactions A: Science, 2017. 51. Huang, J. and H. Yang, A two-parameter estimator in the negative binomial regression model. Journal of Statistical Computation and Simulation, 2014. 84(1): p. 124-134. 52. Algamal, Z.Y., et al., High-dimensional QSAR prediction of anticancer potency of imidazo[4,5-b]pyridine derivatives using adjusted adaptive LASSO. Journal of Chemometrics, 2015. 29(10): p. 547-556. 53. Algamal, Z.Y. and M.H. Lee, A novel molecular descriptor selection method in QSAR classification model based on weighted penalized logistic regression. Journal of Chemometrics, 2017. 31(10): p. e2915. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Statistics Article View: 316 PDF Download: 310 |