Estimating the Parameters of Mixture Gamma Distributions Using Maximum Likelihood and Bayesian Method | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
IRAQI JOURNAL OF STATISTICAL SCIENCES | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Volume 21, Issue 1, June 2024, Pages 138-150 PDF (700.86 K) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Document Type: Research Paper | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
DOI: 10.33899/iqjoss.2024.183254 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Authors | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nagham Ibrahim Abdulla Najm^{*} ^{1}; Raya Salim Al_Rassam^{2} | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
^{1}Nagham Ibrahim Abdulla Najm Department of Statistics and Informatics, College of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
^{2}Department of Statistics and Informatics, College of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This paper focuses on the mixture Gamma distribution and uses the maximum likelihood and Bayesian techniques to estimate its parameters. This study uses Expectation Maximization Algorithm (EM) to find the maximum likelihood estimators and the random Metropolis-Hastings algorithm is used to simulate the Bayesian estimates of the parameters of mixture gamma distribution. then these estimates are compared by using the sum of the modulus of the bias (MBias), and the root-mean square error (RMSE). It has been shown that the Bayesian estimator is better than the maximum likelihood estimator. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Highlights | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1- Acknowledgment
The authors are sincerely grateful to the University of Mosul and College of Computer Sciences and Mathematics for their provided facilities, which helped me very much to improve this work's quality.
The authors have no conflict of interest. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Keywords | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gamma distribution; Mixture distribution; Bayesian estimation; Likelihood function; Expectation Maximization Algorithm; Metropolis-Hastings | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Full Text | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A random variable is always considered as a sample from a distribution. This may be well-known distribution or not. Some random variables are drawn from one single distribution , such as the normal distribution but this is not always so easy because in real-life the random variables might have been generated from a mixture of several distributions. In studying mixture distributions the formula of this distribution have been difficult then it is used some algorithms to facilitate finding the estimators , where EM algorithm is used to find the maximum likelihood estimators and the metropolis Hastings algorithm to find the Bayesian estimators . if the distribution is an exponential family , with density ,then a conjugate prior distribution for exists and the prior distribution is conjugate to the likelihood of the exponential family , see (Bernardo,2009). Many authors considered estimating the parameters of the mixture distributions. For example, (Newcomb ,1886) suggested an iterative reweighting scheme that can be viewed as an application of the EM algorithm of (Dempster et al. ,1977) to compute the common mean of a mixture in known proportions of a finite number of univariate normal distributions with known variances. (Jewell ,1982) provided maximum likelihood estimates of mixture of exponential distributions using EM algorithm..( Li L.A., 1983) quoted several features of mixture models and defined two types of mixture models. If the component distributions of a mixture belong to same family, their mixture is known as a type-I mixture model. Whereas, a type-II mixture model is defined as the component distributions of a mixture belong to different families .. (Upadhyay et. al. ,2002) proposed Bayesian inference in life testing and reliability by using Markov Chain Monte Carlo (MCMC). (Pang et. al. ,2004) used MCMC techniques to carry out a Bayesian estimation procedure using Hirose’s simulated data. (Chojogh,B,et al,2019) presented a research in which he clarified mixture distributions the research include model of the normal mixture distribution and Poisson mixture distribution for tow component and for k-components and estimating the parameters of these model using (EM) algorithm. (“A mixture model for determining SARS-COV-2 variant composition in pooled samples”) presented a research includes a mixture model distributions and apply it to a set of variables SARS-COV-2 the model is built by looking at a pre-defined set of data ,the results showed that these models support these data well. Gamma Distribution It is a type of continuous probability distribution and is used in many fields such as Statistics, Economics, Physics, Computer Science and others, the Gamma distribution can be determined by two parameters, the shape parameter (α) and the scale parameter (β), and the probability density function (pdf) for this distribution is as follows: - (1) where α > 0 , β > 0 and x > 0.
1- Mixture Distribution ModelsIt is the process of analyzing data to determine the best mixture model that can be used to describe the observed data. Mixture models consist of several different probability distributions and are characterized by their ability to represent the distribution of data more accurately than single models. Every random variable can be considered as a sample from a distribution, . Some random variables are drawn from one single distribution, such as a normal distribution. But life is not always so easy! Most of real-life random variables might have been generated from a mixture of several distributions and not a single distribution. Random variables usually come from only one distribution, like (gamma distribution or normal distribution), but in real life there are some variables that come from several mixture distributions and these distribution may be from the same family, i.e. from one family, for example, all of them from the normal distribution, but with different parameters, or these distributions may be different, for example (gamma distribution and normal distribution) together. Let 𝑋_{1}, 𝑋_{2}, 𝑋_{3}, … , 𝑋_{𝑛} be independent random variables and 𝑥_{1}, 𝑥_{2}, 𝑥_{3}, … , 𝑥_{𝑛} the observations of the random variable and the probability density function for the mixture distribution (pdf) containing k of the components can be expressed as follows:- (2) where 𝜆_{𝑗} represents the mixture weights and is 0 < 𝜆_{𝑗} < 1 and and 𝑓_{𝑗}(𝑥|𝜃_{𝑗}) represents the probability density function of the variable ( x) and = (𝜃_{1}, 𝜃_{2}, … 𝜃_{𝑘}) represents the parameters vector of the mixture distribution, and it is worth noting that the parameter θ is treated as a random variable rather than a constant (Tahir & et al, 2016). The mixture gamma distribution of k of components is written as follows:- (3) , , , .
2- SOME METHODS OF ESTIMATE THE PARAMETERS OF MIXTURE DISTRBUTIONMixture distributions are common statistical distributions, which are used in many fields such as data analysis, machine learning, and others, and these distributions depend on the idea of collecting several simple distributions together to produce a complex distribution. And these distributions need to estimate a set of parameters that determine the distribution of mixture data. When we have a sample size n (𝑥_{1}, 𝑥_{3}, 𝑥_{2}, … 𝑥_{𝑛}) are randomly drawn from a known distribution but the distribution parameters are unknown, for example a sample drawn from the normal distribution with unknown parameters (mean and variance), the main objective is to estimate the parameters of this distribution. In this study, we will discuss two methods for estimating parameters of mixture distribution. A- Maximum Likelihood Estimation (MLE):This method is one of the most important methods of point estimation and was proposed by the famous statistician Fisher in 1920, as it assumes that the parameters to be estimated for a particular population is an unknown fixed quantity which estimated based on the sample data. Assume we have a sample with size n, i.e., (𝑥_{1}, 𝑥_{3}, 𝑥_{2}, … 𝑥_{𝑛}) Also assume that we know the distribution from which this sample has been randomly drawn but we do not know the parameters of that distribution. The principle of this method is to find an estimate such as 𝜃^{̂ }for the parameter θ which makes the likelihood function at its maximum value. If 𝑥_{1}, 𝑥_{3}, 𝑥_{2}, … 𝑥_{𝑛} are random variables and these variables have an independent and identically distributed (iid) and size (n) and drawn from a population with a probability density function 𝑓(𝑥|𝜃), the estimator of the likelihood function that makes the likelihood function at its maximum value can be obtained by deriving the likelihood function and equating it to zero. The likelihood function will be as follows:- (4) by used (2) (5) by given log: (6) Then we take the partial derivative of 𝜃_{𝑗} once and for 𝜆_{𝑗} again to get the equation for each parameter, but it will be difficult to solve the equations formed directly because of the presence of addition operations inside the logarithm, so it is necessary to rely on numerical methods and algorithms that use iterative operations in order to reach the maximum likelihood estimator (Friedman & et al, 2009). Expectation Maximization Algorithm (EM):The expectation maximization algorithm (EM) was proposed by (Dempster, Laird & Rubin ,1977) and still to this day, and it is one of the most important methods to find the maximum likelihood estimators in the case of latent variables or missing values. And this algorithm is used in statistics and machine learning to solve problems related to statistical analysis of data such as classification, aggregation and factor analysis (Filho, 2008). For example, assuming the collection of data about a particular disease, where the severity of the disease was not recorded, but the presence or absence of the disease was recorded, i.e. the absence of the disease was expressed by zero, and the presence of the disease was expressed in x> 0, in this case we do not know the values of x, is it 100 or 5 ?, in this case, we cannot use the method of maximum likelihood because there are missing values. The expectation maximization algorithm consists of two steps (Chris & Raftery, 2017): a-Step One: E-StepThis step aims to estimate probability distributions by taking the expectation of the logarithm of the likelihood function in order to find an appropriate estimate of the parameters. here the missing values are treated as constants and not variables (Chojogh et al, 2019). b-Step Two: M-StepThis step aims to determine the optimal values of the parameters using the expectation function in the first step. To estimate mixture Gamma distribution we have the p.d.f of mixture Gamma distributions by used (3)
Taking the logarithm to the above equation we get (7) Optimizing this log-likelihood is difficult because of the summation within the logarithm. However, we can use the indicator parameter for each observation as follows (Corduneanu and Bishop, 2001).
And the probability is: For fixed i and , the probability density function for as the following form: (Saeed,2005)
Since Are independent, we write the joint indicator density as the following form: (8) where denoted the complement data. Therefore we can write the joint pdf of the observation and the indicator as following form:
(9) and the complement data likelihood is given by: (10)
(11) Where The log of the complement data likelihood function is (12) Theis latent or missing value because we do not know whether it be or therefore we used the Expectation Maximization(EM) to estimate the parameters (Sattayatham and Talangtam, 2012). Case 1:E-Step = (13)
(14) The expected complete log-likelihood is
(15)
(16)
Case 2:M-SteP
(17)
(18)
(19)
We solve this equation by Newton’s Raphson method
(20) 2-Bayesian Estimation ApproachIn many cases, it is easy to find a suitable formula for the posterior distribution, but sometimes we may face difficulties in finding posterior distributions, which may require the integration of high- dimensional functions (high-grade functions), so it was necessary to develop methods that facilitate the process of finding posterior distributions and solve this problem, and the most important of these methods is the Markov Chain Monte Carlo (MCMC) where this method was used by researchers in the early 1990s and was widely applied to solve Bayes' problems as it relies on the idea of obtaining a random sample of conditional distributions of parameters . The most commonly used methods of the Markov Chain Monte Carlo (MCMC) are the Gibbs Sampling Algorithm and the Metropolis-Hastings Algorithm, which we will use in this paper. Metropolis - Hastings AlgorithmThe Metropolis-Hastings algorithm is one of the main methods of the (MCMC) the main methods to estimate the parameters of mixture distributions and is used in many scientific and engineering applications, especially in the fields of Statistics and Physics. Let 𝑥_{1}, 𝑥_{3}, 𝑥_{2}, … 𝑥_{𝑛} be identically distributed random (iid) and have a probability density function 𝑓(𝑥|𝜃) and we do not know the posterior distribution of the parameters of this function and suppose that 𝑞(𝜃|𝜃^{′}) is a candidate distribution with the parameter θ', the steps of this algorithm are: (al-masri,2020) Metropolis-Hastings Algorithm Steps:1-Choose an initial value for the parameter 𝜃^{(0)} so that it is close to the parameter values of the real data. 2-Choose the default sample sizes for random variable observations x 3-We make a repetition from i=1,2,…,N
where the numerator represents the value of the proposed parameter compensated in the equation for a conditional distribution. The denominator represents the value estimated by the equation of a conditional distribution.. c- Generate random numbers ui of uniform (0,1).
4- We repeat the previous steps each time by making i=i +1 and go to step 1.1-Posterior When the indicator parameter z_{i} is unknown, for all observation x_{i}, i = 1, 2, . . . , n and the scale parameter a, the shape parameters and the weight parameter λ are known. The conjugate prior p () of is multinomial with hyper parameters (1, λ_{1}, λ_{2}, . . . , λ_{k}). By using Bayes’ theorem, the posterior distribution: (21) Since each takes two values only 1 or 0, then (22) Therefore, the posterior distribution has a multinomial distribution (1, w_{i}_{1}, w_{i}_{2}, . . . , w_{ik}), where i = 1, 2, . . . , n and j = 1, 2, . . . , k.
2- Posterior When the weight parameter is unknown and the scale parameter a and the shape parameters are known. By ignoring terms that contain in (11) the complete data likelihood function is given by: (23) Where is the number of the observations By using (13) (24) The conjugate prior p (λ) is a Dirichlet distribution with hyperparameters µ = (µ_{1}, µ_{2}, . . . , µ_{k)} is given _{ ,} By ignoring terms that contain the posterior distribution is a Dirchlet with hyperparameters ( is given by (25) 3-a_{j} Posterior When the shape parameter a_{j} is unknown, for some j = 1, 2, . . . , k and both the weight parameter λ and the scale parameters are known. By ignoring terms that contain . . , a_{j}_{−}_{1}, a_{j}_{+1}, . . . , a_{k} in (11), the complete data likelihood function is given by: (26) Where The conjugate prior p (a_{j}) is an exponential family with hyper parameters ( is given by (27) The posterior distribution with hyper parameters is given by (28) 4-_{j} Posterior When the scale parameter is unknown, for some j = 1, 2, . . . , k and the shape parameter a, and the weight parameter λ are known. By ignoring terms that contain in (11), the complete data likelihood function is given by: The conjugate prior p () is the gamma distribution with hyper parameters () (29) The posterior distribution is the gamma distribution with hyper parameters () (30) Where
5-Joint Posterior of When the weight parameter λ is known and the shape parameters and the scale parameter are unknown. By ignoring terms that contain λ in (11), the complete data likelihood function is given by: The conjugate prior p () with hyper parameters () is given by (31) By ignoring terms that contain λ, the joint posterior distribution
(32) with the hyper parameters
In this section, a simulation study using Monte Carlo methods in Bayesian method of estimation and EM algorithm in maximum likelihood estimation and compare the efficiency of MLE method with Bayesian method of estimation using by computing the mean of the sum of the modulus of the bias (MBias), and the root-mean square error (RMSE),
The general form of tow-component mixture gamma distribution is given by
The simulation study was written using R language. The simulation study included the following basic stages: First stage: choosing the initial vales as follows: 1-choosing the initial values for the parameters ( , the and (1-) selected randomly from the first and the second component density. 2- choose different sample size( 50, 100, 150) to generate the data set of tow-component mixture gamma distribution with parameters. 3-Repeat the experiment 1000 repetitions for each experiment. 4-choose values for the random variable. Second stage: data generation : A random variable is generated depending on the type of distribution Third stage: estimating the parameters according to the mixture distributions using the estimation methods. Fourth stage: the results compare the efficiency of MLE method with Bayesian method of estimation using by computing the mean of the sum of the modulus of the bias (MBias), and the root-mean square error (RMSE), where the smaller RMSE and MBias indicates a better overall quality of the estimates.
To find the MLE estimators, the Newton Raphson method was adopted. The parameters () are estimated with Metropolis method (MT) of estimation using the joint prior in (31) with hyperparameters (s = 1; m = 1; t = 1) where the simulation study was carried out 1000 times. Table 1 present the estimates (Est.) and the RMSE and MBias values by MLE and MT method. The smaller RMSE and MBias for each sample size is highlighted in bold . Looking at these tables we observe that: we obtained that Metropolis method is uniformly better than MLE in all cases.
Table 1: MBias and RMSE of the MLE estimates and the MT estimators for two component mixture Gamma distribution
4- DiscussionThe parameters are estimated with Metropolis method and the Expectation Maximization algorithm(EM) from the simulation results, it is observed that Bayes estimator better than maximum likelihood estimation in all cases | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
References | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Statistics Article View: 27 PDF Download: 23 |