Modeling of Two-Meter Extreme Air Temperature in Iraq Using Extreme-Value Copulas | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
IRAQI JOURNAL OF STATISTICAL SCIENCES | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Volume 20, Issue 2, December 2023, Pages 90-103 PDF (1.56 M) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Document Type: Research Paper | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
DOI: 10.33899/iqjoss.2023.0181192 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Authors | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Manaf Hazim AHMED* ; Zaid Tarik Al-Khaledi | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Department of Statistics and Informatics, College of Computer Science and Mathematics, University of Mosul, Mosul, Iraq | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The UN's International Organization for Migration (IOM) reported that Iraq is the fifth-most affected country by soaring temperatures. This requires assessing the risks associated by accurately understanding the behaviour of these extreme events. A deep investigation of the behaviour of 2m air temperature has been done, by spatially modeling this event using extreme value copula with Pickands dependence. The investigation of the 2m extreme air temperature in Iraq concluded that the symmetric extreme-value copula models are suitable to consider in the modeling. Nine extreme value copula models were constructed from one parameter family copulas (Hüsler-Reiss, Gumbel, and Galambos), and two parameters model t-EV, after adopting the spatial context. Fifty locations were randomly sampled from 1517 locations divided into two parts, 40 for modeling, and 10 for validation. The Composite Maximum Pseudo-Likelihood estimation method has been used in the modeling. According to the AIC information criterion, we selected 4 models as candidates (Hüsler-Reiss A and B; Gumbel, and Galambos). Due to the slight difference in the criterion values among the four candidate models, the Kullback-Leibler (KL) divergence method between the non-parametric and parametric pairwise extreme-value copulas has been evaluated by the validation dataset, to choose the best-fitted model. The Hüsler-Reiss A was the best-fitted model, due to the high KL density around zero of all the pairwise in the validation dataset. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Highlights | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Iraq’s 2m extreme air temperature dataset has been investigated to analyze its behaviour. The monthly block maxima have been adopted in this study, so that resulted from this block maxima procedure marginals having GEV distribution with location , , and parameters. The parameters of GEV for each marginal (location) , have been estimated by the maximum likelihood method. The estimated parameters , , and appeared slightly varying in their amounts for most of the locations, except the northeast region. So, we can consider the events as stationary after excluding the northeast of Iraq region, due to the mountains which caused the spatial process has non-stationary behaviour in this region. These estimated marginals are also used to transform to a pseudo format, , that will be used in the modeling. Examining the pseudo dataset has asymptotic dependence /independence behaviour has been done by the upper tail , and lower tail measures using empirical copula. The dataset showed that has asymptotic dependence property. After examining the pseudo spatial dataset that has heavy tail dependence property, the next test is to see whether the dataset belongs to an extreme-value copula and it is symmetric (exchangeable). For the first one, -value with test was used between the empirical copula and extreme-value copula with non-parametric estimated Pickands dependence function. This test was under hypotheses , where , is a Pickands dependence function of . The statistics -value falls to reject . The symmetry (symmetry radial of the underlying multivariate copula) test is also done by -value under the hypothesis , where is the survival of . The statistics -value also falls to reject . The investigation of the 2m extreme air temperature in Iraq above, concluded that symmetric extreme-value copula models are suitable to consider in modeling. Nine Copula models were constructed from one parameter family copulas (Hüsler-Reiss, Gumbel, and Galambos), and two parameters model, which is the t-EV copula after adopting the spatial context. Such that, and regarding the Hüsler-Reis model, four models were considered. The first model A, when trend with , such that ; the second B, third C, and forth D, we used the spirit of its parameter , where is a spatial and isotropic auto-correlation function, and is a distance between the pairwise locations . The auto-correlation functions have been chosen to be exponential, , ; power exponential , , and Cauchy , . These models of used also in t-EV copula to construct three additional models, named t-EV A, B, and C. Finally, for Gumbel and Galambos models, the same consideration in Hüsler -Reiss A model has been used, such that their parameters respectively expressed by , and . A sample of locations was randomly selected from locations divided into two parts, for Modeling, and for validation, after excluding the northeast region in Iraq from the sampling. The composite pseudo-likelihood estimation method is used to model the extreme event. According to the AIC information criterion, we selected models as candidates. Which are: Husler-Reiss A and B; Gumbel, and Galambos. Due to the slight difference in the criterion values among the four candidate models, we implemented the KL divergence method between the four estimated models with the model having a non-parametric estimated Pikands dependence function. This divergence test was implemented on the validation dataset to choose the best-fitted model. The Hüsler -Reiss A wined in this competition, due to the density around zero of all the pairwise in the validation dataset being high. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Keywords | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Spatial Extremes; Extreme Value Copulas; 2 Meter Air temperature; Pickands dependence function | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Full Text | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Introduction Extreme temperature events, such as heat waves, have significant impacts on various domains, such as agriculture, energy, and public health. Modeling these extreme events accurately is crucial for better understanding their behaviour and for effective planning and mitigation of the risks associated with them. Extreme value models have shown promising performance in modeling such types of data. In a spatial context, a max-stable spatial process is considered, and the models of this process will be in multivariate case [1]. This situation itself poses a challenge because these events follow multivariate extreme value (MGEV) distributions, and no existing models can capture the dependence structure of these events. Additionally, ignoring the dependencies among the locations and treating them as independent locations using the Generalized Extreme Value (GEV) distribution for each location will provide an unreal representation of the events. Despite the models of the max-stable in the bivariate case existing, the limited number of these models, and the relatively high number of parameters in most of the models also state as restrictions in the modeling.
The report of the International Organization for Migration (IOM) in the UN concerning climate change published on 11 August 2022 puts Iraq as the fifth-most vulnerable country to climate breakdown, affected by soaring temperatures, and this requires preparing for assessing the risks associated with this climate change. Choosing a 2m air temperature to investigate its behaviour in this study was motivated by the outputs of this report. To address this breakdown, one should first understand the behaviour of the extreme 2m air temperature. This will be done by Modeling this event via the extreme-value copula. The 2m air temperature was collected from the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric, land and oceanic climate global dataset ERA5 [9]. This study is devoted to investigating the behaviour of the two-meter air temperature in Iraq through in-depth analysis by extreme-value copula with Pickands dependence functions. The modeling of this event has been done by following the statistical inference on extreme-value copulas introduced in [10] and adaptation of extreme-value copula to spatial context by considering the parameters are functions of distance among the locations. The Composite Maximum Pseudo-Likelihood estimation method introduced in [11] has been used in the modeling. The paper is organized as follows: the theoretical concepts of extreme-value copula models, and corresponding Pickands dependence functions. Furthermore, adapting these concepts to the spatial context has been presented first. Then, the Composite pseudo-likelihood method is used in the parameters estimation of the copula models presented in Extreme-Value copula section. Preparing the 2m air temperature in Iraq dataset by pre-processing it (examining the stationary, isotropy, tail dependencies, and symmetry properties), modeling, and choosing the best-fitted model have been done in modeling the 2m air temperature in Iraq dataset. Finally, the discussions and conclusions of the main results obtained were presented. Extreme-Value CopulaIn this section, extreme-value copula models and their extension to spatial context used in modeling the 2m air temperature have been presented. The extreme-value copula will be defined via Pickands dependence function. Pickands function is a major and important key in extreme-value copula, so choosing different functions of Pickands leads to different copula models. Since the dataset that will be discussed in the pre-processing section is symmetric, a symmetric Pickands function, i.e., symmetric extreme-value, will be the focus of this section. The main fundamental concepts concerning Copula can be found in [12]. Let be a random vector with multivariate probability distribution function , and marginals . A function is said to be a multivariate copula , with dimension if and only if
where . The copula is unique if is continuous [13]. In extreme value context, let be a random vector with i.i.d replications and multivariate copula , and let non-overlapping block maxima, such that , and has max-stable copula
The extreme-value copula exists if and only if
such that
where is a max-stable tail dependence measure [10, 14]. Bivariate extreme-value copula via Pickands dependence functionSince the multivariate extreme events have tail dependence, a Pickands dependence function is the most reasonable measure able to quantify the dependence strength among the variables [15]. It has the capability on analyzing rare events, such as extreme weather events. More specifically, Pickands dependence function is considered an essential tool in bivariate extreme-value copula, because it can reduce the mapping to one dimension, and Copula is fully characterized in this dimension [13]. Without loss of the generality, we will define when . In extreme context, and under max-stability tail dependence assumption, we can define Pickands dependence function, so that for all ,
Therefore, in the bivariate case, the extreme-value copula in Equation (4) can be expressed by Pickands dependence function as follows
By Theorem 2.22 in [10] formula of in (6) will be
or equivalently
Referring that is the inverse of marginals corresponding to the . Since this study concerns to block maxima case, then will follow GEV distribution with location , scale , and shape parameters. is a convex function with inequality ,respectively correspond to complete dependence and independence. Deferent models of are determined by models of . We will present the most common parametric and symmetric copulas, that will be used in the modeling in this paper.
The attraction of this model is the domain of the Logistic distribution function [17].
where
and is a univariate student’s distribution function with a degree of freedom . If , then t-EV weakly converges to Hüsler -Reiss copula with parameter [18]. Adaptation of extreme-value copula to spatial contextIn the previous section, the extreme-value copula and corresponding Pickands dependence functions are presented in general concepts. In this part adaptation of these concepts to spatial extremes will be done. Let , , , and be an i.i.d replications of spatial process. Let and , be two continuous sequences, if there exist
with non-degenerate marginals, then is a max-stable process, such that , where refer to GEV distribution with location , scale , and shape parameters [19]. The max-stable spatial process is said to be strictly stationary, if , , , and . And is isotropic if the covariance for each depends only on the distance, such that , . In what follow in this paper, the max-stable spatial process will be under assumptions of stationarity and isotropy properties. When the focus is on extreme values, it is necessary to use more suitable tools for analyzing the spatial dependence of extremes. Since our aim is modeling using the Extreme-Value Copula concept via Pickands dependence function, we present the concepts in a spatial context. Let be a pairwise of spatial process with unit uniform distribution separated by the distance , such that for all , . The bivariate extreme-value copula corresponding to the pairwise is
The Pickands dependence function is a function that evaluates the dependence strength between separated by distance . Concerning Husler-Reiss Pickands function, the spatial aspect will be included, so that, the parameter in Equation (10), will be , where is the spatial isotropic correlation function. Many models of correlation function exist, such as exponential, power exponential, and many others. As well as for in t-EV copula model [20]. Concerning Gumbel and Galambos copula models, respectively with parameters and , the same consideration will be made. The fact that, the dependence strength of each pairwise in are varying, and since is isotropic, which means this varying will be according to the distance. And most of the time this dependence strength decreases as increases. Therefore, using this fact, we will consider the parameters and to be the trend across distance. Such that, , as well as for , where is a coefficient of trending. Composite Maximum Pseudo-LikelihoodA parametric estimation such as the Maximum Pseudo-Likelihood MPL method showed as a useful tool for estimating copula parameters, especially when the marginals are unknown [21]. Since just the bivariate extreme copula models exist, the composite likelihood is a reasonable method for estimating spatial extreme models [22, 23]. The combination of the two likelihood methods composites and pseudo was defined in [11], named Composite Pseudo-Likelihood CPL. This method is very suitable when using the copula concept in modeling spatial extremes. For that, this method was used in the Modeling of the extreme 2m air temperature event. Given a max-stable dataset with i.i.d replicates, and let is pseudo max-stable spatial process. The Composite pseudo-likelihood function is given by
where denoted to the likelihood contribution function of the pairwise at the replication . In this study used as bivariate density of the corresponding extreme-value copula defined in Equation (16). Let the compact set of the parameters of is denoted by . The estimation of can be achieved by maximizing , so that
Since the i.i.d achieved on copula when the marginals are known, such as in this case, the pairwise pseudo-likelihood estimator has asymptotic normality as , with mean and covariance matrix of sandwich form , where
respectively are the variance of the score function, and the expected information matrix are computed from Equation (17). For more details about asymptotic behaviour, see [11], [24], and [25]. The estimation of can be readily obtained from the Hessian provided by the optimization algorithm, and of by the empirical variance of the score contribution of each observation [26]. The selected model will be according to the corresponding minimum of , where is the number of locations in the dataset, and
is the Composite Likelihood Information Criterion. is very closely to Akaike Information Criterion AIC, so for simplicity in computations, we shall use AIC instead of . See [23] and [26]. Modeling the 2m air temperature in Iraq datasetData description and pre-processingThe goal of this section is to model the extreme 2m air temperature in Iraq. The hourly 2m air temperature was collected from the fifth generation of atmospheric land and oceanic climate global dataset ERA5, produced by the European Centre for Medium-Range Weather Forecasts ECMWF. This dataset was collected for the region with a longitude range of 37.5 to 49 degrees, a latitude range of 27.5 to 38 degrees, and a grid spacing of 11 km during the summer season (June, July, and August) for the years from 1981 to 2022, at times from 11:00H to 17:00H. This collection of data resulted grids and hourly observations for each grid. Mathematically, let , , , and be a spatial process represent the 2m air temperature. To ensure the block maxima be i.i.d, a monthly block maxima was proposed. So that for a non-overlapping replication
where is a spatial extreme process, for each marginal of follows GEV distribution, and the amount represents respectively the number of hours per day times the number of days per month. To examine the stationarity of the dataset, the GEV’s parameters , and are estimated for each location of using the maximum likelihood estimation method. The grids in the three panels in Figure 1 represent the estimated location , scale , shape parameters for each . Noting that, all the computational process in this study has been done by R program with main package ‘copula’ version 1.1-2, and others.
Figure 1: The panels respectively represent the estimated parameters , scale , shape for each grid of the dataset It is clear that each of the three estimated parameters for each grid is approximately equal, especially in the red region for , green in ; and red for excluding the northeast of Iraq, due to the mountains. That means and from the definition of strictly spatial stationarity, we can consider the dataset has spatial stationary property. In the following step the spatial extreme dataset will transform into , such that for each
where is the GEV distribution. In what follows, we shall deal with instead of . It is known the dependence structure pattern of the events is essential in modeling extreme events. This structure distinguishes between the models corresponding to asymptotic dependence/ independence structures. So, in the next step, examine the dataset for which the dependence structure belongs to asymptotic dependence or independence. This examination will by empirical upper and lower tails dependence measures [27],
And
where is the empirical Copula, so that , and . For more details, see [10]. The threshold was chosen to be to ensure there are data for computation. The pairwise evaluation of and of the dataset represented respectively by the first and second panels in Figure 2. From these panels, the dataset seems to have an upper tail dependence structure, due to , as well as for . So, we can consider the block maxima has an asymptotic dependence structure, and this leads to consider extreme copula models. To verify if the dependence structure of the dataset is present in extreme, furthermore, the exchangeability (have symmetric distribution) between the pairwise locations , a test hypothesis corresponding to these two assumptions has been made, (see Figure 3). Once again, the northeastern region of Iraq does not appear a tail dependence structure, and this is clear on the top and right sides of the two panels. From this result, this region will be excluded definitely from the modeling.
Figure 2: Empirical pairwise upper and lower tail dependence measures, respectively represented by the left and right panels. Each grid in the panels represents the corresponding tail dependence strength between the pairwise , . Regarding the pairwise extreme-value dependency, locations sampled form for this purpose. The -value test with will be done between the empirical pairwise copula and extreme-value copula with the non-parametric estimate of Pickands dependence function under the hypothesis
where , is a Pickands dependence function of [28]. The -value statistics is illustrated in the lift penal in Figure 3. The blue fill represents that the p-value test cannot reject . In other words, the pairwise , has extreme-value copula. As illustrated in the left panel, the test in most of the pairwise failed to reject , so we can consider extreme-value models in modeling the dataset. To ensure this property exists in modeling, the pairwise locations rejected will be excluded from the selection of locations for modeling. To test the exchangeability between the pairwise (symmetry radial of the underlying multivariate copula), -value statistics will be used based on empirical copula [29]. The assumption of the symmetry copula will be under the hypothesis
where is the survival of . The right panel in Figure 3 appears that the majority cannot reject . Then we can consider the extreme-value and symmetric models.
Figure 3: The panels represent the -value test with for extreme value dependence and the exchangeability (symmetry) of the copula dataset. The left panel, represent the pairwise , has extreme-value dependence after collecting randomly 100 locations among location; while the right panel represents the which pairwise have symmetric copula model. Copula models proposed for modeling 2m air temperature datasetBy adopting the results obtained in the pre-processing section, the locations implemented in the modeling will be randomly selected from locations to ensure consistency with the outcomes of the previous section: The northeast region of Iraq will be excluded from the Modeling, due to the spatial non-stationary with the remaining region. In other words, the GEV marginals of this region have different behaviour from others; the process at location which does not have extreme-value dependences; and also, does not have symmetric copula will be excluded from the modeling also. The 2m air temperature spatial process in locations have been randomly selected, 40 for modeling and 10 for validation. The latter locations will not be used in modeling. Most of these locations are in Iraq and small numbers are sited around the border of the west and south of Iraq, which have the same behavior as the majority of locations. The coordinates of the selected locations are pointed out in Figure 4.
Figure 4: The coordinates of the selected locations of modeling and validation. The red dots represent the locations that will use in the modeling, while the green one will be for validation According to the results of the preprocessing, a symmetric extreme-value Copula model is proposed with different Pickands dependence functions. One parameter extreme copula family (Hüsler-Reiss, Gumbel, and Galambos), and two parameters family, such as t-EV defined respectively in Equations (10), (12), (13) and (14) are chosen for modeling. Traditionally, in one-parameter models, the tail dependence strength between two extreme random variables is controlled by this parameter. To extend this concept to spatial context, should this parameter be varying across the distance between the pairwise locations , so that these parameters measure the tail dependence strength between or equivalently separated by the distance , under isotropy assumption. Due to the dependence strength varying with each pairwise separated by a distance , the parameters will consider as a function of distance. So, models A will consider the parameter varying across the distance with coefficient ; model B, the parameter varying across according to exponential dependence strength; models C will be according to power exponential; and model D with Cauchy. So that the proposed models will be
Due to no extreme-value models with exist in spatial context, all models considered with , and therefore the estimation of the parameter’s models will be by Composite Pseudo Likelihood CPL method previously introduced. Usually, Composite likelihood method is used for estimating parameters for the spatial extreme process. Table 1 shows the estimated parameters, log-likelihood amount, and model selecting criterion (Akaike information criterion AIC) corresponding to each model proposed. Table 1: Show the estimated parameters, log-likelihood, and Akaike information criterion AIC for the models proposed
The AIC information criterion in Table 1 indicates that the best four fitted models are Hüsler-Reiss A, B, Gumbel, and Galambos. The slight difference in the criterion values of AIC among the four candidate models made us consider the model which will represent the 2m air temperature has minimum divergence between the non-parametric and parametric of the estimated bivariate extreme-value copulas. This divergence test will apply to the validation dataset using the Kuiback-Lebler KL method. For more information see, [10]. Each pairwise of the validation dataset, the KL divergence between the four estimated models and the model with non-parametric Pickands function have been done. Figure 5 shows that the Hüsler-Reiss A model has a divergence density of the pairwise of the validation dataset with more skew to lift (to zero) than the three candidate models. In other words, the divergence between the non-parametric and estimated Hüsler-Reiss A copula is the minimum. For that, now we can consider the Hüsler-Reiss A copula as the best extreme-value copula model to represent the 2m air temperature event in Iraq, and they can adapt it in further studies. Figure 5: The densities of the Kuiback-Lebler divergence of the four candidate models (Hüsler-Reiss A, Hüsler-Reiss B, Gumbel, and Galambos) with the non-parametric extreme-value copula, which implemented on the validation dataset | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
References | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
References
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Statistics Article View: 157 PDF Download: 139 |