Smoothing parameter selection in Nadaraya-Watson kernel nonparametric regression using nature-inspired algorithm optimization.

Basheer, Zinah; Algamal, Zakariya

doi:10.33899/iqjoss.2020.167389

Journals List

Smoothing parameter selection in Nadaraya-Watson kernel nonparametric regression using nature-inspired algorithm optimization.

IRAQI JOURNAL OF STATISTICAL SCIENCES

Article 6, Volume 17, Issue 2, December 2020, Pages 43-50 PDF (1.09 M)

Document Type: Research Paper

DOI: 10.33899/iqjoss.2020.167389

Authors

Zinah Basheer^* ; Zakariya Algamal

Department of Statistics and Informatics, Computer science and Mathematics, University of Mosul, Mosul, Iraq

Abstract

In the context of Nadaraya-Watson kernel nonparametric regression, the curve estimation is fully depending on the smoothing parameter. At this point, the nature-inspired algorithms can be used as an alternative tool to find the optimal selection. In this paper, a firefly optimization algorithm method is proposed to choose the smoothing parameter in Nadaraya-Watson kernel nonparametric regression. The proposed method will efficiently help to find the best smoothing parameter with a high prediction. The proposed method is compared with four famous methods. The experimental results comprehensively demonstrate the superiority of the proposed method in terms of prediction capability.

Highlights

this paper, the problem of selecting smoothing parameter in Nadaraya-Watson kernel nonparametric regression is considered. A firefly optimization algorithm was proposed to choose the parameter of smoothing parameter. The results obtained from simulation and real data application demonstrated the superiority of the proposed method, FFA, in terms of IMSE comparing with other competitor methods.

Keywords

Nadaraya-Watson estimator; firefly optimization algorithm; smoothing parameter selection

Full Text

Introduction

The nonparametric regression model (NRM) and methods of estimation have been developed mainly in the last years (Ali, 2019; Hazelton & Cox, 2016). Kernel regression estimates are one of the most popular nonparametric estimates. In a univariate case, these estimates depend on a bandwidth, which is a smoothing parameter controlling smoothness of an estimated curve and a kernel which is considered as a weight function (Schucany, 1995; Slaoui, 2016; Steland, 2012).

The choice of the smoothing parameter is a crucial problem in the kernel regression. The literature on bandwidth selection is quite extensive, such as (Ali, 2019; Chen, 2015; C.-K. Chu & Marron, 1991; C. Chu, 1995; Dobrovidov & Ruds’ko, 2010; Feng & Heiler, 2009; Francisco-Fernández & Vilar-Fernández, 2005; Gao & Gijbels, 2012; Kauermann & Opsomer, 2004; Koláček & Horová, 2017; Lee & Solo, 1999; Leungi, Marriott, & Wu, 1993; Nychka, 1991; Opsomer & Miller, 2007; Rice, 1984; Schucany, 1995; Zhang, Chan, Ho, & Ho, 2008; Zhou & Huang, 2018; Żychaluk, 2014).

In this paper, a firefly optimization algorithm method, which is a natural-inspired continuous algorithm, is proposed to choose smoothing parameter in Nadaraya-Watson kernel nonparametric regression. The proposed method will efficiently help to find the best smoothing parameter with a high prediction. The superiority of the proposed method in different simulated examples and a real data application is proved.

This paper is organized as follows. The description of the Nadaraya-Watson kernel nonparametric regression and the smoothing parameter selection are covered in Section 2. The details of the firefly optimization algorithm are covered in Section 3. Section 4 contains the details of our proposed method. The illustration of the proposed method through simulation studies and through real data application is given in Sections 5 and 6. In section 7, the conclusion is covered.

Smoothing parameter selection

The nonparametric regression model, often estimated by estimators of the Nadaraya–Watson type, forms an attractive framework for diverse areas such as engineering, econometrics, environmetrics, social sciences, and biometrics (Steland, 2012). In NRM, we have a set of univariate observations and the model can be defined as:

where is the function of unknown regression and are random errors with mean equal to zero and variance . The nonparametric regression depend on weighted mean of the dependent variable, the weights are the distance between the observations of independent variable measured by a smoothing parameter.

One of the techniques nonparametric regression estimate is the Nadaraya-Watson (NW) kernel function estimator which is more flexible than the other nonparametric techniques and it provides an accurate predictor of observations (Ali, 2019; Kyung Lee, Park, & Su Park, 2007; Li & Palta, 2009). The kernel estimator of at the point , in general, is defined as

where represents kernel probability density function centered at each point , and the smoothing parameter is known as fixed bandwidth. The NW kernel function estimator with a fixed is defined as the following:

The NW kernel estimator depends on smoothing parameter (bandwidth). It controls the amount of curve smoothing where large value leads to a smooth density estimate (Rice, 1984; Schucany, 1995; Utami, Haris, Prahutama, & Purnomo, 2020; Yoon Kim & Park, 2002).

The optimal bandwidth of the NW kernel estimator is the value which minimized the mean integrated squared error and it is obtained by integration the mean squares of errors (IMSE). There are many different methods to select the value of . Among them, in 1981, Friedman and Stuetzle used this strong reparability to identify the components of the nonparametric regression model when is unknown, and proposed a kernel-based consistent and asymptotically normal estimator (Friedman & Stuetzle, 1981). In 1982, the researcher Abramson suggested the law of inverse square root to estimate in variable kernel density function, which reduced the bias more than fixed h estimator (Abramson, 1982). In 1986, the researcher Silverman suggested an adaptation for the kernel function estimator by varying the as nonparametric estimation depends on geometric mean (Silverman, 1986). In 1987, the researchers Scott and Terrell discussed relationship between the biased and unbiased cross-validation and the variable used instead than the fixed in the case of long-tail distribution (Scott & Terrell, 1987). There are several authors handling the problem of selection the smoothing parameter, such as (Ali, 2019; Chen, 2015; C.-K. Chu & Marron, 1991; C. Chu, 1995; Dobrovidov & Ruds’ko, 2010; Feng & Heiler, 2009; Francisco-Fernández & Vilar-Fernández, 2005; Gao & Gijbels, 2012; Kauermann & Opsomer, 2004; Koláček & Horová, 2017; Lee & Solo, 1999; Leungi et al., 1993; Nychka, 1991; Opsomer & Miller, 2007; Rice, 1984; Schucany, 1995; Zhang et al., 2008; Zhou & Huang, 2018; Żychaluk, 2014).

Firefly optimization algorithm

Nature has been an inspiration for the introduction of many meta-heuristic algorithms. Swarm intelligence is an important tool for solving many complex problems in scientific research. Swarm intelligence algorithms have been widely studied and successfully applied to a variety of complex optimization problems. The firefly algorithm (FFA), is one of the recent novel swarm intelligence methods and the most powerful optimization algorithms, which was developed by Yang (2013).

Firefly algorithm has been proved to be a good performance and the effectiveness for solving various optimization problems (Fister, Fister, Yang, & Brest, 2013). The firefly algorithm has been inspired by the simulation of the social behavior of fireflies on the basis of the flashing lights or the flash attractiveness. By representing the advantage of some flashing characteristics of fireflies and how fireflies interact with flashing lights, the firefly flash is a signal system which used to attract another firefly (Yelghi & Köse, 2018).

For simplicity in the description and modeling of the FFA, Yang (2013) formulated the following three idealized important rules for the protocol to prefer the search process in FFA. The protocol standardizes certain firefly characteristics and describes the behavior of artificial fireflies as follows:

(1) All fireflies are unisex, implying that all the fireflies of a population can attract each other regardless of their sex.

(2) The degree of the attractiveness of a firefly is proportional to its brightness, therefore for any two flashing fireflies, the less bright one will move towards the brighter firefly and the more brightness means the less distance between two fireflies. Attractiveness is proportional to the brightness which decreases as the distance increase between tow fireflies. If there is no brighter firefly than a particular firefly, it will move randomly.

(3) The brightness of each firefly is affected or determined by the value of the objective function to be optimized. In the maximization problem, the brightness of each firefly can be directly proportional to the value of the fitness function. In the case of a minimization problem, the brightness of each firefly is inversely proportional to the value of objective function.

In the implementation of the FFA, each member is called firefly in the swarm. Each firefly represents a candidate solution in the dimensional search space. The brighter locations are assumed to represent better solutions. Then, the algorithm tries to help fireflies to find these locations in the search space. The firefly's attractiveness is determined by its brightness, which in turn is associated with the objective function for a given optimization problem. The brightness decreases when the distance between a firefly and the target location increases. The attraction between fireflies is based on the differences in brightness. This means that a less bright can move to a brighter firefly by the attraction. If none of the fireflies are brighter than a particular firefly, it will move randomly. During the search process and because of the attractions among fireflies, fireflies can move towards new locations or positions through the attraction and find new candidate solutions.

Mathematically, assume that there are of fireflies in the swarm (populations size) are randomly distributed in the D-dimensional search space. During the evolutionary process, each firefly has a position vector denoted as , where and is the dimensionality of the solutions.

The distance between any two fireflies and , at positions and in the search space, respectively, is the Cartesian distance which can be calculated using the following equation

Each firefly has its light intensity or brightness. The brightness value is used to evaluate the goodness of firefly, which is affected by the landscape of the optimization problem. The brightness of firefly at a particular or current position can be denoted by the objective function value as follows:

The light intensity of the firefly is directly proportional to its brightness and is related to objective values. In comparing the two fireflies, both fireflies are attracted, the firefly which has a lower light intensity is attracted toward the other firefly with the higher light intensity. The light intensity of a firefly depends on the intensity of light emitted by a firefly and the distance between two fireflies. Light intensity can be described by a monotonically decreasing function of the which can be formulated as follows:

where is used to control the decrease of the light intensity or brightness an and can be taken as a constant.

Each firefly has its distinctive attractiveness which indicates how powerful it attracts other members in the swarm. Attractiveness, , is relative, which means that it must be judged by others, and therefore varies with the distance . As mentioned earlier, the brightness decreases with the distance from the source and the light is also absorbed by the air, therefore the attractiveness must be allowed to vary with differing degrees of absorption (Karthikeyan, Asokan, & Nickolas, 2014). Thus, the main form of the attractiveness of a firefly is defined as the following equation:

where represents attractiveness function of a firefly at a distance, , and denotes the initial attractiveness of a firefly at distance and it can be constant. For implementation usually set to be 1 for most problems.

The fireflies will try to move to the best position. This means that the lower light intensity one will be attracted by the brighter one. The location updates for each pair of fireflies and . For each firefly , it is compared to other all fireflies . If firefly at position is brighter than firefly , then will move towards by the attraction. The movement is defined as:

where is the randomization parameter, is an absorption coefficient which controls the decrease of the light intensity, and , where is a random number from uniform distribution with [0, 1].

The proposed method

The efficiency of NW kernel estimator largely depends on an appropriately choosing the smoothing parameter, . As a result, it is of crucial importance selecting a suitable value of the . In literature, the most widely used method for selecting is the cross-validation (CV), which is a data-driven approach (Chen, 2015; Scott & Terrell, 1987). In this paper, a FFA is proposed to determine the smoothing parameter in the NW kernel estimator. The proposed method will efficiently help to find the best value with high prediction performance. The parameter configurations for our proposed method are presented as follows.

(1) The number of fireflies, is set to 25 and the number of iterations is .

(2) The positions of each firefly are randomly determined. The position of a firefly represents the smoothing parameter, . The initial positions of the fireflies are generated from a uniform distribution within the range [0,10].

(3) The fitness function is defined as

(4) The positions are updated using Eq. (8).

(5) Steps 3 and 4 are repeated until a is reached.

Simulation results

To test how well the proposed method performs for different possible mean functions the following study design was followed. The comparisons with different used methods, CV, GCV, AIC, and plug-in method (PM) are also conducted. Three sizes of samples are taken as: In addition, the type of kernel is setting as Epanechnikov kernel type.

Case 1: In the case, we use the regression function