A global optimization paradigm based on change of measures
A global optimization framework, COMBEO (Change Of Measure Based Evolutionary Optimization), is proposed. An important aspect in the development is a set of derivative-free additive directional terms, obtainable through a change of measures en route to the imposition of any stipulated conditions aimed at driving the realized design variables (particles) to the global optimum. The generalized setting offered by the new approach also enables several basic ideas, used with other global search methods such as the particle swarm or the differential evolution, to be rationally incorporated in the proposed set-up via a change of measures. The global search may be further aided by imparting to the directional update terms additional layers of random perturbations such as ‘scrambling’ and ‘selection’. Depending on the precise choice of the optimality conditions and the extent of random perturbation, the search can be readily rendered either greedy or more exploratory. As numerically demonstrated, the new proposal appears to provide for a more rational, more accurate and, in some cases, a faster alternative to many available evolutionary optimization schemes.
1. Introduction
Potential applications of global optimization are ubiquitous across a large spectrum of problems in science and engineering, from very-large-scale integration circuit design, to optimization of transportation routes to determination of protein structures. An essential aspect of the solution to any such problem is the extremization of an objective functional, possibly subject to a prescribed set of constraints. The solution is deemed to have been achieved if the global extremum of the objective functional is reached. For many practical applications, the objective functional could be non-convex, non-separable and even non-smooth. This precludes gradient-based local optimization approaches to treat these problems, an aspect that has driven extensive research in devising global optimization strategies, many of which employ stochastic (i.e. random evolutionary) search of heuristic or meta-heuristic origin [1]. Indeed, finding the global optimum within a search space of the parameters or design variables is much more challenging than arriving at a local extremum. In the absence of generic directional information (equivalent to the Gateaux or directional derivative in the search for a local extremum of a sufficiently smooth cost functional), a direct search seems to be the only option left. Some of the notable schemes in this genre include variants of the genetic algorithm (GA) [2], differential evolution (DE) [3], particle swarm optimization (PSO) [4], ant colony optimization [5] and covariance matrix adaptation evolution strategy (CMA-ES) [6]. In a search for the global extremum, stochastic schemes typically score over their gradient-based deterministic counterparts [7,8], even for cases involving sufficiently smooth cost functionals with well-defined directional derivatives. Nevertheless, thanks to the proper directional information guiding the update, gradient-based methods possess the benefit of faster convergence to the nearest extremum, as long as the objective functional is sufficiently smooth. To the authors' knowledge, the existing evolutionary global optimization schemes do not offer the benefit of well-founded directional information. Most evolutionary schemes depend on a random (diffusion-type) scatter applied to the available candidate solutions or particles and some criteria for selecting the new particles. Unfortunately, a ‘greedy’ approach for the global optimum (e.g. selecting particles of higher ‘fitness’ with higher probability), though tempting from the perspective of faster convergence, faces the hazard of getting trapped in local extrema. As a fix to a premature and possibly wrong convergence, many evolutionarily global search schemes adopt safeguards. Despite the wide acceptability of a few evolutionary methods of the heuristic/meta-heuristic origin [9], the underlying justification is often based on sociological or biological metaphors [1–6,10] that are hardly founded on a sound probabilistic basis even though a random search forms a key ingredient of the algorithm. In this context, some interesting comments on the facile usage of natural metaphors in the development of many global optimization algorithms, without a serious engagement with scientific ratiocination, may be found in [10]. Be that as it may, the popular adoption of these schemes is not only owing to the algorithmic simplicity, but mainly because of their effectiveness in treating many non-deterministic polynomial-time hard optimization problems, which may be contrasted with a relatively poorer performance of some of the well-grounded stochastic schemes, e.g. simulated annealing [11], stochastic tunnelling [12], etc. Whether a slow convergence to the global optimum, not infrequently encountered with some of the popular evolutionary schemes, could be fixed by reorienting them with an alternative stochastic framework, however, remains a moot point, which is addressed in this work to an extent. Another related question is regarding the precise number of parameters in the algorithm that the end-user must tune for an appropriate exploration–exploitation trade-off [13,14], a phrase used to indicate the relative balance of random scatter of the particles vis-a-vis their selection based on the ‘fitness’. Admittedly, the notion of random evolution, built upon Monte Carlo (MC) sampling of the particles, efficiently explores the search space, though at the cost of possibly slower convergence [15] in contrast to a gradient-based method. For better exploration, many such schemes, e.g. the GA, adopt steps like ‘crossover’, ‘mutation’, ‘selection’, etc. While ‘crossover’ and ‘mutation’ impart variations in the particles, by the ‘selection’ step each particle is assigned a ‘weight’ or fitness value (a measure of importance) determined as the ratio of the realized value of the objective functional to the available maximum of the same. In the selection step, the fitness values, used to update the particles via selection of the best-fit individuals for subsequent crossover and mutation, may be considered functionally analogous to the derivatives used in gradient-based approaches. Analogy may also be drawn between the fitness values and the likelihood ratios commonly employed in nonlinear stochastic filtering, e.g. the family of sequential importance sampling filters [16]. Even though a weight-based approach reduces a measure of misfit between the available best and the rest within a finite ensemble of particles, they bring with them the curse of degeneracy, wherein all particles but one tend to receive zero weights as the iterations progress [17]. This problem, also referred to as ‘particle collapse’ in the stochastic filtering parlance, can be arrested, in principle, by exponentially increasing the ensemble size (number of particles), a requirement that can hardly be met in practice [18]. Evolutionary schemes that replace the weight-based multiplicative approach by adopting additive updates for the particles, often succeed in eliminating particle degeneracy. Evolutionary schemes like DE, PSO, etc., that are known to perform better than GA, use such additive particle update strategies without adversely affecting the ‘craziness’ or randomness in the evolution. On reaching the optimum, the additive correction shrinks to zero. Unfortunately, none of these methods obtain the additive term, which may be looked upon as a stochastic equivalent to the derivative-based directional information, in a rigorous or optimal manner—a fact that is perhaps responsible for a painstakingly large number of functional evaluations in some cases.
In framing a rigorous stochastic basis leading to directional updates of the additive type, several possible schemes are suggested in this work. First, it is shown that the problem of optimization may be generically posed as a martingale problem [19,20], which must however be randomly perturbed to facilitate the global search. For instance, one may perform a greedy search for an extremum of an objective functional by solving a martingale problem, which is in the form of an integro-differential equation herein referred to as the extremal equation. In line with the Freidlin–Wentzell theory [21], the martingale problem may further be perturbed randomly so as to enable the greedy scheme to converge to the global extremum as the strength of the perturbation vanishes asymptotically. The solution through this approach, which is viewed as a stochastic process parametered via the iterations, depends crucially on an error or ‘innovation’ function and is deemed to have been realized when the innovation function, interpreted as a stochastic process, is driven to a zero-mean noise or martingale [19]. The martingale structure of the innovation essentially implies that the mean of the computed cost functional in future iterations remains constant and invariant to zero-mean random perturbations of the argument vector (the design variables). The argument vector should then correspond to an extremum. In the evolutionary random set-up, both the cost functional and its argument vector are treated as stochastic diffusion processes even though the original extremization problem is deterministically posed. Realizing a zero-mean martingale structure for the innovation requires a change of measures that in turn leads to additive gain-type updates on the particles. The gain coefficient matrix, which is a replacement for and generalization over the Frechet derivative of a smooth cost functional, provides an efficacious directional search without requiring the functional to be differentiable. Additionally, in order to facilitate the global search, a general random perturbation strategy (using steps such as ‘scrambling’, ‘relaxation’ and ‘selection’) is adopted to insure against a possible trapping of particles in local extrema.
An important feature of the present approach is the flexibility with which the innovation process may be designed in order to meet a set of conflicting demands en route the detection of the global extremum. The efficiency of the global search basically relies upon the ability of the algorithm to explore the search space while preserving some directionality that helps to quickly resolve the nearest extremum. The development of the proposed set-up, referred to as COMBEO (Change Of Measure Based Evolutionary Optimization), recognizes the near impossibility of a specific optimization scheme performing uniformly well across a large class of problems. In this context, one may refer to variants of the famous ‘no free lunch theorem’, where it is proved that the average performance of any two algorithms across all possible problems is identical [22]. Accommodation of this fact had earlier led to a proposal of an evolutionary scheme that simultaneously ran different optimization methods for a given problem with some communications built among the updates by the different methods [23]. A major related contribution of this work is to bring the basic ideas for global search used in a few well-known optimization schemes under a single unified framework propped up by a sound probabilistic basis. In a way explained later, the ideas (or their possible generalizations) behind some of the existing optimization methods may sometimes be readily incorporated in the present setting by just tweaking the innovation process.
The rest of the paper is organized as follows. In §2, the problem of finding an extremum of an objective functional is posed as a martingale problem represented through an integro-differential equation, whose solution only realizes a local optimum. The integro-differential equation is discretized and weakly solved within an MC setting so as to circumvent its inherent circularity. Section 3 discusses several random perturbation schemes to arrive at the global optimum efficiently without getting stuck in local traps. By combining these tools in different ways, a few pseudo-codes are presented in §4. In §5, the performance of COMBEO is compared with DE, PSO and CMA-ES in the context of several benchmark problems. In the same section, the new optimization approach is also applied for the quantitative recovery of boundary layer density variation in a high-speed flow given the light travel time data from an experiment. Finally, the conclusions of this study are drawn in §6.
2. Search for an extremum through a martingale problem
Here, it is demonstrated that functional extremization, subject to a generic set of equality constraints, can be posed as a martingale problem upon a proper characterization of the design variables within a probabilistic setting. However, before elaborating on this, a few fundamental features of the new evolutionary approach are listed below:
-
(i) at a given iterate, the solution is a random variable (taking values in the search space);
-
(ii) hence the solution process along the iteration axis may be considered a stochastic process and the iterations must be so designed that the optimal solution is asymptotically approached by the mean (i.e. the first moment) of the solution process; and
-
(iii) since upon convergence the mean should be iteration-invariant, random fluctuations about the converged mean must assume the structure of a zero-mean stochastic process (along the iteration axis), thereby allowing us to identify the fluctuations as a noise process, e.g. a Brownian motion, or, more generally, a zero-mean martingale.
In posing the global optimization problem within a stochastic framework, a complete probability space
Note that a ready extension of this approach is possible with multi-objective functions, which would merely require building up the innovation as a vector-valued process. Similarly, the approach may also be adapted for treating equality constraints, wherein zero-valued constraint functions, perturbed by zero-mean noise processes, may be used to construct the innovation processes. The easy adaptability of the current set-up for multiple objective functions or constraints may be viewed as an advantage over most existing evolutionary optimization schemes, where a single cost functional needs to be constructed based on all the constraints, a feature that may possibly engender instability for a class of large dimensional problems.
Driving xτ to an
Within a τ-discrete framework with τ∈(τi−1,τi], xτ is evolved following equation (2.1). An innovation constraint that may be satisfied for an accelerated (greedy) search in a neighbourhood of an extremum is given by
Note that the replacement of Δτ by Δτi merely modifies the intensity of the noise process Δητ in equation (2.3). The form of SDE (2.6) has the desirable feature that the fictitious diffusion coefficient Δτi is an order ‘smaller’ relative to the drift coefficient f(xτ). However, since ητ is not standard Brownian, it is more convenient to rewrite equation (2.6) as
As shown in appendix A (a theorem and its corollary) using Ito's expansions of xτΛτ and (Λτ)−1, the incrementally additive updates to arrive at an extremum must follow from the differential equation:
Note that the directionality of a search process provided by the second (integral) term on the right-hand side (r.h.s.) of the equation above may also be gauged from the fact the integrand in this term can be interpreted as a Malliavin-type derivative using Clarke–Ocone theorem [25]. But the appearance of the unknown term πτ(f) on the r.h.s. prevents equations (2.11) or (2.12) to be qualified as an SDE in πτ(x). Indeed, a necessarily nonlinear dependence of f on xτ and the consequent non-Gaussianity of πτ(f) would prevent writing the latter in terms of πτ(x). This results in the so-called closure problem in solving for πτ(x).