TY  - JOUR
A1  - Doerr, Benjamin
A1  - Krejca, Martin S.
T1  - Significance-based estimation-of-distribution algorithms
JF  - IEEE transactions on evolutionary computation
N2  - Estimation-of-distribution algorithms (EDAs) are randomized search heuristics that create a probabilistic model of the solution space, which is updated iteratively, based on the quality of the solutions sampled according to the model. As previous works show, this iteration-based perspective can lead to erratic updates of the model, in particular, to bit-frequencies approaching a random boundary value. In order to overcome this problem, we propose a new EDA based on the classic compact genetic algorithm (cGA) that takes into account a longer history of samples and updates its model only with respect to information which it classifies as statistically significant. We prove that this significance-based cGA (sig-cGA) optimizes the commonly regarded benchmark functions OneMax (OM), LeadingOnes, and BinVal all in quasilinear time, a result shown for no other EDA or evolutionary algorithm so far. For the recently proposed stable compact genetic algorithm-an EDA that tries to prevent erratic model updates by imposing a bias to the uniformly distributed model-we prove that it optimizes OM only in a time exponential in its hypothetical population size. Similarly, we show that the convex search algorithm cannot optimize OM in polynomial time.
KW  - heuristic algorithms
KW  - sociology
KW  - statistics
KW  - history
KW  - probabilistic
KW  - logic
KW  - benchmark testing
KW  - genetic algorithms
KW  - estimation-of-distribution
KW  - algorithm (EDA)
KW  - run time analysis
KW  - theory
Y1  - 2020
U6  - https://doi.org/10.1109/TEVC.2019.2956633
SN  - 1089-778X
SN  - 1941-0026
VL  - 24
IS  - 6
SP  - 1025
EP  - 1034
PB  - Institute of Electrical and Electronics Engineers
CY  - New York, NY
ER  - 
TY  - JOUR
A1  - Doerr, Benjamin
A1  - Krejca, Martin Stefan
T1  - A simplified run time analysis of the univariate marginal distribution algorithm on LeadingOnes
JF  - Theoretical computer science
N2  - With elementary means, we prove a stronger run time guarantee for the univariate marginal distribution algorithm (UMDA) optimizing the LEADINGONES benchmark function in the desirable regime with low genetic drift. If the population size is at least quasilinear, then, with high probability, the UMDA samples the optimum in a number of iterations that is linear in the problem size divided by the logarithm of the UMDA's selection rate. This improves over the previous guarantee, obtained by Dang and Lehre (2015) via the deep level-based population method, both in terms of the run time and by demonstrating further run time gains from small selection rates. Under similar assumptions, we prove a lower bound that matches our upper bound up to constant factors.
KW  - Theory
KW  - Estimation-of-distribution algorithm
KW  - Run time analysis
Y1  - 2021
U6  - https://doi.org/10.1016/j.tcs.2020.11.028
SN  - 0304-3975
SN  - 1879-2294
VL  - 851
SP  - 121
EP  - 128
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Doerr, Benjamin
A1  - Kötzing, Timo
T1  - Lower bounds from fitness levels made easy
JF  - Algorithmica
N2  - One of the first and easy to use techniques for proving run time bounds for evolutionary algorithms is the so-called method of fitness levels by Wegener. It uses a partition of the search space into a sequence of levels which are traversed by the algorithm in increasing order, possibly skipping levels. An easy, but often strong upper bound for the run time can then be derived by adding the reciprocals of the probabilities to leave the levels (or upper bounds for these). Unfortunately, a similarly effective method for proving lower bounds has not yet been established. The strongest such method, proposed by Sudholt (2013), requires a careful choice of the viscosity parameters gamma(i), j, 0 <= i < j <= n. In this paper we present two new variants of the method, one for upper and one for lower bounds. Besides the level leaving probabilities, they only rely on the probabilities that levels are visited at all. We show that these can be computed or estimated without greater difficulties and apply our method to reprove the following known results in an easy and natural way. (i) The precise run time of the (1+1) EA on LEADINGONES. (ii) A lower bound for the run time of the (1+1) EA on ONEMAX, tight apart from an O(n) term. (iii) A lower bound for the run time of the (1+1) EA on long k-paths (which differs slightly from the previous result due to a small error in the latter). We also prove a tighter lower bound for the run time of the (1+1) EA on jump functions by showing that, regardless of the jump size, only with probability O(2(-n)) the algorithm can avoid to jump over the valley of low fitness.
KW  - First hitting time
KW  - Fitness level method
KW  - Evolutionary computation
Y1  - 2022
U6  - https://doi.org/10.1007/s00453-022-00952-w
SN  - 0178-4617
SN  - 1432-0541
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Doerr, Benjamin
A1  - Kötzing, Timo
T1  - Multiplicative Up-Drift
JF  - Algorithmica
N2  - Drift analysis aims at translating the expected progress of an evolutionary algorithm (or more generally, a random process) into a probabilistic guarantee on its run time (hitting time). So far, drift arguments have been successfully employed in the rigorous analysis of evolutionary algorithms, however, only for the situation that the progress is constant or becomes weaker when approaching the target. Motivated by questions like how fast fit individuals take over a population, we analyze random processes exhibiting a (1+delta)-multiplicative growth in expectation. We prove a drift theorem translating this expected progress into a hitting time. This drift theorem gives a simple and insightful proof of the level-based theorem first proposed by Lehre (2011). Our version of this theorem has, for the first time, the best-possible near-linear dependence on 1/delta} (the previous results had an at least near-quadratic dependence), and it only requires a population size near-linear in delta (this was super-quadratic in previous results). These improvements immediately lead to stronger run time guarantees for a number of applications. We also discuss the case of large delta and show stronger results for this setting.
KW  - drift theory
KW  - evolutionary computation
KW  - stochastic process
Y1  - 2020
U6  - https://doi.org/10.1007/s00453-020-00775-7
SN  - 0178-4617
SN  - 1432-0541
VL  - 83
IS  - 10
SP  - 3017
EP  - 3058
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Doerr, Benjamin
A1  - Kötzing, Timo
A1  - Lagodzinski, Gregor J. A.
A1  - Lengler, Johannes
T1  - The impact of lexicographic parsimony pressure for ORDER/MAJORITY on the run time
JF  - Theoretical computer science : the journal of the EATCS
N2  - While many optimization problems work with a fixed number of decision variables and thus a fixed-length representation of possible solutions, genetic programming (GP) works on variable-length representations. A naturally occurring problem is that of bloat, that is, the unnecessary growth of solution lengths, which may slow down the optimization process. So far, the mathematical runtime analysis could not deal well with bloat and required explicit assumptions limiting bloat. 
In this paper, we provide the first mathematical runtime analysis of a GP algorithm that does not require any assumptions on the bloat. Previous performance guarantees were only proven conditionally for runs in which no strong bloat occurs. Together with improved analyses for the case with bloat restrictions our results show that such assumptions on the bloat are not necessary and that the algorithm is efficient without explicit bloat control mechanism. 
More specifically, we analyzed the performance of the (1 + 1) GP on the two benchmark functions ORDER and MAJORITY. When using lexicographic parsimony pressure as bloat control, we show a tight runtime estimate of O(T-init + nlogn) iterations both for ORDER and MAJORITY. For the case without bloat control, the bounds O(T-init logT(i)(nit) + n(logn)(3)) and Omega(T-init + nlogn) (and Omega(T-init log T-init) for n = 1) hold for MAJORITY(1).
KW  - genetic programming
KW  - bloat control
KW  - theory
KW  - runtime analysis
Y1  - 2020
U6  - https://doi.org/10.1016/j.tcs.2020.01.011
SN  - 0304-3975
SN  - 1879-2294
VL  - 816
SP  - 144
EP  - 168
PB  - Elsevier
CY  - Amsterdam [u.a.]
ER  - 
TY  - JOUR
A1  - Doerr, Benjamin
A1  - Neumann, Frank
A1  - Sutton, Andrew M.
T1  - Time Complexity Analysis of Evolutionary Algorithms on Random Satisfiable k-CNF Formulas
JF  - Algorithmica : an international journal in computer science
N2  - We contribute to the theoretical understanding of randomized search heuristics by investigating their optimization behavior on satisfiable random k-satisfiability instances both in the planted solution model and the uniform model conditional on satisfiability. Denoting the number of variables by n, our main technical result is that the simple () evolutionary algorithm with high probability finds a satisfying assignment in time when the clause-variable density is at least logarithmic. For low density instances, evolutionary algorithms seem to be less effective, and all we can show is a subexponential upper bound on the runtime for densities below . We complement these mathematical results with numerical experiments on a broader density spectrum. They indicate that, indeed, the () EA is less efficient on lower densities. Our experiments also suggest that the implicit constants hidden in our main runtime guarantee are low. Our main result extends and considerably improves the result obtained by Sutton and Neumann (Lect Notes Comput Sci 8672:942-951, 2014) in terms of runtime, minimum density, and clause length. These improvements are made possible by establishing a close fitness-distance correlation in certain parts of the search space. This approach might be of independent interest and could be useful for other average-case analyses of randomized search heuristics. While the notion of a fitness-distance correlation has been around for a long time, to the best of our knowledge, this is the first time that fitness-distance correlation is explicitly used to rigorously prove a performance statement for an evolutionary algorithm.
KW  - Runtime analysis
KW  - Satisfiability
KW  - Fitness-distance correlation
Y1  - 2016
U6  - https://doi.org/10.1007/s00453-016-0190-3
SN  - 0178-4617
SN  - 1432-0541
VL  - 78
SP  - 561
EP  - 586
PB  - Springer
CY  - New York
ER  -