### Refine

#### Keywords

We propose a novel cluster-based reduced-order modelling (CROM) strategy for unsteady flows. CROM combines the cluster analysis pioneered in Gunzburger's group (Burkardt, Gunzburger & Lee, Comput. Meth. Appl. Mech. Engng, vol. 196, 2006a, pp. 337-355) and transition matrix models introduced in fluid dynamics in Eckhardt's group (Schneider, Eckhardt & Vollmer, Phys. Rev. E, vol. 75, 2007, art. 066313). CROM constitutes a potential alternative to POD models and generalises the Ulam-Galerkin method classically used in dynamical systems to determine a finite-rank approximation of the Perron-Frobenius operator. The proposed strategy processes a time-resolved sequence of flow snapshots in two steps. First, the snapshot data are clustered into a small number of representative states, called centroids, in the state space. These centroids partition the state space in complementary non-overlapping regions (centroidal Voronoi cells). Departing from the standard algorithm, the probabilities of the clusters are determined, and the states are sorted by analysis of the transition matrix. Second, the transitions between the states are dynamically modelled using a Markov process. Physical mechanisms are then distilled by a refined analysis of the Markov process, e. g. using finite-time Lyapunov exponent (FTLE) and entropic methods. This CROM framework is applied to the Lorenz attractor (as illustrative example), to velocity fields of the spatially evolving incompressible mixing layer and the three-dimensional turbulent wake of a bluff body. For these examples, CROM is shown to identify non-trivial quasi-attractors and transition processes in an unsupervised manner. CROM has numerous potential applications for the systematic identification of physical mechanisms of complex dynamics, for comparison of flow evolution models, for the identification of precursors to desirable and undesirable events, and for flow control applications exploiting nonlinear actuation dynamics.

We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.