TY - JOUR A1 - Hempel, Sabrina A1 - Adolphs, Julian A1 - Landwehr, Niels A1 - Willink, Dilya A1 - Janke, David A1 - Amon, Thomas T1 - Supervised machine learning to assess methane emissions of a dairy building with natural ventilation JF - Applied Sciences N2 - A reliable quantification of greenhouse gas emissions is a basis for the development of adequate mitigation measures. Protocols for emission measurements and data analysis approaches to extrapolate to accurate annual emission values are a substantial prerequisite in this context. We systematically analyzed the benefit of supervised machine learning methods to project methane emissions from a naturally ventilated cattle building with a concrete solid floor and manure scraper located in Northern Germany. We took into account approximately 40 weeks of hourly emission measurements and compared model predictions using eight regression approaches, 27 different sampling scenarios and four measures of model accuracy. Data normalization was applied based on median and quartile range. A correlation analysis was performed to evaluate the influence of individual features. This indicated only a very weak linear relation between the methane emission and features that are typically used to predict methane emission values of naturally ventilated barns. It further highlighted the added value of including day-time and squared ambient temperature as features. The error of the predicted emission values was in general below 10%. The results from Gaussian processes, ordinary multilinear regression and neural networks were least robust. More robust results were obtained with multilinear regression with regularization, support vector machines and particularly the ensemble methods gradient boosting and random forest. The latter had the added value to be rather insensitive against the normalization procedure. In the case of multilinear regression, also the removal of not significantly linearly related variables (i.e., keeping only the day-time component) led to robust modeling results. We concluded that measurement protocols with 7 days and six measurement periods can be considered sufficient to model methane emissions from the dairy barn with solid floor with manure scraper, particularly when periods are distributed over the year with a preference for transition periods. Features should be normalized according to median and quartile range and must be carefully selected depending on the modeling approach. KW - greenhouse gas KW - on-farm evaluation KW - emission factor KW - regression KW - ensemble methods KW - gradient boosting KW - random forest KW - neural networks KW - support vector machines Y1 - 2020 U6 - https://doi.org/10.3390/app10196938 SN - 2076-3417 VL - 10 IS - 19 PB - MDPI CY - Basel ER - TY - JOUR A1 - Christian, Jan-Ole A1 - Braginets, Rostyslav A1 - Schulze, Waltraud X. A1 - Walther, Dirk T1 - Characterization and prediction of protein phosphorylation hotspots in Arabidopsis thaliana JF - Frontiers in plant science N2 - The regulation of protein function by modulating the surface charge status via sequence-locally enriched phosphorylation sites (P-sites) in so called phosphorylation "hotspots" has gained increased attention in recent years. We set out to identify P-hotspots in the model plant Arabidopsis thaliana. We analyzed the spacing of experimentally detected P-sites within peptide-covered regions along Arabidopsis protein sequences as available from the PhosPhAt database. Confirming earlier reports (Schweiger and Lanial, 2010), we found that, indeed, P-sites tend to cluster and that distributions between serine and threonine P-sites to their respected closest next P-site differ significantly from those for tyrosine P-sites. The ability to predict P-hotspots by applying available computational P-site prediction programs that focus on identifying single P-sites was observed to be severely compromised by the inevitable interference of nearby P-sites. We devised a new approach, named HotSPotter, for the prediction of phosphorylation hotspots. HotSPotter is based primarily on local amino acid compositional preferences rather than sequence position-specific motifs and uses support vector machines as the underlying classification engine. HotSPotter correctly identified experimentally determined phosphorylation hotspots in A. thaliana with high accuracy. Applied to the Arabidopsis proteome, HotSPotter-predicted 13,677 candidate P-hotspots in 9,599 proteins corresponding to 7,847 unique genes. Hotspot containing proteins are involved predominantly in signaling processes confirming the surmised modulating role of hotspots in signaling and interaction events. Our study provides new bioinformatics means to identify phosphorylation hotspots and lays the basis for further investigating novel candidate P-hotspots. All phosphorylation hotspot annotations and predictions have been made available as part of the PhosPhAt database at http://phosphat.mpimp-golm.mpg.de. KW - protein phosphorylation KW - hotspots KW - Arabidopsis thaliana KW - support vector machines KW - regulation Y1 - 2012 U6 - https://doi.org/10.3389/fpls.2012.00207 SN - 1664-462X VL - 3 PB - Frontiers Research Foundation CY - Lausanne ER -