TY - GEN A1 - Hollstein, André A1 - Segl, Karl A1 - Guanter, Luis A1 - Brell, Maximilian A1 - Enesco, Marta T1 - Ready-to-Use methods for the detection of clouds, cirrus, snow, shadow, water and clear sky pixels in Sentinel-2 MSI images T2 - remote sensing N2 - Classification of clouds, cirrus, snow, shadows and clear sky areas is a crucial step in the pre-processing of optical remote sensing images and is a valuable input for their atmospheric correction. The Multi-Spectral Imager on board the Sentinel-2's of the Copernicus program offers optimized bands for this task and delivers unprecedented amounts of data regarding spatial sampling, global coverage, spectral coverage, and repetition rate. Efficient algorithms are needed to process, or possibly reprocess, those big amounts of data. Techniques based on top-of-atmosphere reflectance spectra for single-pixels without exploitation of external data or spatial context offer the largest potential for parallel data processing and highly optimized processing throughput. Such algorithms can be seen as a baseline for possible trade-offs in processing performance when the application of more sophisticated methods is discussed. We present several ready-to-use classification algorithms which are all based on a publicly available database of manually classified Sentinel-2A images. These algorithms are based on commonly used and newly developed machine learning techniques which drastically reduce the amount of time needed to update the algorithms when new images are added to the database. Several ready-to-use decision trees are presented which allow to correctly label about 91% of the spectra within a validation dataset. While decision trees are simple to implement and easy to understand, they offer only limited classification skill. It improves to 98% when the presented algorithm based on the classical Bayesian method is applied. This method has only recently been used for this task and shows excellent performance concerning classification skill and processing performance. A comparison of the presented algorithms with other commonly used techniques such as random forests, stochastic gradient descent, or support vector machines is also given. Especially random forests and support vector machines show similar classification skill as the classical Bayesian method. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 455 KW - Sentinel-2 MSI KW - cloud detection KW - snow detection KW - cirrus detection KW - shadow detection KW - Bayesian classification KW - machine learning KW - decision trees Y1 - 2018 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-407938 ER - TY - GEN A1 - Kibrik, Andrej A. A1 - Khudyakova, Mariya V. A1 - Dobrov, Grigory B. A1 - Linnik, Anastasia A1 - Zalmanov, Dmitrij A. T1 - Referential Choice BT - Predictability and Its Limits N2 - We report a study of referential choice in discourse production, understood as the choice between various types of referential devices, such as pronouns and full noun phrases. Our goal is to predict referential choice, and to explore to what extent such prediction is possible. Our approach to referential choice includes a cognitively informed theoretical component, corpus analysis, machine learning methods and experimentation with human participants. Machine learning algorithms make use of 25 factors, including referent’s properties (such as animacy and protagonism), the distance between a referential expression and its antecedent, the antecedent’s syntactic role, and so on. Having found the predictions of our algorithm to coincide with the original almost 90% of the time, we hypothesized that fully accurate prediction is not possible because, in many situations, more than one referential option is available. This hypothesis was supported by an experimental study, in which participants answered questions about either the original text in the corpus, or about a text modified in accordance with the algorithm’s prediction. Proportions of correct answers to these questions, as well as participants’ rating of the questions’ difficulty, suggested that divergences between the algorithm’s prediction and the original referential device in the corpus occur overwhelmingly in situations where the referential choice is not categorical. T3 - Zweitveröffentlichungen der Universität Potsdam : Humanwissenschaftliche Reihe - 306 KW - cross-methodological approach KW - discourse production KW - machine learning KW - non-categoricity KW - referential choice Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-100313 ER -