Refine
Document Type
- Article (3)
- Monograph/Edited Volume (1)
- Conference Proceeding (1)
- Doctoral Thesis (1)
Language
- English (6)
Is part of the Bibliography
- yes (6)
Keywords
- Biodiversity Exploratories (1)
- Cloud Computing (1)
- Forschungsprojekte (1)
- Future SOC Lab (1)
- In-Memory Technologie (1)
- Information Extraction (1)
- Informationsextraktion (1)
- Linked Data (1)
- Multicore Architekturen (1)
- SOEP (1)
HPI Future SOC Lab
(2015)
Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien.
In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2015 vorgestellt. Ausgewählte Projekte stellten ihre Ergebnisse am 15. April 2015 und 4. November 2015 im Rahmen der Future SOC Lab Tag Veranstaltungen vor.
Although temporal heterogeneity is a well-accepted driver of biodiversity, effects of interannual variation in land-use intensity (LUI) have not been addressed yet. Additionally, responses to land use can differ greatly among different organisms; therefore, overall effects of land-use on total local biodiversity are hardly known. To test for effects of LUI (quantified as the combined intensity of fertilization, grazing, and mowing) and interannual variation in LUI (SD in LUI across time), we introduce a unique measure of whole-ecosystem biodiversity, multidiversity. This synthesizes individual diversity measures across up to 49 taxonomic groups of plants, animals, fungi, and bacteria from 150 grasslands. Multidiversity declined with increasing LUI among grasslands, particularly for rarer species and aboveground organisms, whereas common species and belowground groups were less sensitive. However, a high level of interannual variation in LUI increased overall multidiversity at low LUI and was even more beneficial for rarer species because it slowed the rate at which the multidiversity of rare species declined with increasing LUI. In more intensively managed grasslands, the diversity of rarer species was, on average, 18% of the maximum diversity across all grasslands when LUI was static over time but increased to 31% of the maximum when LUI changed maximally over time. In addition to decreasing overall LUI, we suggest varying LUI across years as a complementary strategy to promote biodiversity conservation.
This thesis presents novel ideas and research findings for the Web of Data – a global data space spanning many so-called Linked Open Data sources. Linked Open Data adheres to a set of simple principles to allow easy access and reuse for data published on the Web. Linked Open Data is by now an established concept and many (mostly academic) publishers adopted the principles building a powerful web of structured knowledge available to everybody. However, so far, Linked Open Data does not yet play a significant role among common web technologies that currently facilitate a high-standard Web experience. In this work, we thoroughly discuss the state-of-the-art for Linked Open Data and highlight several shortcomings – some of them we tackle in the main part of this work. First, we propose a novel type of data source meta-information, namely the topics of a dataset. This information could be published with dataset descriptions and support a variety of use cases, such as data source exploration and selection. For the topic retrieval, we present an approach coined Annotated Pattern Percolation (APP), which we evaluate with respect to topics extracted from Wikipedia portals. Second, we contribute to entity linking research by presenting an optimization model for joint entity linking, showing its hardness, and proposing three heuristics implemented in the LINked Data Alignment (LINDA) system. Our first solution can exploit multi-core machines, whereas the second and third approach are designed to run in a distributed shared-nothing environment. We discuss and evaluate the properties of our approaches leading to recommendations which algorithm to use in a specific scenario. The distributed algorithms are among the first of their kind, i.e., approaches for joint entity linking in a distributed fashion. Also, we illustrate that we can tackle the entity linking problem on the very large scale with data comprising more than 100 millions of entity representations from very many sources. Finally, we approach a sub-problem of entity linking, namely the alignment of concepts. We again target a method that looks at the data in its entirety and does not neglect existing relations. Also, this concept alignment method shall execute very fast to serve as a preprocessing for further computations. Our approach, called Holistic Concept Matching (HCM), achieves the required speed through grouping the input by comparing so-called knowledge representations. Within the groups, we perform complex similarity computations, relation conclusions, and detect semantic contradictions. The quality of our result is again evaluated on a large and heterogeneous dataset from the real Web. In summary, this work contributes a set of techniques for enhancing the current state of the Web of Data. All approaches have been tested on large and heterogeneous real-world input.
Roughly every third Wikipedia article contains an infobox - a table that displays important facts about the subject in attribute-value form. The schema of an infobox, i.e., the attributes that can be expressed for a concept, is defined by an infobox template. Often, authors do not specify all template attributes, resulting in incomplete infoboxes. With iPopulator, we introduce a system that automatically populates infoboxes of Wikipedia articles by extracting attribute values from the article's text. In contrast to prior work, iPopulator detects and exploits the structure of attribute values for independently extracting value parts. We have tested iPopulator on the entire set of infobox templates and provide a detailed analysis of its effectiveness. For instance, we achieve an average extraction precision of 91% for 1,727 distinct infobox template attributes.
SOEP-LEE2
(2023)
This article presents the new linked employee-employer study of the Socio-Economic Panel (SOEP-LEE2), which offers new research opportunities for various academic fields. In particular, the study contains two waves of an employer survey for persons in dependent work that is also linkable to the SOEP, a large representative German annual household panel (SOEP-LEE2-Core). Moreover, SOEP-LEE2 includes two waves of self-employed surveys based on self-employed in the SOEP-Core (SOEP-LEE2-Self-employed) and three additional representative employer surveys, independent of the SOEP in terms of sampling employers (SOEP-LEE2-Compare). Survey topics include digitalisation and cybersecurity, human capital formation, COVID-19, and human resource management. Here, we describe the content, survey design, and comparability of the different datasets in the SOEP-LEE2 to potential users in different disciplines of research.
Improved measurements of the photospheric and chromospheric three-dimensional magnetic and flow fields are crucial for a precise determination of the origin and evolution of active regions. We present an illustrative sample of multi-instrument data acquired during a two-week coordinated observing campaign in August 2015 involving, among others, the GREGOR solar telescope (imaging and near-infrared spectroscopy) and the space missions Solar Dynamics Observatory (SDO) and Interface Region Imaging Spectrograph (IRIS). The observations focused on the trailing part of active region NOAA 12396 with complex polarity inversion lines and strong intrusions of opposite polarity flux. The GREGOR Infrared Spectrograph (GRIS) provided Stokes IQUV spectral profiles in the photospheric Si i.1082.7 nm line, the chromospheric He I lambda 1083.0 nm triplet, and the photospheric Ca I lambda 1083.9 nm line. Carefully calibrated GRIS scans of the active region provided maps of Doppler velocity and magnetic field at different atmospheric heights. We compare quick-look maps with those obtained with the " Stokes Inversions based on Response functions" (SIR) code, which furnishes deeper insight into the magnetic properties of the region. We find supporting evidence that newly emerging flux and intruding opposite polarity flux are hampering the formation of penumbrae, i.e., a penumbra fully surrounding a sunspot is only expected after cessation of flux emergence in proximity to the sunspots. (C) 2016 WILEY-VCH Verlag GmbH& Co.KGaA, Weinheim