Refine
Year of publication
Document Type
- Article (44)
- Monograph/Edited Volume (10)
- Other (3)
- Postprint (1)
- Preprint (1)
Language
- English (59) (remove)
Is part of the Bibliography
- yes (59)
Keywords
- radiation mechanisms: non-thermal (8)
- gamma rays: galaxies (6)
- galaxies: active (5)
- gamma rays: general (5)
- ISM: supernova remnants (4)
- data profiling (4)
- Datenintegration (3)
- duplicate detection (3)
- similarity measures (3)
- Data Integration (2)
Data analytics are moving beyond the limits of a single data processing platform. A cross-platform query optimizer is necessary to enable applications to run their tasks over multiple platforms efficiently and in a platform-agnostic manner. For the optimizer to be effective, it must consider data movement costs across different data processing platforms. In this paper, we present the graph-based data movement strategy used by RHEEM, our open-source cross-platform system. In particular, we (i) model the data movement problem as a new graph problem, which we prove to be NP-hard, and (ii) propose a novel graph exploration algorithm, which allows RHEEM to discover multiple hidden opportunities for cross-platform data processing.
Any system at play in a data-driven project has a fundamental requirement: the ability to load data. The de-facto standard format to distribute and consume raw data is CSV. Yet, the plain text and flexible nature of this format make such files often difficult to parse and correctly load their content, requiring cumbersome data preparation steps. We propose a benchmark to assess the robustness of systems in loading data from non-standard CSV formats and with structural inconsistencies. First, we formalize a model to describe the issues that affect real-world files and use it to derive a systematic lpollutionz process to generate dialects for any given grammar. Our benchmark leverages the pollution framework for the csv format. To guide pollution, we have surveyed thousands of real-world, publicly available csv files, recording the problems we encountered. We demonstrate the applicability of our benchmark by testing and scoring 16 different systems: popular csv parsing frameworks, relational database tools, spreadsheet systems, and a data visualization tool.
Aims. Previous observations with the High Energy Stereoscopic System (H.E.S.S.) have revealed an extended very-high-energy (VHE; E > 100 GeV) gamma-ray source, HESS J1834-087, coincident with the supernova remnant (SNR) W41. The origin of the gamma-ray emission was investigated in more detail with the H.E.S.S. array and the Large Area Telescope (LAT) onboard the Fermi Gamma-ray Space Telescope.
Methods. The gamma-ray data provided by 61 h of observations with H.E.S.S., and four years with the Fermi LAT were analyzed, covering over five decades in energy from 1.8 GeV up to 30 TeV. The morphology and spectrum of the TeV and GeV sources were studied and multiwavelength data were used to investigate the origin of the gamma-ray emission toward W41.
Results. The TeV source can be modeled with a sum of two components: one point-like and one significantly extended (sigma(TeV) = 0.17 degrees +/- 0.01 degrees), both centered on SNR W41 and exhibiting spectra described by a power law with index Gamma(TeV) similar or equal to 2.6. The GeV source detected with Fermi LAT is extended (sigma(GeV) = 0.15 degrees +/- 0.03 degrees) and morphologically matches the VHE emission. Its spectrum can be described by a power-law model with an index Gamma(GeV) = 2.15 +/- 0.12 and smoothly joins the spectrum of the whole TeV source. A break appears in the gamma-ray spectra around 100 GeV. No pulsations were found in the GeV range.
Conclusions. Two main scenarios are proposed to explain the observed emission: a pulsar wind nebula (PWN) or the interaction of SNR W41 with an associated molecular cloud. X-ray observations suggest the presence of a point-like source (a pulsar candidate) near the center of the remnant and nonthermal X-ray diffuse emission that could arise from the possibly associated PWN. The PWN scenario is supported by the compatible positions of the TeV and GeV sources with the putative pulsar. However, the spectral energy distribution from radio to gamma-rays is reproduced by a one-zone leptonic model only if an excess of low-energy electrons is injected following a Maxwellian distribution by a pulsar with a high spin-down power (> 10(37) erg s(-1)). This additional low-energy component is not needed if we consider that the point-like TeV source is unrelated to the extended GeV and TeV sources. The interacting SNR scenario is supported by the spatial coincidence between the gamma-ray sources, the detection of OH (1720 MHz) maser lines, and the hadronic modeling.
Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application. Commonly used technologies, such as J2EE and .NET, form de facto standards for the realization of complex distributed systems. Evolution of component systems has lead to web services and service-based architectures. This has been manifested in a multitude of industry standards and initiatives such as XML, WSDL UDDI, SOAP, etc. All these achievements lead to a new and promising paradigm in IT systems engineering which proposes to design complex software solutions as collaboration of contractually defined software services. Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns. The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the Research Scholl, this technical report covers a wide range of research topics. These include but are not limited to: Self-Adaptive Service-Oriented Systems, Operating System Support for Service-Oriented Systems, Architecture and Modeling of Service-Oriented Systems, Adaptive Process Management, Services Composition and Workflow Planning, Security Engineering of Service-Based IT Systems, Quantitative Analysis and Optimization of Service-Oriented Systems, Service-Oriented Systems in 3D Computer Graphics sowie Service-Oriented Geoinformatics.
Proceedings of the HPI Research School on Service-oriented Systems Engineering 2020 Fall Retreat
(2021)
Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application.
Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns.
The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the research school, this technical report covers a wide range of topics. These include but are not limited to: Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; and Services Specification, Composition, and Enactment.
RHEEMix in the data jungle
(2020)
Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.
RHEEMix in the data jungle
(2020)
Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.
Context. Very-high-energy (VHE; E > 100 GeV) gamma-ray emission from blazars inevitably gives rise to electron-positron pair production through the interaction of these gamma-rays with the extragalactic background light (EBL). Depending on the magnetic fields in the proximity of the source, the cascade initiated from pair production can result in either an isotropic halo around an initially- beamed source or a magnetically- broadened cascade :aux.
Aims. Both extended pair-halo (PH) and magnetically broadened cascade (MBC) emission from regions surrounding the blazars 1ES 1101-232, IRS 0229+200, and PKS 2155-304 were searched for using VHE y-ray data taken with the High Energy Stereoscopic System (HESS.) and high-energy (HE; 100 MeV < E < 100 GeV) gamma-ray data with the Fermi Large Area Telescope (LAT).
Methods. By comparing the angular distributions of the reconstructed gamma-ray events to the angular profiles calculated from detailed theoretical models, the presence of PH and MBC was investigated.
Results. Upper limits on the extended emission around lES 1101-232, lES 0229+200, and PKS 2155-304 are found to be at a level of a few per cent of the Crab nebula flux above 1 TeV, depending on the assumed photon index of the cascade emission. Assuming strong extra-Galactic magnetic field (EGME) values, >10(-12) G, this limits the production of pair haloes developing from electromagnetic cascades. For weaker magnetic fields, in which electromagnetic cascades would result in MBCs. EGMF strengths in the range (0.3-3) x 10(-15) G were excluded for PKS 2155-304 at the 99% confidence level, under the assumption of a 1 Mpc coherence length.
Gamma-ray line signatures can be expected in the very-high-energy (E-gamma > 100 GeV) domain due to self-annihilation or decay of dark matter (DM) particles in space. Such a signal would be readily distinguishable from astrophysical gamma-ray sources that in most cases produce continuous spectra that span over several orders of magnitude in energy. Using data collected with the H. E. S. S. gamma-ray instrument, upper limits on linelike emission are obtained in the energy range between similar to 500 GeV and similar to 25 TeV for the central part of the Milky Way halo and for extragalactic observations, complementing recent limits obtained with the Fermi-LAT instrument at lower energies. No statistically significant signal could be found. For monochromatic gamma-ray line emission, flux limits of (2 x 10(-7)-2 x 10(-5)) m(-2)s(-1)sr(-1) and (1 x 10(-8)- 2 x 10(-6)) m(-2)s(-1)sr(-1) are obtained for the central part of the Milky Way halo and extragalactic observations, respectively. For a DM particle mass of 1 TeV, limits on the velocity- averaged DM annihilation cross section <sigma upsilon >(chi chi ->gamma gamma) reach similar to 10(-27)cm(3)s(-1), based on the Einasto parametrization of the Galactic DM halo density profile. DOI: 10.1103/PhysRevLett.110.041301
Search for TeV Gamma-ray emission from GRB 100621A, an extremely bright GRB in X-rays, with HESS
(2014)
The long gamma-ray burst (GRB) 100621A, at the time the brightest X-ray transient ever detected by Swift-XRT in the 0.3-10 keV range, has been observed with the H.E.S.S. imaging air Cherenkov telescope array, sensitive to gamma radiation in the very-high-energy (VHE, >100 GeV) regime. Due to its relatively small redshift of z similar to 0.5, the favourable position in the southern sky and the relatively short follow-up time (<700 s after the satellite trigger) of the H.E.S.S. observations, this GRB could be within the sensitivity reach of the HESS. instrument. The analysis of the HESS. data shows no indication of emission and yields an integral flux upper limit above similar to 380 GeV of 4.2 x 10(-12) cm(-2) s(-1) s (95% confidence level), assuming a simple Band function extension model. A comparison to a spectral-temporal model, normalised to the prompt flux at sub-MeV energies, constraints the existence of a temporally extended and strong additional hard power law, as has been observed in the other bright X-ray GRB 130427A. A comparison between the HESS. upper limit and the contemporaneous energy output in X-rays constrains the ratio between the X-ray and VHE gamma-ray fluxes to be greater than 0.4. This value is an important quantity for modelling the afterglow and can constrain leptonic emission scenarios, where leptons are responsible for the X-ray emission and might produce VHE gamma rays.
Context. Globular clusters (GCs) are established emitters of high-energy (HE, 100 MeV < E < 100 GeV) gamma-ray radiation which could originate from the cumulative emission of the numerous millisecond pulsars (msPSRs) in the clusters’ cores or from inverse Compton (IC) scattering of relativistic leptons accelerated in the GC environment. These stellar clusters could also constitute a new class of sources in the very-high-energy (VHE, E > 100 GeV) gamma-ray regime, judging from the recent detection of a signal from the direction of Terzan 5 with the H.E.S.S. telescope array. Aims. To search for VHE gamma-ray sources associated with other GCs, and to put constraints on leptonic emission models, we systematically analyzed the observations towards 15 GCs taken with the H. E. S. S. array of imaging atmospheric Cherenkov telescopes. Methods. We searched for point-like and extended VHE gamma-ray emission from each GC in our sample and also performed a stacking analysis combining the data from all GCs to investigate the hypothesis of a population of faint emitters. Assuming IC emission as the origin of the VHE gamma-ray signal from the direction of Terzan 5, we calculated the expected gamma-ray flux from each of the 15 GCs, based on their number of millisecond pulsars, their optical brightness and the energy density of background photon fields. Results. We did not detect significant VHE gamma-ray emission from any of the 15 GCs in either of the two analyses. Given the uncertainties related to the parameter determinations, the obtained flux upper limits allow to rule out the simple IC/msPSR scaling model for NGC6388 and NGC7078. The upper limits derived from the stacking analyses are factors between 2 and 50 below the flux predicted by the simple leptonic scaling model, depending on the assumed source extent and the dominant target photon fields. Therefore, Terzan 5 still remains exceptional among all GCs, as the VHE gamma-ray emission either arises from extra-ordinarily efficient leptonic processes, or from a recent catastrophic event, or is even unrelated to the GC itself.
Duplicate detection consists in determining different representations of real-world objects in a database. Recent research has considered the use of relationships among object representations to improve duplicate detection. In the general case where relationships form a graph, research has mainly focused on duplicate detection quality/effectiveness. Scalability has been neglected so far, even though it is crucial for large real-world duplicate detection tasks. In this paper we scale up duplicate detection in graph data (DDG) to large amounts of data and pairwise comparisons, using the support of a relational database system. To this end, we first generalize the process of DDG. We then present how to scale algorithms for DDG in space (amount of data processed with limited main memory) and in time. Finally, we explore how complex similarity computation can be performed efficiently. Experiments on data an order of magnitude larger than data considered so far in DDG clearly show that our methods scale to large amounts of data not residing in main memory.
TeV gamma-ray observations of the young synchrotron-dominated SNRs G1.9+0.3 and G330.2+1.0 with HESS
(2014)
The non-thermal nature of the X-ray emission from the shell-type supernova remnants (SNRs) G1.9+0.3 and G330.2+1.0 is an indication of intense particle acceleration in the shock fronts of both objects. This suggests that the SNRs are prime candidates for very-high-energy (VHE; E > 0.1 TeV) gamma-ray observations. G1.9+0.3, recently established as the youngest known SNR in the Galaxy, also offers a unique opportunity to study the earliest stages of SNR evolution in the VHE domain. The purpose of this work is to probe the level of VHE gamma-ray emission from both SNRs and use this to constrain their physical properties. Observations were conducted with the H. E. S. S. (High Energy Stereoscopic System) Cherenkov Telescope Array over a more than six-year period spanning 2004-2010. The obtained data have effective livetimes of 67 h for G1.9+0.3 and 16 h for G330.2+1.0. The data are analysed in the context of the multiwavelength observations currently available and in the framework of both leptonic and hadronic particle acceleration scenarios. No significant gamma-ray signal from G1.9+0.3 or G330.2+1.0 was detected. Upper limits (99 per cent confidence level) to the TeV flux from G1.9+0.3 and G330.2+1.0 for the assumed spectral index Gamma = 2.5 were set at 5.6 x 10(-1)3 cm(-2) s(-1) above 0.26 TeV and 3.2 x 10(-12) cm(-2) s(-1) above 0.38 TeV, respectively. In a one-zone leptonic scenario, these upper limits imply lower limits on the interior magnetic field to B-G1.9 greater than or similar to 12 mu G for G1.9+0.3 and to B-G330 greater than or similar to 8 mu G for G330.2+1.0. In a hadronic scenario, the low ambient densities and the large distances to the SNRs result in very low predicted fluxes, for which the H.E.S.S. upper limits are not constraining.
The 2010 very high energy gamma-ray flare and 10 years ofmulti-wavelength oservations of M 87
(2012)
The giant radio galaxy M 87 with its proximity (16 Mpc), famous jet, and very massive black hole ((3-6) x 10(9) M-circle dot) provides a unique opportunity to investigate the origin of very high energy (VHE; E > 100 GeV) gamma-ray emission generated in relativistic outflows and the surroundings of supermassive black holes. M 87 has been established as a VHE gamma-ray emitter since 2006. The VHE gamma-ray emission displays strong variability on timescales as short as a day. In this paper, results from a joint VHE monitoring campaign on M 87 by the MAGIC and VERITAS instruments in 2010 are reported. During the campaign, a flare at VHE was detected triggering further observations at VHE (H.E.S.S.), X-rays (Chandra), and radio (43 GHz Very Long Baseline Array, VLBA). The excellent sampling of the VHE gamma-ray light curve enables one to derive a precise temporal characterization of the flare: the single, isolated flare is well described by a two-sided exponential function with significantly different flux rise and decay times of tau(rise)(d) = (1.69 +/- 0.30) days and tau(decay)(d) = (0.611 +/- 0.080) days, respectively. While the overall variability pattern of the 2010 flare appears somewhat different from that of previous VHE flares in 2005 and 2008, they share very similar timescales (similar to day), peak fluxes (Phi(>0.35 TeV) similar or equal to (1-3) x 10(-11) photons cm(-2) s(-1)), and VHE spectra. VLBA radio observations of 43 GHz of the inner jet regions indicate no enhanced flux in 2010 in contrast to observations in 2008, where an increase of the radio flux of the innermost core regions coincided with a VHE flare. On the other hand, Chandra X-ray observations taken similar to 3 days after the peak of the VHE gamma-ray emission reveal an enhanced flux from the core (flux increased by factor similar to 2; variability timescale <2 days). The long-term (2001-2010) multi-wavelength (MWL) light curve of M 87, spanning from radio to VHE and including data from Hubble Space Telescope, Liverpool Telescope, Very Large Array, and European VLBI Network, is used to further investigate the origin of the VHE gamma-ray emission. No unique, common MWL signature of the three VHE flares has been identified. In the outer kiloparsec jet region, in particular in HST-1, no enhanced MWL activity was detected in 2008 and 2010, disfavoring it as the origin of the VHE flares during these years. Shortly after two of the three flares (2008 and 2010), the X-ray core was observed to be at a higher flux level than its characteristic range (determined from more than 60 monitoring observations: 2002-2009). In 2005, the strong flux dominance of HST-1 could have suppressed the detection of such a feature. Published models for VHE gamma-ray emission from M 87 are reviewed in the light of the new data.
The gamma-ray spectrum of the low-frequency-peaked BL Lac (LBL) object AP Librae is studied, following the discovery of very-high-energy (VHE; E > 100 GeV) gamma-ray emission up to the TeV range by the H.E.S.S. experiment. Thismakes AP Librae one of the few VHE emitters of the LBL type. The measured spectrum yields a flux of (8.8 +/- 1.5(stat) +/- 1.8(sys)) x 10(-12) cm(-2) s(-1) above 130 GeV and a spectral index of Gamma = 2.65 +/- 0.19(stat) +/- 0.20(sys). This study also makes use of Fermi-LAT observations in the high energy (HE, E > 100 MeV) range, providing the longest continuous light curve (5 years) ever published on this source. The source underwent a flaring event between MJD 56 306-56 376 in the HE range, with a flux increase of a factor of 3.5 in the 14 day bin light curve and no significant variation in spectral shape with respect to the low-flux state. While the H.E.S.S. and (low state) Fermi-LAT fluxes are in good agreement where they overlap, a spectral curvature between the steep VHE spectrum and the Fermi-LAT spectrum is observed. The maximum of the gamma-ray emission in the spectral energy distribution is located below the GeV energy range.
The task of expert finding is to rank the experts in the search space given a field of expertise as an input query. In this paper, we propose a topic modeling approach for this task. The proposed model uses latent Dirichlet allocation (LDA) to induce probabilistic topics. In the first step of our algorithm, the main topics of a document collection are extracted using LDA. The extracted topics present the connection between expert candidates and user queries. In the second step, the topics are used as a bridge to find the probability of selecting each candidate for a given query. The candidates are then ranked based on these probabilities. The experimental results on the Text REtrieval Conference (TREC) Enterprise track for 2005 and 2006 show that the proposed topic-based approach outperforms the state-of-the-art profile- and document-based models, which use information retrieval methods to rank experts. Moreover, we present the superiority of the proposed topic-based approach to the improved document-based expert finding systems, which consider additional information such as local context, candidate prior, and query expansion.
Duplicate detection algorithms produce clusters of database records, each cluster representing a single real-world entity. As most of these algorithms use pairwise comparisons, the resulting (transitive) clusters can be inconsistent: Not all records within a cluster are sufficiently similar to be classified as duplicate. Thus, one of many subsequent clustering algorithms can further improve the result. <br /> We explain in detail, compare, and evaluate many of these algorithms and introduce three new clustering algorithms in the specific context of duplicate detection. Two of our three new algorithms use the structure of the input graph to create consistent clusters. Our third algorithm, and many other clustering algorithms, focus on the edge weights, instead. For evaluation, in contrast to related work, we experiment on true real-world datasets, and in addition examine in great detail various pair-selection strategies used in practice. While no overall winner emerges, we are able to identify best approaches for different situations. In scenarios with larger clusters, our proposed algorithm, Extended Maximum Clique Clustering (EMCC), and Markov Clustering show the best results. EMCC especially outperforms Markov Clustering regarding the precision of the results and additionally has the advantage that it can also be used in scenarios where edge weights are not available.
Extract-Transform-Load (ETL) tools are used for the creation, maintenance, and evolution of data warehouses, data marts, and operational data stores. ETL workflows populate those systems with data from various data sources by specifying and executing a DAG of transformations. Over time, hundreds of individual workflows evolve as new sources and new requirements are integrated into the system. The maintenance and evolution of large-scale ETL systems requires much time and manual effort. A key problem is to understand the meaning of unfamiliar attribute labels in source and target databases and ETL transformations. Hard-to-understand attribute labels lead to frustration and time spent to develop and understand ETL workflows. We present a schema decryption technique to support ETL developers in understanding cryptic schemata of sources, targets, and ETL transformations. For a given ETL system, our recommender-like approach leverages the large number of mapped attribute labels in existing ETL workflows to produce good and meaningful decryptions. In this way we are able to decrypt attribute labels consisting of a number of unfamiliar few-letter abbreviations, such as UNP_PEN_INT, which we can decrypt to UNPAID_PENALTY_INTEREST. We evaluate our schema decryption approach on three real-world repositories of ETL workflows and show that our approach is able to suggest high-quality decryptions for cryptic attribute labels in a given schema.
VLDB 2021
(2021)
The 47th International Conference on Very Large Databases (VLDB'21) was held on August 16-20, 2021 as a hybrid conference. It attracted 180 in-person attendees in Copenhagen and 840 remote attendees. In this paper, we describe our key decisions as general chairs and program committee chairs and share the lessons we learned.