Refine
Year of publication
Document Type
- Article (40)
- Monograph/Edited Volume (10)
- Other (3)
- Postprint (1)
- Preprint (1)
Language
- English (55) (remove)
Is part of the Bibliography
- yes (55)
Keywords
- radiation mechanisms: non-thermal (8)
- gamma rays: galaxies (6)
- galaxies: active (5)
- gamma rays: general (5)
- ISM: supernova remnants (4)
- data profiling (4)
- Datenintegration (3)
- duplicate detection (3)
- similarity measures (3)
- Data Integration (2)
Unique column combinations (UCCs) are a fundamental concept in relational databases. They identify entities in the data and support various data management activities. Still, UCCs are usually not explicitly defined and need to be discovered. State-of-the-art data profiling algorithms are able to efficiently discover UCCs in moderately sized datasets, but they tend to fail on large and, in particular, on wide datasets due to run time and memory limitations. <br /> In this paper, we introduce HPIValid, a novel UCC discovery algorithm that implements a faster and more resource-saving search strategy. HPIValid models the metadata discovery as a hitting set enumeration problem in hypergraphs. In this way, it combines efficient discovery techniques from data profiling research with the most recent theoretical insights into enumeration algorithms. Our evaluation shows that HPIValid is not only orders of magnitude faster than related work, it also has a much smaller memory footprint.
Primary keys (PKs) and foreign keys (FKs) are important elements of relational schemata in various applications, such as query optimization and data integration. However, in many cases, these constraints are unknown or not documented. Detecting them manually is time-consuming and even infeasible in large-scale datasets. We study the problem of discovering primary keys and foreign keys automatically and propose an algorithm to detect both, namely Holistic Primary Key and Foreign Key Detection (HoPF). PKs and FKs are subsets of the sets of unique column combinations (UCCs) and inclusion dependencies (INDs), respectively, for which efficient discovery algorithms are known. Using score functions, our approach is able to effectively extract the true PKs and FKs from the vast sets of valid UCCs and INDs. Several pruning rules are employed to speed up the procedure. We evaluate precision and recall on three benchmarks and two real-world datasets. The results show that our method is able to retrieve on average 88% of all primary keys, and 91% of all foreign keys. We compare the performance of HoPF with two baseline approaches that both assume the existence of primary keys.
How inclusive are we?
(2022)
ACM SIGMOD, VLDB and other database organizations have committed to fostering an inclusive and diverse community, as do many other scientific organizations. Recently, different measures have been taken to advance these goals, especially for underrepresented groups. One possible measure is double-blind reviewing, which aims to hide gender, ethnicity, and other properties of the authors. <br /> We report the preliminary results of a gender diversity analysis of publications of the database community across several peer-reviewed venues, and also compare women's authorship percentages in both single-blind and double-blind venues along the years. We also obtained a cross comparison of the obtained results in data management with other relevant areas in Computer Science.
Introducing the CTA concept
(2013)
The Cherenkov Telescope Array (CTA) is a new observatory for very high-energy (VHE) gamma rays. CTA has ambitions science goals, for which it is necessary to achieve full-sky coverage, to improve the sensitivity by about an order of magnitude, to span about four decades of energy, from a few tens of GeV to above 100 TeV with enhanced angular and energy resolutions over existing VHE gamma-ray observatories. An international collaboration has formed with more than 1000 members from 27 countries in Europe, Asia, Africa and North and South America. In 2010 the CTA Consortium completed a Design Study and started a three-year Preparatory Phase which leads to production readiness of CTA in 2014. In this paper we introduce the science goals and the concept of CTA, and provide an overview of the project.
The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity-duplicates-into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. <br /> We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.
MDedup
(2020)
Duplicate detection is an integral part of data cleaning and serves to identify multiple representations of same real-world entities in (relational) datasets. Existing duplicate detection approaches are effective, but they are also hard to parameterize or require a lot of pre-labeled training data. Both parameterization and pre-labeling are at least domain-specific if not dataset-specific, which is a problem if a new dataset needs to be cleaned.
For this reason, we propose a novel, rule-based and fully automatic duplicate detection approach that is based on matching dependencies (MDs). Our system uses automatically discovered MDs, various dataset features, and known gold standards to train a model that selects MDs as duplicate detection rules. Once trained, the model can select useful MDs for duplicate detection on any new dataset. To increase the generally low recall of MD-based data cleaning approaches, we propose an additional boosting step. Our experiments show that this approach reaches up to 94% F-measure and 100% precision on our evaluation datasets, which are good numbers considering that the system does not require domain or target data-specific configuration.
Data analytics are moving beyond the limits of a single data processing platform. A cross-platform query optimizer is necessary to enable applications to run their tasks over multiple platforms efficiently and in a platform-agnostic manner. For the optimizer to be effective, it must consider data movement costs across different data processing platforms. In this paper, we present the graph-based data movement strategy used by RHEEM, our open-source cross-platform system. In particular, we (i) model the data movement problem as a new graph problem, which we prove to be NP-hard, and (ii) propose a novel graph exploration algorithm, which allows RHEEM to discover multiple hidden opportunities for cross-platform data processing.
Aims. Previous observations with the High Energy Stereoscopic System (H.E.S.S.) have revealed an extended very-high-energy (VHE; E > 100 GeV) gamma-ray source, HESS J1834-087, coincident with the supernova remnant (SNR) W41. The origin of the gamma-ray emission was investigated in more detail with the H.E.S.S. array and the Large Area Telescope (LAT) onboard the Fermi Gamma-ray Space Telescope.
Methods. The gamma-ray data provided by 61 h of observations with H.E.S.S., and four years with the Fermi LAT were analyzed, covering over five decades in energy from 1.8 GeV up to 30 TeV. The morphology and spectrum of the TeV and GeV sources were studied and multiwavelength data were used to investigate the origin of the gamma-ray emission toward W41.
Results. The TeV source can be modeled with a sum of two components: one point-like and one significantly extended (sigma(TeV) = 0.17 degrees +/- 0.01 degrees), both centered on SNR W41 and exhibiting spectra described by a power law with index Gamma(TeV) similar or equal to 2.6. The GeV source detected with Fermi LAT is extended (sigma(GeV) = 0.15 degrees +/- 0.03 degrees) and morphologically matches the VHE emission. Its spectrum can be described by a power-law model with an index Gamma(GeV) = 2.15 +/- 0.12 and smoothly joins the spectrum of the whole TeV source. A break appears in the gamma-ray spectra around 100 GeV. No pulsations were found in the GeV range.
Conclusions. Two main scenarios are proposed to explain the observed emission: a pulsar wind nebula (PWN) or the interaction of SNR W41 with an associated molecular cloud. X-ray observations suggest the presence of a point-like source (a pulsar candidate) near the center of the remnant and nonthermal X-ray diffuse emission that could arise from the possibly associated PWN. The PWN scenario is supported by the compatible positions of the TeV and GeV sources with the putative pulsar. However, the spectral energy distribution from radio to gamma-rays is reproduced by a one-zone leptonic model only if an excess of low-energy electrons is injected following a Maxwellian distribution by a pulsar with a high spin-down power (> 10(37) erg s(-1)). This additional low-energy component is not needed if we consider that the point-like TeV source is unrelated to the extended GeV and TeV sources. The interacting SNR scenario is supported by the spatial coincidence between the gamma-ray sources, the detection of OH (1720 MHz) maser lines, and the hadronic modeling.
Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application. Commonly used technologies, such as J2EE and .NET, form de facto standards for the realization of complex distributed systems. Evolution of component systems has lead to web services and service-based architectures. This has been manifested in a multitude of industry standards and initiatives such as XML, WSDL UDDI, SOAP, etc. All these achievements lead to a new and promising paradigm in IT systems engineering which proposes to design complex software solutions as collaboration of contractually defined software services. Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns. The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the Research Scholl, this technical report covers a wide range of research topics. These include but are not limited to: Self-Adaptive Service-Oriented Systems, Operating System Support for Service-Oriented Systems, Architecture and Modeling of Service-Oriented Systems, Adaptive Process Management, Services Composition and Workflow Planning, Security Engineering of Service-Based IT Systems, Quantitative Analysis and Optimization of Service-Oriented Systems, Service-Oriented Systems in 3D Computer Graphics sowie Service-Oriented Geoinformatics.
Proceedings of the HPI Research School on Service-oriented Systems Engineering 2020 Fall Retreat
(2021)
Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application.
Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns.
The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the research school, this technical report covers a wide range of topics. These include but are not limited to: Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; and Services Specification, Composition, and Enactment.
RHEEMix in the data jungle
(2020)
Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.
RHEEMix in the data jungle
(2020)
Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.
Context. Very-high-energy (VHE; E > 100 GeV) gamma-ray emission from blazars inevitably gives rise to electron-positron pair production through the interaction of these gamma-rays with the extragalactic background light (EBL). Depending on the magnetic fields in the proximity of the source, the cascade initiated from pair production can result in either an isotropic halo around an initially- beamed source or a magnetically- broadened cascade :aux.
Aims. Both extended pair-halo (PH) and magnetically broadened cascade (MBC) emission from regions surrounding the blazars 1ES 1101-232, IRS 0229+200, and PKS 2155-304 were searched for using VHE y-ray data taken with the High Energy Stereoscopic System (HESS.) and high-energy (HE; 100 MeV < E < 100 GeV) gamma-ray data with the Fermi Large Area Telescope (LAT).
Methods. By comparing the angular distributions of the reconstructed gamma-ray events to the angular profiles calculated from detailed theoretical models, the presence of PH and MBC was investigated.
Results. Upper limits on the extended emission around lES 1101-232, lES 0229+200, and PKS 2155-304 are found to be at a level of a few per cent of the Crab nebula flux above 1 TeV, depending on the assumed photon index of the cascade emission. Assuming strong extra-Galactic magnetic field (EGME) values, >10(-12) G, this limits the production of pair haloes developing from electromagnetic cascades. For weaker magnetic fields, in which electromagnetic cascades would result in MBCs. EGMF strengths in the range (0.3-3) x 10(-15) G were excluded for PKS 2155-304 at the 99% confidence level, under the assumption of a 1 Mpc coherence length.
Gamma-ray line signatures can be expected in the very-high-energy (E-gamma > 100 GeV) domain due to self-annihilation or decay of dark matter (DM) particles in space. Such a signal would be readily distinguishable from astrophysical gamma-ray sources that in most cases produce continuous spectra that span over several orders of magnitude in energy. Using data collected with the H. E. S. S. gamma-ray instrument, upper limits on linelike emission are obtained in the energy range between similar to 500 GeV and similar to 25 TeV for the central part of the Milky Way halo and for extragalactic observations, complementing recent limits obtained with the Fermi-LAT instrument at lower energies. No statistically significant signal could be found. For monochromatic gamma-ray line emission, flux limits of (2 x 10(-7)-2 x 10(-5)) m(-2)s(-1)sr(-1) and (1 x 10(-8)- 2 x 10(-6)) m(-2)s(-1)sr(-1) are obtained for the central part of the Milky Way halo and extragalactic observations, respectively. For a DM particle mass of 1 TeV, limits on the velocity- averaged DM annihilation cross section <sigma upsilon >(chi chi ->gamma gamma) reach similar to 10(-27)cm(3)s(-1), based on the Einasto parametrization of the Galactic DM halo density profile. DOI: 10.1103/PhysRevLett.110.041301
Search for TeV Gamma-ray emission from GRB 100621A, an extremely bright GRB in X-rays, with HESS
(2014)
The long gamma-ray burst (GRB) 100621A, at the time the brightest X-ray transient ever detected by Swift-XRT in the 0.3-10 keV range, has been observed with the H.E.S.S. imaging air Cherenkov telescope array, sensitive to gamma radiation in the very-high-energy (VHE, >100 GeV) regime. Due to its relatively small redshift of z similar to 0.5, the favourable position in the southern sky and the relatively short follow-up time (<700 s after the satellite trigger) of the H.E.S.S. observations, this GRB could be within the sensitivity reach of the HESS. instrument. The analysis of the HESS. data shows no indication of emission and yields an integral flux upper limit above similar to 380 GeV of 4.2 x 10(-12) cm(-2) s(-1) s (95% confidence level), assuming a simple Band function extension model. A comparison to a spectral-temporal model, normalised to the prompt flux at sub-MeV energies, constraints the existence of a temporally extended and strong additional hard power law, as has been observed in the other bright X-ray GRB 130427A. A comparison between the HESS. upper limit and the contemporaneous energy output in X-rays constrains the ratio between the X-ray and VHE gamma-ray fluxes to be greater than 0.4. This value is an important quantity for modelling the afterglow and can constrain leptonic emission scenarios, where leptons are responsible for the X-ray emission and might produce VHE gamma rays.
Context. Globular clusters (GCs) are established emitters of high-energy (HE, 100 MeV < E < 100 GeV) gamma-ray radiation which could originate from the cumulative emission of the numerous millisecond pulsars (msPSRs) in the clusters’ cores or from inverse Compton (IC) scattering of relativistic leptons accelerated in the GC environment. These stellar clusters could also constitute a new class of sources in the very-high-energy (VHE, E > 100 GeV) gamma-ray regime, judging from the recent detection of a signal from the direction of Terzan 5 with the H.E.S.S. telescope array. Aims. To search for VHE gamma-ray sources associated with other GCs, and to put constraints on leptonic emission models, we systematically analyzed the observations towards 15 GCs taken with the H. E. S. S. array of imaging atmospheric Cherenkov telescopes. Methods. We searched for point-like and extended VHE gamma-ray emission from each GC in our sample and also performed a stacking analysis combining the data from all GCs to investigate the hypothesis of a population of faint emitters. Assuming IC emission as the origin of the VHE gamma-ray signal from the direction of Terzan 5, we calculated the expected gamma-ray flux from each of the 15 GCs, based on their number of millisecond pulsars, their optical brightness and the energy density of background photon fields. Results. We did not detect significant VHE gamma-ray emission from any of the 15 GCs in either of the two analyses. Given the uncertainties related to the parameter determinations, the obtained flux upper limits allow to rule out the simple IC/msPSR scaling model for NGC6388 and NGC7078. The upper limits derived from the stacking analyses are factors between 2 and 50 below the flux predicted by the simple leptonic scaling model, depending on the assumed source extent and the dominant target photon fields. Therefore, Terzan 5 still remains exceptional among all GCs, as the VHE gamma-ray emission either arises from extra-ordinarily efficient leptonic processes, or from a recent catastrophic event, or is even unrelated to the GC itself.
Duplicate detection consists in determining different representations of real-world objects in a database. Recent research has considered the use of relationships among object representations to improve duplicate detection. In the general case where relationships form a graph, research has mainly focused on duplicate detection quality/effectiveness. Scalability has been neglected so far, even though it is crucial for large real-world duplicate detection tasks. In this paper we scale up duplicate detection in graph data (DDG) to large amounts of data and pairwise comparisons, using the support of a relational database system. To this end, we first generalize the process of DDG. We then present how to scale algorithms for DDG in space (amount of data processed with limited main memory) and in time. Finally, we explore how complex similarity computation can be performed efficiently. Experiments on data an order of magnitude larger than data considered so far in DDG clearly show that our methods scale to large amounts of data not residing in main memory.
TeV gamma-ray observations of the young synchrotron-dominated SNRs G1.9+0.3 and G330.2+1.0 with HESS
(2014)
The non-thermal nature of the X-ray emission from the shell-type supernova remnants (SNRs) G1.9+0.3 and G330.2+1.0 is an indication of intense particle acceleration in the shock fronts of both objects. This suggests that the SNRs are prime candidates for very-high-energy (VHE; E > 0.1 TeV) gamma-ray observations. G1.9+0.3, recently established as the youngest known SNR in the Galaxy, also offers a unique opportunity to study the earliest stages of SNR evolution in the VHE domain. The purpose of this work is to probe the level of VHE gamma-ray emission from both SNRs and use this to constrain their physical properties. Observations were conducted with the H. E. S. S. (High Energy Stereoscopic System) Cherenkov Telescope Array over a more than six-year period spanning 2004-2010. The obtained data have effective livetimes of 67 h for G1.9+0.3 and 16 h for G330.2+1.0. The data are analysed in the context of the multiwavelength observations currently available and in the framework of both leptonic and hadronic particle acceleration scenarios. No significant gamma-ray signal from G1.9+0.3 or G330.2+1.0 was detected. Upper limits (99 per cent confidence level) to the TeV flux from G1.9+0.3 and G330.2+1.0 for the assumed spectral index Gamma = 2.5 were set at 5.6 x 10(-1)3 cm(-2) s(-1) above 0.26 TeV and 3.2 x 10(-12) cm(-2) s(-1) above 0.38 TeV, respectively. In a one-zone leptonic scenario, these upper limits imply lower limits on the interior magnetic field to B-G1.9 greater than or similar to 12 mu G for G1.9+0.3 and to B-G330 greater than or similar to 8 mu G for G330.2+1.0. In a hadronic scenario, the low ambient densities and the large distances to the SNRs result in very low predicted fluxes, for which the H.E.S.S. upper limits are not constraining.
The 2010 very high energy gamma-ray flare and 10 years ofmulti-wavelength oservations of M 87
(2012)
The giant radio galaxy M 87 with its proximity (16 Mpc), famous jet, and very massive black hole ((3-6) x 10(9) M-circle dot) provides a unique opportunity to investigate the origin of very high energy (VHE; E > 100 GeV) gamma-ray emission generated in relativistic outflows and the surroundings of supermassive black holes. M 87 has been established as a VHE gamma-ray emitter since 2006. The VHE gamma-ray emission displays strong variability on timescales as short as a day. In this paper, results from a joint VHE monitoring campaign on M 87 by the MAGIC and VERITAS instruments in 2010 are reported. During the campaign, a flare at VHE was detected triggering further observations at VHE (H.E.S.S.), X-rays (Chandra), and radio (43 GHz Very Long Baseline Array, VLBA). The excellent sampling of the VHE gamma-ray light curve enables one to derive a precise temporal characterization of the flare: the single, isolated flare is well described by a two-sided exponential function with significantly different flux rise and decay times of tau(rise)(d) = (1.69 +/- 0.30) days and tau(decay)(d) = (0.611 +/- 0.080) days, respectively. While the overall variability pattern of the 2010 flare appears somewhat different from that of previous VHE flares in 2005 and 2008, they share very similar timescales (similar to day), peak fluxes (Phi(>0.35 TeV) similar or equal to (1-3) x 10(-11) photons cm(-2) s(-1)), and VHE spectra. VLBA radio observations of 43 GHz of the inner jet regions indicate no enhanced flux in 2010 in contrast to observations in 2008, where an increase of the radio flux of the innermost core regions coincided with a VHE flare. On the other hand, Chandra X-ray observations taken similar to 3 days after the peak of the VHE gamma-ray emission reveal an enhanced flux from the core (flux increased by factor similar to 2; variability timescale <2 days). The long-term (2001-2010) multi-wavelength (MWL) light curve of M 87, spanning from radio to VHE and including data from Hubble Space Telescope, Liverpool Telescope, Very Large Array, and European VLBI Network, is used to further investigate the origin of the VHE gamma-ray emission. No unique, common MWL signature of the three VHE flares has been identified. In the outer kiloparsec jet region, in particular in HST-1, no enhanced MWL activity was detected in 2008 and 2010, disfavoring it as the origin of the VHE flares during these years. Shortly after two of the three flares (2008 and 2010), the X-ray core was observed to be at a higher flux level than its characteristic range (determined from more than 60 monitoring observations: 2002-2009). In 2005, the strong flux dominance of HST-1 could have suppressed the detection of such a feature. Published models for VHE gamma-ray emission from M 87 are reviewed in the light of the new data.