Refine
Year of publication
Document Type
- Article (41)
- Monograph/Edited Volume (10)
- Other (3)
- Postprint (1)
- Preprint (1)
Language
- English (56) (remove)
Is part of the Bibliography
- yes (56)
Keywords
- radiation mechanisms: non-thermal (8)
- gamma rays: galaxies (6)
- galaxies: active (5)
- gamma rays: general (5)
- ISM: supernova remnants (4)
- data profiling (4)
- Datenintegration (3)
- duplicate detection (3)
- similarity measures (3)
- Data Integration (2)
Exploring Change
(2018)
Data and metadata in datasets experience many different kinds of change. Values axe inserted, deleted or updated; rows appear and disappear; columns are added or repurposed, etc. In such a dynamic situation, users might have many questions related to changes in the dataset, for instance which parts of the data are trustworthy and which are not? Users will wonder: How many changes have there been in the recent minutes, days or years? What kind of changes were made at which points of time? How dirty is the data? Is data cleansing required? The fact that data changed can hint at different hidden processes or agendas: a frequently crowd-updated city name may be controversial; a person whose name has been recently changed may be the target of vandalism; and so on. We show various use cases that benefit from recognizing and exploring such change. We envision a system and methods to interactively explore such change, addressing the variability dimension of big data challenges. To this end, we propose a model to capture change and the process of exploring dynamic data to identify salient changes. We provide exploration primitives along with motivational examples and measures for the volatility of data. We identify technical challenges that need to be addressed to make our vision a reality, and propose directions of future work for the data management community.
Roughly every third Wikipedia article contains an infobox - a table that displays important facts about the subject in attribute-value form. The schema of an infobox, i.e., the attributes that can be expressed for a concept, is defined by an infobox template. Often, authors do not specify all template attributes, resulting in incomplete infoboxes. With iPopulator, we introduce a system that automatically populates infoboxes of Wikipedia articles by extracting attribute values from the article's text. In contrast to prior work, iPopulator detects and exploits the structure of attribute values for independently extracting value parts. We have tested iPopulator on the entire set of infobox templates and provide a detailed analysis of its effectiveness. For instance, we achieve an average extraction precision of 91% for 1,727 distinct infobox template attributes.
Context. About 40% of the observation time of the High Energy Stereoscopic System (H.E.S.S.) is dedicated to studying active galactic nuclei (AGN), with the aim of increasing the sample of known extragalactic very-high-energy (VHE, E > 100 GeV) sources and constraining the physical processes at play in potential emitters.
Aims. H.E.S.S. observations of AGN, spanning a period from April 2004 to December 2011, are investigated to constrain their gamma-ray fluxes. Only the 47 sources without significant excess detected at the position of the targets are presented.
Methods. Upper limits on VHE fluxes of the targets were computed and a search for variability was performed on the nightly time scale.
Results. For 41 objects, the flux upper limits we derived are the most constraining reported to date. These constraints at VHE are compared with the flux level expected from extrapolations of Fermi-LAT measurements in the two-year catalog of AGN. The H.E.S.S. upper limits are at least a factor of two lower than the extrapolated Fermi-LAT fluxes for 11 objects Taking into account the attenuation by the extragalactic background light reduces the tension for all but two of them, suggesting intrinsic curvature in the high-energy spectra of these two AGN.
Conclusions. Compilation efforts led by current VHE instruments are of critical importance for target-selection strategies before the advent of the Cherenkov Telescope Array (CTA).
A deep observation campaign carried out by the High Energy Stereoscopic System (HESS) on Centaurus A enabled the discovery of gamma-rays from the blazar 1ES 1312-423, 2 degrees away from the radio galaxy. With a differential flux at 1 TeV of phi(1 TeV) = (1.9 +/- 0.6(stat) +/- 0.4(sys)) x 10(-13) cm(-2) s(-1) TeV-1 corresponding to 0.5 per cent of the Crab nebula differential flux and a spectral index Gamma = 2.9 +/- 0.5(stat) +/- 0.2(sys), 1ES 1312-423 is one of the faintest sources ever detected in the very high energy (E > 100 GeV) extragalactic sky. A careful analysis using three and a half years of Fermi Large Area Telescope (Fermi-LAT) data allows the discovery at high energies (E > 100 MeV) of a hard spectrum (Gamma = 1.4 +/- 0.4(stat) +/- 0.2(sys)) source coincident with 1ES 1312-423. Radio, optical, UV and X-ray observations complete the spectral energy distribution of this blazar, now covering 16 decades in energy. The emission is successfully fitted with a synchrotron self-Compton model for the non-thermal component, combined with a blackbody spectrum for the optical emission from the host galaxy.
The quasar PKS 1510-089 (z = 0.361) was observed with the H.E.S.S. array of imaging atmospheric Cherenkov telescopes during high states in the optical and GeV bands, to search for very high energy (VHE, defined as E >= 0.1 TeV) emission. VHE gamma-rays were detected with a statistical significance of 9.2 standard deviations in 15.8 h of H. E. S. S. data taken during March and April 2009. A VHE integral flux of I(0.15 TeV < E < 1.0TeV) = (1.0 +/- 0.2(stat) +/- 0.2(sys)) x 10(-11) cm(-2) s(-1) is measured. The best-fit power law to the VHE data has a photon index of G = 5.4 +/- 0.7(stat) +/- 0.3(sys). The GeV and optical light curves show pronounced variability during the period of H.E.S.S. observations. However, there is insufficient evidence to claim statistically significant variability in the VHE data. Because of its relatively high redshift, the VHE flux from PKS 1510-089 should suffer considerable attenuation in the intergalactic space due to the extragalactic background light (EBL). Hence, the measured gamma-ray spectrum is used to derive upper limits on the opacity due to EBL, which are found to be comparable with the previously derived limits from relatively-nearby BL Lac objects. Unlike typical VHE-detected blazars where the broadband spectrum is dominated by nonthermal radiation at all wavelengths, the quasar PKS 1510-089 has a bright thermal component in the optical to UV frequency band. Among all VHE detected blazars, PKS 1510-089 has the most luminous broad line region. The detection of VHE emission from this quasar indicates a low level of gamma - gamma absorption on the internal optical to UV photon field.
HESS J1640-465 - an exceptionally luminous TeV gamma-ray supernova remnant (vol 439, pg 2828, 2014)
(2014)
The results of follow-up observations of the TeV gamma-ray source HESS J1640-465 from 2004 to 2011 with the High Energy Stereoscopic System (HESS) are reported in this work. The spectrum is well described by an exponential cut-off power law with photon index Gamma = 2.11 +/- 0.09(stat) +/- 0.10(sys), and a cut-off energy of E-2 = 6.0(-1.2)(+2.0) TeV. The TeV emission is significantly extended and overlaps with the northwestern part of the shell of the SNR G338.3-0.0. The new HESS results, a re-analysis of archival XMM-Newton data and multiwavelength observations suggest that a significant part of the gamma-ray emission from HESS J1640-465 originates in the supernova remnant shell. In a hadronic scenario, as suggested by the smooth connection of the GeV and TeV spectra, the product of total proton energy and mean target density could be as high as W(p)n(H) similar to 4 x 10(52)(d/10kpc)(2) erg cm(-3).
Composite supernova remnants (SNRs) constitute a small subclass of the remnants of massive stellar explosions where non-thermal radiation is observed from both the expanding shell-like shock front and from a pulsar wind nebula (PWN) located inside of the SNR. These systems represent a unique evolutionary phase of SNRs where observations in the radio, X-ray, and gamma-ray regimes allow the study of the co-evolution of both these energetic phenomena. In this article, we report results from observations of the shell-type SNR G15.4+0.1 performed with the High Energy Stereoscopic System (H. E. S. S.) and XMM-Newton. A compact TeV gamma-ray source, HESS J1818-154, located in the center and contained within the shell of G15.4+0.1 is detected by H. E. S. S. and featurs a spectrum best represented by a power-law model with a spectral index of -2.3 +/- 0.3(stat) +/- 0.2(sys) and an integral flux of F(>0.42 TeV) = (0.9 +/- 0.3(stat) +/- 0.2(sys)) x 10(-12) cm(-2) s(-1). Furthermore, a recent observation with XMM-Newton reveals extended X-ray emission strongly peaked in the center of G15.4+0.1. The X-ray source shows indications of an energy-dependent morphology featuring a compact core at energies above 4 keV and more extended emission that fills the entire region within the SNR at lower energies. Together, the X-ray and VHE gamma-ray emission provide strong evidence of a PWN located inside the shell of G15.4+0.1 and this SNR can therefore be classified as a composite based on these observations. The radio, X-ray, and gamma-ray emission from the PWN is compatible with a one-zone leptonic model that requires a low average magnetic field inside the emission region. An unambiguous counterpart to the putative pulsar, which is thought to power the PWN, has been detected neither in radio nor in X-ray observations of G15.4+0.1.
HESS observations of the binary system PSR B1259-63/LS 2883 around the 2010/2011 periastron passage
(2013)
Aims. We present very high energy (VHE; E > 100 GeV) data from the gamma-ray binary system PSR B1259-63/LS 2883 taken around its periastron passage on 15th of December 2010 with the High Energy Stereoscopic System (H. E. S. S.) of Cherenkov Telescopes. We aim to search for a possible TeV counterpart of the GeV flare detected by the Fermi LAT. In addition, we aim to study the current periastron passage in the context of previous observations taken at similar orbital phases, testing the repetitive behaviour of the source.
Methods. Observations at VHEs were conducted with H.E.S.S. from 9th to 16th of January 2011. The total dataset amounts to similar to 6 h of observing time. The data taken around the 2004 periastron passage were also re-analysed with the current analysis techniques in order to extend the energy spectrum above 3 TeV to fully compare observation results from 2004 and 2011.
Results. The source is detected in the 2011 data at a significance level of 11.5 sigma revealing an averaged integral flux above 1 TeV of (1.01 +/- 0.18(stat) +/- 0.20(sys)) x 10(-12) cm(-2) s(-1). The differential energy spectrum follows a power-law shape with a spectral index Gamma = 2.92 +/- 0.30(stat) +/- 0.20(sys) and a flux normalisation at 1 TeV of N-0 = (1.95 +/- 0.32(stat) +/- 0.39(sys)) x 10(-12) TeV-1 cm(-2) s(-1). The measured light curve does not show any evidence for variability of the source on the daily scale. The re-analysis of the 2004 data yields results compatible with the published ones. The differential energy spectrum measured up to similar to 10 TeV is consistent with a power law with a spectral index Gamma = 2.81 +/- 0.10(stat) +/- 0.20(sys) and a flux normalisation at 1 TeV of N-0 = (1.29 +/- 0.08(stat) +/- 0.26(sys)) x 10(-12) TeV-1 cm(-2) s(-1).
Conclusions. The measured integral flux and the spectral shape of the 2011 data are compatible with the results obtained around previous periastron passages. The absence of variability in the H.E.S.S. data indicates that the GeV flare observed by Fermi LAT in the time period covered also by H.E.S.S. observations originates in a different physical scenario than the TeV emission. Moreover, the comparison of the new results to the results from the 2004 observations made at a similar orbital phase provides a stronger evidence of the repetitive behaviour of the source.
Context. On March 4, 2013 the Fermi-EAT and AGILE reported a flare from the direction of the Crab nebula in which the high-energy (HE; E > 100 MeV) flux was six times above its quiescent level. Simultaneous observations in other energy bands give us hints about the emission processes during the flare episode and the physics of pulsar wind nebulae in general.
Aims. We search for variability in the emission of the Crab nebula at very-high energies (VHF,; E > 100 GeV), using contemporaneous data taken with the H.E.S.S. array of Cherenkov telescopes.
Methods. Observational data taken with the H.E.S.S. instrument on five consecutive days during the flare were analysed for the flux and spectral shape of the emission from the Crab nebula. Night-wise light curves are presented with energy thresholds of 1 TeV and 5 TeV.
Results. The observations conducted with H.E.S.S. on March 6 to March 10, 2013 show no significant changes in the flux. They limit the variation in the integral flux above 1 TeV to less than 63% and the integral flux above 5 TeV to less than 78% at a 95% confidence level.
Unique column combinations (UCCs) are a fundamental concept in relational databases. They identify entities in the data and support various data management activities. Still, UCCs are usually not explicitly defined and need to be discovered. State-of-the-art data profiling algorithms are able to efficiently discover UCCs in moderately sized datasets, but they tend to fail on large and, in particular, on wide datasets due to run time and memory limitations. <br /> In this paper, we introduce HPIValid, a novel UCC discovery algorithm that implements a faster and more resource-saving search strategy. HPIValid models the metadata discovery as a hitting set enumeration problem in hypergraphs. In this way, it combines efficient discovery techniques from data profiling research with the most recent theoretical insights into enumeration algorithms. Our evaluation shows that HPIValid is not only orders of magnitude faster than related work, it also has a much smaller memory footprint.
Primary keys (PKs) and foreign keys (FKs) are important elements of relational schemata in various applications, such as query optimization and data integration. However, in many cases, these constraints are unknown or not documented. Detecting them manually is time-consuming and even infeasible in large-scale datasets. We study the problem of discovering primary keys and foreign keys automatically and propose an algorithm to detect both, namely Holistic Primary Key and Foreign Key Detection (HoPF). PKs and FKs are subsets of the sets of unique column combinations (UCCs) and inclusion dependencies (INDs), respectively, for which efficient discovery algorithms are known. Using score functions, our approach is able to effectively extract the true PKs and FKs from the vast sets of valid UCCs and INDs. Several pruning rules are employed to speed up the procedure. We evaluate precision and recall on three benchmarks and two real-world datasets. The results show that our method is able to retrieve on average 88% of all primary keys, and 91% of all foreign keys. We compare the performance of HoPF with two baseline approaches that both assume the existence of primary keys.
How inclusive are we?
(2022)
ACM SIGMOD, VLDB and other database organizations have committed to fostering an inclusive and diverse community, as do many other scientific organizations. Recently, different measures have been taken to advance these goals, especially for underrepresented groups. One possible measure is double-blind reviewing, which aims to hide gender, ethnicity, and other properties of the authors. <br /> We report the preliminary results of a gender diversity analysis of publications of the database community across several peer-reviewed venues, and also compare women's authorship percentages in both single-blind and double-blind venues along the years. We also obtained a cross comparison of the obtained results in data management with other relevant areas in Computer Science.
Introducing the CTA concept
(2013)
The Cherenkov Telescope Array (CTA) is a new observatory for very high-energy (VHE) gamma rays. CTA has ambitions science goals, for which it is necessary to achieve full-sky coverage, to improve the sensitivity by about an order of magnitude, to span about four decades of energy, from a few tens of GeV to above 100 TeV with enhanced angular and energy resolutions over existing VHE gamma-ray observatories. An international collaboration has formed with more than 1000 members from 27 countries in Europe, Asia, Africa and North and South America. In 2010 the CTA Consortium completed a Design Study and started a three-year Preparatory Phase which leads to production readiness of CTA in 2014. In this paper we introduce the science goals and the concept of CTA, and provide an overview of the project.
The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity-duplicates-into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. <br /> We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.
MDedup
(2020)
Duplicate detection is an integral part of data cleaning and serves to identify multiple representations of same real-world entities in (relational) datasets. Existing duplicate detection approaches are effective, but they are also hard to parameterize or require a lot of pre-labeled training data. Both parameterization and pre-labeling are at least domain-specific if not dataset-specific, which is a problem if a new dataset needs to be cleaned.
For this reason, we propose a novel, rule-based and fully automatic duplicate detection approach that is based on matching dependencies (MDs). Our system uses automatically discovered MDs, various dataset features, and known gold standards to train a model that selects MDs as duplicate detection rules. Once trained, the model can select useful MDs for duplicate detection on any new dataset. To increase the generally low recall of MD-based data cleaning approaches, we propose an additional boosting step. Our experiments show that this approach reaches up to 94% F-measure and 100% precision on our evaluation datasets, which are good numbers considering that the system does not require domain or target data-specific configuration.
Data analytics are moving beyond the limits of a single data processing platform. A cross-platform query optimizer is necessary to enable applications to run their tasks over multiple platforms efficiently and in a platform-agnostic manner. For the optimizer to be effective, it must consider data movement costs across different data processing platforms. In this paper, we present the graph-based data movement strategy used by RHEEM, our open-source cross-platform system. In particular, we (i) model the data movement problem as a new graph problem, which we prove to be NP-hard, and (ii) propose a novel graph exploration algorithm, which allows RHEEM to discover multiple hidden opportunities for cross-platform data processing.
Any system at play in a data-driven project has a fundamental requirement: the ability to load data. The de-facto standard format to distribute and consume raw data is CSV. Yet, the plain text and flexible nature of this format make such files often difficult to parse and correctly load their content, requiring cumbersome data preparation steps. We propose a benchmark to assess the robustness of systems in loading data from non-standard CSV formats and with structural inconsistencies. First, we formalize a model to describe the issues that affect real-world files and use it to derive a systematic lpollutionz process to generate dialects for any given grammar. Our benchmark leverages the pollution framework for the csv format. To guide pollution, we have surveyed thousands of real-world, publicly available csv files, recording the problems we encountered. We demonstrate the applicability of our benchmark by testing and scoring 16 different systems: popular csv parsing frameworks, relational database tools, spreadsheet systems, and a data visualization tool.
Aims. Previous observations with the High Energy Stereoscopic System (H.E.S.S.) have revealed an extended very-high-energy (VHE; E > 100 GeV) gamma-ray source, HESS J1834-087, coincident with the supernova remnant (SNR) W41. The origin of the gamma-ray emission was investigated in more detail with the H.E.S.S. array and the Large Area Telescope (LAT) onboard the Fermi Gamma-ray Space Telescope.
Methods. The gamma-ray data provided by 61 h of observations with H.E.S.S., and four years with the Fermi LAT were analyzed, covering over five decades in energy from 1.8 GeV up to 30 TeV. The morphology and spectrum of the TeV and GeV sources were studied and multiwavelength data were used to investigate the origin of the gamma-ray emission toward W41.
Results. The TeV source can be modeled with a sum of two components: one point-like and one significantly extended (sigma(TeV) = 0.17 degrees +/- 0.01 degrees), both centered on SNR W41 and exhibiting spectra described by a power law with index Gamma(TeV) similar or equal to 2.6. The GeV source detected with Fermi LAT is extended (sigma(GeV) = 0.15 degrees +/- 0.03 degrees) and morphologically matches the VHE emission. Its spectrum can be described by a power-law model with an index Gamma(GeV) = 2.15 +/- 0.12 and smoothly joins the spectrum of the whole TeV source. A break appears in the gamma-ray spectra around 100 GeV. No pulsations were found in the GeV range.
Conclusions. Two main scenarios are proposed to explain the observed emission: a pulsar wind nebula (PWN) or the interaction of SNR W41 with an associated molecular cloud. X-ray observations suggest the presence of a point-like source (a pulsar candidate) near the center of the remnant and nonthermal X-ray diffuse emission that could arise from the possibly associated PWN. The PWN scenario is supported by the compatible positions of the TeV and GeV sources with the putative pulsar. However, the spectral energy distribution from radio to gamma-rays is reproduced by a one-zone leptonic model only if an excess of low-energy electrons is injected following a Maxwellian distribution by a pulsar with a high spin-down power (> 10(37) erg s(-1)). This additional low-energy component is not needed if we consider that the point-like TeV source is unrelated to the extended GeV and TeV sources. The interacting SNR scenario is supported by the spatial coincidence between the gamma-ray sources, the detection of OH (1720 MHz) maser lines, and the hadronic modeling.