Refine
Has Fulltext
- no (40)
Year of publication
Document Type
- Article (40) (remove)
Language
- English (40) (remove)
Is part of the Bibliography
- yes (40)
Keywords
Axionlike particles (ALPs) are hypothetical light (sub-eV) bosons predicted in some extensions of the Standard Model of particle physics. In astrophysical environments comprising high-energy gamma rays and turbulent magnetic fields, the existence of ALPs can modify the energy spectrum of the gamma rays for a sufficiently large coupling between ALPs and photons. This modification would take the form of an irregular behavior of the energy spectrum in a limited energy range. Data from the H. E. S. S. observations of the distant BL Lac object PKS 2155 - 304 (z = 0.116) are used to derive upper limits at the 95% C. L. on the strength of the ALP coupling to photons, g(gamma a) < 2.1 x 10(-11) GeV-1 for an ALP mass between 15 and 60 neV. The results depend on assumptions on the magnetic field around the source, which are chosen conservatively. The derived constraints apply to both light pseudoscalar and scalar bosons that couple to the electromagnetic field.
Effective query optimization is a core feature of any database management system. While most query optimization techniques make use of simple metadata, such as cardinalities and other basic statistics, other optimization techniques are based on more advanced metadata including data dependencies, such as functional, uniqueness, order, or inclusion dependencies. This survey provides an overview, intuitive descriptions, and classifications of query optimization and execution strategies that are enabled by data dependencies. We consider the most popular types of data dependencies and focus on optimization strategies that target the optimization of relational database queries. The survey supports database vendors to identify optimization opportunities as well as DBMS researchers to find related work and open research questions.
Data Preparation
(2020)
Raw data are often messy: they follow different encodings, records are not well structured, values do not adhere to patterns, etc. Such data are in general not fit to be ingested by downstream applications, such as data analytics tools, or even by data management systems. The act of obtaining information from raw data relies on some data preparation process. Data preparation is integral to advanced data analysis and data management, not only for data science but for any data-driven applications. Existing data preparation tools are operational and useful, but there is still room for improvement and optimization. With increasing data volume and its messy nature, the demand for prepared data increases day by day. <br /> To cater to this demand, companies and researchers are developing techniques and tools for data preparation. To better understand the available data preparation systems, we have conducted a survey to investigate (1) prominent data preparation tools, (2) distinctive tool features, (3) the need for preliminary data processing even for these tools and, (4) features and abilities that are still lacking. We conclude with an argument in support of automatic and intelligent data preparation beyond traditional and simplistic techniques.
Data errors represent a major issue in most application workflows. Before any important task can take place, a certain data quality has to be guaranteed by eliminating a number of different errors that may appear in data. Typically, most of these errors are fixed with data preparation methods, such as whitespace removal. However, the particular error of duplicate records, where multiple records refer to the same entity, is usually eliminated independently with specialized techniques. Our work is the first to bring these two areas together by applying data preparation operations under a systematic approach prior to performing duplicate detection. <br /> Our process workflow can be summarized as follows: It begins with the user providing as input a sample of the gold standard, the actual dataset, and optionally some constraints to domain-specific data preparations, such as address normalization. The preparation selection operates in two consecutive phases. First, to vastly reduce the search space of ineffective data preparations, decisions are made based on the improvement or worsening of pair similarities. Second, using the remaining data preparations an iterative leave-one-out classification process removes preparations one by one and determines the redundant preparations based on the achieved area under the precision-recall curve (AUC-PR). Using this workflow, we manage to improve the results of duplicate detection up to 19% in AUC-PR.
Ground-based gamma-ray astronomy has had a major breakthrough with the impressive results obtained using systems of imaging atmospheric Cherenkov telescopes. Ground-based gamma-ray astronomy has a huge potential in astrophysics, particle physics and cosmology. CTA is an international initiative to build the next generation instrument, with a factor of 5-10 improvement in sensitivity in the 100 GeV-10 TeV range and the extension to energies well below 100 GeV and above 100 TeV. CTA will consist of two arrays (one in the north, one in the south) for full sky coverage and will be operated as open observatory. The design of CTA is based on currently available technology. This document reports on the status and presents the major design concepts of CTA.
Spreadsheets are among the most commonly used file formats for data management, distribution, and analysis. Their widespread employment makes it easy to gather large collections of data, but their flexible canvas-based structure makes automated analysis difficult without heavy preparation. One of the common problems that practitioners face is the presence of multiple, independent regions in a single spreadsheet, possibly separated by repeated empty cells. We define such files as "multiregion" files. In collections of various spreadsheets, we can observe that some share the same layout. We present the Mondrian approach to automatically identify layout templates across multiple files and systematically extract the corresponding regions. Our approach is composed of three phases: first, each file is rendered as an image and inspected for elements that could form regions; then, using a clustering algorithm, the identified elements are grouped to form regions; finally, every file layout is represented as a graph and compared with others to find layout templates. We compare our method to state-of-the-art table recognition algorithms on two corpora of real-world enterprise spreadsheets. Our approach shows the best performances in detecting reliable region boundaries within each file and can correctly identify recurring layouts across files.
With the advent of big data and data lakes, data are often integrated from multiple sources. Such integrated data are often of poor quality, due to inconsistencies, errors, and so forth. One way to check the quality of data is to infer functional dependencies (fds). However, in many modern applications it might be necessary to extract properties and relationships that are not captured through fds, due to the necessity to admit exceptions, or to consider similarity rather than equality of data values. Relaxed fds (rfds) have been introduced to meet these needs, but their discovery from data adds further complexity to an already complex problem, also due to the necessity of specifying similarity and validity thresholds. We propose Domino, a new discovery algorithm for rfds that exploits the concept of dominance in order to derive similarity thresholds of attribute values while inferring rfds. An experimental evaluation on real datasets demonstrates the discovery performance and the effectiveness of the proposed algorithm.
Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of genuineness and propose algorithms to compute the genuineness score of a discovered FD. This can be used to identify the genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs.
Discovery of high and very high-energy emission from the BL Lacertae object SHBL J001355.9-185406
(2013)
The detection of the high-frequency peaked BL Lac object (HBL) SHBL J001355.9-185406 (z = 0.095) at high (HE; 100 MeV < E < 300 GeV) and very high-energy (VHE; E > 100 GeV) with the Fermi Large Area Telescope (LAT) and the High Energy Stereoscopic System (H.E.S.S.) is reported. Dedicated observations were performed with the H. E. S. S. telescopes, leading to a detection at the 5.5 sigma significance level. The measured flux above 310 GeV is (8.3 +/- 1.7(stat) +/- 1.7(sys)) x 10(-13) photons cm(-2) s(-1) (about 0.6% of that of the Crab Nebula), and the power-law spectrum has a photon index of Gamma = 3.4 +/- 0.5(stat) +/- 0.2(sys). Using 3.5 years of publicly available Fermi-LAT data, a faint counterpart has been detected in the LAT data at the 5.5 sigma significance level, with an integrated flux above 300 MeV of (9.3 +/- 3.4(stat) +/- 0.8(sys)) x 10(-10) photons cm(-2) s(-1) and a photon index of Gamma = 1.96 +/- 0.20(stat) +/- 0.08(sys). X-ray observations with Swift-XRT allow the synchrotron peak energy in vF(v) representation to be located at similar to 1.0 keV. The broadband spectral energy distribution is modelled with a one-zone synchrotron self-Compton (SSC) model and the optical data by a black-body emission describing the thermal emission of the host galaxy. The derived parameters are typical of HBLs detected at VHE, with a particle-dominated jet.