publish.UP 004 Datenverarbeitung; Informatik

004 Datenverarbeitung; Informatik

Refine

Has Fulltext

yes (110)
no (27)

Keywords

machine learning (11)
maschinelles Lernen (8)
Cloud Computing (5)
Forschungsprojekte (5)
Future SOC Lab (5)
In-Memory Technologie (5)
Multicore Architekturen (5)
Smalltalk (5)
artifical intelligence (5)
cloud computing (5)
multicore architectures (5)
research projects (5)
cyber-physical systems (4)
digital education (4)
künstliche Intelligenz (4)
openHPI (4)
probabilistic timed systems (4)
qualitative Analyse (4)
qualitative analysis (4)
quantitative Analyse (4)
quantitative analysis (4)
Datenaufbereitung (3)
Digitalisierung (3)
Hasso Plattner Institute (3)
Hasso-Plattner-Institut (3)
MOOC (3)
blockchain (3)
business process management (3)
cloud (3)
data preparation (3)
digitale Bildung (3)
digitalization (3)
in-memory technology (3)
smart contracts (3)
3D visualization (2)
3D-Visualisierung (2)
Anomalieerkennung (2)
Bounded Model Checking (2)
Datenbanksysteme (2)
Datenintegration (2)
Datenqualität (2)
Design Thinking (2)
European Union (2)
Europäische Union (2)
Forschungskolleg (2)
Graphentransformationssysteme (2)
HPI Schul-Cloud (2)
Identitätsmanagement (2)
In-Memory (2)
In-Memory technology (2)
Innovation (2)
Klausurtagung (2)
Künstliche Intelligenz (2)
MERLOT (2)
Modellprüfung (2)
Ph.D. retreat (2)
Programmieren (2)
Service-oriented Systems Engineering (2)
Sicherheit (2)
Versionsverwaltung (2)
Werkzeuge (2)
anomaly detection (2)
bounded model checking (2)
business processes (2)
causal discovery (2)
causal structure learning (2)
classification (2)
clustering (2)
computer vision (2)
cyber-physische Systeme (2)
data integration (2)
data quality (2)
database systems (2)
deep learning (2)
deferred choice (2)
digital enlightenment (2)
digital learning platform (2)
digital sovereignty (2)
digitale Aufklärung (2)
digitale Lernplattform (2)
digitale Souveränität (2)
evaluation (2)
formal semantics (2)
functional dependencies (2)
funktionale Abhängigkeiten (2)
geospatial data (2)
graph transformation systems (2)
identity management (2)
image stylization (2)
inclusion dependencies (2)
index selection (2)
innovation (2)
intrusion detection (2)
kausale Entdeckung (2)
kausales Strukturlernen (2)
lebenslanges Lernen (2)
lifelong learning (2)
liveness (2)
maschinelles Sehen (2)
memory (2)
mobile mapping (2)
model checking (2)
nested graph conditions (2)
oracles (2)
outlier detection (2)
probabilistische gezeitete Systeme (2)
probabilistische zeitgesteuerte Systeme (2)
programming (2)
research school (2)
security (2)
self-sovereign identity (2)
service-oriented systems engineering (2)
tiefes Lernen (2)
typed attributed graphs (2)
user interaction (2)
user-generated content (2)
version control (2)
workflow patterns (2)
0-day (1)
3D city model (1)
3D geovisualization (1)
3D point cloud (1)
3D point clouds (1)
3D portrayal (1)
3D-Geovisualisierung (1)
3D-Punktwolke (1)
3D-Punktwolken (1)
3D-Rendering (1)
3D-Stadtmodell (1)
ACINQ (1)
APT (1)
ASIC (1)
Activity-oriented Optimization (1)
Advanced Persistent Threats (1)
Agile (1)
Agilität (1)
Aktivitäten (1)
Alcohol Use Disorders Identification Test (1)
Alcohol use assessment (1)
Algebraic methods (1)
Ambiguity (1)
Ambiguität (1)
Analog-zu-Digital-Konvertierung (1)
Anfrageoptimierung (1)
Angriffserkennung (1)
Architekturadaptation (1)
Archivanalyse (1)
Artificial Intelligence (1)
Arzt-Patient-Beziehung (1)
Attributsicherung (1)
Ausreißererkennung (1)
Australian securities exchange (1)
Auswirkungen (1)
BCCC (1)
BPMN (1)
BTC (1)
Bahnwesen (1)
Bedrohungserkennung (1)
Behavior change (1)
Benutzerinteraktion (1)
Betriebssysteme (1)
Big Data (1)
Big Five model (1)
Bildverarbeitung (1)
BitShares (1)
Bitcoin Core (1)
Blockchain (1)
Blockchain Auth (1)
Blockchain-Konsortium R3 (1)
Blockkette (1)
Blockstack (1)
Blockstack ID (1)
Blumix-Plattform (1)
Blöcke (1)
Bounded Backward Model Checking (1)
Business Process Management (1)
Business process modeling (1)
Byzantine Agreement (1)
C++ tool (1)
CityGML (1)
Clinical predictive modeling (1)
Cloud (1)
Clusteranalyse (1)
Clustering (1)
Colored Coins (1)
Commonsense reasoning (1)
Community analysis (1)
Complexity (1)
Compound Values (1)
Computational Photography (1)
Computing (1)
Conceptual modeling (1)
Creative (1)
Cyber-Sicherheit (1)
Cyber-physikalische Systeme (1)
DAO (1)
DBMS (1)
DPoS (1)
Data Profiling (1)
Data Structure Optimization (1)
Data modeling (1)
Data profiling (1)
Data warehouse (1)
Data-Mining (1)
Data-Science (1)
Dateistruktur (1)
Datenanalyse (1)
Datenbank (1)
Datenbankoptimierung (1)
Datenmodelle (1)
Datensatz (1)
Datenschutz-sicherer Einsatz in der Schule (1)
Datenstrukturoptimierung (1)
Datensynthese (1)
Datentransformation (1)
Datenverwaltung (1)
Datenverwaltung für Daten mit räumlich-zeitlichem Bezug (1)
Datenvisualisierung (1)
Debugging (1)
Decision support (1)
Dekubitus (1)
Delegated Proof-of-Stake (1)
Denkweise (1)
Design-Forschung (1)
Digital Engineering (1)
Digital World (1)
Direkte Manipulation (1)
Distributed Proof-of-Research (1)
Distributed-Ledger-Technologie (DLT) (1)
Domänenspezifische Modellierung (1)
Dubletten (1)
Duplikaterkennung (1)
E-Learning (1)
E-Wallet (1)
ECDSA (1)
Echtzeit (1)
Echtzeit-Rendering (1)
Einbruchserkennung (1)
Endpunktsicherheit (1)
Entitätsverknüpfung (1)
Entscheidungsfindung (1)
Entscheidungsmanagement (1)
Entscheidungsmodelle (1)
Enumeration algorithm (1)
Eris (1)
Erkennung von Metadaten (1)
Ether (1)
Ethereum (1)
Evaluation (1)
Exploration (1)
Feature selection (1)
Federated Byzantine Agreement (1)
Fehlertoleranz (1)
Fernerkundung (1)
Fertigung (1)
FollowMyVote (1)
Fork (1)
Formal modelling (1)
GPU (1)
GPU acceleration (1)
GPU-Beschleunigung (1)
Gaussian process state-space models (1)
Gaussian processes (1)
Gauß-Prozess Zustandsraummodelle (1)
Gauß-Prozesse (1)
Gene expression (1)
General Earth and Planetary Sciences (1)
Generalized Discrimination Networks (1)
Geodaten (1)
Geography, Planning and Development (1)
German schools (1)
Geschäftsprozessarchitekturen (1)
Geschäftsprozessmanagement (1)
Gewinnung benannter Entitäten (1)
GitHub (1)
GraalVM (1)
Graph algorithm (1)
Graph logic (1)
Graph-Mining (1)
Graphableitung (1)
Graphreparatur (1)
Gridcoin (1)
Hard Fork (1)
Hashed Timelock Contracts (1)
Hasserkennung (1)
Hauptspeicher Datenmanagement (1)
Heuristic triangle estimation (1)
Heuristiken (1)
Hyrise (1)
Häkeln (1)
IDS (1)
IT-Infrastruktur (1)
IT-infrastructure (1)
Ideation (1)
Ideenfindung (1)
Identität (1)
Impact (1)
Implementation in Organizations (1)
Implementierung in Organisationen (1)
Indexauswahl (1)
Informationsextraktion (1)
Inklusionsabhängigkeiten (1)
Interdisciplinary Teams (1)
Internet der Dinge (1)
Internet of Things (1)
Interpretability (1)
Interpreter (1)
Interval Timed Automata (1)
IoT (1)
Japanese Blockchain Consortium (1)
Japanisches Blockchain-Konsortium (1)
Java (1)
Karten (1)
Kausalität (1)
Kette (1)
Klassifikation (1)
Klassifizierung (1)
Konsensalgorithmus (1)
Konsensprotokoll (1)
Konsensprotokolle (1)
Konsistenzrestauration (1)
Kreativität (1)
Kunstanalyse (1)
Large networks (1)
Laserscanning (1)
Lastverteilung (1)
Laufzeitmodelle (1)
Learning Analytics (1)
Lebendigkeit (1)
Leistungsmodelle von virtuellen Maschinen (1)
LiDAR (1)
Lightning Network (1)
Live-Migration (1)
Live-Programmierung (1)
Lively Kernel (1)
Liveness (1)
Lock-Time-Parameter (1)
Lösungsraum (1)
MOOCs (1)
Machine-Learning (1)
Machinelles Lernen (1)
Maschinelles Lernen (1)
Maschinen (1)
Measurement (1)
Messung (1)
Metacrate (1)
Metadaten (1)
Micropayment-Kanäle (1)
Microsoft Azur (1)
Mindset (1)
Minimal hitting set (1)
Mobile Mapping (1)
Mobile-Mapping (1)
Modelle mit mehreren Versionen (1)
Modellreparatur (1)
Multidisciplinary Teams (1)
NASDAQ (1)
NameID (1)
Namecoin (1)
Nephrology (1)
Netzwerkprotokolle (1)
Non-photorealistic Rendering (1)
Nutzerinteraktion (1)
Objects (1)
Objekte (1)
Off-Chain-Transaktionen (1)
Onename (1)
Online Learning Environments (1)
Open source (1)
OpenBazaar (1)
Opinion mining (1)
OptoGait (1)
Oracles (1)
Ordinances (1)
Orphan Block (1)
Overlapping community detection (1)
P2P (1)
Parallelized algorithm (1)
Patientenermündigung (1)
Peer-to-Peer Netz (1)
Peercoin (1)
PoB (1)
PoS (1)
PoW (1)
Popular matching (1)
Posenabschätzung (1)
Prior knowledge (1)
Privatsphäre (1)
Problem Solving (1)
Problemlösung (1)
Process (1)
Process Execution (1)
Programmiererlebnis (1)
Programmierwerkzeuge (1)
Proof-of-Burn (1)
Proof-of-Stake (1)
Proof-of-Work (1)
Prototyping (1)
Prozess (1)
Prozessmodelle (1)
Psychotherapie (1)
Python (1)
Quanten-Computing (1)
Query-Optimierung (1)
RL (1)
Regressionstests (1)
Representationlernen (1)
Reproducible benchmarking (1)
Resource Allocation (1)
Resource Management (1)
Reverse Engineering (1)
Ripple (1)
Ruby (1)
Runtime improvement (1)
Runtime-monitoring (1)
Russia (1)
SCP (1)
SHA (1)
SIEM (1)
SPV (1)
SWIRL (1)
Savanne (1)
Schriftartgestaltung (1)
Schriftrendering (1)
Schule (1)
Schwierigkeitsgrad (1)
Scrollytelling (1)
Selbst-Adaptive Software (1)
Self-Regulated Learning (1)
Sequenzeigenschaften (1)
Serialisierung (1)
Service-Oriented Architecture (1)
Sicherheitsanalyse (1)
Simplified Payment Verification (1)
Situationsbewusstsein (1)
Skalierbarkeit der Blockchain (1)
Skriptsprachen (1)
Slock.it (1)
Smart cities (1)
Social (1)
Soft Fork (1)
Software-Evolution (1)
Software/Hardware Co-Design (1)
Solution Space (1)
Soziale Medien (1)
Specification (1)
Spezifikation von gezeiteten Graph Transformationen (1)
Sprachlernen im Limes (1)
Squeak (1)
Squeak/Smalltalk (1)
Stable marriage (1)
Stable matching (1)
Standardisierung (1)
Steemit (1)
Stellar Consensus Protocol (1)
Storj (1)
Suchtberatung und -therapie (1)
Telemedizin (1)
Temporallogik (1)
Testergebnisse (1)
Testpriorisierungs (1)
Text mining (1)
Texterkennung (1)
Textklassifikation (1)
The Bitfury Group (1)
The DAO (1)
Timed Automata (1)
Tools (1)
Trajektorien (1)
Trajektoriendaten (1)
Transaktion (1)
Transversal hypergraph (1)
Tripel-Graph-Grammatiken (1)
Two-Way-Peg (1)
Unique column combination (1)
Unspent Transaction Output (1)
Unterricht mit digitalen Medien (1)
User Experience (1)
VUCA-World (1)
Validation (1)
Verbundwerte (1)
Verhaltensänderung (1)
Verlässlichkeit (1)
Vertrauen (1)
Verträge (1)
Veränderungsanalyse (1)
Virtual Machines (1)
Virtuelle Maschinen (1)
Visualisierungskonzept-Exploration (1)
Vorhersage (1)
W[3]-Completeness (1)
Water Science and Technology (1)
Watson IoT (1)
Wearable (1)
Weighted clustering coefficient (1)
Werkzeugbau (1)
Wicked Problems (1)
Wolke (1)
Wüstenbildung (1)
Zebris (1)
Zielvorgabe (1)
Zookos Dreieck (1)
Zookos triangle (1)
acceptability (1)
acyclic preferences (1)
addiction care (1)
advanced persistent threat (1)
advanced threats (1)
agil (1)
altchain (1)
alternative chain (1)
analog-to-digital conversion (1)
apt (1)
architectural adaptation (1)
archive analysis (1)
art analysis (1)
asset management (1)
atomic swap (1)
attribute assurance (1)
autonomous (1)
behaviourally correct learning (1)
benchmark (1)
benutzergenerierte Inhalte (1)
bidirectional payment channels (1)
bildbasiertes Rendering (1)
bitcoins (1)
blockchain consortium (1)
blockchain-übergreifend (1)
blocks (1)
blumix platform (1)
bounded backward model checking (1)
brand personality (1)
business process architectures (1)
categories (1)
causal AI (1)
causal reasoning (1)
causality (1)
chain (1)
change detection (1)
code generation (1)
columnar databases (1)
compositional analysis (1)
computational photography (1)
computer-aided design (1)
computer-mediated therapy (1)
computervermittelte Therapie (1)
computing (1)
confirmation period (1)
confluence (1)
consensus algorithm (1)
consensus protocol (1)
consensus protocols (1)
consistency restoration (1)
consistent learning (1)
contest period (1)
continuous integration (1)
contracts (1)
convolutional neural networks (1)
creativity (1)
crochet (1)
cross-chain (1)
cultural heritage (1)
cumulative culture (1)
cyber-physikalische Systeme (1)
cybersecurity (1)
data analytics (1)
data dependencies (1)
data management (1)
data mining (1)
data models (1)
data pipeline (1)
data profiling (1)
data science (1)
data set (1)
data synthesis (1)
data visualization (1)
data wrangling (1)
data-driven (1)
database (1)
database optimization (1)
database tuning (1)
datengetrieben (1)
debugging (1)
decentral identities (1)
decentralized autonomous organization (1)
decision management (1)
decision mining (1)
decision models (1)
decubitus (1)
deduplication (1)
deep Gaussian processes (1)
demografische Informationen (1)
demographic information (1)
dependability (1)
desertification (1)
design research (1)
dezentrale Identitäten (1)
dezentrale autonome Organisation (1)
difficulty (1)
difficulty target (1)
digital health (1)
digital picture archive (1)
digital unterstützter Unterricht (1)
digital whiteboard (1)
digital world (1)
digitale Infrastruktur für den Schulunterricht (1)
digitales Bildarchiv (1)
digitales Whiteboard (1)
direct manipulation (1)
discrete-event model (1)
diskretes Ereignismodell (1)
distributed computation (1)
distributed performance monitoring (1)
distributed systems (1)
doctor-patient relationship (1)
domain-specific modeling (1)
doppelter Hashwert (1)
double hashing (1)
drift theory (1)
duplicate detection (1)
dynamic systems (1)
dynamische Systeme (1)
e-learning (1)
electrical muscle stimulation (1)
elektrische Muskelstimulation (1)
endpoint security (1)
entity linking (1)
entity resolution (1)
erzeugende gegnerische Netzwerke (1)
evolutionary computation (1)
experience (1)
exploration (1)
exploratives Programmieren (1)
exploratory programming (1)
extend (1)
fault tolerance (1)
federated voting (1)
file structure (1)
font engineering (1)
font rendering (1)
fortschrittliche Angriffe (1)
gait analysis algorithm (1)
ganzzahlige lineare Optimierung (1)
gefaltete neuronale Netze (1)
gemischte Daten (1)
generalized discrimination networks (1)
generative adversarial networks (1)
geschichtsbewusste Laufzeit-Modelle (1)
getypte Attributierte Graphen (1)
global model management (1)
globales Modellmanagement (1)
grammars (1)
graph inference (1)
graph mining (1)
graph repair (1)
graph-transformations (1)
hashrate (1)
hate speech detection (1)
heuristics (1)
higher education (1)
history-aware runtime models (1)
human-centered (1)
hybrid systems (1)
identity (1)
image processing (1)
image-based rendering (1)
immediacy (1)
in-memory (1)
in-memory data management (1)
incremental graph query evaluation (1)
inertial measurement unit (1)
information extraction (1)
inkrementelle Ausführung von Graphanfragen (1)
integer linear programming (1)
integrated development environments (1)
integrierte Entwicklungsumgebungen (1)
intelligente Verträge (1)
inter-chain (1)
interactive media (1)
interaktive Medien (1)
interdisziplinäre Teams (1)
interpreters (1)
interval probabilistic timed systems (1)
interval probabilistische zeitgesteuerte Systeme (1)
interval timed automata (1)
intransitivity (1)
intuitive Benutzeroberflächen (1)
intuitive interfaces (1)
invention (1)
invention mechanism (1)
juridical recording (1)
k-inductive invariant checking (1)
k-induktive Invariantenprüfung (1)
kausale KI (1)
kausale Schlussfolgerung (1)
kompositionale Analyse (1)
konsistentes Lernen (1)
kontinuierliche Integration (1)
kulturelles Erbe (1)
language learning in the limit (1)
laserscanning (1)
learning (1)
lebenszentriert (1)
ledger assets (1)
left recursion (1)
level-replacement systems (1)
life-centered (1)
live migration (1)
live programming (1)
load balancing (1)
machine (1)
machines (1)
manufacturing (1)
maps (1)
maschinelle Verarbeitung natürlicher Sprache (1)
media (1)
medical documentation (1)
medizinische Dokumentation (1)
mehrsprachige Ausführungsumgebungen (1)
menschenzentriert (1)
merged mining (1)
merkle root (1)
metacrate (1)
metadata (1)
metadata detection (1)
methods (1)
metric temporal logic (1)
metric termporal graph logic (1)
metrisch temporale Graph Logic (1)
metrische Temporallogik (1)
microcredential (1)
micropayment (1)
micropayment channels (1)
miner (1)
mining (1)
mining hardware (1)
minting (1)
mixed data (1)
mobile applications (1)
model (1)
model repair (1)
model-driven engineering (1)
model-driven software engineering (1)
modellgetriebene Entwicklung (1)
modellgetriebene Softwaretechnik (1)
multi-version models (1)
multidisziplinäre Teams (1)
named entity mining (1)
natural language processing (1)
network protocols (1)
nicht-parametrische bedingte Unabhängigkeitstests (1)
non-parametric conditional independence testing (1)
non-photorealistic rendering (1)
non-photorealistic rendering (NPR) (1)
nonce (1)
novelty detection (1)
nutzergenerierte Inhalte (1)
object-oriented programming (1)
objektorientiertes Programmieren (1)
off-chain transaction (1)
online course creation (1)
online course design (1)
open innovation (1)
operating systems (1)
optical character recognition (1)
order dependencies (1)
packrat parsing (1)
parallel and sequential independence (1)
parallel processing (1)
parallele Verarbeitung (1)
parallele und Sequentielle Unabhängigkeit (1)
parsing expression grammars (1)
partial replication (1)
partielle Replikation (1)
patent (1)
patient empowerment (1)
peer-to-peer network (1)
pegged sidechains (1)
performance models of virtual machines (1)
personality prediction (1)
polyglot execution environments (1)
polyglot programming (1)
polyglottes Programmieren (1)
portrait (1)
pose estimation (1)
poset (1)
prediction (1)
primary healthcare (1)
privacy (1)
probabilistic machine learning (1)
probabilistisches maschinelles Lernen (1)
process mining (1)
process models (1)
programming experience (1)
programming tools (1)
programs (1)
prototyping (1)
psychotherapy (1)
public dataset (1)
qualitative model (1)
qualitatives Modell (1)
quantum computing (1)
query optimization (1)
quorum slices (1)
railways (1)
real-time (1)
real-time rendering (1)
rechnerunterstütztes Konstruieren (1)
reconfigurable systems (1)
regression testing (1)
reinforcement learning (1)
remote sensing (1)
representation learning (1)
reverse engineering (1)
rootstock (1)
runtime models (1)
runtime monitoring (1)
räumliche Geodaten (1)
savanna (1)
scalability of blockchain (1)
scarce tokens (1)
schwach überwachtes maschinelles Lernen (1)
screening tools (1)
scripting languages (1)
scrollytelling (1)
security analytics (1)
selbst-souveräne Identitäten (1)
selbstbestimmte Identitäten (1)
selbstüberwachtes Lernen (1)
self-adaptive software (1)
self-driving (1)
semantic classification (1)
semantische Klassifizierung (1)
sequence properties (1)
serialization (1)
serverseitiges 3D-Rendering (1)
serverside 3D rendering (1)
service-oriented architectures (1)
serviceorientierte Architekturen (1)
sidechain (1)
simulation (1)
situational awareness (1)
small talk (1)
smalltalk (1)
social media (1)
social media analysis (1)
software evolution (1)
software/hardware co-design (1)
spaltenorientierte Datenbanken (1)
spatio-temporal data management (1)
specification of timed graph transformations (1)
squeak (1)
stable matching (1)
standardization (1)
stark verhaltenskorrekt sperrend (1)
static source-code analysis (1)
statische Quellcodeanalyse (1)
stochastic process (1)
strongly behaviourally correct locking (1)
strongly stable matching (1)
style transfer (1)
super stable matching (1)
symbolic analysis (1)
symbolic graphs (1)
symbolische Analyse (1)
symbolische Graphen (1)
synchronization (1)
tabellarische Dateien (1)
tabular data (1)
technology (1)
telemedicine (1)
temporal graph queries (1)
temporal logic (1)
temporale Graphanfragen (1)
test case prioritization (1)
test results (1)
text classification (1)
text mining (1)
threat detection (1)
tiefe Gauß-Prozesse (1)
timed automata (1)
tool building (1)
tools (1)
trajectories (1)
trajectory data (1)
transaction (1)
triple graph grammars (1)
trust (1)
typisierte attributierte Graphen (1)
unique column combinations (1)
unsupervised (1)
unsupervised learning (1)
usability (1)
user experience (1)
variational inference (1)
variationelle Inferenz (1)
verhaltenskorrektes Lernen (1)
verifiable credentials (1)
verschachtelte Anwendungsbedingungen (1)
verschachtelte Graphbedingungen (1)
verteilte Berechnung (1)
verteilte Leistungsüberwachung (1)
verzwickte Probleme (1)
virtual (1)
virtual machines (1)
virtual reality (1)
virtuell (1)
virtuelle Maschinen (1)
virtuelle Realität (1)
visual language (1)
visual languages (1)
visualization concept exploration (1)
visuelle Sprache (1)
visuelle Sprachen (1)
weak supervision (1)
weakly (1)
wearables (1)
web-based development (1)
web-based development environment (1)
web-basierte Entwicklungsumgebung (1)
webbasierte Entwicklung (1)
zero-day (1)
überprüfbare Nachweise (1)

- less

137 search hits

1 to 10

Sort by

Structural preparation of raw data files (2024)

Hameed, Mazhar

Data preparation stands as a cornerstone in the landscape of data science workflows, commanding a significant portion—approximately 80%—of a data scientist's time. The extensive time consumption in data preparation is primarily attributed to the intricate challenge faced by data scientists in devising tailored solutions for downstream tasks. This complexity is further magnified by the inadequate availability of metadata, the often ad-hoc nature of preparation tasks, and the necessity for data scientists to grapple with a diverse range of sophisticated tools, each presenting its unique intricacies and demands for proficiency. Previous research in data management has traditionally concentrated on preparing the content within columns and rows of a relational table, addressing tasks, such as string disambiguation, date standardization, or numeric value normalization, commonly referred to as data cleaning. This focus assumes a perfectly structured input table. Consequently, the mentioned data cleaning tasks can be effectively applied only after the table has been successfully loaded into the respective data cleaning environment, typically in the later stages of the data processing pipeline. While current data cleaning tools are well-suited for relational tables, extensive data repositories frequently contain data stored in plain text files, such as CSV files, due to their adaptable standard. Consequently, these files often exhibit tables with a flexible layout of rows and columns, lacking a relational structure. This flexibility often results in data being distributed across cells in arbitrary positions, typically guided by user-specified formatting guidelines. Effectively extracting and leveraging these tables in subsequent processing stages necessitates accurate parsing. This thesis emphasizes what we define as the “structure” of a data file—the fundamental characters within a file essential for parsing and comprehending its content. Concentrating on the initial stages of the data preprocessing pipeline, this thesis addresses two crucial aspects: comprehending the structural layout of a table within a raw data file and automatically identifying and rectifying any structural issues that might hinder its parsing. Although these issues may not directly impact the table's content, they pose significant challenges in parsing the table within the file. Our initial contribution comprises an extensive survey of commercially available data preparation tools. This survey thoroughly examines their distinct features, the lacking features, and the necessity for preliminary data processing despite these tools. The primary goal is to elucidate the current state-of-the-art in data preparation systems while identifying areas for enhancement. Furthermore, the survey explores the encountered challenges in data preprocessing, emphasizing opportunities for future research and improvement. Next, we propose a novel data preparation pipeline designed for detecting and correcting structural errors. The aim of this pipeline is to assist users at the initial preprocessing stage by ensuring the correct loading of their data into their preferred systems. Our approach begins by introducing SURAGH, an unsupervised system that utilizes a pattern-based method to identify dominant patterns within a file, independent of external information, such as data types, row structures, or schemata. By identifying deviations from the dominant pattern, it detects ill-formed rows. Subsequently, our structure correction system, TASHEEH, gathers the identified ill-formed rows along with dominant patterns and employs a novel pattern transformation algebra to automatically rectify errors. Our pipeline serves as an end-to-end solution, transforming a structurally broken CSV file into a well-formatted one, usually suitable for seamless loading. Finally, we introduce MORPHER, a user-friendly GUI integrating the functionalities of both SURAGH and TASHEEH. This interface empowers users to access the pipeline's features through visual elements. Our extensive experiments demonstrate the effectiveness of our data preparation systems, requiring no user involvement. Both SURAGH and TASHEEH outperform existing state-of-the-art methods significantly in both precision and recall.

HPI Future SOC Lab (2024)

The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners. The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies. This technical report presents results of research projects executed in 2020. Selected projects have presented their results on April 21st and November 10th 2020 at the Future SOC Lab Day events.

Konzeption eines integrativen Schulfaches „Digitale Welt“ für hessische Schulen (2024)

Meinel, Christoph ; Michael, Galbas ; Dengel, Andreas ; Wendlandt, Matthias

Um in der Schule bereits frühzeitig ein Verständnis für informatische Prozesse zu vermitteln wurde das neue Informatikfach Digitale Welt für die Klassenstufe 5 konzipiert mit der bundesweit einmaligen Verbindung von Informatik mit anwendungsbezogenen und gesellschaftlich relevanten Bezügen zur Ökologie und Ökonomie. Der Technische Report gibt eine Handreichung zur Einführung des neuen Faches.

Self-supervised deep learning methods for medical image analysis (2024)

Taleb, Aiham

Deep learning has seen widespread application in many domains, mainly for its ability to learn data representations from raw input data. Nevertheless, its success has so far been coupled with the availability of large annotated (labelled) datasets. This is a requirement that is difficult to fulfil in several domains, such as in medical imaging. Annotation costs form a barrier in extending deep learning to clinically-relevant use cases. The labels associated with medical images are scarce, since the generation of expert annotations of multimodal patient data at scale is non-trivial, expensive, and time-consuming. This substantiates the need for algorithms that learn from the increasing amounts of unlabeled data. Self-supervised representation learning algorithms offer a pertinent solution, as they allow solving real-world (downstream) deep learning tasks with fewer annotations. Self-supervised approaches leverage unlabeled samples to acquire generic features about different concepts, enabling annotation-efficient downstream task solving subsequently. Nevertheless, medical images present multiple unique and inherent challenges for existing self-supervised learning approaches, which we seek to address in this thesis: (i) medical images are multimodal, and their multiple modalities are heterogeneous in nature and imbalanced in quantities, e.g. MRI and CT; (ii) medical scans are multi-dimensional, often in 3D instead of 2D; (iii) disease patterns in medical scans are numerous and their incidence exhibits a long-tail distribution, so it is oftentimes essential to fuse knowledge from different data modalities, e.g. genomics or clinical data, to capture disease traits more comprehensively; (iv) Medical scans usually exhibit more uniform color density distributions, e.g. in dental X-Rays, than natural images. Our proposed self-supervised methods meet these challenges, besides significantly reducing the amounts of required annotations. We evaluate our self-supervised methods on a wide array of medical imaging applications and tasks. Our experimental results demonstrate the obtained gains in both annotation-efficiency and performance; our proposed methods outperform many approaches from related literature. Additionally, in case of fusion with genetic modalities, our methods also allow for cross-modal interpretability. In this thesis, not only we show that self-supervised learning is capable of mitigating manual annotation costs, but also our proposed solutions demonstrate how to better utilize it in the medical imaging domain. Progress in self-supervised learning has the potential to extend deep learning algorithms application to clinical scenarios.

HPI Future SOC Lab – Proceedings 2019 (2024)

Kuban, Robert ; Rotta, Randolf ; Nolte, Jörg ; Chromik, Jonas ; Beilharz, Jossekin Jakob ; Pirl, Lukas ; Friedrich, Tobias ; Lenzner, Pascal ; Weyand, Christopher ; Juiz, Carlos ; Bermejo, Belen ; Sauer, Joao ; Coelh, Leandro dos Santos ; Najafi, Pejman ; Pünter, Wenzel ; Cheng, Feng ; Meinel, Christoph ; Sidorova, Julia ; Lundberg, Lars ; Vogel, Thomas ; Tran, Chinh ; Moser, Irene ; Grunske, Lars ; Elsaid, Mohamed Esameldin Mohamed ; Abbas, Hazem M. ; Rula, Anisa ; Sejdiu, Gezim ; Maurino, Andrea ; Schmidt, Christopher ; Hügle, Johannes ; Uflacker, Matthias ; Nozza, Debora ; Messina, Enza ; Hoorn, André van ; Frank, Markus ; Schulz, Henning ; Alhosseini Almodarresi Yasin, Seyed Ali ; Nowicki, Marek ; Muite, Benson K. ; Boysan, Mehmet Can ; Bianchi, Federico ; Cremaschi, Marco ; Moussa, Rim ; Abdel-Karim, Benjamin M. ; Pfeuffer, Nicolas ; Hinz, Oliver ; Plauth, Max ; Polze, Andreas ; Huo, Da ; Melo, Gerard de ; Mendes Soares, Fábio ; Oliveira, Roberto Célio Limão de ; Benson, Lawrence ; Paul, Fabian ; Werling, Christian ; Windheuser, Fabian ; Stojanovic, Dragan ; Djordjevic, Igor ; Stojanovic, Natalija ; Stojnev Ilic, Aleksandra ; Weidmann, Vera ; Lowitzki, Leon ; Wagner, Markus ; Ifa, Abdessatar Ben ; Arlos, Patrik ; Megia, Ana ; Vendrell, Joan ; Pfitzner, Bjarne ; Redondo, Alberto ; Ríos Insua, David ; Albert, Justin Amadeus ; Zhou, Lin ; Arnrich, Bert ; Szabó, Ildikó ; Fodor, Szabina ; Ternai, Katalin ; Bhowmik, Rajarshi ; Campero Durand, Gabriel ; Shevchenko, Pavlo ; Malysheva, Milena ; Prymak, Ivan ; Saake, Gunter

The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners. The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies. This technical report presents results of research projects executed in 2019. Selected projects have presented their results on April 9th and November 12th 2019 at the Future SOC Lab Day events.

Memory-efficient data management for spatio-temporal applications (2024)

Richly, Keven

The wide distribution of location-acquisition technologies means that large volumes of spatio-temporal data are continuously being accumulated. Positioning systems such as GPS enable the tracking of various moving objects' trajectories, which are usually represented by a chronologically ordered sequence of observed locations. The analysis of movement patterns based on detailed positional information creates opportunities for applications that can improve business decisions and processes in a broad spectrum of industries (e.g., transportation, traffic control, or medicine). Due to the large data volumes generated in these applications, the cost-efficient storage of spatio-temporal data is desirable, especially when in-memory database systems are used to achieve interactive performance requirements. To efficiently utilize the available DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes structures). By considering horizontal data partitioning, we can independently apply different tuning options on a fine-grained level. However, the selection of cost and performance-balancing configurations is challenging, due to the vast number of possible setups consisting of mutually dependent individual decisions. In this thesis, we introduce multiple approaches to improve spatio-temporal data management by automatically optimizing diverse tuning options for the application-specific access patterns and data characteristics. Our contributions are as follows: (1) We introduce a novel approach to determine fine-grained table configurations for spatio-temporal workloads. Our linear programming (LP) approach jointly optimizes the (i) data compression, (ii) ordering, (iii) indexing, and (iv) tiering. We propose different models which address cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload, memory budgets, and data characteristics. To yield maintainable and robust configurations, we further extend our LP-based approach to incorporate reconfiguration costs as well as optimizations for multiple potential workload scenarios. (2) To optimize the storage layout of timestamps in columnar databases, we present a heuristic approach for the workload-driven combined selection of a data layout and compression scheme. By considering attribute decomposition strategies, we are able to apply application-specific optimizations that reduce the memory footprint and improve performance. (3) We introduce an approach that leverages past trajectory data to improve the dispatch processes of transportation network companies. Based on location probabilities, we developed risk-averse dispatch strategies that reduce critical delays. (4) Finally, we used the use case of a transportation network company to evaluate our database optimizations on a real-world dataset. We demonstrate that workload-driven fine-grained optimizations allow us to reduce the memory footprint (up to 71% by equal performance) or increase the performance (up to 90% by equal memory size) compared to established rule-based heuristics. Individually, our contributions provide novel approaches to the current challenges in spatio-temporal data mining and database research. Combining them allows in-memory databases to store and process spatio-temporal data more cost-efficiently.

NPRportrait 1.0: A three-level benchmark for non-photorealistic rendering of portraits (2022)

Rosin, Paul L. ; Lai, Yu-Kun ; Mould, David ; Yi, Ran ; Berger, Itamar ; Doyle, Lars ; Lee, Seungyong ; Li, Chuan ; Liu, Yong-Jin ; Semmo, Amir ; Shamir, Ariel ; Son, Minjung ; Winnemöller, Holger

Recently, there has been an upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer (NST). However, the state of performance evaluation in this field is poor, especially compared to the norms in the computer vision and machine learning communities. Unfortunately, the task of evaluating image stylisation is thus far not well defined, since it involves subjective, perceptual, and aesthetic aspects. To make progress towards a solution, this paper proposes a new structured, three-level, benchmark dataset for the evaluation of stylised portrait images. Rigorous criteria were used for its construction, and its consistency was validated by user studies. Moreover, a new methodology has been developed for evaluating portrait stylisation algorithms, which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces. We perform evaluation for a wide variety of image stylisation methods (both portrait-specific and general purpose, and also both traditional NPR approaches and NST) using the new benchmark dataset.

Pollock: a data loading benchmark (2023)

Vitagliano, Gerardo ; Hameed, Mazhar ; Jiang, Lan ; Reisener, Lucas ; Wu, Eugene ; Naumann, Felix

Any system at play in a data-driven project has a fundamental requirement: the ability to load data. The de-facto standard format to distribute and consume raw data is CSV. Yet, the plain text and flexible nature of this format make such files often difficult to parse and correctly load their content, requiring cumbersome data preparation steps. We propose a benchmark to assess the robustness of systems in loading data from non-standard CSV formats and with structural inconsistencies. First, we formalize a model to describe the issues that affect real-world files and use it to derive a systematic lpollutionz process to generate dialects for any given grammar. Our benchmark leverages the pollution framework for the csv format. To guide pollution, we have surveyed thousands of real-world, publicly available csv files, recording the problems we encountered. We demonstrate the applicability of our benchmark by testing and scoring 16 different systems: popular csv parsing frameworks, relational database tools, spreadsheet systems, and a data visualization tool.

Pilot study to evaluate usability and acceptability of the 'Animated Alcohol Assessment Tool' in Russian primary healthcare (2022)

Wiemker, Veronika ; Bunova, Anna ; Neufeld, Maria ; Gornyi, Boris ; Yurasova, Elena ; Konigorski, Stefan ; Kalinina, Anna ; Kontsevaya, Anna ; Ferreira-Borges, Carina ; Probst, Charlotte

Background and aims: Accurate and user-friendly assessment tools quantifying alcohol consumption are a prerequisite to effective prevention and treatment programmes, including Screening and Brief Intervention. Digital tools offer new potential in this field. We developed the ‘Animated Alcohol Assessment Tool’ (AAA-Tool), a mobile app providing an interactive version of the World Health Organization's Alcohol Use Disorders Identification Test (AUDIT) that facilitates the description of individual alcohol consumption via culturally informed animation features. This pilot study evaluated the Russia-specific version of the Animated Alcohol Assessment Tool with regard to (1) its usability and acceptability in a primary healthcare setting, (2) the plausibility of its alcohol consumption assessment results and (3) the adequacy of its Russia-specific vessel and beverage selection. Methods: Convenience samples of 55 patients (47% female) and 15 healthcare practitioners (80% female) in 2 Russian primary healthcare facilities self-administered the Animated Alcohol Assessment Tool and rated their experience on the Mobile Application Rating Scale – User Version. Usage data was automatically collected during app usage, and additional feedback on regional content was elicited in semi-structured interviews. Results: On average, patients completed the Animated Alcohol Assessment Tool in 6:38 min (SD = 2.49, range = 3.00–17.16). User satisfaction was good, with all subscale Mobile Application Rating Scale – User Version scores averaging >3 out of 5 points. A majority of patients (53%) and practitioners (93%) would recommend the tool to ‘many people’ or ‘everyone’. Assessed alcohol consumption was plausible, with a low number (14%) of logically impossible entries. Most patients reported the Animated Alcohol Assessment Tool to reflect all vessels (78%) and all beverages (71%) they typically used. Conclusion: High acceptability ratings by patients and healthcare practitioners, acceptable completion time, plausible alcohol usage assessment results and perceived adequacy of region-specific content underline the Animated Alcohol Assessment Tool's potential to provide a novel approach to alcohol assessment in primary healthcare. After its validation, the Animated Alcohol Assessment Tool might contribute to reducing alcohol-related harm by facilitating Screening and Brief Intervention implementation in Russia and beyond.

LazyFox: fast and parallelized overlapping community detection in large graphs (2023)

Garrels, Tim ; Khodabakhsh, Athar ; Renard, Bernhard Y. ; Baum, Katharina

The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug discovery. While most existing algorithms provide fast approaches for community detection, their results usually contain strictly separated communities. However, most datasets would semantically allow for or even require overlapping communities that can only be determined at much higher computational cost. We build on an efficient algorithm, FOX, that detects such overlapping communities. FOX measures the closeness of a node to a community by approximating the count of triangles which that node forms with that community. We propose LAZYFOX, a multi-threaded adaptation of the FOX algorithm, which provides even faster detection without an impact on community quality. This allows for the analyses of significantly larger and more complex datasets. LAZYFOX enables overlapping community detection on complex graph datasets with millions of nodes and billions of edges in days instead of weeks. As part of this work, LAZYFOX's implementation was published and is available as a tool under an MIT licence at https://github.com/TimGarrels/LazyFox.

1 to 10

004 Datenverarbeitung; Informatik

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

137 search hits