004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Monograph/Edited Volume (88)
- Article (41)
- Doctoral Thesis (40)
- Conference Proceeding (10)
- Other (1)
- Preprint (1)
Language
- English (181) (remove)
Keywords
- Hasso-Plattner-Institut (7)
- Cloud Computing (6)
- Forschungskolleg (6)
- Hasso Plattner Institute (6)
- Klausurtagung (6)
- Modellierung (6)
- Service-oriented Systems Engineering (6)
- cloud computing (6)
- Datenintegration (5)
- Forschungsprojekte (5)
- Future SOC Lab (5)
- In-Memory Technologie (5)
- Multicore Architekturen (5)
- Geschäftsprozessmanagement (4)
- Research School (4)
- Verifikation (4)
- data integration (4)
- graph transformation (4)
- middleware (4)
- virtual machines (4)
- Betriebssysteme (3)
- Graphtransformationen (3)
- Model Synchronisation (3)
- Model Transformation (3)
- Model-Driven Engineering (3)
- Modeling (3)
- Ph.D. Retreat (3)
- Ph.D. retreat (3)
- Privacy (3)
- Process Mining (3)
- Prozessmodellierung (3)
- Tripel-Graph-Grammatik (3)
- Virtualisierung (3)
- Virtuelle Maschinen (3)
- business process management (3)
- multicore architectures (3)
- operating systems (3)
- process mining (3)
- research projects (3)
- security (3)
- service-oriented systems engineering (3)
- verification (3)
- AUTOSAR (2)
- Abstraktion (2)
- Aspektorientierte Softwareentwicklung (2)
- Assoziationsregeln (2)
- Asynchrone Schaltung (2)
- BPMN (2)
- Bayesian networks (2)
- Bitcoin (2)
- CSC (2)
- CSCW (2)
- Cloud-Sicherheit (2)
- Cloud-Speicher (2)
- Data Integration (2)
- Data profiling (2)
- Design Thinking (2)
- Evolution (2)
- Graphtransformationssysteme (2)
- HCI (2)
- In-Memory technology (2)
- Kollaborationen (2)
- Laufzeitmodelle (2)
- Link-Entdeckung (2)
- Megamodell (2)
- Middleware (2)
- Model Synchronization (2)
- Modell (2)
- Process Modeling (2)
- RDF (2)
- Ressourcenoptimierung (2)
- Runtime analysis (2)
- SQL (2)
- STG decomposition (2)
- STG-Dekomposition (2)
- Service-Orientierte Architekturen (2)
- Sicherheit (2)
- SysML (2)
- Systemsoftware (2)
- Visualisierung (2)
- abstraction (2)
- adaptive Systeme (2)
- adaptive systems (2)
- big data services (2)
- cloud security (2)
- collaboration (2)
- data (2)
- data profiling (2)
- debugging (2)
- design thinking (2)
- digital identity (2)
- distributed systems (2)
- duplicate detection (2)
- graph constraints (2)
- in-memory technology (2)
- incremental graph pattern matching (2)
- law (2)
- missing data (2)
- model (2)
- model transformation (2)
- model-driven engineering (2)
- modeling (2)
- modellgetriebene Entwicklung (2)
- nested graph conditions (2)
- non-photorealistic rendering (2)
- privacy (2)
- programming (2)
- research school (2)
- schema discovery (2)
- self-sovereign identity (2)
- service-oriented systems (2)
- simulation (2)
- software engineering (2)
- stochastic Petri nets (2)
- stochastische Petri Netze (2)
- systems of systems (2)
- systems software (2)
- testing (2)
- virtualization (2)
- virtuelle Maschinen (2)
- visualization (2)
- "Big Data"-Dienste (1)
- 3D Computer Grafik (1)
- 3D Computer Graphics (1)
- 3D Drucken (1)
- 3D Semiotik (1)
- 3D Visualisierung (1)
- 3D point clouds (1)
- 3D printing (1)
- 3D semiotics (1)
- 3D visualization (1)
- APX-hardness (1)
- Abhängigkeiten (1)
- Abstraktion von Geschäftsprozessmodellen (1)
- Actor (1)
- Actor model (1)
- Analyse (1)
- Anfragepaare (1)
- Anisotroper Kuwahara Filter (1)
- Anomalien (1)
- Application (1)
- Apriori (1)
- Architektur (1)
- Aspect-oriented Programming (1)
- Association Rule Mining (1)
- Asynchronous circuit (1)
- Attribut-Merge-Prozess (1)
- Attribute Merge Process (1)
- Attribute aggregation (1)
- Ausführung von Modellen (1)
- Ausführungsgeschichte (1)
- Authentication (1)
- BPM (1)
- Batchprozesse (1)
- Bayes'sche Netze (1)
- Bayessche Netze (1)
- Bedingte Inklusionsabhängigkeiten (1)
- Behavior (1)
- Behaviour Analysis (1)
- Berührungseingaben (1)
- Beschränkungen und Abhängigkeiten (1)
- Bidirectional order dependencies (1)
- Bildverarbeitung (1)
- Bisimulation (1)
- Blockchain (1)
- Blockchains (1)
- Business Process Models (1)
- CCS Concepts (1)
- CEP (1)
- Carrera Digital D132 (1)
- Case management (1)
- Change Management (1)
- Cloud (1)
- Cloud Datenzentren (1)
- Cloud computing (1)
- Coccinelle (1)
- Compliance (1)
- Compliance checking (1)
- Composition (1)
- Computational photography (1)
- Computer crime (1)
- Conditional Inclusion Dependency (1)
- Conformance Überprüfung (1)
- Consistency (1)
- Constraints (1)
- Context-oriented Programming (1)
- Contracts (1)
- Controller-Resynthese (1)
- Critical pairs (1)
- Cryptography (1)
- Currencies (1)
- Cyber-Physical Systems (1)
- Cyber-Physical-Systeme (1)
- Cyber-physical-systems (1)
- Data Dependency (1)
- Data Modeling (1)
- Data Profiling (1)
- Data Quality (1)
- Data Warehouse (1)
- Data dependencies (1)
- Data integration (1)
- Data mining (1)
- Data-centric (1)
- Database Cost Model (1)
- Databases (1)
- Daten (1)
- Datenabhängigkeiten (1)
- Datenabhängigkeiten-Entdeckung (1)
- Datenanalyse (1)
- Datenbank-Kostenmodell (1)
- Datenbanken (1)
- Datenextraktion (1)
- Datenflusskorrektheit (1)
- Datenkorrektheit (1)
- Datenmodellierung (1)
- Datenobjekte (1)
- Datenqualität (1)
- Datenreinigung (1)
- Datenvertraulichkeit (1)
- Datenzustände (1)
- Deadline-Verbreitung (1)
- Deep learning (1)
- Delta preservation (1)
- Dependency discovery (1)
- Deurema Modellierungssprache (1)
- Differential Privacy (1)
- Differenz von Gauss Filtern (1)
- Digitale Whiteboards (1)
- Disambiguierung (1)
- Discrimination Networks (1)
- Distributed computing (1)
- Distributed programming (1)
- Duplicate Detection (1)
- Duplikaterkennung (1)
- Dynamic Type System (1)
- Dynamische Typ Systeme (1)
- E-Learning (1)
- EHR (1)
- EPA (1)
- Echtzeit (1)
- Echtzeitsysteme (1)
- Ecosystems (1)
- Eingabegenauigkeit (1)
- Elektronische Patientenakte (1)
- Energieeffizienz (1)
- Entity resolution (1)
- Entwicklungswerkzeuge (1)
- Entwurfsmuster für SOA-Sicherheit (1)
- Ereignisabstraktion (1)
- Ereignisse (1)
- Erfahrungsbericht (1)
- Erfüllbarkeitsanalyse (1)
- Erkennen von Meta-Daten (1)
- Estimation-of-distribution algorithm (1)
- Evolution in MDE (1)
- Evolutionary algorithms (1)
- Extract-Transform-Load (ETL) (1)
- FMC-QE (1)
- FRP (1)
- Fallstudie (1)
- Federated learning (1)
- Feedback Loop Modellierung (1)
- Feedback Loops (1)
- Fehlende Daten (1)
- Fehlerbeseitigung (1)
- Fehlersuche (1)
- Fitness-distance correlation (1)
- Flussgesteuerter Bilateraler Filter (1)
- Focus+Context Visualization (1)
- Fokus-&-Kontext Visualisierung (1)
- Formale Verifikation (1)
- Functional Lenses (1)
- Functional dependencies (1)
- GPU (1)
- Gebäudemodelle (1)
- Geländemodelle (1)
- General Earth and Planetary Sciences (1)
- Generalisierung (1)
- Geodaten (1)
- Geography, Planning and Development (1)
- Geschäftsprozesse (1)
- Geschäftsprozessmodelle (1)
- Gesetze (1)
- Graph databases (1)
- Graph homomorphisms (1)
- Graph repair (1)
- Graph transformation (1)
- Graph-Constraints (1)
- Graph-basierte Suche (1)
- Graphbedingungen (1)
- Graphdatenbanken (1)
- Graphtransformation (1)
- HENSHIN (1)
- Hasso-Plattner-Institute (1)
- Hauptspeicher Technologie (1)
- Hauptspeicherdatenbank (1)
- Herodotos (1)
- HiGHmed (1)
- History of pattern occurrences (1)
- Homomorphe Verschlüsselung (1)
- IOPS (1)
- IT-Security (1)
- IT-Sicherheit (1)
- Identity management systems (1)
- Image (1)
- Image-based rendering (1)
- In-Memory Database (1)
- In-Memory Datenbank (1)
- In-Memory-Datenbank (1)
- Inclusion dependencies (1)
- Index (1)
- Index Structures (1)
- Indexstrukturen (1)
- Individuen (1)
- Industries (1)
- Industry 4.0 (1)
- Infinite State (1)
- Information Extraction (1)
- Information Systems (1)
- Informationsextraktion (1)
- Informationssysteme (1)
- Informationsvorhaltung (1)
- Initial conflicts (1)
- Inklusionsabhängigkeit (1)
- Inkrementelle Graphmustersuche (1)
- Innovation (1)
- Innovationsmanagement (1)
- Innovationsmethode (1)
- Interactive Rendering (1)
- Interaktionsmodel (1)
- Interaktives Rendering (1)
- Internet applications (1)
- Internet of Things (1)
- Internetanwendungen (1)
- Invariant-Checking (1)
- Invarianten (1)
- Invariants (1)
- JCop (1)
- Java (1)
- Kartografisches Design (1)
- Komplexität (1)
- Komplexitätsbewältigung (1)
- Komposition (1)
- Kontext (1)
- LEGO Mindstorms EV3 (1)
- LIDAR (1)
- LOD (1)
- LSTM (1)
- Landmarken (1)
- Laser Cutten (1)
- Laufzeitanalyse (1)
- Laufzeitverhalten (1)
- Leadership (1)
- Leistungsfähigkeit (1)
- Leistungsvorhersage (1)
- Licenses (1)
- Link Discovery (1)
- Linked Data (1)
- Linked Open Data (1)
- Live-Programmierung (1)
- Lively Kernel (1)
- Logiksynthese (1)
- MDE Ansatz (1)
- MDE settings (1)
- MOOCs (1)
- Management (1)
- Matroids (1)
- Megamodel (1)
- Megamodels (1)
- Mehrkernsysteme (1)
- Metadata Discovery (1)
- Metadatenentdeckung (1)
- Metadatenqualität (1)
- Mobile Application Development (1)
- Mobilgeräte (1)
- Model Consistency (1)
- Model Execution (1)
- Model Management (1)
- Model repair (1)
- Model verification (1)
- Model-driven (1)
- Modeling Languages (1)
- Modell Management (1)
- Modell-driven Security (1)
- Modell-getriebene Sicherheit (1)
- Modell-getriebene Softwareentwicklung (1)
- Modellerzeugung (1)
- Modellgetrieben (1)
- Modellgetriebene Entwicklung (1)
- Modellgetriebene Softwareentwicklung (1)
- Modellierungssprachen (1)
- Modellkonsistenz (1)
- Modelltransformation (1)
- Modelltransformationen (1)
- Models at Runtime (1)
- Morphic (1)
- Multi-Instanzen (1)
- Multi-objective optimization (1)
- Multicore architectures (1)
- Muster (1)
- Musterabgleich (1)
- Mustererkennung (1)
- Mutation operators (1)
- Nested Graph Conditions (1)
- Nested graph conditions (1)
- Newspeak (1)
- Nicht-photorealistisches Rendering (1)
- Nichtfotorealistische Bildsynthese (1)
- Nutzungsinteresse (1)
- O (1)
- Object Constraint Programming (1)
- Object-Oriented Programming (1)
- Objekt-Constraint Programmierung (1)
- Objekt-Orientiertes Programmieren (1)
- Objekt-orientiertes Programmieren mit Constraints (1)
- Objektlebenszyklus-Synchronisation (1)
- Online Course (1)
- Online-Learning (1)
- Online-Lernen (1)
- Onlinekurs (1)
- Order dependencies (1)
- Organisationsveränderung (1)
- PRISM Modell-Checker (1)
- PRISM model checker (1)
- PTCTL (1)
- Parallelization (1)
- Pattern Matching (1)
- Patterns (1)
- Performance (1)
- Performance Prediction (1)
- Petri net Mapping (1)
- Petri net mapping (1)
- Petrinetz (1)
- Point-based rendering (1)
- Probabilistische Modelle (1)
- Process (1)
- Process Enactment (1)
- Process Modelling (1)
- Programmierung (1)
- Programming Languages (1)
- Propagation von Aktivitätsinstanzzuständen (1)
- Protocols (1)
- Prozess (1)
- Prozess- und Datenintegration (1)
- Prozessarchitektur (1)
- Prozessausführung (1)
- Prozessautomatisierung (1)
- Prozesse (1)
- Prozesserhebung (1)
- Prozessinstanz (1)
- Prozessmodellsuche (1)
- Prozessoren (1)
- Prozessverfeinerung (1)
- Präsentation (1)
- Quantitative Analysen (1)
- Quantitative Modeling (1)
- Quantitative Modellierung (1)
- Query (1)
- Query execution (1)
- Query optimization (1)
- Queuing Theory (1)
- RT_PREEMT patch (1)
- RT_PREEMT-Patch (1)
- Relational data (1)
- Research Projects (1)
- Ressourcenmanagement (1)
- Run time analysis (1)
- Runtime Binding (1)
- SOA Security Pattern (1)
- SPARQL (1)
- Sammlungsdatentypen (1)
- Satisfiability (1)
- Scalability (1)
- Schema-Entdeckung (1)
- Schemaentdeckung (1)
- Schlüsselentdeckung (1)
- Search Algorithms (1)
- Security (1)
- Security Modelling (1)
- Self-Adaptive Software (1)
- Semantische Analyse (1)
- Sequential anomaly (1)
- Sequenzen von s/t-Pattern (1)
- Service-Oriented Architecture (1)
- Service-oriented Architectures (1)
- Service-orientierte Systeme (1)
- Service-orientierte Systme (1)
- Sicherheitsmodellierung (1)
- Signalflankengraph (SFG oder STG) (1)
- Similarity Measures (1)
- Similarity Search (1)
- Simulation (1)
- Skalierbarkeit (1)
- Smalltalk (1)
- SoaML (1)
- Softwareanalyse (1)
- Softwarearchitektur (1)
- Softwareentwicklung (1)
- Softwareentwicklungsprozesse (1)
- Softwareproduktlinien (1)
- Softwaretechnik (1)
- Softwaretest (1)
- Softwaretests (1)
- Softwarevisualisierung (1)
- Softwarewartung (1)
- Sozialen Medien (1)
- Speicheroptimierungen (1)
- Sprachspezifikation (1)
- Standards (1)
- Stilisierung (1)
- Structuring (1)
- Strukturierung (1)
- Studie (1)
- Submodular function (1)
- Submodular functions (1)
- Subset selection (1)
- Suchverfahren (1)
- Synchronisation (1)
- Synonyme (1)
- System of Systems (1)
- Systeme von Systemen (1)
- Systems of Systems (1)
- Tableaumethode (1)
- Tele-Lab (1)
- Tele-Teaching (1)
- Temporal Logic (1)
- Temporallogik (1)
- Test-getriebene Fehlernavigation (1)
- Testen (1)
- Theory (1)
- Threshold Cryptography (1)
- Time Augmented Petri Nets (1)
- Time series (1)
- Traceability (1)
- Tracking (1)
- Transformation (1)
- Transformationsebene (1)
- Transformationssequenzen (1)
- Travis CI (1)
- Triple Graph Grammar (1)
- Triple Graph Grammars (1)
- Triple-Graph-Grammatiken (1)
- Unbegrenzter Zustandsraum (1)
- Unique column combinations (1)
- Unveränderlichkeit (1)
- Usage Interest (1)
- VIL (1)
- Verbindungsnetzwerke (1)
- Verhalten (1)
- Verhaltensabstraktion (1)
- Verhaltensanalyse (1)
- Verhaltensbewahrung (1)
- Verhaltensverfeinerung (1)
- Verhaltensäquivalenz (1)
- Verification (1)
- Verletzung Auflösung (1)
- Verletzung Erklärung (1)
- Vernetzte Daten (1)
- Versionierung (1)
- Verteiltes Arbeiten (1)
- Verteilungsalgorithmen (1)
- Verteilungsalgorithmus (1)
- Verwaltung von Rechenzentren (1)
- Verzögerungs-Verbreitung (1)
- Videoanalyse (1)
- Videometadaten (1)
- Violation Explanation (1)
- Violation Resolution (1)
- Virtual machines (1)
- Visualization (1)
- Vocabulary (1)
- Vorhersage (1)
- Warteschlangentheorie (1)
- Wartung von Graphdatenbanksichten (1)
- Water Science and Technology (1)
- Web Sites (1)
- Web applications (1)
- Web of Data (1)
- Web-Anwendungen (1)
- Webseite (1)
- Well-structuredness (1)
- Wikipedia (1)
- Wohlstrukturiertheit (1)
- Zeitbehaftete Petri Netze (1)
- Zugriffskontrolle (1)
- access control (1)
- activity instance state propagation (1)
- adoption (1)
- analysis (1)
- anisotropic Kuwahara filter (1)
- annotation (1)
- anomalies (1)
- approximation (1)
- apriori (1)
- architecture (1)
- architecture recovery (1)
- argumentation research (1)
- aspect adapter (1)
- aspect oriented programming (1)
- aspect-oriented (1)
- aspects (1)
- aspectualization (1)
- association rule mining (1)
- asynchronous circuit (1)
- attacks (1)
- attribute assurance (1)
- ausführbare Semantiken (1)
- back-in-time (1)
- batch processing (1)
- behavior preservation (1)
- behavioral abstraction (1)
- behavioral equivalenc (1)
- behavioral refinement (1)
- behavioral specification (1)
- beschreibende Feldstudie (1)
- big data (1)
- biomarker detection (1)
- bisimulation (1)
- bitcoin (1)
- bpm (1)
- bug tracking (1)
- building models (1)
- business process architecture (1)
- business process model abstraction (1)
- business processes (1)
- cancer therapy (1)
- cartographic design (1)
- case study (1)
- center dot Computing (1)
- change management (1)
- changeability (1)
- cleansing (1)
- cloud (1)
- cloud datacenter (1)
- cloud storage (1)
- cognition (1)
- coherence-enhancing filtering (1)
- collaborations (1)
- collection types (1)
- complexity (1)
- complexity dichotomy (1)
- comprehension (1)
- computer science (1)
- concurrency (1)
- concurrent graph rewriting (1)
- conditions (1)
- confidentiality (1)
- conflicts and dependencies in (1)
- conformance analysis (1)
- conformance checking (1)
- consistency (1)
- context awareness (1)
- continuous integration (1)
- continuous testing (1)
- contract (1)
- control resynthesis (1)
- controlled experiment (1)
- corporate takeovers (1)
- crosscutting wrappers (1)
- cryptocurrency exchanges (1)
- cscw (1)
- cyber (1)
- cyber humanistic (1)
- cyber threat intelligence (1)
- cyber-physical systems (1)
- data center management (1)
- data center management (1)
- data correctness checking (1)
- data driven approaches (1)
- data extraction (1)
- data flow correctness (1)
- data migration (1)
- data objects (1)
- data preparation (1)
- data quality (1)
- data states (1)
- data transformation (1)
- data wrangling (1)
- database systems (1)
- database technology (1)
- deadline propagation (1)
- delay propagation (1)
- dental caries classification (1)
- dependable computing (1)
- dependencies (1)
- dependency discovery (1)
- deterministic properties (1)
- deurema modeling language (1)
- development tools (1)
- difference of Gaussians (1)
- differential privacy (1)
- diffusion (1)
- digital whiteboard (1)
- direct manipulation (1)
- direkte Manipulation (1)
- discrimination networks (1)
- distributed ledger technology (1)
- distribution algorithm (1)
- dynamic typing (1)
- dynamic consolidation (1)
- dynamic programming languages (1)
- dynamic reconfiguration (1)
- dynamische Programmiersprachen (1)
- dynamische Sprachen (1)
- dynamische Umsortierung (1)
- efficient deep learning (1)
- eindeutig (1)
- eingebettete Systeme (1)
- electronic health record (1)
- embedded systems (1)
- empirical studies (1)
- empirische Studien (1)
- energy efficiency (1)
- engine (1)
- engineering (1)
- entity alignment (1)
- erfahrbare Medien (1)
- evaluation (1)
- event abstraction (1)
- events (1)
- evolution (1)
- evolution in MDE (1)
- executable semantics (1)
- experience report (1)
- explainability (1)
- explainability-accuracy trade-off (1)
- explainable AI (1)
- exploratory programming (1)
- expression (1)
- external knowledge bases (1)
- failure model (1)
- feedback loop modeling (1)
- feedback loops (1)
- fehlende Daten (1)
- flow-based bilateral filter (1)
- forensics (1)
- formal framework (1)
- formal verification (1)
- formal verification methods (1)
- formale Verifikation (1)
- formales Framework (1)
- formalism (1)
- functional dependency (1)
- functional lenses (1)
- functional programming (1)
- funktionale Abhängigkeit (1)
- funktionale Programmierung (1)
- future SOC lab (1)
- ganzheitlich (1)
- gene (1)
- gene selection (1)
- generalization (1)
- geospatial data (1)
- gesture (1)
- graph clustering (1)
- graph databases (1)
- graph languages (1)
- graph pattern matching (1)
- graph queries (1)
- graph transformation systems (1)
- graph transformations (1)
- holistic (1)
- homomorphic encryption (1)
- human computer interaction (1)
- hybrid graph-transformation-systems (1)
- hybride Graph-Transformations-Systeme (1)
- identity broker (1)
- identity management (1)
- image captioning (1)
- image processing (1)
- immutable values (1)
- in-memory database (1)
- inclusion dependency (1)
- index (1)
- individuals (1)
- inductive invariant checking (1)
- induktives Invariant Checking (1)
- inkrementelles Graph Pattern Matching (1)
- innovation (1)
- innovation capabilities (1)
- innovation management (1)
- input accuracy (1)
- interaction (1)
- interactive simulation (1)
- interconnect (1)
- interface (1)
- interpretable machine learning (1)
- invariant checking (1)
- invasive aspects (1)
- k-Induktion (1)
- k-induction (1)
- k-inductive invariant checking (1)
- k-inductive invariants (1)
- k-induktive Invarianten (1)
- k-induktives Invariant-Checking (1)
- key discovery (1)
- knowledge building (1)
- knowledge discovery (1)
- knowledge management (1)
- kontinuierliche Integration (1)
- kontinuierliches Testen (1)
- kontrolliertes Experiment (1)
- künstliche Intelligenz (1)
- landmarks (1)
- language specification (1)
- leadership (1)
- link discovery (1)
- linked data (1)
- live programming (1)
- location-based (1)
- logic synthesis (1)
- main memory computing (1)
- management (1)
- many-core (1)
- map reduce (1)
- map/reduce (1)
- maschinelles Lernen (1)
- medical malpractice (1)
- mehrdimensionale Belangtrennung (1)
- memory optimization (1)
- metadata discovery (1)
- metadata quality (1)
- methodologie (1)
- metric learning (1)
- mobile (1)
- mobile devices (1)
- model generation (1)
- model-based prototyping (1)
- model-driven (1)
- modelgetriebene Entwicklung (1)
- modelling (1)
- modular counting (1)
- modularity (1)
- molecular tumor board (1)
- monitoring (1)
- morphic (1)
- multi-core (1)
- multi-dimensional separation of concerns (1)
- multi-instances (1)
- multimodal representations (1)
- mutli-task learning (1)
- nested application conditions (1)
- networks (1)
- neural (1)
- object life cycle synchronization (1)
- object-constraint programming (1)
- openHPI (1)
- organizational change (1)
- orts-basiert (1)
- parallel (1)
- parallel computing (1)
- paralleles Rechnen (1)
- partial application conditions (1)
- partielle Anwendungsbedingungen (1)
- performance (1)
- periodic tasks (1)
- periodische Aufgaben (1)
- personalized medicine (1)
- petri net (1)
- power-law (1)
- prediction (1)
- prefetching (1)
- presentation (1)
- prior knowledge (1)
- probabilistic models (1)
- probabilistic timed automata (1)
- probabilistische zeitbehaftete Automaten (1)
- process (1)
- process and data integration (1)
- process automation (1)
- process elicitation (1)
- process instance (1)
- process model search (1)
- process refinement (1)
- process scheduling (1)
- processes (1)
- processing (1)
- processor hardware (1)
- profiling (1)
- program (1)
- program analysis (1)
- programming language (1)
- quantitative analysis (1)
- query matching (1)
- querying (1)
- random I (1)
- random graphs (1)
- rapid prototyping (1)
- reactive (1)
- reaktive Programmierung (1)
- real-time (1)
- real-time systems (1)
- record linkage (1)
- recursive tuning (1)
- reflection (1)
- relational model transformation (1)
- relationale Modelltransformationen (1)
- remodularization (1)
- remote collaboration (1)
- representation learning (1)
- requirements engineering (1)
- resilient architectures (1)
- resource management (1)
- resource optimization (1)
- restoration (1)
- reusable aspects (1)
- robustness (1)
- runtime adaptations (1)
- runtime behavior (1)
- runtime models (1)
- s/t-pattern sequences (1)
- satisfiabilitiy solving (1)
- search plan generation (1)
- security chaos engineering (1)
- security policies (1)
- security risk assessment (1)
- self-driving (1)
- self-healing (1)
- self-supervised learning (1)
- semantic analysis (1)
- semantics preservation (1)
- service-oriented (1)
- signal transition graph (1)
- similarity (1)
- similarity learning (1)
- similarity measures (1)
- small files (1)
- smalltalk (1)
- software analysis (1)
- software architecture (1)
- software development (1)
- software development processes (1)
- software maintenance (1)
- software product lines (1)
- software tests (1)
- software visualization (1)
- speed independence (1)
- speed independent (1)
- standards (1)
- static analysis (1)
- statische Analyse (1)
- study (1)
- stylization (1)
- synchronization (1)
- synonym discovery (1)
- system of systems (1)
- t.BPM (1)
- tableau method (1)
- tangible media (1)
- teamwork (1)
- tele-TASK (1)
- terrain models (1)
- test-driven fault navigation (1)
- threshold cryptography (1)
- topics (1)
- tort law (1)
- touch input (1)
- transfer learning (1)
- transformation (1)
- transformation level (1)
- transformation sequences (1)
- triple graph grammars (1)
- trust (1)
- trust model (1)
- tuple spaces (1)
- typed graph transformation systems (1)
- unique (1)
- unsupervised methods (1)
- verschachtelte Anwednungsbedingungen (1)
- verschachtelte Graphbedingungen (1)
- versioning (1)
- verteilte Datenbanken (1)
- video analysis (1)
- video metadata (1)
- view maintenance (1)
- views (1)
- virtual 3D city models (1)
- virtual groups (1)
- virtualisierte IT-Infrastruktur (1)
- virtuelle 3D-Stadtmodelle (1)
- vulnerabilities (1)
- web-applications (1)
- weight (1)
- word sense disambiguation (1)
- workload prediction (1)
- zuverlässige Datenverarbeitung (1)
- zuverlässigen Datenverarbeitung (1)
- Ähnlichkeit (1)
- Ähnlichkeitsmaße (1)
- Ähnlichkeitssuche (1)
- Änderbarkeit (1)
- Übereinstimmungsanalyse (1)
- Überwachung (1)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (181) (remove)
High annotation costs are a substantial bottleneck in applying deep learning architectures to clinically relevant use cases, substantiating the need for algorithms to learn from unlabeled data.
In this work, we propose employing self-supervised methods. To that end, we trained with three self-supervised algorithms on a large corpus of unlabeled dental images, which contained 38K bitewing radiographs (BWRs). We then applied the learned neural network representations on tooth-level dental caries classification, for which we utilized labels extracted from electronic health records (EHRs). Finally, a holdout test-set was established, which consisted of 343 BWRs and was annotated by three dental professionals and approved by a senior dentist.
This test-set was used to evaluate the fine-tuned caries classification models. Our experimental results demonstrate the obtained gains by pretraining models using self-supervised algorithms. These include improved caries classification performance (6 p.p. increase in sensitivity) and, most importantly, improved label-efficiency.
In other words, the resulting models can be fine-tuned using few labels (annotations).
Our results show that using as few as 18 annotations can produce >= 45% sensitivity, which is comparable to human-level diagnostic performance.
This study shows that self-supervision can provide gains in medical image analysis, particularly when obtaining labels is costly and expensive.
Modern data analysis tasks often involve control flow statements, such as the iterations in PageRank and K-means. To achieve scalability, developers usually implement these tasks in distributed dataflow systems, such as Spark and Flink. Designers of such systems have to choose between providing imperative or functional control flow constructs to users. Imperative constructs are easier to use, but functional constructs are easier to compile to an efficient dataflow job. We propose Mitos, a system where control flow is both easy to use and efficient. Mitos relies on an intermediate representation based on the static single assignment form. This allows us to abstract away from specific control flow constructs and treat any imperative control flow uniformly both when building the dataflow job and when coordinating the distributed execution.
Data encoding has been applied to database systems for decades as it mitigates bandwidth bottlenecks and reduces storage requirements. But even in the presence of these advantages, most in-memory database systems use data encoding only conservatively as the negative impact on runtime performance can be severe. Real-world systems with large parts being infrequently accessed and cost efficiency constraints in cloud environments require solutions that automatically and efficiently select encoding techniques, including heavy-weight compression. In this paper, we introduce workload-driven approaches to automaticaly determine memory budget-constrained encoding configurations using greedy heuristics and linear programming. We show for TPC-H, TPC-DS, and the Join Order Benchmark that optimized encoding configurations can reduce the main memory footprint significantly without a loss in runtime performance over state-of-the-art dictionary encoding. To yield robust selections, we extend the linear programming-based approach to incorporate query runtime constraints and mitigate unexpected performance regressions.
How inclusive are we?
(2022)
ACM SIGMOD, VLDB and other database organizations have committed to fostering an inclusive and diverse community, as do many other scientific organizations. Recently, different measures have been taken to advance these goals, especially for underrepresented groups. One possible measure is double-blind reviewing, which aims to hide gender, ethnicity, and other properties of the authors. <br /> We report the preliminary results of a gender diversity analysis of publications of the database community across several peer-reviewed venues, and also compare women's authorship percentages in both single-blind and double-blind venues along the years. We also obtained a cross comparison of the obtained results in data management with other relevant areas in Computer Science.
Correction to: Knowledge bases and software support for variant interpretation in precision oncology
(2021)
Precision oncology is a rapidly evolving interdisciplinary medical specialty. Comprehensive cancer panels are becoming increasingly available at pathology departments worldwide, creating the urgent need for scalable cancer variant annotation and molecularly informed treatment recommendations. A wealth of mainly academia-driven knowledge bases calls for software tools supporting the multi-step diagnostic process. We derive a comprehensive list of knowledge bases relevant for variant interpretation by a review of existing literature followed by a survey among medical experts from university hospitals in Germany. In addition, we review cancer variant interpretation tools, which integrate multiple knowledge bases. We categorize the knowledge bases along the diagnostic process in precision oncology and analyze programmatic access options as well as the integration of knowledge bases into software tools. The most commonly used knowledge bases provide good programmatic access options and have been integrated into a range of software tools. For the wider set of knowledge bases, access options vary across different parts of the diagnostic process. Programmatic access is limited for information regarding clinical classifications of variants and for therapy recommendations. The main issue for databases used for biological classification of pathogenic variants and pathway context information is the lack of standardized interfaces. There is no single cancer variant interpretation tool that integrates all identified knowledge bases. Specialized tools are available and need to be further developed for different steps in the diagnostic process.
Phe2vec
(2021)
Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its neighbors in the embedding space. Patients are linked to a disease if their embedded representation is close to the disease phenotype. Comparing Phe2vec and PheKB cohorts head-to-head using chart review, Phe2vec performed on par or better in nine out of ten diseases. Differently from other approaches, it can scale to any condition and was validated against widely adopted expert-based standards. Phe2vec aims to optimize clinical informatics research by augmenting current frameworks to characterize patients by condition and derive reliable disease cohorts.
Business processes are often specified in descriptive or normative models. Both types of models should adhere to internal and external regulations, such as company guidelines or laws. Employing compliance checking techniques, it is possible to verify process models against rules. While traditionally compliance checking focuses on well-structured processes, we address case management scenarios. In case management, knowledge workers drive multi-variant and adaptive processes. Our contribution is based on the fragment-based case management approach, which splits a process into a set of fragments. The fragments are synchronized through shared data but can, otherwise, be dynamically instantiated and executed. We formalize case models using Petri nets. We demonstrate the formalization for design-time and run-time compliance checking and present a proof-of-concept implementation. The application of the implemented compliance checking approach to a use case exemplifies its effectiveness while designing a case model. The empirical evaluation on a set of case models for measuring the performance of the approach shows that rules can often be checked in less than a second.
A simplified run time analysis of the univariate marginal distribution algorithm on LeadingOnes
(2021)
With elementary means, we prove a stronger run time guarantee for the univariate marginal distribution algorithm (UMDA) optimizing the LEADINGONES benchmark function in the desirable regime with low genetic drift. If the population size is at least quasilinear, then, with high probability, the UMDA samples the optimum in a number of iterations that is linear in the problem size divided by the logarithm of the UMDA's selection rate. This improves over the previous guarantee, obtained by Dang and Lehre (2015) via the deep level-based population method, both in terms of the run time and by demonstrating further run time gains from small selection rates. Under similar assumptions, we prove a lower bound that matches our upper bound up to constant factors.
Effective query optimization is a core feature of any database management system. While most query optimization techniques make use of simple metadata, such as cardinalities and other basic statistics, other optimization techniques are based on more advanced metadata including data dependencies, such as functional, uniqueness, order, or inclusion dependencies. This survey provides an overview, intuitive descriptions, and classifications of query optimization and execution strategies that are enabled by data dependencies. We consider the most popular types of data dependencies and focus on optimization strategies that target the optimization of relational database queries. The survey supports database vendors to identify optimization opportunities as well as DBMS researchers to find related work and open research questions.
We consider the subset selection problem for function f with constraint bound B that changes over time. Within the area of submodular optimization, various greedy approaches are commonly used. For dynamic environments we observe that the adaptive variants of these greedy approaches are not able to maintain their approximation quality. Investigating the recently introduced POMC Pareto optimization approach, we show that this algorithm efficiently computes a phi=(alpha(f)/2)(1 - 1/e(alpha)f)-approximation, where alpha(f) is the submodularity ratio of f, for each possible constraint bound b <= B. Furthermore, we show that POMC is able to adapt its set of solutions quickly in the case that B increases. Our experimental investigations for the influence maximization in social networks show the advantage of POMC over generalized greedy algorithms. We also consider EAMC, a new evolutionary algorithm with polynomial expected time guarantee to maintain phi approximation ratio, and NSGA-II with two different population sizes as advanced multi-objective optimization algorithm, to demonstrate their challenges in optimizing the maximum coverage problem. Our empirical analysis shows that, within the same number of evaluations, POMC is able to perform as good as NSGA-II under linear constraint, while EAMC performs significantly worse than all considered algorithms in most cases.
Recently, initial conflicts were introduced in the framework of M-adhesive categories as an important optimization of critical pairs. In particular, they represent a proper subset such that each conflict is represented in a minimal context by a unique initial one. The theory of critical pairs has been extended in the framework of M-adhesive categories to rules with nested application conditions (ACs), restricting the applicability of a rule and generalizing the well-known negative application conditions. A notion of initial conflicts for rules with ACs does not exist yet.
In this paper, on the one hand, we extend the theory of initial conflicts in the framework of M-adhesive categories to transformation rules with ACs. They represent a proper subset again of critical pairs for rules with ACs, and represent each conflict in a minimal context uniquely. They are moreover symbolic because we can show that in general no finite and complete set of conflicts for rules with ACs exists. On the other hand, we show that critical pairs are minimally M-complete, whereas initial conflicts are minimally complete. Finally, we introduce important special cases of rules with ACs for which we can obtain finite, minimally (M-)complete sets of conflicts.
Bitcoin is gaining traction as an alternative store of value. Its market capitalization transcends all other cryptocurrencies in the market. But its high monetary value also makes it an attractive target to cyber criminal actors. Hacking campaigns usually target an ecosystem's weakest points. In Bitcoin, the exchange platforms are one of them. Each exchange breach is a threat not only to direct victims, but to the credibility of Bitcoin's entire ecosystem. Based on an extensive analysis of 36 breaches of Bitcoin exchanges, we show the attack patterns used to exploit Bitcoin exchange platforms using an industry standard for reporting intelligence on cyber security breaches. Based on this we are able to provide an overview of the most common attack vectors, showing that all except three hacks were possible due to relatively lax security. We show that while the security regimen of Bitcoin exchanges is subpar compared to other financial service providers, the use of stolen credentials, which does not require any hacking, is decreasing. We also show that the amount of BTC taken during a breach is decreasing, as well as the exchanges that terminate after being breached. Furthermore we show that overall security posture has improved, but still has major flaws. To discover adversarial methods post-breach, we have analyzed two cases of BTC laundering. Through this analysis we provide insight into how exchange platforms with lax cyber security even further increase the intermediary risk introduced by them into the Bitcoin ecosystem.
A core operator of evolutionary algorithms (EAs) is the mutation. Recently, much attention has been devoted to the study of mutation operators with dynamic and non-uniform mutation rates. Following up on this area of work, we propose a new mutation operator and analyze its performance on the (1 + 1) Evolutionary Algorithm (EA). Our analyses show that this mutation operator competes with pre-existing ones, when used by the (1 + 1) EA on classes of problems for which results on the other mutation operators are available. We show that the (1 + 1) EA using our mutation operator finds a (1/3)-approximation ratio on any non-negative submodular function in polynomial time. We also consider the problem of maximizing a symmetric submodular function under a single matroid constraint and show that the (1 + 1) EA using our operator finds a (1/3)-approximation within polynomial time. This performance matches that of combinatorial local search algorithms specifically designed to solve these problems and outperforms them with constant probability. Finally, we evaluate the performance of the (1 + 1) EA using our operator experimentally by considering two applications: (a) the maximum directed cut problem on real-world graphs of different origins, with up to 6.6 million vertices and 56 million edges and (b) the symmetric mutual information problem using a four month period air pollution data set. In comparison with uniform mutation and a recently proposed dynamic scheme, our operator comes out on top on these instances.
Intrinsic decomposition refers to the problem of estimating scene characteristics, such as albedo and shading, when one view or multiple views of a scene are provided. The inverse problem setting, where multiple unknowns are solved given a single known pixel-value, is highly under-constrained. When provided with correlating image and depth data, intrinsic scene decomposition can be facilitated using depth-based priors, which nowadays is easy to acquire with high-end smartphones by utilizing their depth sensors. In this work, we present a system for intrinsic decomposition of RGB-D images on smartphones and the algorithmic as well as design choices therein. Unlike state-of-the-art methods that assume only diffuse reflectance, we consider both diffuse and specular pixels. For this purpose, we present a novel specularity extraction algorithm based on a multi-scale intensity decomposition and chroma inpainting. At this, the diffuse component is further decomposed into albedo and shading components. We use an inertial proximal algorithm for non-convex optimization (iPiano) to ensure albedo sparsity. Our GPU-based visual processing is implemented on iOS via the Metal API and enables interactive performance on an iPhone 11 Pro. Further, a qualitative evaluation shows that we are able to obtain high-quality outputs. Furthermore, our proposed approach for specularity removal outperforms state-of-the-art approaches for real-world images, while our albedo and shading layer decomposition is faster than the prior work at a comparable output quality. Manifold applications such as recoloring, retexturing, relighting, appearance editing, and stylization are shown, each using the intrinsic layers obtained with our method and/or the corresponding depth data.
We introduce a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of least-changing graph repairs from which a user may select a graph repair based on non-formalized further requirements. This incremental approach features delta preservation as it allows to restrict the generation of graph repairs to delta-preserving graph repairs, which do not revert the additions and deletions of the most recent consistency-violating graph update. We specify consistency of graphs using the logic of nested graph conditions, which is equivalent to first-order logic on graphs. Technically, the incremental approach encodes if and how the graph under repair satisfies a graph condition using the novel data structure of satisfaction trees, which are adapted incrementally according to the graph updates applied. In addition to the incremental approach, we also present two state-based graph repair algorithms, which restore consistency of a graph independent of the most recent graph update and which generate additional graph repairs using a global perspective on the graph under repair. We evaluate the developed algorithms using our prototypical implementation in the tool AutoGraph and illustrate our incremental approach using a case study from the graph database domain.
This special issue contains extended versions of four selected papers from the 11th International Conference on Graph Transformation (ICGT 2018). The articles cover a tool for computing core graphs via SAT/SMT solvers (graph language definition), graph transformation through graph surfing in reaction systems (a new graph transformation formalism), the essence and initiality of conflicts in M-adhesive transformation systems, and a calculus of concurrent graph-rewriting processes (theory on conflicts and parallel independence).
This paper shows that the law, in subtle ways, may set hitherto unrecognized incentives for the adoption of explainable machine learning applications. In doing so, we make two novel contributions. First, on the legal side, we show that to avoid liability, professional actors, such as doctors and managers, may soon be legally compelled to use explainable ML models. We argue that the importance of explainability reaches far beyond data protection law, and crucially influences questions of contractual and tort liability for the use of ML models. To this effect, we conduct two legal case studies, in medical and corporate merger applications of ML. As a second contribution, we discuss the (legally required) trade-off between accuracy and explainability and demonstrate the effect in a technical case study in the context of spam classification.
There is an increasing interest in fusing data from heterogeneous sources. Combining data sources increases the utility of existing datasets, generating new information and creating services of higher quality. A central issue in working with heterogeneous sources is data migration: In order to share and process data in different engines, resource intensive and complex movements and transformations between computing engines, services, and stores are necessary.
Muses is a distributed, high-performance data migration engine that is able to interconnect distributed data stores by forwarding, transforming, repartitioning, or broadcasting data among distributed engines' instances in a resource-, cost-, and performance-adaptive manner. As such, it performs seamless information sharing across all participating resources in a standard, modular manner. We show an overall improvement of 30 % for pipelining jobs across multiple engines, even when we count the overhead of Muses in the execution time. This performance gain implies that Muses can be used to optimise large pipelines that leverage multiple engines.
Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients' anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.
First-class concepts
(2022)
Ideally, programs are partitioned into independently maintainable and understandable modules. As a system grows, its architecture gradually loses the capability to accommodate new concepts in a modular way. While refactoring is expensive and not always possible, and the programming language might lack dedicated primary language constructs to express certain cross-cutting concerns, programmers are still able to explain and delineate convoluted concepts through secondary means: code comments, use of whitespace and arrangement of code, documentation, or communicating tacit knowledge. <br /> Secondary constructs are easy to change and provide high flexibility in communicating cross-cutting concerns and other concepts among programmers. However, such secondary constructs usually have no reified representation that can be explored and manipulated as first-class entities through the programming environment. <br /> In this exploratory work, we discuss novel ways to express a wide range of concepts, including cross-cutting concerns, patterns, and lifecycle artifacts independently of the dominant decomposition imposed by an existing architecture. We propose the representation of concepts as first-class objects inside the programming environment that retain the capability to change as easily as code comments. We explore new tools that allow programmers to view, navigate, and change programs based on conceptual perspectives. In a small case study, we demonstrate how such views can be created and how the programming experience changes from draining programmers' attention by stretching it across multiple modules toward focusing it on cohesively presented concepts. Our designs are geared toward facilitating multiple secondary perspectives on a system to co-exist in symbiosis with the original architecture, hence making it easier to explore, understand, and explain complex contexts and narratives that are hard or impossible to express using primary modularity constructs.
ATIB
(2021)
Identity management is a principle component of securing online services. In the advancement of traditional identity management patterns, the identity provider remained a Trusted Third Party (TTP). The service provider and the user need to trust a particular identity provider for correct attributes amongst other demands. This paradigm changed with the invention of blockchain-based Self-Sovereign Identity (SSI) solutions that primarily focus on the users. SSI reduces the functional scope of the identity provider to an attribute provider while enabling attribute aggregation. Besides that, the development of new protocols, disregarding established protocols and a significantly fragmented landscape of SSI solutions pose considerable challenges for an adoption by service providers. We propose an Attribute Trust-enhancing Identity Broker (ATIB) to leverage the potential of SSI for trust-enhancing attribute aggregation. Furthermore, ATIB abstracts from a dedicated SSI solution and offers standard protocols. Therefore, it facilitates the adoption by service providers. Despite the brokered integration approach, we show that ATIB provides a high security posture. Additionally, ATIB does not compromise the ten foundational SSI principles for the users.
Data errors represent a major issue in most application workflows. Before any important task can take place, a certain data quality has to be guaranteed by eliminating a number of different errors that may appear in data. Typically, most of these errors are fixed with data preparation methods, such as whitespace removal. However, the particular error of duplicate records, where multiple records refer to the same entity, is usually eliminated independently with specialized techniques. Our work is the first to bring these two areas together by applying data preparation operations under a systematic approach prior to performing duplicate detection. <br /> Our process workflow can be summarized as follows: It begins with the user providing as input a sample of the gold standard, the actual dataset, and optionally some constraints to domain-specific data preparations, such as address normalization. The preparation selection operates in two consecutive phases. First, to vastly reduce the search space of ineffective data preparations, decisions are made based on the improvement or worsening of pair similarities. Second, using the remaining data preparations an iterative leave-one-out classification process removes preparations one by one and determines the redundant preparations based on the achieved area under the precision-recall curve (AUC-PR). Using this workflow, we manage to improve the results of duplicate detection up to 19% in AUC-PR.
TPC-H continues to be the most widely used benchmark for relational OLAP systems. It poses a number of challenges, also known as "choke points", which database systems have to solve in order to achieve good benchmark results. Examples include joins across multiple tables, correlated subqueries, and correlations within the TPC-H data set. Knowing the impact of such optimizations helps in developing optimizers as well as in interpreting TPC-H results across database systems.
This paper provides a systematic analysis of choke points and their optimizations. It complements previous work on TPC-H choke points by providing a quantitative discussion of their relevance. It focuses on eleven choke points where the optimizations are beneficial independently of the database system. Of these, the flattening of subqueries and the placement of predicates have the biggest impact. Three queries (Q2, Q17, and Q21) are strongly ifluenced by the choice of an efficient query plan; three others (Q1, Q13, and Q18) are less influenced by plan optimizations and more dependent on an efficient execution engine.
Indexes are essential for the efficient processing of database workloads. Proposed solutions for the relevant and challenging index selection problem range from metadata-based simple heuristics, over sophisticated multi-step algorithms, to approaches that yield optimal results. The main challenges are (i) to accurately determine the effect of an index on the workload cost while considering the interaction of indexes and (ii) a large number of possible combinations resulting from workloads containing many queries and massive schemata with possibly thousands of attributes. <br /> In this work, we describe and analyze eight index selection algorithms that are based on different concepts and compare them along different dimensions, such as solution quality, runtime, multi-column support, solution granularity, and complexity. In particular, we analyze the solutions of the algorithms for the challenging analytical Join Order, TPC-H, and TPC-DS benchmarks. Afterward, we assess strengths and weaknesses, infer insights for index selection in general and each approach individually, before we give recommendations on when to use which approach.
CloudStrike
(2020)
Most cyber-attacks and data breaches in cloud infrastructure are due to human errors and misconfiguration vulnerabilities. Cloud customer-centric tools are imperative for mitigating these issues, however existing cloud security models are largely unable to tackle these security challenges. Therefore, novel security mechanisms are imperative, we propose Risk-driven Fault Injection (RDFI) techniques to address these challenges. RDFI applies the principles of chaos engineering to cloud security and leverages feedback loops to execute, monitor, analyze and plan security fault injection campaigns, based on a knowledge-base. The knowledge-base consists of fault models designed from secure baselines, cloud security best practices and observations derived during iterative fault injection campaigns. These observations are helpful for identifying vulnerabilities while verifying the correctness of security attributes (integrity, confidentiality and availability). Furthermore, RDFI proactively supports risk analysis and security hardening efforts by sharing security information with security mechanisms. We have designed and implemented the RDFI strategies including various chaos engineering algorithms as a software tool: CloudStrike. Several evaluations have been conducted with CloudStrike against infrastructure deployed on two major public cloud infrastructure: Amazon Web Services and Google Cloud Platform. The time performance linearly increases, proportional to increasing attack rates. Also, the analysis of vulnerabilities detected via security fault injection has been used to harden the security of cloud resources to demonstrate the effectiveness of the security information provided by CloudStrike. Therefore, we opine that our approaches are suitable for overcoming contemporary cloud security issues.
Creation, collection and retention of knowledge in digital communities is an activity that currently requires being explicitly targeted as a secure method of keeping intellectual capital growing in the digital era. In particular, we consider it relevant to analyze and evaluate the empathetic cognitive personalities and behaviors that individuals now have with the change from face-to-face communication (F2F) to computer-mediated communication (CMC) online. This document proposes a cyber-humanistic approach to enhance the traditional SECI knowledge management model. A cognitive perception is added to its cyclical process following design thinking interaction, exemplary for improvement of the method in which knowledge is continuously created, converted and shared. In building a cognitive-centered model, we specifically focus on the effective identification and response to cognitive stimulation of individuals, as they are the intellectual generators and multiplicators of knowledge in the online environment. Our target is to identify how geographically distributed-digital-organizations should align the individual's cognitive abilities to promote iteration and improve interaction as a reliable stimulant of collective intelligence. The new model focuses on analyzing the four different stages of knowledge processing, where individuals with sympathetic cognitive personalities can significantly boost knowledge creation in a virtual social system. For organizations, this means that multidisciplinary individuals can maximize their extensive potential, by externalizing their knowledge in the correct stage of the knowledge creation process, and by collaborating with their appropriate sympathetically cognitive remote peers.
Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection.
Many important graph-theoretic notions can be encoded as counting graph homomorphism problems, such as partition functions in statistical physics, in particular independent sets and colourings. In this article, we study the complexity of #(p) HOMSTOH, the problem of counting graph homomorphisms from an input graph to a graph H modulo a prime number p. Dyer and Greenhill proved a dichotomy stating that the tractability of non-modular counting graph homomorphisms depends on the structure of the target graph. Many intractable cases in non-modular counting become tractable in modular counting due to the common phenomenon of cancellation. In subsequent studies on counting modulo 2, however, the influence of the structure of H on the tractability was shown to persist, which yields similar dichotomies. <br /> Our main result states that for every tree H and every prime p the problem #pHOMSTOH is either polynomial time computable or #P-p-complete. This relates to the conjecture of Faben and Jerrum stating that this dichotomy holds for every graph H when counting modulo 2. In contrast to previous results on modular counting, the tractable cases of #pHOMSTOH are essentially the same for all values of the modulo when H is a tree. To prove this result, we study the structural properties of a homomorphism. As an important interim result, our study yields a dichotomy for the problem of counting weighted independent sets in a bipartite graph modulo some prime p. These results are the first suggesting that such dichotomies hold not only for the modulo 2 case but also for the modular counting functions of all primes p.
Spreadsheets are among the most commonly used file formats for data management, distribution, and analysis. Their widespread employment makes it easy to gather large collections of data, but their flexible canvas-based structure makes automated analysis difficult without heavy preparation. One of the common problems that practitioners face is the presence of multiple, independent regions in a single spreadsheet, possibly separated by repeated empty cells. We define such files as "multiregion" files. In collections of various spreadsheets, we can observe that some share the same layout. We present the Mondrian approach to automatically identify layout templates across multiple files and systematically extract the corresponding regions. Our approach is composed of three phases: first, each file is rendered as an image and inspected for elements that could form regions; then, using a clustering algorithm, the identified elements are grouped to form regions; finally, every file layout is represented as a graph and compared with others to find layout templates. We compare our method to state-of-the-art table recognition algorithms on two corpora of real-world enterprise spreadsheets. Our approach shows the best performances in detecting reliable region boundaries within each file and can correctly identify recurring layouts across files.
Large real-world networks typically follow a power-law degree distribution. To study such networks, numerous random graph models have been proposed. However, real-world networks are not drawn at random. Therefore, Brach et al. (27th symposium on discrete algorithms (SODA), pp 1306-1325, 2016) introduced two natural deterministic conditions: (1) a power-law upper bound on the degree distribution (PLB-U) and (2) power-law neighborhoods, that is, the degree distribution of neighbors of each vertex is also upper bounded by a power law (PLB-N). They showed that many real-world networks satisfy both properties and exploit them to design faster algorithms for a number of classical graph problems. We complement their work by showing that some well-studied random graph models exhibit both of the mentioned PLB properties. PLB-U and PLB-N hold with high probability for Chung-Lu Random Graphs and Geometric Inhomogeneous Random Graphs and almost surely for Hyperbolic Random Graphs. As a consequence, all results of Brach et al. also hold with high probability or almost surely for those random graph classes. In the second part we study three classical NP-hard optimization problems on PLB networks. It is known that on general graphs with maximum degree Delta, a greedy algorithm, which chooses nodes in the order of their degree, only achieves a Omega (ln Delta)-approximation forMinimum Vertex Cover and Minimum Dominating Set, and a Omega(Delta)-approximation forMaximum Independent Set. We prove that the PLB-U property with beta>2 suffices for the greedy approach to achieve a constant-factor approximation for all three problems. We also show that these problems are APX-hard even if PLB-U, PLB-N, and an additional power-law lower bound on the degree distribution hold. Hence, a PTAS cannot be expected unless P = NP. Furthermore, we prove that all three problems are in MAX SNP if the PLB-U property holds.
In recent years, the increased interest in application areas such as social networks has resulted in a rising popularity of graph-based approaches for storing and processing large amounts of interconnected data. To extract useful information from the growing network structures, efficient querying techniques are required.
In this paper, we propose an approach for graph pattern matching that allows a uniform handling of arbitrary constraints over the query vertices. Our technique builds on a previously introduced matching algorithm, which takes concrete host graph information into account to dynamically adapt the employed search plan during query execution. The dynamic algorithm is combined with an existing static approach for search plan generation, resulting in a hybrid technique which we further extend by a more sophisticated handling of filtering effects caused by constraint checks. We evaluate the presented concepts empirically based on an implementation for our graph pattern matching tool, the Story Diagram Interpreter, with queries and data provided by the LDBC Social Network Benchmark. Our results suggest that the hybrid technique may improve search efficiency in several cases, and rarely reduces efficiency.
The automated detection of sequential anomalies in time series is an essential task for many applications, such as the monitoring of technical systems, fraud detection in high-frequency trading, or the early detection of disease symptoms. All these applications require the detection to find all sequential anomalies possibly fast on potentially very large time series. In other words, the detection needs to be effective, efficient and scalable w.r.t. the input size. Series2Graph is an effective solution based on graph embeddings that are robust against re-occurring anomalies and can discover sequential anomalies of arbitrary length and works without training data. Yet, Series2Graph is no t scalable due to its single-threaded approach; it cannot, in particular, process arbitrarily large sequences due to the memory constraints of a single machine. In this paper, we propose our distributed anomaly detection system, short DADS, which is an efficient and scalable adaptation of Series2Graph. Based on the actor programming model, DADS distributes the input time sequence, intermediate state and the computation to all processors of a cluster in a way that minimizes communication costs and synchronization barriers. Our evaluation shows that DADS is orders of magnitude faster than S2G, scales almost linearly with the number of processors in the cluster and can process much larger input sequences due to its scale-out property.
The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity-duplicates-into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. <br /> We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.
Challenges for self-driving database systems, which tune their physical design and configuration autonomously, are manifold: Such systems have to anticipate future workloads, find robust configurations efficiently, and incorporate knowledge gained by previous actions into later decisions. We present a component-based framework for self-driving database systems that enables database integration and development of self-managing functionality with low overhead by relying on separation of concerns. By keeping the components of the framework reusable and exchangeable, experiments are simplified, which promotes further research in that area. Moreover, to optimize multiple mutually dependent features, e.g., index selection and compression configurations, we propose a linear programming (LP) based algorithm to derive an efficient tuning order automatically. Afterwards, we demonstrate the applicability and scalability of our approach with reproducible examples.
Evaluating the performance of self-adaptive systems is challenging due to their interactions with often highly dynamic environments. In the specific case of self-healing systems, the performance evaluations of self-healing approaches and their parameter tuning rely on the considered characteristics of failure occurrences and the resulting interactions with the self-healing actions. In this paper, we first study the state-of-the-art for evaluating the performances of self-healing systems by means of a systematic literature review. We provide a classification of different input types for such systems and analyse the limitations of each input type. A main finding is that the employed inputs are often not sophisticated regarding the considered characteristics for failure occurrences. To further study the impact of the identified limitations, we present experiments demonstrating that wrong assumptions regarding the characteristics of the failure occurrences can result in large performance prediction errors, disadvantageous design-time decisions concerning the selection of alternative self-healing approaches, and disadvantageous deployment-time decisions concerning parameter tuning. Furthermore, the experiments indicate that employing multiple alternative input characteristics can help with reducing the risk of premature disadvantageous design-time decisions.
3D point clouds are a digital representation of our world and used in a variety of applications. They are captured with LiDAR or derived by image-matching approaches to get surface information of objects, e.g., indoor scenes, buildings, infrastructures, cities, and landscapes. We present novel interaction and visualization techniques for heterogeneous, time variant, and semantically rich 3D point clouds. Interactive and view-dependent see-through lenses are introduced as exploration tools to enhance recognition of objects, semantics, and temporal changes within 3D point cloud depictions. We also develop filtering and highlighting techniques that are used to dissolve occlusion to give context-specific insights. All techniques can be combined with an out-of-core real-time rendering system for massive 3D point clouds. We have evaluated the presented approach with 3D point clouds from different application domains. The results show the usability and how different visualization and exploration tasks can be improved for a variety of domain-specific applications.
Bidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical form with a novel, reactive, and distributed search strategy. Our evaluation on various datasets shows that DISTOD outperforms both single-threaded and distributed state-of-the-art bOD discovery algorithms by up to orders of magnitude; it can, in particular, process much larger datasets.
We contribute to the theoretical understanding of randomized search heuristics by investigating their optimization behavior on satisfiable random k-satisfiability instances both in the planted solution model and the uniform model conditional on satisfiability. Denoting the number of variables by n, our main technical result is that the simple () evolutionary algorithm with high probability finds a satisfying assignment in time when the clause-variable density is at least logarithmic. For low density instances, evolutionary algorithms seem to be less effective, and all we can show is a subexponential upper bound on the runtime for densities below . We complement these mathematical results with numerical experiments on a broader density spectrum. They indicate that, indeed, the () EA is less efficient on lower densities. Our experiments also suggest that the implicit constants hidden in our main runtime guarantee are low. Our main result extends and considerably improves the result obtained by Sutton and Neumann (Lect Notes Comput Sci 8672:942-951, 2014) in terms of runtime, minimum density, and clause length. These improvements are made possible by establishing a close fitness-distance correlation in certain parts of the search space. This approach might be of independent interest and could be useful for other average-case analyses of randomized search heuristics. While the notion of a fitness-distance correlation has been around for a long time, to the best of our knowledge, this is the first time that fitness-distance correlation is explicitly used to rigorously prove a performance statement for an evolutionary algorithm.
Generating a novel and descriptive caption of an image is drawing increasing interests in computer vision, natural language processing, and multimedia communities. In this work, we propose an end-to-end trainable deep bidirectional LSTM (Bi-LSTM (Long Short-Term Memory)) model to address the problem. By combining a deep convolutional neural network (CNN) and two separate LSTM networks, our model is capable of learning long-term visual-language interactions by making use of history and future context information at high-level semantic space. We also explore deep multimodal bidirectional models, in which we increase the depth of nonlinearity transition in different ways to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale, and vertical mirror are proposed to prevent over-fitting in training deep models. To understand how our models "translate" image to sentence, we visualize and qualitatively analyze the evolution of Bi-LSTM internal states over time. The effectiveness and generality of proposed models are evaluated on four benchmark datasets: Flickr8K, Flickr30K, MSCOCO, and Pascal1K datasets. We demonstrate that Bi-LSTM models achieve highly competitive performance on both caption generation and image-sentence retrieval even without integrating an additional mechanism (e.g., object detection, attention model). Our experiments also prove that multi-task learning is beneficial to increase model generality and gain performance. We also demonstrate the performance of transfer learning of the Bi-LSTM model significantly outperforms previous methods on the Pascal1K dataset.
Applications with different characteristics in the cloud may have different resources preferences. However, traditional resource allocation and scheduling strategies rarely take into account the characteristics of applications. Considering that an I/O-intensive application is a typical type of application and that frequent I/O accesses, especially small files randomly accessing the disk, may lead to an inefficient use of resources and reduce the quality of service (QoS) of applications, a weight allocation strategy is proposed based on the available resources that a physical server can provide as well as the characteristics of the applications. Using the weight obtained, a resource allocation and scheduling strategy is presented based on the specific application characteristics in the data center. Extensive experiments show that the strategy is correct and can guarantee a high concurrency of I/O per second (IOPS) in a cloud data center with high QoS. Additionally, the strategy can efficiently improve the utilization of the disk and resources of the data center without affecting the service quality of applications.
Industry 4.0 and the Internet of Things are recent developments that have lead to the creation of new kinds of manufacturing data. Linking this new kind of sensor data to traditional business information is crucial for enterprises to take advantage of the data’s full potential. In this paper, we present a demo which allows experiencing this data integration, both vertically between technical and business contexts and horizontally along the value chain. The tool simulates a manufacturing company, continuously producing both business and sensor data, and supports issuing ad-hoc queries that answer specific questions related to the business. In order to adapt to different environments, users can configure sensor characteristics to their needs.
Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application.
Commonly used technologies, such as J2EE and .NET, form de facto standards for the realization of complex distributed systems. Evolution of component systems has lead to web services and service-based architectures. This has been manifested in a multitude of industry standards and initiatives such as XML, WSDL UDDI, SOAP, etc. All these achievements lead to a new and promising paradigm in IT systems engineering which proposes to design complex software solutions as collaboration of contractually defined software services.
Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns.
The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the research school, this technical report covers a wide range of topics. These include but are not limited to: Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; and Services Specification, Composition, and Enactment.
Today, software has become an intrinsic part of complex distributed embedded real-time systems. The next generation of embedded real-time systems will interconnect the today unconnected systems via complex software parts and the service-oriented paradigm. Therefore besides timed behavior and probabilistic behaviour also structure dynamics, where the architecture can be subject to changes at run-time, e.g. when dynamic binding of service end-points is employed or complex collaborations are established dynamically, is required. However, a modeling and analysis approach that combines all these necessary aspects does not exist so far.
To fill the identified gap, we propose Probabilistic Timed Graph Transformation Systems (PTGTSs) as a high-level description language that supports all the necessary aspects of structure dynamics, timed behavior, and probabilistic behavior. We introduce the formal model of PTGTSs in this paper and present a mapping of models with finite state spaces to probabilistic timed automata (PTA) that allows to use the PRISM model checker to analyze PTGTS models with respect to PTCTL properties.
Every year, the Hasso Plattner Institute (HPI) invites guests from industry and academia to a collaborative scientific workshop on the topic Every year, the Hasso Plattner Institute (HPI) invites guests from industry and academia to a collaborative scientific workshop on the topic "Operating the Cloud". Our goal is to provide a forum for the exchange of knowledge and experience between industry and academia. Co-located with the event is the HPI's Future SOC Lab day, which offers an additional attractive and conducive environment for scientific and industry related discussions. "Operating the Cloud" aims to be a platform for productive interactions of innovative ideas, visions, and upcoming technologies in the field of cloud operation and administration.
On the occasion of this symposium we called for submissions of research papers and practitioner's reports. A compilation of the research papers realized during the fourth HPI cloud symposium "Operating the Cloud" 2016 are published in this proceedings. We thank the authors for exciting presentations and insights into their current work and research.
Moreover, we look forward to more interesting submissions for the upcoming symposium later in the year. Every year, the Hasso Plattner Institute (HPI) invites guests from industry and academia to a collaborative scientific workshop on the topic "Operating the Cloud". Our goal is to provide a forum for the exchange of knowledge and experience between industry and academia. Co-located with the event is the HPI's Future SOC Lab day, which offers an additional attractive and conducive environment for scientific and industry related discussions. "Operating the Cloud" aims to be a platform for productive interactions of innovative ideas, visions, and upcoming technologies in the field of cloud operation and administration.
While offering significant expressive power, graph transformation systems often come with rather limited capabilities for automated analysis, particularly if systems with many possible initial graphs and large or infinite state spaces are concerned. One approach that tries to overcome these limitations is inductive invariant checking. However, the verification of inductive invariants often requires extensive knowledge about the system in question and faces the approach-inherent challenges of locality and lack of context.
To address that, this report discusses k-inductive invariant checking for graph transformation systems as a generalization of inductive invariants. The additional context acquired by taking multiple (k) steps into account is the key difference to inductive invariant checking and is often enough to establish the desired invariants without requiring the iterative development of additional properties.
To analyze possibly infinite systems in a finite fashion, we introduce a symbolic encoding for transformation traces using a restricted form of nested application conditions. As its central contribution, this report then presents a formal approach and algorithm to verify graph constraints as k-inductive invariants. We prove the approach's correctness and demonstrate its applicability by means of several examples evaluated with a prototypical implementation of our algorithm.
Graphs are ubiquitous in Computer Science. For this reason, in many areas, it is very important to have the means to express and reason about graph properties. In particular, we want to be able to check automatically if a given graph property is satisfiable. Actually, in most application scenarios it is desirable to be able to explore graphs satisfying the graph property if they exist or even to get a complete and compact overview of the graphs satisfying the graph property.
We show that the tableau-based reasoning method for graph properties as introduced by Lambers and Orejas paves the way for a symbolic model generation algorithm for graph properties. Graph properties are formulated in a dedicated logic making use of graphs and graph morphisms, which is equivalent to firstorder logic on graphs as introduced by Courcelle. Our parallelizable algorithm gradually generates a finite set of so-called symbolic models, where each symbolic model describes a set of finite graphs (i.e., finite models) satisfying the graph property. The set of symbolic models jointly describes all finite models for the graph property (complete) and does not describe any finite graph violating the graph property (sound). Moreover, no symbolic model is already covered by another one (compact). Finally, the algorithm is able to generate from each symbolic model a minimal finite model immediately and allows for an exploration of further finite models. The algorithm is implemented in the new tool AutoGraph.
The correctness of model transformations is a crucial element for model-driven engineering of high quality software. In particular, behavior preservation is the most important correctness property avoiding the introduction of semantic errors during the model-driven engineering process. Behavior preservation verification techniques either show that specific properties are preserved, or more generally and complex, they show some kind of behavioral equivalence or refinement between source and target model of the transformation. Both kinds of behavior preservation verification goals have been presented with automatic tool support for the instance level, i.e. for a given source and target model specified by the model transformation. However, up until now there is no automatic verification approach available at the transformation level, i.e. for all source and target models specified by the model transformation.
In this report, we extend our results presented in [27] and outline a new sophisticated approach for the automatic verification of behavior preservation captured by bisimulation resp. simulation for model transformations specified by triple graph grammars and semantic definitions given by graph transformation rules. In particular, we show that the behavior preservation problem can be reduced to invariant checking for graph transformation and that the resulting checking problem can be addressed by our own invariant checker even for a complex example where a sequence chart is transformed into communicating automata. We further discuss today's limitations of invariant checking for graph transformation and motivate further lines of future work in this direction.