TY - GEN A1 - Sahlmann, Kristina A1 - Scheffler, Thomas A1 - Schnor, Bettina T1 - Ontology-driven Device Descriptions for IoT Network Management T2 - 2018 Global Internet of Things Summit (GIoTS) N2 - One particular challenge in the Internet of Things is the management of many heterogeneous things. The things are typically constrained devices with limited memory, power, network and processing capacity. Configuring every device manually is a tedious task. We propose an interoperable way to configure an IoT network automatically using existing standards. The proposed NETCONF-MQTT bridge intermediates between the constrained devices (speaking MQTT) and the network management standard NETCONF. The NETCONF-MQTT bridge generates dynamically YANG data models from the semantic description of the device capabilities based on the oneM2M ontology. We evaluate the approach for two use cases, i.e. describing an actuator and a sensor scenario. KW - Internet of Things KW - Interoperability KW - oneM2M KW - Ontology KW - Semantic Web KW - NETCONF KW - YANG KW - MQTT Y1 - 2018 SN - 978-1-5386-6451-3 U6 - https://doi.org/10.1109/GIOTS.2018.8534569 SP - 295 EP - 300 PB - IEEE CY - New York ER - TY - GEN A1 - Elsaid, Mohamed Esam A1 - Shawish, Ahmed A1 - Meinel, Christoph T1 - Enhanced cost analysis of multiple virtual machines live migration in VMware environments T2 - 2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2) N2 - Live migration is an important feature in modern software-defined datacenters and cloud computing environments. Dynamic resource management, load balance, power saving and fault tolerance are all dependent on the live migration feature. Despite the importance of live migration, the cost of live migration cannot be ignored and may result in service availability degradation. Live migration cost includes the migration time, downtime, CPU overhead, network and power consumption. There are many research articles that discuss the problem of live migration cost with different scopes like analyzing the cost and relate it to the parameters that control it, proposing new migration algorithms that minimize the cost and also predicting the migration cost. For the best of our knowledge, most of the papers that discuss the migration cost problem focus on open source hypervisors. For the research articles focus on VMware environments, none of the published articles proposed migration time, network overhead and power consumption modeling for single and multiple VMs live migration. In this paper, we propose empirical models for the live migration time, network overhead and power consumption for single and multiple VMs migration. The proposed models are obtained using a VMware based testbed. Y1 - 2018 SN - 978-1-7281-0236-8 U6 - https://doi.org/10.1109/SC2.2018.00010 SP - 16 EP - 23 PB - IEEE CY - New York ER - TY - GEN A1 - Kötzing, Timo A1 - Krejca, Martin Stefan T1 - First-Hitting times under additive drift T2 - Parallel Problem Solving from Nature – PPSN XV, PT II N2 - For the last ten years, almost every theoretical result concerning the expected run time of a randomized search heuristic used drift theory, making it the arguably most important tool in this domain. Its success is due to its ease of use and its powerful result: drift theory allows the user to derive bounds on the expected first-hitting time of a random process by bounding expected local changes of the process - the drift. This is usually far easier than bounding the expected first-hitting time directly. Due to the widespread use of drift theory, it is of utmost importance to have the best drift theorems possible. We improve the fundamental additive, multiplicative, and variable drift theorems by stating them in a form as general as possible and providing examples of why the restrictions we keep are still necessary. Our additive drift theorem for upper bounds only requires the process to be nonnegative, that is, we remove unnecessary restrictions like a finite, discrete, or bounded search space. As corollaries, the same is true for our upper bounds in the case of variable and multiplicative drift. Y1 - 2018 SN - 978-3-319-99259-4 SN - 978-3-319-99258-7 U6 - https://doi.org/10.1007/978-3-319-99259-4_8 SN - 0302-9743 SN - 1611-3349 VL - 11102 SP - 92 EP - 104 PB - Springer CY - Cham ER - TY - GEN A1 - Kötzing, Timo A1 - Krejca, Martin Stefan T1 - First-Hitting times for finite state spaces T2 - Parallel Problem Solving from Nature – PPSN XV, PT II N2 - One of the most important aspects of a randomized algorithm is bounding its expected run time on various problems. Formally speaking, this means bounding the expected first-hitting time of a random process. The two arguably most popular tools to do so are the fitness level method and drift theory. The fitness level method considers arbitrary transition probabilities but only allows the process to move toward the goal. On the other hand, drift theory allows the process to move into any direction as long as it move closer to the goal in expectation; however, this tendency has to be monotone and, thus, the transition probabilities cannot be arbitrary. We provide a result that combines the benefit of these two approaches: our result gives a lower and an upper bound for the expected first-hitting time of a random process over {0,..., n} that is allowed to move forward and backward by 1 and can use arbitrary transition probabilities. In case that the transition probabilities are known, our bounds coincide and yield the exact value of the expected first-hitting time. Further, we also state the stationary distribution as well as the mixing time of a special case of our scenario. Y1 - 2018 SN - 978-3-319-99259-4 SN - 978-3-319-99258-7 U6 - https://doi.org/10.1007/978-3-319-99259-4_7 SN - 0302-9743 SN - 1611-3349 VL - 11102 SP - 79 EP - 91 PB - Springer CY - Cham ER - TY - GEN A1 - Kötzing, Timo A1 - Lagodzinski, Gregor J. A. A1 - Lengler, Johannes A1 - Melnichenko, Anna T1 - Destructiveness of Lexicographic Parsimony Pressure and Alleviation by a Concatenation Crossover in Genetic Programming T2 - Parallel Problem Solving from Nature – PPSN XV N2 - For theoretical analyses there are two specifics distinguishing GP from many other areas of evolutionary computation. First, the variable size representations, in particular yielding a possible bloat (i.e. the growth of individuals with redundant parts). Second, the role and realization of crossover, which is particularly central in GP due to the tree-based representation. Whereas some theoretical work on GP has studied the effects of bloat, crossover had a surprisingly little share in this work. We analyze a simple crossover operator in combination with local search, where a preference for small solutions minimizes bloat (lexicographic parsimony pressure); the resulting algorithm is denoted Concatenation Crossover GP. For this purpose three variants of the wellstudied Majority test function with large plateaus are considered. We show that the Concatenation Crossover GP can efficiently optimize these test functions, while local search cannot be efficient for all three variants independent of employing bloat control. Y1 - 2018 SN - 978-3-319-99259-4 SN - 978-3-319-99258-7 U6 - https://doi.org/10.1007/978-3-319-99259-4_4 SN - 0302-9743 SN - 1611-3349 VL - 11102 SP - 42 EP - 54 PB - Springer CY - Cham ER - TY - GEN A1 - Perscheid, Cindy A1 - Faber, Lukas A1 - Kraus, Milena A1 - Arndt, Paul A1 - Janke, Michael A1 - Rehfeldt, Sebastian A1 - Schubotz, Antje A1 - Slosarek, Tamara A1 - Uflacker, Matthias T1 - A tissue-aware gene selection approach for analyzing multi-tissue gene expression data T2 - 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) N2 - High-throughput RNA sequencing (RNAseq) produces large data sets containing expression levels of thousands of genes. The analysis of RNAseq data leads to a better understanding of gene functions and interactions, which eventually helps to study diseases like cancer and develop effective treatments. Large-scale RNAseq expression studies on cancer comprise samples from multiple cancer types and aim to identify their distinct molecular characteristics. Analyzing samples from different cancer types implies analyzing samples from different tissue origin. Such multi-tissue RNAseq data sets require a meaningful analysis that accounts for the inherent tissue-related bias: The identified characteristics must not originate from the differences in tissue types, but from the actual differences in cancer types. However, current analysis procedures do not incorporate that aspect. As a result, we propose to integrate a tissue-awareness into the analysis of multi-tissue RNAseq data. We introduce an extension for gene selection that provides a tissue-wise context for every gene and can be flexibly combined with any existing gene selection approach. We suggest to expand conventional evaluation by additional metrics that are sensitive to the tissue-related bias. Evaluations show that especially low complexity gene selection approaches profit from introducing tissue-awareness. KW - RNAseq KW - gene selection KW - tissue-awareness KW - TCGA KW - GTEx Y1 - 2018 SN - 978-1-5386-5488-0 U6 - https://doi.org/10.1109/BIBM.2018.8621189 SN - 2156-1125 SN - 2156-1133 SP - 2159 EP - 2166 PB - IEEE CY - New York ER - TY - GEN A1 - Bin Tareaf, Raad A1 - Berger, Philipp A1 - Hennig, Patrick A1 - Meinel, Christoph T1 - Personality exploration system for online social networks BT - Facebook brands as a use case T2 - 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI) N2 - User-generated content on social media platforms is a rich source of latent information about individual variables. Crawling and analyzing this content provides a new approach for enterprises to personalize services and put forward product recommendations. In the past few years, brands made a gradual appearance on social media platforms for advertisement, customers support and public relation purposes and by now it became a necessity throughout all branches. This online identity can be represented as a brand personality that reflects how a brand is perceived by its customers. We exploited recent research in text analysis and personality detection to build an automatic brand personality prediction model on top of the (Five-Factor Model) and (Linguistic Inquiry and Word Count) features extracted from publicly available benchmarks. The proposed model reported significant accuracy in predicting specific personality traits form brands. For evaluating our prediction results on actual brands, we crawled the Facebook API for 100k posts from the most valuable brands' pages in the USA and we visualize exemplars of comparison results and present suggestions for future directions. KW - Big Five Model KW - Brand Personality KW - Personality Prediction KW - Machine Learning KW - Social Media Analysis Y1 - 2019 SN - 978-1-5386-7325-6 U6 - https://doi.org/10.1109/WI.2018.00-76 SP - 301 EP - 309 PB - IEEE CY - New York ER - TY - GEN A1 - Andjelkovic, Marko A1 - Babic, Milan A1 - Li, Yuanqing A1 - Schrape, Oliver A1 - Krstić, Miloš A1 - Kraemer, Rolf T1 - Use of decoupling cells for mitigation of SET effects in CMOS combinational gates T2 - 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS) N2 - This paper investigates the applicability of CMOS decoupling cells for mitigating the Single Event Transient (SET) effects in standard combinational gates. The concept is based on the insertion of two decoupling cells between the gate's output and the power/ground terminals. To verify the proposed hardening approach, extensive SPICE simulations have been performed with standard combinational cells designed in IHP's 130 nm bulk CMOS technology. Obtained simulation results have shown that the insertion of decoupling cells results in the increase of the gate's critical charge, thus reducing the gate's soft error rate (SER). Moreover, the decoupling cells facilitate the suppression of SET pulses propagating through the gate. It has been shown that the decoupling cells may be a competitive alternative to gate upsizing and gate duplication for hardening the gates with lower critical charge and multiple (3 or 4) inputs, as well as for filtering the short SET pulses induced by low-LET particles. KW - decoupling cells KW - radiation hardening KW - SET effects KW - CMOS technology KW - combinational logic Y1 - 2019 SN - 978-1-5386-9562-3 U6 - https://doi.org/10.1109/ICECS.2018.8617996 SP - 361 EP - 364 PB - IEEE CY - New York ER - TY - GEN A1 - Kayem, Anne Voluntas dei Massah A1 - Meinel, Christoph A1 - Wolthusen, Stephen D. T1 - Smart micro-grid systems security and privacy preface T2 - Smart micro-grid systems security and privacy N2 - Studies indicate that reliable access to power is an important enabler for economic growth. To this end, modern energy management systems have seen a shift from reliance on time-consuming manual procedures , to highly automated management , with current energy provisioning systems being run as cyber-physical systems . Operating energy grids as a cyber-physical system offers the advantage of increased reliability and dependability , but also raises issues of security and privacy. In this chapter, we provide an overview of the contents of this book showing the interrelation between the topics of the chapters in terms of smart energy provisioning. We begin by discussing the concept of smart-grids in general, proceeding to narrow our focus to smart micro-grids in particular. Lossy networks also provide an interesting framework for enabling the implementation of smart micro-grids in remote/rural areas, where deploying standard smart grids is economically and structurally infeasible. To this end, we consider an architectural design for a smart micro-grid suited to low-processing capable devices. We model malicious behaviour, and propose mitigation measures based properties to distinguish normal from malicious behaviour . Y1 - 2018 SN - 978-3-319-91427-5 SN - 978-3-319-91426-8 U6 - https://doi.org/10.1007/978-3-319-91427-5_1 VL - 71 SP - VII EP - VIII PB - Springer CY - Dordrecht ER - TY - GEN A1 - Brand, Thomas A1 - Giese, Holger Burkhard T1 - Towards Generic Adaptive Monitoring T2 - 2018 IEEE 12th International Conference on Self-Adaptive and Self-Organizing Systems (SASO) N2 - Monitoring is a key prerequisite for self-adaptive software and many other forms of operating software. Monitoring relevant lower level phenomena like the occurrences of exceptions and diagnosis data requires to carefully examine which detailed information is really necessary and feasible to monitor. Adaptive monitoring permits observing a greater variety of details with less overhead, if most of the time the MAPE-K loop can operate using only a small subset of all those details. However, engineering such an adaptive monitoring is a major engineering effort on its own that further complicates the development of self-adaptive software. The proposed approach overcomes the outlined problems by providing generic adaptive monitoring via runtime models. It reduces the effort to introduce and apply adaptive monitoring by avoiding additional development effort for controlling the monitoring adaptation. Although the generic approach is independent from the monitoring purpose, it still allows for substantial savings regarding the monitoring resource consumption as demonstrated by an example. Y1 - 2019 SN - 978-1-5386-5172-8 U6 - https://doi.org/10.1109/SASO.2018.00027 SN - 1949-3673 SP - 156 EP - 161 PB - IEEE CY - New York ER - TY - GEN A1 - Blaesius, Thomas A1 - Eube, Jan A1 - Feldtkeller, Thomas A1 - Friedrich, Tobias A1 - Krejca, Martin Stefan A1 - Lagodzinski, Gregor J. A. A1 - Rothenberger, Ralf A1 - Severin, Julius A1 - Sommer, Fabian A1 - Trautmann, Justin T1 - Memory-restricted Routing With Tiled Map Data T2 - 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC) N2 - Modern routing algorithms reduce query time by depending heavily on preprocessed data. The recently developed Navigation Data Standard (NDS) enforces a separation between algorithms and map data, rendering preprocessing inapplicable. Furthermore, map data is partitioned into tiles with respect to their geographic coordinates. With the limited memory found in portable devices, the number of tiles loaded becomes the major factor for run time. We study routing under these restrictions and present new algorithms as well as empirical evaluations. Our results show that, on average, the most efficient algorithm presented uses more than 20 times fewer tile loads than a normal A*. Y1 - 2018 SN - 978-1-5386-6650-0 U6 - https://doi.org/10.1109/SMC.2018.00567 SN - 1062-922X SP - 3347 EP - 3354 PB - IEEE CY - New York ER - TY - GEN A1 - Podlesny, Nikolai Jannik A1 - Kayem, Anne V. D. M. A1 - von Schorlemer, Stephan A1 - Uflacker, Matthias T1 - Minimising Information Loss on Anonymised High Dimensional Data with Greedy In-Memory Processing T2 - Database and Expert Systems Applications, DEXA 2018, PT I N2 - Minimising information loss on anonymised high dimensional data is important for data utility. Syntactic data anonymisation algorithms address this issue by generating datasets that are neither use-case specific nor dependent on runtime specifications. This results in anonymised datasets that can be re-used in different scenarios which is performance efficient. However, syntactic data anonymisation algorithms incur high information loss on high dimensional data, making the data unusable for analytics. In this paper, we propose an optimised exact quasi-identifier identification scheme, based on the notion of k-anonymity, to generate anonymised high dimensional datasets efficiently, and with low information loss. The optimised exact quasi-identifier identification scheme works by identifying and eliminating maximal partial unique column combination (mpUCC) attributes that endanger anonymity. By using in-memory processing to handle the attribute selection procedure, we significantly reduce the processing time required. We evaluated the effectiveness of our proposed approach with an enriched dataset drawn from multiple real-world data sources, and augmented with synthetic values generated in close alignment with the real-world data distributions. Our results indicate that in-memory processing drops attribute selection time for the mpUCC candidates from 400s to 100s, while significantly reducing information loss. In addition, we achieve a time complexity speed-up of O(3(n/3)) approximate to O(1.4422(n)). Y1 - 2018 SN - 978-3-319-98809-2 SN - 978-3-319-98808-5 U6 - https://doi.org/10.1007/978-3-319-98809-2_6 SN - 0302-9743 SN - 1611-3349 VL - 11029 SP - 85 EP - 100 PB - Springer CY - Cham ER - TY - GEN A1 - Galke, Lukas A1 - Gerstenkorn, Gunnar A1 - Scherp, Ansgar T1 - A case atudy of closed-domain response suggestion with limited training data T2 - Database and Expert Systems Applications : DEXA 2018 Iinternational workshops N2 - We analyze the problem of response suggestion in a closed domain along a real-world scenario of a digital library. We present a text-processing pipeline to generate question-answer pairs from chat transcripts. On this limited amount of training data, we compare retrieval-based, conditioned-generation, and dedicated representation learning approaches for response suggestion. Our results show that retrieval-based methods that strive to find similar, known contexts are preferable over parametric approaches from the conditioned-generation family, when the training data is limited. We, however, identify a specific representation learning approach that is competitive to the retrieval-based approaches despite the training data limitation. Y1 - 2018 SN - 978-3-319-99133-7 SN - 978-3-319-99132-0 U6 - https://doi.org/10.1007/978-3-319-99133-7_18 SN - 1865-0929 SN - 1865-0937 VL - 903 SP - 218 EP - 229 PB - Springer CY - Berlin ER - TY - GEN A1 - Gross, Sascha A1 - Tiwari, Abhishek A1 - Hammer, Christian T1 - PlAnalyzer BT - a precise approach for pendingIntent vulnerability analysis T2 - Computer Security(ESORICS 2018), PT II N2 - In this work we propose PIAnalyzer, a novel approach to analyze PendingIntent related vulnerabilities. We empirically evaluate PIAnalyzer on a set of 1000 randomly selected applications from the Google Play Store and find 1358 insecure usages of Pendinglntents, including 70 severe vulnerabilities. We manually inspected ten reported vulnerabilities out of which nine correctly reported vulnerabilities, indicating a high precision. The evaluation shows that PIAnalyzer is efficient with an average execution time of 13 seconds per application. KW - Android KW - Intent analysis KW - Information flow control KW - Static analysis Y1 - 2018 SN - 978-3-319-98989-1 SN - 978-3-319-98988-4 U6 - https://doi.org/10.1007/978-3-319-98989-1_3 SN - 0302-9743 SN - 1611-3349 VL - 11099 SP - 41 EP - 59 PB - Springer CY - Cham ER - TY - GEN A1 - Fricke, Andreas A1 - Döllner, Jürgen Roland Friedrich A1 - Asche, Hartmut T1 - Servicification - Trend or Paradigm Shift in Geospatial Data Processing? T2 - Computational Science and Its Applications – ICCSA 2018, PT III N2 - Currently we are witnessing profound changes in the geospatial domain. Driven by recent ICT developments, such as web services, serviceoriented computing or open-source software, an explosion of geodata and geospatial applications or rapidly growing communities of non-specialist users, the crucial issue is the provision and integration of geospatial intelligence in these rapidly changing, heterogeneous developments. This paper introduces the concept of Servicification into geospatial data processing. Its core idea is the provision of expertise through a flexible number of web-based software service modules. Selection and linkage of these services to user profiles, application tasks, data resources, or additional software allow for the compilation of flexible, time-sensitive geospatial data handling processes. Encapsulated in a string of discrete services, the approach presented here aims to provide non-specialist users with geospatial expertise required for the effective, professional solution of a defined application problem. Providing users with geospatial intelligence in the form of web-based, modular services, is a completely different approach to geospatial data processing. This novel concept puts geospatial intelligence, made available through services encapsulating rule bases and algorithms, in the centre and at the disposal of the users, regardless of their expertise. KW - Servicification KW - Geospatial intelligence KW - Spatial data handling systems Y1 - 2018 SN - 978-3-319-95168-3 SN - 978-3-319-95167-6 U6 - https://doi.org/10.1007/978-3-319-95168-3_23 SN - 0302-9743 SN - 1611-3349 VL - 10962 SP - 339 EP - 350 PB - Springer CY - Cham ER - TY - GEN A1 - Haarmann, Stephan A1 - Batoulis, Kimon A1 - Nikaj, Adriatik A1 - Weske, Mathias T1 - DMN Decision Execution on the Ethereum Blockchain T2 - Advanced Information Systems Engineering, CAISE 2018 N2 - Recently blockchain technology has been introduced to execute interacting business processes in a secure and transparent way. While the foundations for process enactment on blockchain have been researched, the execution of decisions on blockchain has not been addressed yet. In this paper we argue that decisions are an essential aspect of interacting business processes, and, therefore, also need to be executed on blockchain. The immutable representation of decision logic can be used by the interacting processes, so that decision taking will be more secure, more transparent, and better auditable. The approach is based on a mapping of the DMN language S-FEEL to Solidity code to be run on the Ethereum blockchain. The work is evaluated by a proof-of-concept prototype and an empirical cost evaluation. KW - Blockchain KW - Interacting processes KW - DMN Y1 - 2018 SN - 978-3-319-91563-0 SN - 978-3-319-91562-3 U6 - https://doi.org/10.1007/978-3-319-91563-0_20 SN - 0302-9743 SN - 1611-3349 VL - 10816 SP - 327 EP - 341 PB - Springer CY - Cham ER - TY - GEN A1 - Limberger, Daniel A1 - Gropler, Anne A1 - Buschmann, Stefan A1 - Döllner, Jürgen Roland Friedrich A1 - Wasty, Benjamin T1 - OpenLL BT - an API for Dynamic 2D and 3D Labeling T2 - 22nd International Conference Information Visualisation (IV) N2 - Today's rendering APIs lack robust functionality and capabilities for dynamic, real-time text rendering and labeling, which represent key requirements for 3D application design in many fields. As a consequence, most rendering systems are barely or not at all equipped with respective capabilities. This paper drafts the unified text rendering and labeling API OpenLL intended to complement common rendering APIs, frameworks, and transmission formats. For it, various uses of static and dynamic placement of labels are showcased and a text interaction technique is presented. Furthermore, API design constraints with respect to state-of-the-art text rendering techniques are discussed. This contribution is intended to initiate a community-driven specification of a free and open label library. KW - visualization KW - labeling KW - real-time rendering Y1 - 2018 SN - 978-1-5386-7202-0 U6 - https://doi.org/10.1109/iV.2018.00039 SP - 175 EP - 181 PB - IEEE CY - New York ER - TY - GEN A1 - Sianipar, Johannes Harungguan A1 - Sukmana, Muhammad Ihsan Haikal A1 - Meinel, Christoph T1 - Moving sensitive data against live memory dumping, spectre and meltdown attacks T2 - 26th International Conference on Systems Engineering (ICSEng) N2 - The emergence of cloud computing allows users to easily host their Virtual Machines with no up-front investment and the guarantee of always available anytime anywhere. But with the Virtual Machine (VM) is hosted outside of user's premise, the user loses the physical control of the VM as it could be running on untrusted host machines in the cloud. Malicious host administrator could launch live memory dumping, Spectre, or Meltdown attacks in order to extract sensitive information from the VM's memory, e.g. passwords or cryptographic keys of applications running in the VM. In this paper, inspired by the moving target defense (MTD) scheme, we propose a novel approach to increase the security of application's sensitive data in the VM by continuously moving the sensitive data among several memory allocations (blocks) in Random Access Memory (RAM). A movement function is added into the application source code in order for the function to be running concurrently with the application's main function. Our approach could reduce the possibility of VM's sensitive data in the memory to be leaked into memory dump file by 2 5% and secure the sensitive data from Spectre and Meltdown attacks. Our approach's overhead depends on the number and the size of the sensitive data. KW - Virtual Machine KW - Memory Dumping KW - Security KW - Cloud Computing KW - Spectre KW - Meltdown Y1 - 2019 SN - 978-1-5386-7834-3 PB - IEEE CY - New York ER - TY - GEN A1 - Risch, Julian A1 - Krestel, Ralf T1 - My Approach = Your Apparatus? BT - Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections T2 - Libraries N2 - Comparative text mining extends from genre analysis and political bias detection to the revelation of cultural and geographic differences, through to the search for prior art across patents and scientific papers. These applications use cross-collection topic modeling for the exploration, clustering, and comparison of large sets of documents, such as digital libraries. However, topic modeling on documents from different collections is challenging because of domain-specific vocabulary. We present a cross-collection topic model combined with automatic domain term extraction and phrase segmentation. This model distinguishes collection-specific and collection-independent words based on information entropy and reveals commonalities and differences of multiple text collections. We evaluate our model on patents, scientific papers, newspaper articles, forum posts, and Wikipedia articles. In comparison to state-of-the-art cross-collection topic modeling, our model achieves up to 13% higher topic coherence, up to 4% lower perplexity, and up to 31% higher document classification accuracy. More importantly, our approach is the first topic model that ensures disjunct general and specific word distributions, resulting in clear-cut topic representations. KW - Topic modeling KW - Automatic domain term extraction KW - Entropy Y1 - 2018 SN - 978-1-4503-5178-2 U6 - https://doi.org/10.1145/3197026.3197038 SN - 2575-7865 SN - 2575-8152 SP - 283 EP - 292 PB - Association for Computing Machinery CY - New York ER - TY - GEN A1 - Patalas-Maliszewska, Justyna A1 - Krebs, Irene T1 - An Information System Supporting the Eliciting of Expert Knowledge for Successful IT Projects T2 - Information and Software Technologies, ICIST 2018 N2 - In order to guarantee the success of an IT project, it is necessary for a company to possess expert knowledge. The difficulty arises when experts no longer work for the company and it then becomes necessary to use their knowledge, in order to realise an IT project. In this paper, the ExKnowIT information system which supports the eliciting of expert knowledge for successful IT projects, is presented and consists of the following modules: (1) the identification of experts for successful IT projects, (2) the eliciting of expert knowledge on completed IT projects, (3) the expert knowledge base on completed IT projects, (4) the Group Method for Data Handling (GMDH) algorithm, (5) new knowledge in support of decisions regarding the selection of a manager for a new IT project. The added value of our system is that these three approaches, namely, the elicitation of expert knowledge, the success of an IT project and the discovery of new knowledge, gleaned from the expert knowledge base, otherwise known as the decision model, complement each other. KW - Expert knowledge KW - IT project KW - Information system KW - GMDH Y1 - 2018 SN - 978-3-319-99972-2 SN - 978-3-319-99971-5 U6 - https://doi.org/10.1007/978-3-319-99972-2_1 SN - 1865-0929 SN - 1865-0937 VL - 920 SP - 3 EP - 13 PB - Springer CY - Berlin ER -