Refine
Year of publication
Document Type
- Article (538)
- Doctoral Thesis (126)
- Monograph/Edited Volume (82)
- Other (26)
- Postprint (10)
- Conference Proceeding (5)
- Part of a Book (4)
- Preprint (4)
- Bachelor Thesis (1)
- Habilitation Thesis (1)
Language
- English (798) (remove)
Is part of the Bibliography
- yes (798) (remove)
Keywords
- answer set programming (12)
- Answer Set Programming (10)
- Answer set programming (10)
- Machine Learning (7)
- Maschinelles Lernen (7)
- Antwortmengenprogrammierung (5)
- Internet of Things (4)
- MQTT (4)
- machine learning (4)
- security (4)
Institute
- Institut für Informatik und Computational Science (798) (remove)
Large-scale literature mining to assess the relation between anti-cancer drugs and cancer types
(2021)
Background:
There is a huge body of scientific literature describing the relation between tumor types and anti-cancer drugs. The vast amount of scientific literature makes it impossible for researchers and physicians to extract all relevant information manually.
Methods:
In order to cope with the large amount of literature we applied an automated text mining approach to assess the relations between 30 most frequent cancer types and 270 anti-cancer drugs. We applied two different approaches, a classical text mining based on named entity recognition and an AI-based approach employing word embeddings. The consistency of literature mining results was validated with 3 independent methods: first, using data from FDA approvals, second, using experimentally measured IC-50 cell line data and third, using clinical patient survival data.
Results:
We demonstrated that the automated text mining was able to successfully assess the relation between cancer types and anti-cancer drugs. All validation methods showed a good correspondence between the results from literature mining and independent confirmatory approaches. The relation between most frequent cancer types and drugs employed for their treatment were visualized in a large heatmap. All results are accessible in an interactive web-based knowledge base using the following link: .
Conclusions:
Our approach is able to assess the relations between compounds and cancer types in an automated manner. Both, cancer types and compounds could be grouped into different clusters. Researchers can use the interactive knowledge base to inspect the presented results and follow their own research questions, for example the identification of novel indication areas for known drugs.
Computational methods for the design of effective therapies against drug resistant HIV strains
(2005)
The development of drug resistance is a major obstacle to successful treatment of HIV infection. The extraordinary replication dynamics of HIV facilitates its escape from selective pressure exerted by the human immune system and by combination drug therapy. We have developed several computational methods whose combined use can support the design of optimal antiretroviral therapies based on viral genomic data
We introduce and investigate input-revolving finite automata, which are (nondeterministic) finite state automata with the additional ability to shift the remaining part of the input. Three different modes of shifting are considered, namely revolving to the left, revolving to the right, and circular-interchanging. We investigate the computational capacities of these three types of automata and their deterministic variants, comparing any of the six classes of automata with each other and with further classes of well-known automata. In particular, it is shown that nondeterminism is better than determinism, that is, for all three modes of shifting there is a language accepted by the nondeterministic model but not accepted by any deterministic automaton of the same type. Concerning the closure properties most of the deterministic language families studied are not closed under standard operations. For example, we show that the family of languages accepted by deterministic right-revolving finite automata is an anti-AFL which is not closed under reversal and intersection.
Graded paraconsistency
(2000)
Significant inferences
(2000)
Circumscribing inconsistency
(1997)
Compressions and extensions
(1998)
One of the main problems in machine learning is to train a predictive model from training data and to make predictions on test data. Most predictive models are constructed under the assumption that the training data is governed by the exact same distribution which the model will later be exposed to. In practice, control over the data collection process is often imperfect. A typical scenario is when labels are collected by questionnaires and one does not have access to the test population. For example, parts of the test population are underrepresented in the survey, out of reach, or do not return the questionnaire. In many applications training data from the test distribution are scarce because they are difficult to obtain or very expensive. Data from auxiliary sources drawn from similar distributions are often cheaply available. This thesis centers around learning under differing training and test distributions and covers several problem settings with different assumptions on the relationship between training and test distributions-including multi-task learning and learning under covariate shift and sample selection bias. Several new models are derived that directly characterize the divergence between training and test distributions, without the intermediate step of estimating training and test distributions separately. The integral part of these models are rescaling weights that match the rescaled or resampled training distribution to the test distribution. Integrated models are studied where only one optimization problem needs to be solved for learning under differing distributions. With a two-step approximation to the integrated models almost any supervised learning algorithm can be adopted to biased training data. In case studies on spam filtering, HIV therapy screening, targeted advertising, and other applications the performance of the new models is compared to state-of-the-art reference methods.
We address classification problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution-problems also referred to as classification under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. The problem of learning under covariate shift can be written as an integrated optimization problem. Instantiating the general optimization problem leads to a kernel logistic regression and an exponential model classifier for covariate shift. The optimization problem is convex under certain conditions; our findings also clarify the relationship to the known kernel mean matching procedure. We report on experiments on problems of spam filtering, text classification, and landmine detection.
We address classification problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution-problems also referred to as classification under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. The problem of learning under covariate shift can be written as an integrated optimization problem. Instantiating the general optimization problem leads to a kernel logistic regression and an exponential model classifier for covariate shift. The optimization problem is convex under certain conditions; our findings also clarify the relationship to the known kernel mean matching procedure. We report on experiments on problems of spam filtering, text classification, and landmine detection.
Through the use of next generation sequencing (NGS) technology, a lot of newly sequenced organisms are now available. Annotating those genes is one of the most challenging tasks in sequence biology. Here, we present an automated workflow to find homologue proteins, annotate sequences according to function and create a three-dimensional model.
The Berlin Brain-Computer Interface (BBCI) project develops a noninvasive BCI system whose key features are 1) the use of well-established motor competences as control paradigms, 2) high-dimensional features from 128-channel electroencephalogram (EEG), and 3) advanced machine learning techniques. As reported earlier, our experiments demonstrate that very high information transfer rates can be achieved using the readiness potential (RP) when predicting the laterality of upcoming left-versus right-hand movements in healthy subjects. A more recent study showed that the RP similarily accompanies phantom movements in arm amputees, but the signal strength decreases with longer loss of the limb. In a complementary approach, oscillatory features are used to discriminate imagined movements (left hand versus right hand versus foot). In a recent feedback study with six healthy subjects with no or very little experience with BCI control, three subjects achieved an information transfer rate above 35 bits per minute (bpm), and further two subjects above 24 and 15 bpm, while one subject could not achieve any BCI control. These results are encouraging for an EEG-based BCI system in untrained subjects that is independent of peripheral nervous system activity and does not rely on evoked potentials even when compared to results with very well-trained subjects operating other BCI systems
Interest in developing a new method of man-to-machine communication-a brain-computer interface (BCI)-has grown steadily over the past few decades. BCIs create a new communication channel between the brain and an output device by bypassing conventional motor output pathways of nerves and muscles. These systems use signals recorded from the scalp, the surface of the cortex, or from inside the brain to enable users to control a variety of applications including simple word-processing software and orthotics. BCI technology could therefore provide a new communication and control option for individuals who cannot otherwise express their wishes to the outside world. Signal processing and classification methods are essential tools in the development of improved BCI technology. We organized the BCI Competition 2003 to evaluate the current state of the art of these tools. Four laboratories well versed in EEG-based BCI research provided six data sets in a documented format. We made these data sets (i.e., labeled training sets and unlabeled test sets) and their descriptions available on the Internet. The goal in the competition was to maximize the performance measure for the test labels. Researchers worldwide tested their algorithms and competed for the best classification results. This paper describes the six data sets and the results and function of the most successful algorithms
A brain-computer interface (BCI) is a system that allows its users to control external devices with brain activity. Although the proof-of-concept was given decades ago, the reliable translation of user intent into device control commands is still a major challenge. Success requires the effective interaction of two adaptive controllers: the user's brain, which produces brain activity that encodes intent, and the BCI system, which translates that activity into device control commands. In order to facilitate this interaction, many laboratories are exploring a variety of signal analysis techniques to improve the adaptation of the BCI system to the user. In the literature, many machine learning and pattern classification algorithms have been reported to give impressive results when applied to BCI data in offline analyses. However, it is more difficult to evaluate their relative value for actual online use. BCI data competitions have been organized to provide objective formal evaluations of alternative methods. Prompted by the great interest in the first two BCI Competitions, we organized the third BCI Competition to address several of the most difficult and important analysis problems in BCI research. The paper describes the data sets that were provided to the competitors and gives an overview of the results.
The programmable network envisioned in the 1990s within standardization and research for the Intelligent Network is currently coming into reality using IPbased Next Generation Networks (NGN) and applying Service-Oriented Architecture (SOA) principles for service creation, execution, and hosting. SOA is the foundation for both next-generation telecommunications and middleware architectures, which are rapidly converging on top of commodity transport services. Services such as triple/quadruple play, multimedia messaging, and presence are enabled by the emerging service-oriented IPMultimedia Subsystem (IMS), and allow telecommunications service providers to maintain, if not improve, their position in the marketplace. SOA becomes the de facto standard in next-generation middleware systems as the system model of choice to interconnect service consumers and providers within and between enterprises. We leverage previous research activities in overlay networking technologies along with recent advances in network abstraction, service exposure, and service creation to develop a paradigm for a service environment providing converged Internet and Telecommunications services that we call Service Broker. Such a Service Broker provides mechanisms to combine and mediate between different service paradigms from the two domains Internet/WWW and telecommunications. Furthermore, it enables the composition of services across these domains and is capable of defining and applying temporal constraints during creation and execution time. By adding network-awareness into the service fabric, such a Service Broker may also act as a next generation network-to-service element allowing the composition of crossdomain and cross-layer network and service resources. The contribution of this research is threefold: first, we analyze and classify principles and technologies from Information Technologies (IT) and telecommunications to identify and discuss issues allowing cross-domain composition in a converging service layer. Second, we discuss service composition methods allowing the creation of converged services on an abstract level; in particular, we present a formalized method for model-checking of such compositions. Finally, we propose a Service Broker architecture converging Internet and Telecom services. This environment enables cross-domain feature interaction in services through formalized obligation policies acting as constraints during service discovery, creation, and execution time.
Existing telecommunication networks and classical roles of operators are subject to fundamental change. Many network operators are currently seeking for new sources to generate revenue by exposing network capabilities to 3rd party service providers. At the same time we can observe that services on the World Wide Web (WWW) are becoming mature in terms of the definition of APIs that are offered towards other services. The combinations of those services are commonly referred to as Web 2.0 mash-ups. Rapid service design and creation becomes therefore important to meet the requirements in a changing technology and competitive market environment. This report describes our approach to include Next Generation Networks (NGN)-based telecommunications application enabler into complex services by defining a service broker that mediates between 3rd party applications and NGN service enablers. It provides policy-driven orchestration mechanisms for service enablers, a service authorization functionality, and a service discovery interface for Service Creation Environments. The work has been implemented as part of the Open SOA Telco Playground testbed at Fraunhofer FOKUS.
We study the complexity of two-person constraint satisfaction games. An instance of such a game is given by a collection of constraints on overlapping sets of variables, and the two players alternately make moves assigning values from a finite domain to the variables, in a specified order. The first player tries to satisfy all constraints, while the other tries to break at least one constraint: the goal is to decide whether the first player has a winning strategy. We show that such games can be conveniently represented by a logical form of quantified constraint satisfaction, where an instance is given by a first-order sentence in which quantifiers alternate and the quantifier-free part is a conjunction of (positive) atomic formulas; the goal is to decide whether the sentence is true. While the problem of deciding such a game is PSPACE-complete in general, by restricting the set of allowed constraint predicates, one can obtain infinite classes of constraint satisfaction games of lower complexity. We use the quantified constraint satisfaction framework to study how the complexity of deciding such a game depends on the parameter set of allowed predicates. With every predicate. one can associate certain predicate-preserving operations, called polymorphisms. We show that the complexity of our games is determined by the surjective polymorphisms of the constraint predicates. We illustrate how this result can be used by identifying the complexity of a wide variety of constraint satisfaction games.
It is proved that the number of components in context-free cooperating distributed (CD) grammar systems can be reduced to 3 when they are working in the so-called sf-mode of derivation, which is the cooperation protocol which has been considered first for CD grammar systems. In this derivation mode, a component continues the derivation until and unless there is a nonterminal in the sentential form which cannot be rewritten according to that component. Moreover, it is shown that CD grammar systems in sf-mode with only one component can generate only the context-free languages but they can generate non-context-free languages if two components are used. The sf-mode of derivation is compared with other well-known cooperation protocols with respect to the hierarchies induced by the number of components. (C) 2004 Elsevier B.V. All rights reserved
The power of a language L is the set of all powers of the words in L. In this paper, the following decision problem is investigated. Given a context-free language L, is the power of L context-free? We show that this problem is decidable for languages over unary alphabets, but it is undecidable whenever languages over alphabets with at least two letters are considered. (C) 2003 Elsevier B.V. All rights reserved
Parsability approaches of several grammar formalisms generating also non-context-free languages are explored. Chomsky grammars, Lindenmayer systems, grammars with controlled derivations, and grammar systems are treated. Formal properties of these mechanisms are investigated, when they are used as language acceptors. Furthermore, cooperating distributed grammar systems are restricted so that efficient deterministic parsing without backtracking becomes possible. For this class of grammar systems, the parsing algorithm is presented and the feature of leftmost derivations is investigated in detail.