TY  - JOUR
A1  - Bauer, Chris
A1  - Herwig, Ralf
A1  - Lienhard, Matthias
A1  - Prasse, Paul
A1  - Scheffer, Tobias
A1  - Schuchhardt, Johannes
T1  - Large-scale literature mining to assess the relation between anti-cancer drugs and cancer types
JF  - Journal of translational medicine
N2  - Background: 
There is a huge body of scientific literature describing the relation between tumor types and anti-cancer drugs. The vast amount of scientific literature makes it impossible for researchers and physicians to extract all relevant information manually. 

Methods:
In order to cope with the large amount of literature we applied an automated text mining approach to assess the relations between 30 most frequent cancer types and 270 anti-cancer drugs. We applied two different approaches, a classical text mining based on named entity recognition and an AI-based approach employing word embeddings. The consistency of literature mining results was validated with 3 independent methods: first, using data from FDA approvals, second, using experimentally measured IC-50 cell line data and third, using clinical patient survival data.

Results: 
We demonstrated that the automated text mining was able to successfully assess the relation between cancer types and anti-cancer drugs. All validation methods showed a good correspondence between the results from literature mining and independent confirmatory approaches. The relation between most frequent cancer types and drugs employed for their treatment were visualized in a large heatmap. All results are accessible in an interactive web-based knowledge base using the following link: .

Conclusions:
Our approach is able to assess the relations between compounds and cancer types in an automated manner. Both, cancer types and compounds could be grouped into different clusters. Researchers can use the interactive knowledge base to inspect the presented results and follow their own research questions, for example the identification of novel indication areas for known drugs.
KW  - Literature mining
KW  - Anti-cancer drugs
KW  - Tumor types
KW  - Word embeddings
KW  - Database
Y1  - 2021
U6  - https://doi.org/10.1186/s12967-021-02941-z
SN  - 1479-5876
VL  - 19
IS  - 1
PB  - BioMed Central
CY  - London
ER  - 
TY  - JOUR
A1  - Bickel, Steffen
A1  - Brueckner, Michael
A1  - Scheffer, Tobias
T1  - Discriminative learning under covariate shift
N2  - We address classification problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution-problems also referred to as classification under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. The problem of learning under covariate shift can be written as an integrated optimization problem. Instantiating the general optimization problem leads to a kernel logistic regression and an exponential model classifier for covariate shift. The optimization problem is convex under certain conditions; our findings also clarify the relationship to the known kernel mean matching procedure. We report on experiments on problems of spam filtering, text classification, and landmine detection.
Y1  - 2009
UR  - http://jmlr.csail.mit.edu/
SN  - 1532-4435
ER  - 
TY  - JOUR
A1  - Bickel, Steffen
A1  - Brückner, Michael
A1  - Scheffer, Tobias
T1  - Discriminative learning under covariate shift
N2  - We address classification problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution-problems also referred to as classification under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. The problem of learning under covariate shift can be written as an integrated optimization problem. Instantiating the general optimization problem leads to a kernel logistic regression and an exponential model classifier for covariate shift. The optimization problem is convex under certain conditions; our findings also clarify the relationship to the known kernel mean matching procedure. We report on experiments on problems of spam filtering, text classification, and landmine detection.
Y1  - 2009
UR  - http://jmlr.csail.mit.edu/
SN  - 1532-4435
ER  - 
TY  - JOUR
A1  - Brückner, Michael
A1  - Kanzow, Christian
A1  - Scheffer, Tobias
T1  - Static prediction games for adversarial learning problems
JF  - Journal of machine learning research
N2  - The standard assumption of identically distributed training and test data is violated when the test data are generated in response to the presence of a predictive model. This becomes apparent, for example, in the context of email spam filtering. Here, email service providers employ spam filters, and spam senders engineer campaign templates to achieve a high rate of successful deliveries despite the filters. We model the interaction between the learner and the data generator as a static game in which the cost functions of the learner and the data generator are not necessarily antagonistic. We identify conditions under which this prediction game has a unique Nash equilibrium and derive algorithms that find the equilibrial prediction model. We derive two instances, the Nash logistic regression and the Nash support vector machine, and empirically explore their properties in a case study on email spam filtering.
KW  - static prediction games
KW  - adversarial classification
KW  - Nash equilibrium
Y1  - 2012
SN  - 1532-4435
VL  - 13
SP  - 2617
EP  - 2654
PB  - Microtome Publishing
CY  - Cambridge, Mass.
ER  - 
TY  - JOUR
A1  - Haider, Peter
A1  - Scheffer, Tobias
T1  - Bayesian clustering for email campaign detection
Y1  - 2009
SN  - 978-1-605-58516-1
ER  - 
TY  - GEN
A1  - Patil, Kaustubh R.
A1  - Haider, Peter
A1  - Pope, Phillip B.
A1  - Turnbaugh, Peter J.
A1  - Morrison, Mark
A1  - Scheffer, Tobias
A1  - McHardy, Alice C.
T1  - Taxonomic metagenome sequence assignment with structured output models
T2  - Nature methods : techniques for life scientists and chemists
Y1  - 2011
U6  - https://doi.org/10.1038/nmeth0311-191
SN  - 1548-7091
VL  - 8
IS  - 3
SP  - 191
EP  - 192
PB  - Nature Publ. Group
CY  - London
ER  - 
TY  - JOUR
A1  - Prasse, Paul
A1  - Iversen, Pascal
A1  - Lienhard, Matthias
A1  - Thedinga, Kristina
A1  - Bauer, Christopher
A1  - Herwig, Ralf
A1  - Scheffer, Tobias
T1  - Matching anticancer compounds and tumor cell lines by neural networks with ranking loss
JF  - NAR: genomics and bioinformatics
N2  - Computational drug sensitivity models have the potential to improve therapeutic outcomes by identifying targeted drug components that are likely to achieve the highest efficacy for a cancer cell line at hand at a therapeutic dose. State of the art drug sensitivity models use regression techniques to predict the inhibitory concentration of a drug for a tumor cell line. This regression objective is not directly aligned with either of these principal goals of drug sensitivity models: We argue that drug sensitivity modeling should be seen as a ranking problem with an optimization criterion that quantifies a drug's inhibitory capacity for the cancer cell line at hand relative to its toxicity for healthy cells. We derive an extension to the well-established drug sensitivity regression model PaccMann that employs a ranking loss and focuses on the ratio of inhibitory concentration and therapeutic dosage range. We find that the ranking extension significantly enhances the model's capability to identify the most effective anticancer drugs for unseen tumor cell profiles based in on in-vitro data.
Y1  - 2022
U6  - https://doi.org/10.1093/nargab/lqab128
SN  - 2631-9268
VL  - 4
IS  - 1
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Prasse, Paul
A1  - Iversen, Pascal
A1  - Lienhard, Matthias
A1  - Thedinga, Kristina
A1  - Herwig, Ralf
A1  - Scheffer, Tobias
T1  - Pre-Training on In Vitro and Fine-Tuning on Patient-Derived Data Improves Deep Neural Networks for Anti-Cancer Drug-Sensitivity Prediction
JF  - MDPI
N2  - Large-scale databases that report the inhibitory capacities of many combinations of candidate drug compounds and cultivated cancer cell lines have driven the development of preclinical drug-sensitivity models based on machine learning. However, cultivated cell lines have devolved from human cancer cells over years or even decades under selective pressure in culture conditions. Moreover, models that have been trained on in vitro data cannot account for interactions with other types of cells. Drug-response data that are based on patient-derived cell cultures, xenografts, and organoids, on the other hand, are not available in the quantities that are needed to train high-capacity machine-learning models. We found that pre-training deep neural network models of drug sensitivity on in vitro drug-sensitivity databases before fine-tuning the model parameters on patient-derived data improves the models’ accuracy and improves the biological plausibility of the features, compared to training only on patient-derived data. From our experiments, we can conclude that pre-trained models outperform models that have been trained on the target domains in the vast majority of cases.
KW  - deep neural networks
KW  - drug-sensitivity prediction
KW  - anti-cancer drugs
Y1  - 2022
U6  - https://doi.org/10.3390/cancers14163950
SN  - 2072-6694
VL  - 14
SP  - 1
EP  - 14
PB  - MDPI
CY  - Basel, Schweiz
ET  - 16
ER  - 
TY  - GEN
A1  - Prasse, Paul
A1  - Iversen, Pascal
A1  - Lienhard, Matthias
A1  - Thedinga, Kristina
A1  - Herwig, Ralf
A1  - Scheffer, Tobias
T1  - Pre-Training on In Vitro and Fine-Tuning on Patient-Derived Data Improves Deep Neural Networks for Anti-Cancer Drug-Sensitivity Prediction
T2  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - Large-scale databases that report the inhibitory capacities of many combinations of candidate drug compounds and cultivated cancer cell lines have driven the development of preclinical drug-sensitivity models based on machine learning. However, cultivated cell lines have devolved from human cancer cells over years or even decades under selective pressure in culture conditions. Moreover, models that have been trained on in vitro data cannot account for interactions with other types of cells. Drug-response data that are based on patient-derived cell cultures, xenografts, and organoids, on the other hand, are not available in the quantities that are needed to train high-capacity machine-learning models. We found that pre-training deep neural network models of drug sensitivity on in vitro drug-sensitivity databases before fine-tuning the model parameters on patient-derived data improves the models’ accuracy and improves the biological plausibility of the features, compared to training only on patient-derived data. From our experiments, we can conclude that pre-trained models outperform models that have been trained on the target domains in the vast majority of cases.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1300 
KW  - deep neural networks
KW  - drug-sensitivity prediction
KW  - anti-cancer drugs
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-577341
SN  - 1866-8372
SP  - 1
EP  - 14
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Prasse, Paul
A1  - Knaebel, Rene
A1  - Machlica, Lukas
A1  - Pevny, Tomas
A1  - Scheffer, Tobias
T1  - Joint detection of malicious domains and infected clients
JF  - Machine learning
N2  - Detection of malware-infected computers and detection of malicious web domains based on their encrypted HTTPS traffic are challenging problems, because only addresses, timestamps, and data volumes are observable. The detection problems are coupled, because infected clients tend to interact with malicious domains. Traffic data can be collected at a large scale, and antivirus tools can be used to identify infected clients in retrospect. Domains, by contrast, have to be labeled individually after forensic analysis. We explore transfer learning based on sluice networks; this allows the detection models to bootstrap each other. In a large-scale experimental study, we find that the model outperforms known reference models and detects previously unknown malware, previously unknown malware families, and previously unknown malicious domains.
KW  - Machine learning
KW  - Neural networks
KW  - Computer security
KW  - Traffic data
KW  - Https traffic
Y1  - 2019
U6  - https://doi.org/10.1007/s10994-019-05789-z
SN  - 0885-6125
SN  - 1573-0565
VL  - 108
IS  - 8-9
SP  - 1353
EP  - 1368
PB  - Springer
CY  - Dordrecht
ER  -