Refine
Document Type
- Doctoral Thesis (3) (remove)
Language
- English (3) (remove)
Is part of the Bibliography
- yes (3)
Keywords
- artificial intelligence (3) (remove)
Organizations continue to assemble and rely upon teams of remote workers as an essential element of their business strategy; however, knowledge processing is particular difficult in such isolated, largely digitally mediated settings. The great challenge for a knowledge-based organization lies not in how individuals should interact using technology but in how to achieve effective cooperation and knowledge exchange. Currently more attention has been paid to technology and the difficulties machines have processing natural language and less to studies of the human aspect—the influence of our own individual cognitive abilities and preferences on the processing of information when interacting online. This thesis draws on four scientific domains involved in the process of interpreting and processing massive, unstructured data—knowledge management, linguistics, cognitive science, and artificial intelligence—to build a model that offers a reliable way to address the ambiguous nature of language and improve workers’ digitally mediated interactions. Human communication can be discouragingly imprecise and is characterized by a strong linguistic ambiguity; this represents an enormous challenge for the computer analysis of natural language. In this thesis, I propose and develop a new data interpretation layer for the processing of natural language based on the human cognitive preferences of the conversants themselves. Such a semantic analysis merges information derived both from the content and from the associated social and individual contexts, as well as the social dynamics that emerge online. At the same time, assessment taxonomies are used to analyze online comportment at the individual and community level in order to successfully identify characteristics leading to greater effectiveness of communication. Measurement patterns for identifying effective methods of individual interaction with regard to individual cognitive and learning preferences are also evaluated; a novel Cyber-Cognitive Identity (CCI)—a perceptual profile of an individual’s cognitive and learning styles—is proposed. Accommodation of such cognitive preferences can greatly facilitate knowledge management in the geographically dispersed and collaborative digital environment. Use of the CCI is proposed for cognitively labeled Latent Dirichlet Allocation (CLLDA), a novel method for automatically labeling and clustering knowledge that does not rely solely on probabilistic methods, but rather on a fusion of machine learning algorithms and the cognitive identities of the associated individuals interacting in a digitally mediated environment. Advantages include: a greater perspicuity of dynamic and meaningful cognitive rules leading to greater tagging accuracy and a higher content portability at the sentence, document, and corpus level with respect to digital communication.
Answer Set Programming (ASP) emerged in the late 1990s as a new logic programming paradigm, having its roots in nonmonotonic reasoning, deductive databases, and logic programming with negation as failure. The basic idea of ASP is to represent a computational problem as a logic program whose answer sets correspond to solutions, and then to use an answer set solver for finding answer sets of the program. ASP is particularly suited for solving NP-complete search problems. Among these, we find applications to product configuration, diagnosis, and graph-theoretical problems, e.g. finding Hamiltonian cycles. On different lines of ASP research, many extensions of the basic formalism have been proposed. The most intensively studied one is the modelling of preferences in ASP. They constitute a natural and effective way of selecting preferred solutions among a plethora of solutions for a problem. For example, preferences have been successfully used for timetabling, auctioning, and product configuration. In this thesis, we concentrate on preferences within answer set programming. Among several formalisms and semantics for preference handling in ASP, we concentrate on ordered logic programs with the underlying D-, W-, and B-semantics. In this setting, preferences are defined among rules of a logic program. They select preferred answer sets among (standard) answer sets of the underlying logic program. Up to now, those preferred answer sets have been computed either via a compilation method or by meta-interpretation. Hence, the question comes up, whether and how preferences can be integrated into an existing ASP solver. To solve this question, we develop an operational graph-based framework for the computation of answer sets of logic programs. Then, we integrate preferences into this operational approach. We empirically observe that our integrative approach performs in most cases better than the compilation method or meta-interpretation. Another research issue in ASP are optimization methods that remove redundancies, as also found in database query optimizers. For these purposes, the rather recently suggested notion of strong equivalence for ASP can be used. If a program is strongly equivalent to a subprogram of itself, then one can always use the subprogram instead of the original program, a technique which serves as an effective optimization method. Up to now, strong equivalence has not been considered for logic programs with preferences. In this thesis, we tackle this issue and generalize the notion of strong equivalence to ordered logic programs. We give necessary and sufficient conditions for the strong equivalence of two ordered logic programs. Furthermore, we provide program transformations for ordered logic programs and show in how far preferences can be simplified. Finally, we present two new applications for preferences within answer set programming. First, we define new procedures for group decision making, which we apply to the problem of scheduling a group meeting. As a second new application, we reconstruct a linguistic problem appearing in German dialects within ASP. Regarding linguistic studies, there is an ongoing debate about how unique the rule systems of language are in human cognition. The reconstruction of grammatical regularities with tools from computer science has consequences for this debate: if grammars can be modelled this way, then they share core properties with other non-linguistic rule systems.
Modern datasets often exhibit diverse, feature-rich, unstructured data, and they are massive in size. This is the case of social networks, human genome, and e-commerce databases. As Artificial Intelligence (AI) systems are increasingly used to detect pattern in data and predict future outcome, there are growing concerns on their ability to process large amounts of data. Motivated by these concerns, we study the problem of designing AI systems that are scalable to very large and heterogeneous data-sets.
Many AI systems require to solve combinatorial optimization problems in their course of action. These optimization problems are typically NP-hard, and they may exhibit additional side constraints. However, the underlying objective functions often exhibit additional properties. These properties can be exploited to design suitable optimization algorithms. One of these properties is the well-studied notion of submodularity, which captures diminishing returns. Submodularity is often found in real-world applications. Furthermore, many relevant applications exhibit generalizations of this property.
In this thesis, we propose new scalable optimization algorithms for combinatorial problems with diminishing returns. Specifically, we focus on three problems, the Maximum Entropy Sampling problem, Video Summarization, and Feature Selection. For each problem, we propose new algorithms that work at scale. These algorithms are based on a variety of techniques, such as forward step-wise selection and adaptive sampling. Our proposed algorithms yield strong approximation guarantees, and the perform well experimentally.
We first study the Maximum Entropy Sampling problem. This problem consists of selecting a subset of random variables from a larger set, that maximize the entropy. By using diminishing return properties, we develop a simple forward step-wise selection optimization algorithm for this problem. Then, we study the problem of selecting a subset of frames, that represent a given video. Again, this problem corresponds to a submodular maximization problem. We provide a new adaptive sampling algorithm for this problem, suitable to handle the complex side constraints imposed by the application. We conclude by studying Feature Selection. In this case, the underlying objective functions generalize the notion of submodularity. We provide a new adaptive sequencing algorithm for this problem, based on the Orthogonal Matching Pursuit paradigm.
Overall, we study practically relevant combinatorial problems, and we propose new algorithms to solve them. We demonstrate that these algorithms are suitable to handle massive datasets. However, our analysis is not problem-specific, and our results can be applied to other domains, if diminishing return properties hold. We hope that the flexibility of our framework inspires further research into scalability in AI.