TY  - JOUR
A1  - Casel, Katrin
A1  - Fernau, Henning
A1  - Gaspers, Serge
A1  - Gras, Benjamin
A1  - Schmid, Markus L.
T1  - On the complexity of the smallest grammar problem over fixed alphabets
JF  - Theory of computing systems
N2  - In the smallest grammar problem, we are given a word w and we want to compute a preferably small context-free grammar G for the singleton language {w} (where the size of a grammar is the sum of the sizes of its rules, and the size of a rule is measured by the length of its right side). It is known that, for unbounded alphabets, the decision variant of this problem is NP-hard and the optimisation variant does not allow a polynomial-time approximation scheme, unless P = NP. We settle the long-standing open problem whether these hardness results also hold for the more realistic case of a constant-size alphabet. More precisely, it is shown that the smallest grammar problem remains NP-complete (and its optimisation version is APX-hard), even if the alphabet is fixed and has size of at least 17. The corresponding reduction is robust in the sense that it also works for an alternative size-measure of grammars that is commonly used in the literature (i. e., a size measure also taking the number of rules into account), and it also allows to conclude that even computing the number of rules required by a smallest grammar is a hard problem. On the other hand, if the number of nonterminals (or, equivalently, the number of rules) is bounded by a constant, then the smallest grammar problem can be solved in polynomial time, which is shown by encoding it as a problem on graphs with interval structure. However, treating the number of rules as a parameter (in terms of parameterised complexity) yields W[1]-hardness. Furthermore, we present an O(3(vertical bar w vertical bar)) exact exponential-time algorithm, based on dynamic programming. These three main questions are also investigated for 1-level grammars, i. e., grammars for which only the start rule contains nonterminals on the right side; thus, investigating the impact of the "hierarchical depth" of grammars on the complexity of the smallest grammar problem. In this regard, we obtain for 1-level grammars similar, but slightly stronger results.
KW  - grammar-based compression
KW  - smallest grammar problem
KW  - straight-line
KW  - programs
KW  - NP-completeness
KW  - exact exponential-time algorithms
Y1  - 2020
U6  - https://doi.org/10.1007/s00224-020-10013-w
SN  - 1432-4350
SN  - 1433-0490
VL  - 65
IS  - 2
SP  - 344
EP  - 409
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Klie, Sebastian
A1  - Nikoloski, Zoran
A1  - Selbig, Joachim
T1  - Biological cluster evaluation for gene function prediction
JF  - Journal of computational biology
N2  - Recent advances in high-throughput omics techniques render it possible to decode the function of genes by using the "guilt-by-association" principle on biologically meaningful clusters of gene expression data. However, the existing frameworks for biological evaluation of gene clusters are hindered by two bottleneck issues: (1) the choice for the number of clusters, and (2) the external measures which do not take in consideration the structure of the analyzed data and the ontology of the existing biological knowledge. Here, we address the identified bottlenecks by developing a novel framework that allows not only for biological evaluation of gene expression clusters based on existing structured knowledge, but also for prediction of putative gene functions. The proposed framework facilitates propagation of statistical significance at each of the following steps: (1) estimating the number of clusters, (2) evaluating the clusters in terms of novel external structural measures, (3) selecting an optimal clustering algorithm, and (4) predicting gene functions. The framework also includes a method for evaluation of gene clusters based on the structure of the employed ontology. Moreover, our method for obtaining a probabilistic range for the number of clusters is demonstrated valid on synthetic data and available gene expression profiles from Saccharomyces cerevisiae. Finally, we propose a network-based approach for gene function prediction which relies on the clustering of optimal score and the employed ontology. Our approach effectively predicts gene function on the Saccharomyces cerevisiae data set and is also employed to obtain putative gene functions for an Arabidopsis thaliana data set.
KW  - algorithms
KW  - biochemical networks
KW  - combinatorics
KW  - computational molecular biology
KW  - databases
KW  - functional genomics
KW  - gene expression
KW  - NP-completeness
Y1  - 2014
U6  - https://doi.org/10.1089/cmb.2009.0129
SN  - 1066-5277
SN  - 1557-8666
VL  - 21
IS  - 6
SP  - 428
EP  - 445
PB  - Liebert
CY  - New Rochelle
ER  -