On the complexity of the smallest grammar problem over fixed alphabets
- In the smallest grammar problem, we are given a word w and we want to compute a preferably small context-free grammar G for the singleton language {w} (where the size of a grammar is the sum of the sizes of its rules, and the size of a rule is measured by the length of its right side). It is known that, for unbounded alphabets, the decision variant of this problem is NP-hard and the optimisation variant does not allow a polynomial-time approximation scheme, unless P = NP. We settle the long-standing open problem whether these hardness results also hold for the more realistic case of a constant-size alphabet. More precisely, it is shown that the smallest grammar problem remains NP-complete (and its optimisation version is APX-hard), even if the alphabet is fixed and has size of at least 17. The corresponding reduction is robust in the sense that it also works for an alternative size-measure of grammars that is commonly used in the literature (i. e., a size measure also taking the number of rules into account), and it also allows toIn the smallest grammar problem, we are given a word w and we want to compute a preferably small context-free grammar G for the singleton language {w} (where the size of a grammar is the sum of the sizes of its rules, and the size of a rule is measured by the length of its right side). It is known that, for unbounded alphabets, the decision variant of this problem is NP-hard and the optimisation variant does not allow a polynomial-time approximation scheme, unless P = NP. We settle the long-standing open problem whether these hardness results also hold for the more realistic case of a constant-size alphabet. More precisely, it is shown that the smallest grammar problem remains NP-complete (and its optimisation version is APX-hard), even if the alphabet is fixed and has size of at least 17. The corresponding reduction is robust in the sense that it also works for an alternative size-measure of grammars that is commonly used in the literature (i. e., a size measure also taking the number of rules into account), and it also allows to conclude that even computing the number of rules required by a smallest grammar is a hard problem. On the other hand, if the number of nonterminals (or, equivalently, the number of rules) is bounded by a constant, then the smallest grammar problem can be solved in polynomial time, which is shown by encoding it as a problem on graphs with interval structure. However, treating the number of rules as a parameter (in terms of parameterised complexity) yields W[1]-hardness. Furthermore, we present an O(3(vertical bar w vertical bar)) exact exponential-time algorithm, based on dynamic programming. These three main questions are also investigated for 1-level grammars, i. e., grammars for which only the start rule contains nonterminals on the right side; thus, investigating the impact of the "hierarchical depth" of grammars on the complexity of the smallest grammar problem. In this regard, we obtain for 1-level grammars similar, but slightly stronger results.…
Author details: | Katrin CaselORCiDGND, Henning FernauORCiDGND, Serge Gaspers, Benjamin Gras, Markus L. SchmidORCiD |
---|---|
DOI: | https://doi.org/10.1007/s00224-020-10013-w |
ISSN: | 1432-4350 |
ISSN: | 1433-0490 |
Title of parent work (English): | Theory of computing systems |
Publisher: | Springer |
Place of publishing: | New York |
Publication type: | Article |
Language: | English |
Date of first publication: | 2020/11/13 |
Publication year: | 2020 |
Release date: | 2023/01/16 |
Tag: | NP-completeness; exact exponential-time algorithms; grammar-based compression; programs; smallest grammar problem; straight-line |
Volume: | 65 |
Issue: | 2 |
Number of pages: | 66 |
First page: | 344 |
Last Page: | 409 |
Funding institution: | Projekt DEAL |
Organizational units: | An-Institute / Hasso-Plattner-Institut für Digital Engineering gGmbH |
DDC classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 000 Informatik, Informationswissenschaft, allgemeine Werke |
Peer review: | Referiert |
Publishing method: | Open Access / Hybrid Open-Access |
License (German): | CC-BY - Namensnennung 4.0 International |