Refine
Has Fulltext
- yes (256)
Year of publication
Document Type
- Conference Proceeding (118)
- Article (67)
- Postprint (52)
- Review (6)
- Working Paper (6)
- Doctoral Thesis (4)
- Monograph/Edited Volume (3)
Language
- English (256) (remove)
Is part of the Bibliography
- no (256) (remove)
Keywords
- USA (7)
- United States (7)
- moderne jüdische Geschichte (6)
- modern Jewish history (5)
- 20. Jahrhundert (4)
- 20th century (4)
- 19. Jahrhundert (3)
- Diversity (3)
- 19th century (2)
- Fluoreszenz-Resonanz-Energie-Transfer (2)
Institute
- Extern (256) (remove)
A comparison of current trends within computer science teaching in school in Germany and the UK
(2013)
In the last two years, CS as a school subject has gained a lot of attention worldwide, although different countries have differing approaches to and experiences of introducing CS in schools. This paper reports on a study comparing current trends in CS at school, with a major focus on two countries, Germany and UK. A survey was carried out of a number of teaching professionals and experts from the UK and Germany with regard to the content and delivery of CS in school. An analysis of the quantitative data reveals a difference in foci in the two countries; putting this into the context of curricular developments we are able to offer interpretations of these trends and suggest ways in which curricula in CS at school should be moving forward.
This article is a summary of the work carried out by the Ministry of Education in Turkey, in terms of the development of a new ICT Curriculum, together with the e-Training of teachers who will play an important role in the forthcoming pilot study. Based on recent literature on the topic, the article starts by introducing the “F@tih Project”, a national project that aims to effectively integrate technology into schools. After assessing teachers’ and students’ ICT competencies, as defined internationally, the review continues with the proposed model for the e-training of teachers. Summarizing the process of development of the new ICT curriculum, researchers underline key points of the curriculum such as dimensions, levels and competencies. Then teachers’ e-training approaches, together with selected tools, are explained in line with the importance and stages of action research that will be used throughout the pilot implementation of the curriculum and e-training process.
Japan launched the new Course of Study in April 2012, which has been carried out in elementary schools and junior high schools. It will also be implemented in senior high schools from April 2013. This article presents an overview of the information studies education in the new Course of Study for K-12. Besides, the authors point out what role experts of informatics and information studies education should play in the general education centered around information studies that is meant to help people of the nation to lead an active, powerful, and flexible life until the satisfying end.
The traditional purpose of algorithm in education is to prepare students for programming. In our effort to introduce the practically missing computing science into Czech general secondary education, we have revisited this purpose.We propose an approach, which is in better accordance with the goals of general secondary education in Czechia. The importance of programming is diminishing, while recognition of algorithmic procedures and precise (yet concise) communication of algorithms is gaining importance. This includes expressing algorithms in natural language, which is more useful for most of the students than programming. We propose criteria to evaluate such descriptions. Finally, an idea about the limitations is required (inefficient algorithms, unsolvable problems, Turing’s test). We describe these adjusted educational goals and an outline of the resulting course. Our experience with carrying out the proposed intentions is satisfactory, although we did not accomplish all the defined goals.
We launched an original large-scale experiment concerning informatics learning in French high schools. We are using the France-IOI platform to federate resources and share observation for research. The first step is the implementation of an adaptive hypermedia based on very fine grain epistemic modules for Python programming learning. We define the necessary traces to be built in order to study the trajectories of navigation the pupils will draw across this hypermedia. It may be browsed by pupils either as a course support, or an extra help to solve the list of exercises (mainly for algorithmics discovery). By leaving the locus of control to the learner, we want to observe the different trajectories they finally draw through our system. These trajectories may be abstracted and interpreted as strategies and then compared for their relative efficiency. Our hypothesis is that learners have different profiles and may use the appropriate strategy accordingly. This paper presents the research questions, the method and the expected results.
We shall examine the Pedagogical Content Knowledge (PCK) of Computer Science (CS) teachers concerning students’ Computational Thinking (CT) problem solving skills within the context of a CS course in Dutch secondary education and thus obtain an operational definition of CT and ascertain appropriate teaching methodology. Next we shall develop an instrument to assess students’ CT and design a curriculum intervention geared toward teaching and improving students’ CT problem solving skills and competences. As a result, this research will yield an operational definition of CT, knowledge about CT PCK, a CT assessment instrument and teaching materials and accompanying teacher instructions. It shall contribute to CS teacher education, development of CT education and to education in other (STEM) subjects where CT plays a supporting role, both nationally and internationally.
Informatics as a school subject has been virtually absent from bilingual education programs in German secondary schools. Most bilingual programs in German secondary education started out by focusing on subjects from the field of social sciences. Teachers and bilingual curriculum experts alike have been regarding those as the most suitable subjects for bilingual instruction – largely due to the intercultural perspective that a bilingual approach provides. And though one cannot deny the gain that ensues from an intercultural perspective on subjects such as history or geography, this benefit is certainly not limited to social science subjects. In consequence, bilingual curriculum designers have already begun to include other subjects such as physics or chemistry in bilingual school programs. It only seems a small step to extend this to informatics. This paper will start out by addressing potential benefits of adding informatics to the range of subjects taught as part of English-language bilingual programs in German secondary education. In a second step it will sketch out a methodological (= didactical) model for teaching informatics to German learners through English. It will then provide two items of hands-on and tested teaching material in accordance with this model. The discussion will conclude with a brief outlook on the chances and prerequisites of firmly establishing informatics as part of bilingual school curricula in Germany.
In this paper we report on our experiments in teaching computer science concepts with a mix of tangible and abstract object manipulations. The goal we set ourselves was to let pupils discover the challenges one has to meet to automatically manipulate formatted text. We worked with a group of 25 secondary school pupils (9-10th grade), and they were actually able to “invent” the concept of mark-up language. From this experiment we distilled a set of activities which will be replicated in other classes (6th grade) under the guidance of maths teachers.
We present a concept of better integration of practical teaching in student teacher education in Computer Science. As an introduction to the workshop different possible scenarios are discussed on the basis of examples. Afterwards workshop participants will have the opportunity to discuss the application of the aconcepts in other settings.
Development of competence-oriented curricula is still an important theme in informatics education. Unfortunately informatics curricula, which include the domain of logic programming, are still input-orientated or lack detailed competence descriptions. Therefore, the development of competence model and of learning outcomes' descriptions is essential for the learning process in this domain. A prior research developed both. The next research step is to formulate test items to measure the described learning outcomes. This article describes this procedure and exemplifies test items. It also relates a test in school to the items and shows which misconceptions and typical errors are important to discuss in class. The test result can also confirm or disprove the competence model. Therefore, this school test is important for theoretical research as well as for the concrete planning of lessons. Quantitative analysis in school is important for evaluation and improvement of informatics education.
In this paper, we show how the theory of NP completeness can be introduced to students in secondary schools. The motivation of this research is that although there are difficult issues that require technical backgrounds, students are already familiar with demanding computational problems through games such as Sudoku or Tetris. Our intention is to bring together important concepts in the theory of NP completeness in such a way that students in secondary schools can easily understand them. This is part of our ongoing research about how to teach fundamental issues in Computer Science in secondary schools. We discuss what needs to be taught in which sequence in order to introduce ideas behind NP completeness to students without technical backgrounds.
The process of introducing compulsory ICT education at primary school level in the Czech Republic should be completed next year. Programming and Information, two topics from the basics of computer science have been included in a new textbook. The question is whether the new chapters of the textbook are comprehensible for primary school teachers, who have undergone no training in computer science. The paper reports on a pilot verification project in which pre-service primary school teachers were trained to teach these informatics topics.
Grammatica Grandonica
(2013)
In May 2010, Johann Ernst Hanxleden’s Grammatica Grandonica was rediscovered in Montecompatri (Lazio, Rome). Although historiographers attached much weight to the nearly oldest western grammar of Sanskrit, the precious manuscript was lost for several decades. The first aim of the present digital publication is to offer a photographical reproduction of the manuscript. This facsimile is accompanied by a double edition: a facing diplomatic edition with the Sanskrit in Malayāḷam script, followed by a transliterated established text.
Eye fixation durations during normal reading correlate with processing difficulty but the specific cognitive mechanisms reflected in these measures are not well understood. This study finds support in German readers’ eyefixations for two distinct difficulty metrics: surprisal, which reflects the change in probabilities across syntactic analyses as new words are integrated, and retrieval, which quantifies comprehension difficulty in terms of working memory constraints. We examine the predictions of both metrics using a family of dependency parsers indexed by an upper limit on the number of candidate syntactic analyses they retain at successive words. Surprisal models all fixation measures and regression probability. By contrast, retrieval does not model any measure in serial processing. As more candidate analyses are considered in parallel at each word, retrieval can account for the same measures as surprisal. This pattern suggests an important role for ranked parallelism in theories of sentence comprehension.
There has been a substantial increase in the percentage for publications with co-authors located in departments from different countries in 12 major journals of psychology. The results are evidence for a remarkable internationalization of psychological research, starting in the mid 1970s and increasing in rate at the beginning of the 1990s. This growth occurs against a constant number of articles with authors from the same country; it is not due to a concomitant increase in the number of co-authors per article. Thus, international collaboration in psychology is obviously on the rise.
The use of nano zerovalent iron (nZVI) for environmental remediation is a promising new technique for in situ remediation. Due to its high surface area and high reactivity, nZVI is able to dechlorinate organic contaminants and render them harmless. Limited mobility, due to fast aggregation and sedimentation of nZVI, limits the capability for source and plume remediation. Carbo-Iron is a newly developed material consisting of activated carbon particles (d50 = 0,8 µm) that are plated with nZVI particles. These particles combine the mobility of activated carbon and the reactivity of nZVI. This paper presents the first results of the transport experiments.
Abstract interpretation-based model checking provides an approach to verifying properties of infinite-state systems. In practice, most previous work on abstract model checking is either restricted to verifying universal properties, or develops special techniques for temporal logics such as modal transition systems or other dual transition systems. By contrast we apply completely standard techniques for constructing abstract interpretations to the abstraction of a CTL semantic function, without restricting the kind of properties that can be verified. Furthermore we show that this leads directly to implementation of abstract model checking algorithms for abstract domains based on constraints, making use of an SMT solver.
A deterministic cycle scheduling of partitions at the operating system level is supposed for a multiprocessor system. In this paper, we propose a tool for generating such schedules. We use constraint based programming and develop methods and concepts for a combined interactive and automatic partition scheduling system. This paper is also devoted to basic methods and techniques for modeling and solving this partition scheduling problem. Initial application of our partition scheduling tool has proved successful and demonstrated the suitability of the methods used.
A constraint programming system combines two essential components: a constraint solver and a search engine. The constraint solver reasons about satisfiability of conjunctions of constraints, and the search engine controls the search for solutions by iteratively exploring a disjunctive search tree defined by the constraint program. The Monadic Constraint Programming framework gives a monadic definition of constraint programming where the solver is defined as a monad threaded through the monadic search tree. Search and search strategies can then be defined as firstclass objects that can themselves be built or extended by composable search transformers. Search transformers give a powerful and unifying approach to viewing search in constraint programming, and the resulting constraint programming system is first class and extremely flexible.
The interest in extensions of the logic programming paradigm beyond the class of normal logic programs is motivated by the need of an adequate representation and processing of knowledge. One of the most difficult problems in this area is to find an adequate declarative semantics for logic programs. In the present paper a general preference criterion is proposed that selects the ‘intended’ partial models of generalized logic programs which is a conservative extension of the stationary semantics for normal logic programs of [Prz91]. The presented preference criterion defines a partial model of a generalized logic program as intended if it is generated by a stationary chain. It turns out that the stationary generated models coincide with the stationary models on the class of normal logic programs. The general wellfounded semantics of such a program is defined as the set-theoretical intersection of its stationary generated models. For normal logic programs the general wellfounded semantics equals the wellfounded semantics.
Different properties of programs, implemented in Constraint Handling Rules (CHR), have already been investigated. Proving these properties in CHR is fairly simpler than proving them in any type of imperative programming language, which triggered the proposal of a methodology to map imperative programs into equivalent CHR. The equivalence of both programs implies that if a property is satisfied for one, then it is satisfied for the other. The mapping methodology could be put to other beneficial uses. One such use is the automatic generation of global constraints, at an attempt to demonstrate the benefits of having a rule-based implementation for constraint solvers.
In the most abstract definition of its operational semantics, the declarative and concurrent programming language CHR is trivially non-terminating for a significant class of programs. Common refinements of this definition, in closing the gap to real-world implementations, compromise on declarativity and/or concurrency. Building on recent work and the notion of persistent constraints, we introduce an operational semantics avoiding trivial non-termination without compromising on its essential features.
We introduce a simple approach extending the input language of Answer Set Programming (ASP) systems by multi-valued propositions. Our approach is implemented as a (prototypical) preprocessor translating logic programs with multi-valued propositions into logic programs with Boolean propositions only. Our translation is modular and heavily benefits from the expressive input language of ASP. The resulting approach, along with its implementation, allows for solving interesting constraint satisfaction problems in ASP, showing a good performance.
We present the tool Kato which is, to the best of our knowledge, the first tool for plagiarism detection that is directly tailored for answer-set programming (ASP). Kato aims at finding similarities between (segments of) logic programs to help detecting cases of plagiarism. Currently, the tool is realised for DLV programs but it is designed to handle various logic-programming syntax versions. We review basic features and the underlying methodology of the tool.
In this talk, I would like to share my experiences gained from participating in four CSP solver competitions and the second ASP solver competition. In particular, I’ll talk about how various programming techniques can make huge differences in solving some of the benchmark problems used in the competitions. These techniques include global constraints, table constraints, and problem-specific propagators and labeling strategies for selecting variables and values. I’ll present these techniques with experimental results from B-Prolog and other CLP(FD) systems.
We describe a framework to support the implementation of web-based systems to manipulate data stored in relational databases. Since the conceptual model of a relational database is often specified as an entity-relationship (ER) model, we propose to use the ER model to generate a complete implementation in the declarative programming language Curry. This implementation contains operations to create and manipulate entities of the data model, supports authentication, authorization, session handling, and the composition of individual operations to user processes. Furthermore and most important, the implementation ensures the consistency of the database w.r.t. the data dependencies specified in the ER model, i.e., updates initiated by the user cannot lead to an inconsistent state of the database. In order to generate a high-level declarative implementation that can be easily adapted to individual customer requirements, the framework exploits previous works on declarative database programming and web user interface construction in Curry.
Preface
(2010)
The workshops on (constraint) logic programming (WLP) are the annual meeting of the Society of Logic Programming (GLP e.V.) and bring together researchers interested in logic programming, constraint programming, and related areas like databases, artificial intelligence and operations research. In this decade, previous workshops took place in Dresden (2008), Würzburg (2007), Vienna (2006), Ulm (2005), Potsdam (2004), Dresden (2002), Kiel (2001), and Würzburg (2000). Contributions to workshops deal with all theoretical, experimental, and application aspects of constraint programming (CP) and logic programming (LP), including foundations of constraint/ logic programming. Some of the special topics are constraint solving and optimization, extensions of functional logic programming, deductive databases, data mining, nonmonotonic reasoning, , interaction of CP/LP with other formalisms like agents, XML, JAVA, program analysis, program transformation, program verification, meta programming, parallelism and concurrency, answer set programming, implementation and software techniques (e.g., types, modularity, design patterns), applications (e.g., in production, environment, education, internet), constraint/logic programming for semantic web systems and applications, reasoning on the semantic web, data modelling for the web, semistructured data, and web query languages.
In this paper we consider a simple syntactic extension of Answer Set Programming (ASP) for dealing with (nested) existential quantifiers and double negation in the rule bodies, in a close way to the recent proposal RASPL-1. The semantics for this extension just resorts to Equilibrium Logic (or, equivalently, to the General Theory of Stable Models), which provides a logic-programming interpretation for any arbitrary theory in the syntax of Predicate Calculus. We present a translation of this syntactic class into standard logic programs with variables (either disjunctive or normal, depending on the input rule heads), as those allowed by current ASP solvers. The translation relies on the introduction of auxiliary predicates and the main result shows that it preserves strong equivalence modulo the original signature.
We propose a paraconsistent declarative semantics of possibly inconsistent generalized logic programs which allows for arbitrary formulas in the body and in the head of a rule (i.e. does not depend on the presence of any specific connective, such as negation(-as-failure), nor on any specific syntax of rules). For consistent generalized logic programs this semantics coincides with the stable generated models introduced in [HW97], and for normal logic programs it yields the stable models in the sense of [GL88].
A wide range of additional forward chaining applications could be realized with deductive databases, if their rule formalism, their immediate consequence operator, and their fixpoint iteration process would be more flexible. Deductive databases normally represent knowledge using stratified Datalog programs with default negation. But many practical applications of forward chaining require an extensible set of user–defined built–in predicates. Moreover, they often need function symbols for building complex data structures, and the stratified fixpoint iteration has to be extended by aggregation operations. We present an new language Datalog*, which extends Datalog by stratified meta–predicates (including default negation), function symbols, and user–defined built–in predicates, which are implemented and evaluated top–down in Prolog. All predicates are subject to the same backtracking mechanism. The bottom–up fixpoint iteration can aggregate the derived facts after each iteration based on user–defined Prolog predicates.
Deductive databases need general formulas in rule bodies, not only conjuctions of literals. This is well known since the work of Lloyd and Topor about extended logic programming. Of course, formulas must be restricted in such a way that they can be effectively evaluated in finite time, and produce only a finite number of new tuples (in each iteration of the TP-operator: the fixpoint can still be infinite). It is also necessary to respect binding restrictions of built-in predicates: many of these predicates can be executed only when certain arguments are ground. Whereas for standard logic programming rules, questions of safety, allowedness, and range-restriction are relatively easy and well understood, the situation for general formulas is a bit more complicated. We give a syntactic analysis of formulas that guarantees the necessary properties.
Parafoveal Load of Word N+1 Modulates Preprocessing Effectivenessof Word N+2 in Chinese Reading
(2010)
Preview benefits (PBs) from two words to the right of the fixated one (i.e., word N+2)and associated parafoveal-on-foveal effects are critical for proposals of distributed lexical processing during reading. This experiment examined parafoveal processing during reading of Chinese sentences, using a boundary manipulation of N+2-word preview with low- and high-frequency words N+1. The main findings were (a) an identity PB for word N+2 that was (b) primarily observed when word N+1 was of high frequency (i.e., an interaction between frequency of word N+1 and PB for word N+2), and (c) a parafoveal-on-foveal frequency effect of word N+1 for fixation durations on word N. We discuss implications for theories of serial attention shifts and parallel distributed processing of words during reading.
We examined individual differences in masked repetition priming by re-analyzing item-level response-time (RT) data from three experiments. Using a linear mixed model (LMM) with subjects and items specified as crossed random factors, the originally reported priming and word-frequency effects were recovered. In the same LMM, we estimated parameters describing the distributions of these effects across subjects. Subjects’ frequency and priming effects correlated positively with each other and negatively with mean RT. These correlation estimates, however, emerged only with a reciprocal transformation of RT (i.e., -1/RT), justified on the basis of distributional analyses. Different correlations, some with opposite sign, were obtained (1) for untransformed or logarithmic RTs or (2) when correlations were computed using within-subject analyses. We discuss the relevance of the new results for accounts of masked priming, implications of applying RT transformations, and the use of LMMs as a tool for the joint analysis of experimental effects and associated individual differences.
Microsaccades are very small, involuntary flicks in eye position that occur on average once or twice per second during attempted visual fixation. Microsaccades give rise to EMG eye muscle spikes that can distort the spectrum of the scalp EEG and mimic increases in gamma band power. Here we demonstrate that microsaccades are also accompanied by genuine and sizeable cortical activity, manifested in the EEG. In three experiments, high-resolution eye movements were corecorded with the EEG: during sustained fixation of checkerboard and face stimuli and in a standard visual oddball task that required the counting of target stimuli. Results show that microsaccades as small as 0.15° generate a field potential over occipital cortex and midcentral scalp sites 100 –140 ms after movement onset, which resembles the visual lambda response evoked by larger voluntary saccades. This challenges the standard assumption of human brain imaging studies that saccade-related brain activity is precluded by fixation, even when fully complied with. Instead, additional cortical potentials from microsaccades were present in 86% of the oddball task trials and of similar amplitude as the visual response to stimulus onset. Furthermore, microsaccade probability varied systematically according to the proportion of target stimuli in the oddball task, causing modulations of late stimulus-locked event-related potential (ERP) components. Microsaccades present an unrecognized source of visual brain signal that is of interest for vision research and may have influenced the data of many ERP and neuroimaging studies.
It has recently been demonstrated that the presentation of a rare target in a visual oddball paradigm induces a prolonged inhibition of microsaccades. In the field of electrophysiology, the amplitude of the P300 component in event-related potentials (ERP) has been shown to be sensitive to the stimulus category (target vs. non target) of the eliciting stimulus, its overall probability, and the preceding stimulus sequence. In the present study we further specify the functional underpinnings of the prolonged microsaccadic inhibition in the visual oddball task, showing that the stimulus category, the frequency of a stimulus and the preceding stimulus sequence influence microsaccade rate. Furthermore, by co-recording ERPs and eye-movements, we were able to demonstrate that, despite being largely sensitive to the same experimental manipulation, the amplitude of P300 and the microsaccadic inhibition predict each other very weakly, and thus constitute two independent measures of the brain’s response to rare targets in the visual oddball paradigm.
The effect of moderate rates of nitrogen deposition on ground floor vegetation is poorly predicted by uncontrolled surveys or fertilization experiments using high rates of nitrogen (N) addition. We compared the temporal trends of ground floor vegetation in permanent plots with moderate (7–13 kg ha−1 year−1) and lower bulk N deposition (4–6 kg ha−1 year−1) in southern Sweden during 1982–1998. We examined whether trends differed between growth forms (vascular plants and bryophytes) and vegetation types (three types of coniferous forest, deciduous forest, and bog). Trends of site-standardized cover and richness varied among growth forms, vegetation types, and deposition regions. Cover in spruce forests decreased at the same rate with both moderate and low deposition. In pine forests cover decreased faster with moderate deposition and in bogs cover decreased faster with low deposition. Cover of bryophytes in spruce forests increased at the same rate with both moderate and low deposition. In pine forests cover decreased faster with moderate deposition and in bogs and deciduous forests there was a strong non-linear increase with moderate deposition. The trend of number of vascular plants was constant with moderate and decreased with low deposition. We found no trend in the number of bryophyte species. We propose that the decrease of cover and number with low deposition was related to normal ecosystem development (increased shading), suggesting that N deposition maintained or increased the competitiveness of some species in the moderate-deposition region. Deposition had no consistent negative effect on vegetation suggesting that it is less important than normal successional processes.
The emergence of information extraction (IE) oriented pattern engines has been observed during the last decade. Most of them exploit heavily finite-state devices. This paper introduces ExPRESS – a new extraction pattern engine, whose rules are regular expressions over flat feature structures. The underlying pattern language is a blend of two previously introduced IE oriented pattern formalisms, namely, JAPE, used in the widely known GATE system, and the unificationbased XTDL formalism used in SProUT. A brief and technical overview of ExPRESS, its pattern language and the pool of its native linguistic components is given. Furthermore, the implementation of the grammar interpreter is addressed too.
This paper describes a two-level formalism where feature structures are used in contextual rules. Whereas usual two-level grammars describe rational sets over symbol pairs, this new formalism uses tree structured regular expressions. They allow an explicit and precise definition of the scope of feature structures. A given surface form may be described using several feature structures. Feature unification is expressed in contextual rules using variables, like in a unification grammar. Grammars are compiled in finite state multi-tape transducers.
Since Harris’ parser in the late 50s, multiword units have been progressively integrated in parsers. Nevertheless, in the most part, they are still restricted to compound words, that are more stable and less numerous. Actually, language is full of semi-fixed expressions that also form basic semantic units: semi-fixed adverbial expressions (e.g. time), collocations. Like compounds, the identification of these structures limits the combinatorial complexity induced by lexical ambiguity. In this paper, we detail an experiment that largely integrates these notions in a finite-state procedure of segmentation into super-chunks, preliminary to a parser.We show that the chunker, developped for French, reaches 92.9% precision and 98.7% recall. Moreover, multiword units realize 36.6% of the attachments within nominal and prepositional phrases.
Finite state methods for natural language processing often require the construction and the intersection of several automata. In this paper, we investigate the question of determining the best order in which these intersections should be performed. We take as an example lexical disambiguation in polarity grammars. We show that there is no efficient way to minimize the state complexity of these intersections.
We present an algorithm that computes a function that assigns consecutive integers to trees recognized by a deterministic, acyclic, finite-state, bottom-up tree automaton. Such function is called minimal perfect hashing. It can be used to identify trees recognized by the automaton. Its value may be seen as an index in some other data structures. We also present an algorithm for inverted hashing.
In this work an extension of CSSR algorithm using Maximum Entropy Models is introduced. Preliminary experiments to perform Named Entity Recognition with this new system are presented.
We introduce and discuss a number of issues that arise in the process of building a finite-state morphological analyzer for Urdu, in particular issues with potential ambiguity and non-concatenative morphology. Our approach allows for an underlyingly similar treatment of both Urdu and Hindi via a cascade of finite-state transducers that transliterates the very different scripts into a common ASCII transcription system. As this transliteration system is based on the XFST tools that the Urdu/Hindi common morphological analyzer is also implemented in, no compatibility problems arise.
Nested complementation plays an important role in expressing counter- i.e. star-free and first-order definable languages and their hierarchies. In addition, methods that compile phonological rules into finite-state networks use double-nested complementation or “double negation”. This paper reviews how the double-nested complementation extends to a relatively new operation, generalized restriction (GR), coined by the author (Yli-Jyrä and Koskenniemi 2004). This operation encapsulates a double-nested complementation and elimination of a concatenation marker, diamond, whose finite occurrences align concatenations in the arguments of the operation. The paper demonstrates that the GR operation has an interesting potential in expressing regular languages, various kinds of grammars, bimorphisms and relations. This motivates a further study of optimized implementation of the operator.
This article describes a HMM-based word-alignment method that can selectively enforce a contiguity constraint. This method has a direct application in the extraction of a bilingual terminological lexicon from a parallel corpus, but can also be used as a preliminary step for the extraction of phrase pairs in a Phrase-Based Statistical Machine Translation system. Contiguous source words composing terms are aligned to contiguous target language words. The HMM is transformed into a Weighted Finite State Transducer (WFST) and contiguity constraints are enforced by specific multi-tape WFSTs. The proposed method is especially suited when basic linguistic resources (morphological analyzer, part-of-speech taggers and term extractors) are available for the source language only.
Generalized Two-Level Grammar (GTWOL) provides a new method for compilation of parallel replacement rules into transducers. The current paper identifies the role of generalized lenient composition (GLC) in this method. Thanks to the GLC operation, the compilation method becomes bipartite and easily extendible to capture various application modes. In the light of three notions of obligatoriness, a modification to the compilation method is proposed. We argue that the bipartite design makes implementation of parallel obligatoriness, directionality, length and rank based application modes extremely easy, which is the main result of the paper.
Morphological analyses based on word syntax approaches can encounter difficulties with long distance dependencies. The reason is that in some cases an affix has to have access to the inner structure of the form with which it combines. One solution is the percolation of features from ther inner morphemes to the outer morphemes with some process of feature unification. However, the obstacle of percolation constraints or stipulated features has lead some linguists to argue in favour of other frameworks such as, e.g., realizational morphology or parallel approaches like optimality theory. This paper proposes a linguistic analysis of two long distance dependencies in the morphology of Russian verbs, namely secondary imperfectivization and deverbal nominalization.We show how these processes can be reanalysed as local dependencies. Although finitestate frameworks are not bound by such linguistically motivated considerations, we present an implementation of our analysis as proposed in [1] that does not complicate the grammar or enlarge the network unproportionally.