PUaNLP 2015 Abstracts


Full Papers
Paper Nr: 2
Title:

What Did You Mean? - Facing the Challenges of User-generated Software Requirements

Authors:

Michaela Geierhos, Sabine Schulze and Frederik Simon Bäumer

Abstract: Existing approaches towards service composition demand requirements of the customers in terms of service templates, service query profiles, or partial process models. However, addressed non-expert customers may be unable to fill-in the slots of service templates as requested or to describe, for example, pre- and postconditions, or even have difficulties in formalizing their requirements. Thus, our idea is to provide nonexperts with suggestions how to complete or clarify their requirement descriptions written in natural language. Two main issues have to be tackled: (1) partial or full inability (incapacity) of non-experts to specify their requirements correctly in formal and precise ways, and (2) problems in text analysis due to fuzziness in natural language. We present ideas how to face these challenges by means of requirement disambiguation and completion. Therefore, we conduct ontology-based requirement extraction and similarity retrieval based on requirement descriptions that are gathered from App marketplaces. The innovative aspect of our work is that we support users without expert knowledge in writing their requirements by simultaneously resolving ambiguity, vagueness, and underspecification in natural language.

Paper Nr: 3
Title:

Building TALAA, a Free General and Categorized Arabic Corpus

Authors:

Essma Selab and Ahmed Guessoum

Abstract: Arabic natural language processing (ANLP) has gained increasing interest over the last decade. However, the development of ANLP tools depends on the availability of large corpora. It turns out unfortunately that the scientific community has a deficit in large and varied Arabic corpora, especially ones that are freely accessible. With the Internet continuing its exponential growth, Arabic Internet content has also been following the trend, yielding large amounts of textual data available through different Arabic websites. This paper describes the TALAA corpus, a voluminous general Arabic corpus, built from daily Arabic newspaper websites. The corpus is a collection of more than 14 million words with 15,891,729 tokens contained in 57,827 different articles. A part of the TALAA corpus has been tagged to construct an annotated Arabic corpus of about 7000 tokens, the POS-tagger used containing a set of 58 detailed tags. The annotated corpus was manually checked by two human experts. The methodology used to construct TALAA is presented and various metrics are applied to it, showing the usefulness of the corpus. The corpus can be made available to the scientific community upon authorisation.

Paper Nr: 4
Title:

Completing Mixed Language Grammars Through Womb Grammars Plus Ontologies

Authors:

Ife Adebara, Veronica Dahl and Sergio Tessaris

Abstract: Womb Grammars are a recently introduced constraint-based methodology for acquiring linguistic information on a given language from that of another, implemented in CHRG (Constraint Handling Rule Grammars). This is a position paper that discusses their possible adaptation to multilingual text parsing. In particular, we propose to detect unspecified information with appropriate ontologies. Our proposed methodology exploits the descriptive power of constraints both for defining sentence acceptability and for inferring lexical knowledge from a word’s sentential context, even when foreign.

Paper Nr: 5
Title:

Underspecified Relations with a Formal Language of Situation Theory

Authors:

Roussanka Loukanova

Abstract: The paper is an introduction to a formal language of Situation Theory. The language provides algorithmic processing of situated information. We introduce specialized, restricted variables that are recursively constrained to satisfy type-theoretic conditions by restrictions and algorithmic assignments. The restricted variables designate recursively connected networks of memory locations for ‘saving’ parametric information that depends on situations and restrictions over objects. The formal definitions introduce richly informative typed language for classification and representation of underspecified, parametric, and partial information that is dependent on situations.