NLPinAI 2019 Abstracts


Full Papers
Paper Nr: 1
Title:

What Kind of Natural Language Inference are NLP Systems Learning: Is this Enough?

Authors:

Jean-Philippe Bernardy and Stergios Chatzikyriakidis

Abstract: In this paper, we look at Natural Language Inference, arguing that the notion of inference the current NLP systems are learning is much narrower compared to the range of inference patterns found in human reasoning. We take a look at the history and the nature of creating datasets for NLI. We discuss the datasets that are mainly used today for the relevant tasks and show why those are not enough to generalize to other reasoning tasks, e.g. logical and legal reasoning, or reasoning in dialogue settings. We then proceed to propose ways in which this can be remedied, effectively producing more realistic datasets for NLI. Lastly, we argue that the NLP community could have been too hasty to altogether dismiss symbolic approaches in the study of NLI, given that these might still be relevant for more fine-grained cases of reasoning. As such, we argue for a more pluralistic take on tackling NLI, favoring hybrid rather than non-hybrid approaches.

Paper Nr: 6
Title:

A Comparative Study between Possibilistic and Probabilistic Approaches for Query Translation Disambiguation

Authors:

Wiem Ben Romdhane, Bilel Elayeb and Narjès B. Ben Saoud

Abstract: We propose in this paper a new hybrid possibilistic query translation disambiguation approach combining a probability-to-possibility transformation-based approach with a discriminative possibilistic one in order to take advantage of their strengths. The disambiguation process in this approach requires a bilingual lexicon and a parallel text corpus. Given a source query terms, the first step consists of selecting the existing noun phrases (NPs) and the remaining single terms which are not included in any NPs. We have translated these identified NPs as units through the probability-to-possibility transformation-based approach, as a mean to introduce further tolerance, using a language model and translation patterns. Then, the remaining single source query terms are translated via the discriminative possibilistic approach. We have modelled in this step the translation relevance of a given single source query term via two measures: the possible relevance excludes irrelevant translations, while the necessary relevance reinforces the translations not removed by the possibility. We have developed a set of experiments using the CLEF-2003 French-English CLIR test collection and the French-English parallel text corpus Europarl. The reported results highlight some statistically significant improvements of the hybrid possibilistic approach in the CLIR effectiveness using diverse evaluation metrics and scenarios for both long and short queries.

Paper Nr: 7
Title:

Modelling the Semantic Change Dynamics using Diachronic Word Embedding

Authors:

Mohamed A. Boukhaled, Benjamin Fagard and Thierry Poibeau

Abstract: In this contribution, we propose a computational model to predict the semantic evolution of words over time. Though semantic change is very complex and not well suited to analytical manipulation, we believe that computational modelling is a crucial tool to study such phenomenon. Our aim is to capture the systemic change of word’s meanings in an empirical model that can also predict this type of change, making it falsifiable. The model that we propose is based on the long short-term memory units architecture of recurrent neural networks trained on diachronic word embeddings. In order to illustrate the significance of this kind of empirical model, we then conducted an experimental evaluation using the Google Books NGram corpus. The results show that the model is effective in capturing the semantic change and can achieve a high degree of accuracy on predicting words’ distributional semantics.

Paper Nr: 9
Title:

Prior Probabilities of Allen Interval Relations over Finite Orders

Authors:

Tim Fernando and Carl Vogel

Abstract: The probability that intervals are related by a particular Allen relation is calculated relative to sample spaces Ωn given by the number n of, in one case, points, and, in another, interval names. In both cases, worlds in the sample space are assumed equiprobable, and Allen relations are classified as short, medium and long, according to the number of shared borders.

Short Papers
Paper Nr: 3
Title:

Chat Language Normalisation using Machine Learning Methods

Authors:

Daiga Deksne

Abstract: This paper reports on the development of a chat language normalisation module for the Latvian language. The model is trained using a random forest classifier algorithm that learns to rate normalisation candidates for every word. Candidates are generated using pre-trained word embeddings, N-gram lists, a spelling checker module and some other modules. The use of different means in generation of the normalisation candidates allows covering a wide spectre of errors. We are planning to use this normalisation module in the development of intelligent virtual assistants. We have performed tests to detect if the results of the intent detection module improve when text is pre-processed with the normalisation module.

Paper Nr: 5
Title:

Sentiment Analysis of Czech Texts: An Algorithmic Survey

Authors:

Erion Çano and Ondřej Bojar

Abstract: In the area of online communication, commerce and transactions, analyzing sentiment polarity of texts written in various natural languages has become crucial. While there have been a lot of contributions in resources and studies for the English language, “smaller” languages like Czech have not received much attention. In this survey, we explore the effectiveness of many existing machine learning algorithms for sentiment analysis of Czech Facebook posts and product reviews. We report the sets of optimal parameter values for each algorithm and the scores in both datasets. We finally observe that support vector machines are the best classifier and efforts to increase performance even more with bagging, boosting or voting ensemble schemes fail to do so.

Paper Nr: 8
Title:

Towards a Principled Computational System of Syntactic Ambiguity Detection and Representation

Authors:

Hilton Alers-Valentín, Carlos G. Rivera-Velázquez, J. F. Vega-Riveros and Nayda G. Santiago

Abstract: This paper presents the current status of a research project in computational linguistics/natural language processing whose main objective is to develop a symbolic, principle-based, bottom-up system in order to process and parse sequences of lexical items as declarative sentences in English. For each input sequence, the parser should produce (maximally) binary trees as generated by the Merge operation on lexical items. Due to parametric variations in the algorithm, the parser should be able to output (up to four) grammatically feasible structural representations accounted by alternative constituent analyses because of structural ambiguities in the parsing of the input string. Finally, the system should be able to state whether a particular string of lexical items is a possible sentence in account of its parsability. The system has a scalable software framework that may be suitable for the analysis of typologically-diverse natural languages.