Accepted Papers

  • Multilingual Conversation ASCII To Unicode In Indic Script
    Dr.Rajwinder Singh1 and Charanjiv Singh Saroa2 1Department of Punjabi, Punjabi University, Patiala, India 2Department of Computer Engineering, Punjabi University Patiala, India

    In this paper we discuss the various ASCII based scripts that are made for Indian languages and the problems associated with these types of scripts. Then we will discuss the solution we suggest to overcome these problems in the form of “Multilingual ASCII to Unicode Converter”. We also explain the need of regional languages for the development of a person. This paper also contains information of UNICODE and various other issues related to regional languages.

  • Random Set Theory to Interpret Topic Models in terms of Ontology Concepts
    Md Abul Bashar and Yuefeng Li Queensland University of Technology (QUT), Brisbane, QLD 4001, Australia

    Topic modelling, is a popular technique in text mining. However, topic models are difficult to interpret due to incoherence and lack of background context. Many applications require the interpretation of topic models so that both users and machines can use them effectively. To address this problem, this paper proposes a model to automatically interpret topic models in terms of concepts in a domain ontology. The interpretation has two parts: (a) a semantic structure—a network of concepts and their semantic relations, which provides human understandable knowledge for describing the meaning of the topic models and (b) a contextual structure—summarises the statistical aspect of the context for using the knowledge in a real case. It uses random set theory to construct the semantic structure, the core part of the interpretation. Taking the advantage of a domain ontology and a set of relevant summary statistics, the model is able to interpret the topic models. The model is evaluated by comparing it with different baseline models on two standard datasets. The results show that the performance of the proposed model is significantly better than baseline models.

  • Brave, Normal and Cautious Reasoning for Controlled Natural Language Processing
    Rolf Schwitter Macquarie University Sydney, NSW 2109, Australia

    Machine-oriented controlled natural language processing requires an expressive knowledge representation language that can deal with non-monotonic knowledge and allows for question answering, in particular in situations where disjunctive uncertainty occurs. We argue that Answer Set Programming is an interesting knowledge representation paradigm for controlled natural language processing and show how this formal language is used in the PENGASP system for processing disjunctive knowledge. In addition to the normal reasoning mode that is used in the PENGASP system if the translation of a specification results in a unique answer set, we introduce two additional reasoning modes that can deal with specifications that lead to more than one answer set. In the case of multiple answer sets, either brave reasoning or cautious reasoning can be used to answer questions over the union or the intersection of answer sets.

  • How to match bilingual tweets?
    Karima Abidi1 Kamel Smali 2 12SMarT Group,LORIA INRIA, France

    In this paper, we propose a method that aligns comparable bilingual tweets which, not only takes into account the speci city of a Tweet, but treats also proper names, dates and numbers in two di erent languages. This permits to retrieve more relevant target tweets. The process of matching proper names between Arabic and English is a dicult task, because these two languages use di erent scripts. For that, we used an approach which projects the sounds of an English proper name into Arabic and aligns it with the most appropriate proper name. We evaluated the method with a classical measure and compared it to the one we developed. The experiments have been achieved on two parallel corpora and shows that our measure outperforms the baseline by 5.6% at R@1 recall.