Neural Semantİc Parsİng, Annotation and Evaluation for Turkish
Ambargo SüresiAcik erisim
Üst veriTüm öğe kaydını göster
Semantic representation is a way of expressing the meaning of a text that can be processed by a machine to serve a particular natural language processing (NLP) task. Universal Conceptual Cognitive Annotation (UCCA) is one such semantic representation form that is both cognitively and linguistically inspired. UCCA represents the meaning of a text with a directed acyclic graph (DAG) whose nodes can be either terminal or non-terminal nodes, where terminal nodes correspond to tokens and multi-tokens in the text, non-terminal nodes comprise several tokens that are jointly viewed as a single entity according to some semantic or cognitive consideration, and edges indicate the role of a child in a relation. In this thesis, there are three research paths within UCCA representation especially for Turkish language: semantic parsing, data annotation, and evaluation of UCCA representation as extrinsic evaluation in other NLP problems. In the first part of the thesis, we present supervised deep learning-based parsing models, which are transition and graph-based approaches, to better analyze the approaches for UCCA representation. We also present an unsupervised deep learning model that leverages pre-trained language models (PLM) as an external knowledge source. In the second part of the thesis, we present a Turkish UCCA-annotated dataset, that is built using the proposed graph-based semantic parser in a semi-automatic pipeline. Finally, we investigate using UCCA for other NLP tasks including Semantic Textual Similarity (STS), text classification, and question answering (QA) as extrinsic evaluation of UCCA representation. It is therefore reasonable to ask whether we can improve the performance of NLP tasks by using semantic information in the form of UCCA representation. In conclusion, the results show that semantic information in the form of UCCA representation improves performance in NLP tasks, especially in tasks that require more semantic information, such as QA.