Discourse Parsing and its Application to Question Generation




Journal Title

Journal ISSN

Volume Title




Reading comprehension can be analyzed from three points of view: Semantics, Assessment, and Cognition. Here, Semantics refers to the task of identifying discourse relations in text. Assessment involves utilizing these relations to obtain meaningful question-answer pairs. Cognition means categorizing questions according to their difficulty or complexity levels. This dissertation addresses how to leverage or design natural language processing tools to perform underlying tasks and ultimately craft a reading comprehension quiz for use in a classroom environment. Previous research has focused on mining shallow, sentence-level semantic relations and using them to craft intra-sentential, factoid questions. These are not very consequential in the context of large documents as they do not address how sentences coherently come together to comprise the full text. Discourse relations are capable of providing a comprehensive view of the text as they look beyond sentences. These relations focus on how sentences are logically and structurally linked to each other and provide a summarized, high-level overview of the document’s semantics. Likewise, one can expect inter-sentential questions generated using discourse relations to be deep and inferential that can test comprehension abilities like analysis of the document’s structure, identification of author’s intent, and evaluation of stated arguments, among others. Testing these abilities allows one to assess student interpretation of a text effectively. A deep multi-task learning framework is suggested that accurately deduces high-level discourse relations between text spans. The framework uses structure, syntax, and contextaware text representations that are robust enough to capture the document’s meaning and intent. A set of syntactic transformations and well-formed transformation templates convert relations into question-answer pairs: the proposed model generates questions that are grammatically valid and intricate enough to gauge text comprehension. Then, a rich, featuredriven classifier categorizes these questions according to their difficulty levels. Results obtain empirically show that inter-sentential questions that test the ability to deduce high-level semantic relations in the text are more complex and meaningful than intra-sentential ones. These modules are linked into a pipeline. The pipeline’s performance is evaluated on benchmark corpora and it is shown that this pipeline can generate high-quality question-answer pairs that are more purposeful than human-designed ones and ones obtained from previously designed systems. By enhancing reading comprehension datasets with such questions, one can hope to advance research in question answering and reading comprehension.



Reading comprehension, Discourse analysis, Question (Logic)