Thesis Statement Annotation and Identification in Student Essays
Automatic essay scoring (AES) is the task of using computer technology to evaluate the quality of written essays and give feedback. AES systems are one of the most important educational applications of natural language processing. They help teachers to grade student essays and give useful feedback for students to improve essay writing. In addition, they save plenty of manual effort in grading essays in standardized tests, such as TOEFL, SAT, GRE, and GMAT. Although there have been much study on developing AES systems, the task of automated essay scoring is far from being solved. Writing experts state that the application should provide an evaluation of the quality of the discourse structure in an essay. The thesis statements are one of the important discourse elements that reflects a students ability in writing essays. Therefore, we propose to use thesis strength scoring to evaluate the quality of a thesis statement. As an important dimension of essay quality, thesis strength has not been explored much by researchers. In this thesis, our contributions are three-fold. First, we publish a new dataset for thesis statement identification and thesis strength scoring. Second, we introduce new features that have not been used by other models for discourse element identification, including argumentative features, a cluster feature, a conclusion feature, a coverage feature, contextual features and sentence specificity features. Finally, we provide a learning based model for thesis statement identification on our new dataset.