General information

Module title Text Processing
Language English
Module lecturer Barbara Konat
Lecturer's email
Lecturer position Adiunkt
Faculty Faculty of Psychology and Cognitive Science
Semester 2021/2022 (winter)
Duration 60
USOS code 23-PIE-TPR


Module aim (aims)

Text processing laboratory develops participants' knowledge and skills in Natural Language Processing, focusing on textual data analysis such as: text preprocessing (collecting and cleaning the data), analysis (word counts, statistics, topical modelling). We will also cover topics from language engineering: machine learning models and pipeline construction.A1 Introduction of NLP methods for Text Processing.A2 Development of students' programming skills.

Pre-requisites in terms of knowledge, skills and social competences (where relevant)

Basic programming skills in Python. Fundamental concepts of linguistics.


Module’s educational effects:After passing the module, a student:- Has familiarity with processing stages of NLP- Can create text corpus using correct methodology for sampling and annotation - Can perform manual corpus annotation and calculate Inter Annotator Agreement - Can write simple computer programme for annotated corpora analysis - Uses available literature and other resources for further development of skills and knowledge - Has familiarity with fundamental concepts of computational linguistics and can use them in written text- Has ability to organize information and to draw conclusions K_K02Topics covered1. Elements of text processing pipeline 2.Basic tools for text processing (e.g. Python - NLTK, Java OpenNLP) 3. Corpus creation with simple binary categories (e.g. positive and negative opinions, spam and not-spam e-mails) 4. Classifier training for binary categories (elements of machine learning) 5. Classifier evaluation (F-1 score) 6. Corpora for discourse analysis – data collection and annotation schemes (e.g. Rhetorical Structure Theory, Argument Interchange Format) 7. Annotation tools for discourse 8. Collecting and annotating corpora for discourse analysis 9. Classifier training for complex discoursive properties 10. Classifier evaluation methods for multi-class discourse corporaEvaluation:Report 1 (Text classifier): 20 pointsReport 2 (Discourse processing): 20 pointsIn-class activity: 20 pointsScale:55 - 60 points: 550 - 54 points: 4,545-49 points: 440-44 points: 3,535-39 points: 30-34 points: 2

Reading list

Ingersoll, Grant S., Thomas S. Morton, and Andrew L. Farris. Taming text: how to find, organize, and manipulate it. Manning Publications Co., 2013.Apache OpenNLP Developer Documentation Natural Language Processing with Python – NLTK Stede, Manfred. "Discourse processing." Synthesis Lectures on Human Language Technologies, 2011.Janier, Mathilde, and Patrick Saint-Dizier. Argument Mining: Linguistic Foundations. John Wiley & Sons, 2019.