Text Processing - AMU-PIE Platform

General information

Course type	AMUPIE
Module title	Text Processing
Language	English
Module lecturer	Barbara Konat
Lecturer's email	bkonat@amu.edu.pl
Lecturer position	Adiunkt
Faculty	Faculty of Psychology and Cognitive Science
Semester	2021/2022 (winter)
Duration	60
ECTS	8
USOS code	23-PIE-TPR

Timetable

Module aim (aims)

Text processing laboratory develops participants' knowledge and skills in Natural Language Processing, focusing on textual data analysis such as: text preprocessing (collecting and cleaning the data), analysis (word counts, statistics, topical modelling). We will also cover topics from language engineering: machine learning models and pipeline construction.A1 Introduction of NLP methods for Text Processing.A2 Development of students' programming skills.

Pre-requisites in terms of knowledge, skills and social competences (where relevant)

Basic programming skills in Python. Fundamental concepts of linguistics.

Syllabus

Module’s educational effects:After passing the module, a student:- Has familiarity with processing stages of NLP- Can create text corpus using correct methodology for sampling and annotation - Can perform manual corpus annotation and calculate Inter Annotator Agreement - Can write simple computer programme for annotated corpora analysis - Uses available literature and other resources for further development of skills and knowledge - Has familiarity with fundamental concepts of computational linguistics and can use them in written text- Has ability to organize information and to draw conclusions K_K02Topics covered1. Elements of text processing pipeline 2.Basic tools for text processing (e.g. Python - NLTK, Java OpenNLP) 3. Corpus creation with simple binary categories (e.g. positive and negative opinions, spam and not-spam e-mails) 4. Classifier training for binary categories (elements of machine learning) 5. Classifier evaluation (F-1 score) 6. Corpora for discourse analysis – data collection and annotation schemes (e.g. Rhetorical Structure Theory, Argument Interchange Format) 7. Annotation tools for discourse 8. Collecting and annotating corpora for discourse analysis 9. Classifier training for complex discoursive properties 10. Classifier evaluation methods for multi-class discourse corporaEvaluation:Report 1 (Text classifier): 20 pointsReport 2 (Discourse processing): 20 pointsIn-class activity: 20 pointsScale:55 - 60 points: 550 - 54 points: 4,545-49 points: 440-44 points: 3,535-39 points: 30-34 points: 2

Reading list

Ingersoll, Grant S., Thomas S. Morton, and Andrew L. Farris. Taming text: how to find, organize, and manipulate it. Manning Publications Co., 2013.Apache OpenNLP Developer Documentationhttps://opennlp.apache.org/documentation/1.7.0/manual/opennlp.html Natural Language Processing with Python – NLTK http://www.nltk.org/book/ Stede, Manfred. "Discourse processing." Synthesis Lectures on Human Language Technologies, 2011.Janier, Mathilde, and Patrick Saint-Dizier. Argument Mining: Linguistic Foundations. John Wiley & Sons, 2019.