Week 15: Corpus & Machine Assisted Research

Traditionally a corpus is a collection of language examples: written or spoken examples of words, sentences, phrases or texts. Nowadays a corpus can be any collection of examples, for example, human-human interactions, protoin interaction, video fragments, maintenance information, etc. A corpus is collected in order to learn from it, that is, to extract domain-specific information. Examples can be analysed and rules and models underlying the examples can be discovered. Machine learning algorithms are used to extract relationships between examples. Manual structuring of such data (annotation) allows the integration of human preferences and knowledge in machine learning algorithms.

Computer-assisted (or aided) qualitative data analysis software (CAQDAS) offers tools that assist with qualitative research such as transcription analysis, coding and text interpretation, recursive abstraction, content analysis, discourse analysis, grounded theory methodology, etc