Skip to main content

Exploratory Topic Modelling in Python

Topic modelling is a machine-learning technique that finds patterns in language use within a corpus of documents, and clusters those documents accordingly. The commonalities in ther patterns form “topics”, providing a way to automatically categorise documents by their structural content. The specific type of topic modelling covered in this resource is called Latent Dirichlet Allocation - ‘LDA’.

This EHRI notebook walks readers through the process of topic modelling transcripts obtained through the United States Holocaust Memorial Museum (USHMM) using Python and accompanies the article published in the European Holocaust Research Infrastructure (EHRI) Document Blog entitled “Exploratory Topic Modelling in Python”.

Learning outcomes

After viewing this training resource, users will be able to:

  • Understand the basic concepts of topic modelling
  • Walk through the process of topic modelling in Python.
Interested in learning more?

Check out Exploratory Topic Modelling in Python

Go to this resource

Cite as

Mike Bryant and Maria Dermentzi (2022). Exploratory Topic Modelling in Python. Version 1.0.0. EHRI. [Training module]. https://blog.ehri-project.eu/2022/07/19/exploratory-topic-modelling-in-python/

Reuse conditions

Resources hosted on DARIAH-Campus are subjects to the DARIAH-Campus Training Materials Reuse Charter

Full metadata

Title:
Exploratory Topic Modelling in Python
Authors:
Mike Bryant, Maria Dermentzi
Domain:
Social Sciences and Humanities
Language:
en
Published to DARIAH-Campus:
1/31/2023
Originally published:
6/21/2022
Content type:
Training module
Licence:
CCBY 4.0
Sources:
EHRI
Topics:
Data management, Digital Archives, eHeritage
Version:
1.0.0