Tutorial at ICWSM 2022
BERT for Social Sciences & Humanities
Applying Large Language Models to Social Media Data
Join us for a new version of our tutorial at ICWSM, either in person or virtually!
June 6th, 2022
Word Similarity Notebook: https://bit.ly/icwsm-bert-similarity
Classification Notebook: https://bit.ly/icwsm-bert-classify
In this interactive tutorial, we will introduce participants to large language models that are now common in natural language processing (NLP). We will focus on variants of the popular Bidirectional Encoder Representations from Transformers (BERT) model (Devlin et al., 2018). This family of pre-trained models performs well across a wide range of NLP tasks, but their use poses challenges for researchers in other disciplines. This tutorial will highlight opportunities for social media researchers, from the humanities and social sciences, to take advantage of these large models.
Participants will gain hands-on experience with downloading and setting up a pre-trained model, using BERT to analyze words in context, adapting or “fine-tuning” a BERT model to perform better on a curated dataset, and using the fine-tuned model for classification tasks. We will also discuss practical details, like how to run these large models using free resources and which open libraries to use. Most importantly, we will discuss nuances of these models that are most relevant for researchers outside of NLP, including example use cases and exploratory uses of these models; limits to these methods and common errors; using datasets of varying sizes, including small, curated collections; and data processing and tokenization choices.
An internet connection and a web browser.
2 hours total
- 30 minutes lecture
- 20 minutes coding together in a collaborative notebook
- 10 minutes break
- 20 minutes lecture
- 30 minutes coding together in a collaborative notebook
- 10 minutes questions
The target audience for this tutorial includes social scientists and scholars studying social media who are comfortable programming and perhaps have some familiarity with machine learning but who have not yet had the opportunity to use large language models like BERT.