Tutorial at ICWSM 2022

BERT for Social Sciences & Humanities
Applying Large Language Models to Social Media Data

Join us for a new version of our tutorial at ICWSM, either in person or virtually!

June 6th, 2022
Register Here

Hosted by Maria Antoniak, Melanie Walsh, David Mimno, and Matthew Wilkens

Materials

Slides: https://bit.ly/icwsm-bert-slides

Word Similarity Notebook: https://bit.ly/icwsm-bert-similarity

Classification Notebook: https://bit.ly/icwsm-bert-classify

Description

In this interactive tutorial, we will introduce participants to large language models that are now common in natural language processing (NLP). We will focus on variants of the popular Bidirectional Encoder Representations from Transformers (BERT) model (Devlin et al., 2018). This family of pre-trained models performs well across a wide range of NLP tasks, but their use poses challenges for researchers in other disciplines. This tutorial will highlight opportunities for social media researchers, from the humanities and social sciences, to take advantage of these large models.

Participants will gain hands-on experience with downloading and setting up a pre-trained model, using BERT to analyze words in context, adapting or “fine-tuning” a BERT model to perform better on a curated dataset, and using the fine-tuned model for classification tasks. We will also discuss practical details, like how to run these large models using free resources and which open libraries to use. Most importantly, we will discuss nuances of these models that are most relevant for researchers outside of NLP, including example use cases and exploratory uses of these models; limits to these methods and common errors; using datasets of varying sizes, including small, curated collections; and data processing and tokenization choices.

Materials

An internet connection and a web browser.

Schedule

2 hours total

  • 30 minutes lecture
  • 20 minutes coding together in a collaborative notebook
  • 10 minutes break
  • 20 minutes lecture
  • 30 minutes coding together in a collaborative notebook
  • 10 minutes questions

Target Audience

The target audience for this tutorial includes social scientists and scholars studying social media who are comfortable programming and perhaps have some familiarity with machine learning but who have not yet had the opportunity to use large language models like BERT.