Below you can find code tutorials for working with large language models.

Most of the code tutorials are written in Python and use the popular transformers library from Hugging Face. Additionally, most of the tutorials are written in Colab notebooks because Colab offers free (and optionally paid) access to GPUs — a special kind of hardware that is often required when using LLMs.

Word and Document Embeddings


Measuring Document Similarity with LLMs

📔 [Colab Notebook]

This code notebook demonstrates how you can use LLMs to explore which texts, or documents, are similar to each other in a given dataset. We explore narrative vs. non-narrative texts, historical poetry, and ChatGPT-generated poetry.


Measuring Word Similarity with BERT

📔 [Full Colab Notebook] [Demo with Results Only]

This code notebook demonstrates how to use a pre-trained BERT model to measure word similarity.

In this example, we look for words that have a similar vector to a query word from a collection of poems. The results are illustrative of what BERT vectors represent, but also of the limitations of the tokenization scheme that it uses.


Measuring Word Similarity with BERT (Spanish)

📔 [Full Colab Notebook Coming Soon!] [Demo with Results Only]


Text Classification


Zero-Shot Prompting with LLMs

📔 [Colab Notebook]

This code notebook demonstrate how users can set up a zero-shot classification task with an LLM and how they can evaluate different prompting strategies. With the current batch of powerful language models, this zero-short paradigm should be the first thing to try when evaluating a new task.

In this tutorial, we specifically explore how you can prompt a model to predict the genre of a book based on its Goodreads review and to predict whether a given passage is narrative or non-narrative text. But you should be able to use and modify this workflow for your own text classification needs.


Training and Fine-Tuning BERT for Classification: Classifying Goodreads Reviews By Book Genre

📔 [Colab Notebook]

This code notebook demonstrates how users can train and fine-tune a BERT model for text classification. We fine-tune a BERT model on Goodreads reviews from the UCSD Book Graph with the goal of predicting the genre of the book being reviewed.