MIT Reading Group (Fall 2022):
The Science of Deep Learning

Welcome to the Science of Deep Learning Reading Group! This fall at MIT (2022), we will be reading papers relevant to understanding what deep neural networks do and why they work so well. While deep learning has driven a huge amount of progress in AI in the last decade, this progress has primarily been the result of trial and error, guided by loose heurististics, rather than by a principled understanding of these systems. We will be reading works which may help us gain a more principled understanding of deep neural networks.

We are running three sections:

Wednesdays @ 6-8pm in 2-151, led by Eric Michaud
Friday @ 7:30-8:30pm in 4-146, led by Eric Michaud and Alex Infanger

For sections which are two hours, the first hour will mostly be for reading and the second hour mostly for discussion. So if you've already done the reading, it may make sense to show up an hour late.

This reading group is being organized by Eric Michaud, Kaivu Hariharan, and Guilhermo Cutrim Costa, with support from the MIT AI Alignment Club (MAIA).

Syllabus

Week 1 - Outlook for the Science of Deep Learning

Is the task of understanding deep learning more like physics or biology?

Read the first chapter (Chapter 0) of Roberts and Yaida's textbook The Principles of Deep Learning Theory for a physicist's perspective on the task of understanding deep learning.
Read Chris Olah's work Zoom In: An Introduction to Circuits, and this more recent twitter thread. If you want to look play around with some visualizations, check out the OpenAI Microscope tool.

Week 2 - Scaling Laws

Bigger networks are better, predictably.

Read Jared Kaplan et al. Scaling Laws for Neural Language Models

Week 3 - Explanations of Scaling Laws

Why are bigger networks better, predictably?

Read (or perhaps just skim) Sharma & Kaplan's Scaling Laws from the Data Manifold Dimension.
Optional: Bahri et al.'s Explaining Neural Scaling Laws.
Optional: Marcus Hutter's Learning Curve Theory
Optional: Critical Truths About Power Laws

Week 4 - Emergent Capabilities

Neural network performance not so predictable after all?

Read Emergent Abilities of Large Language Models
Read section 3.4 of Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Week 5 - Induction bumps

A "phase change" in model performance during training

Read Anthropic's In-context learning and induction heads

Week 6 - Grokking

Neural Networks sometimes need a while to think

Read OpenAI's Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Read Neel Nanda's Mechanistic Interpretability Analysis of Grokking or just his Twitter thread
Optional: Liu et al.'s Towards Understanding Grokking: An Effective Theory of Representation Learning

Week 7 - Adversarial Examples

Deep neural networks are alien minds

Read Adversarial Examples Are Not Bugs, They Are Features

Week 8 - The Infinite-Width Limit

Super wide neural networks become simple

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
we'll want to do another one here, perhaps original NTK

Week 9 - Information theory

Compression vs fitting in deep networks

Week 10 - The Lottery Ticket Hypothesis

You could have trained a much smaller network, if only you initialized it right

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Week 11 - The Generalization Mystery

Neural Networks learn things which generalize when they could have memorized

Understanding deep learning requires rethinking generalization