The Science of Deep Learning

Welcome to the Science of Deep Learning Reading Group! This fall at MIT (2022), we will be reading
papers relevant to understanding what deep neural networks do and why they work so well. While deep learning has
driven a huge amount of progress in AI in the last decade, this progress has primarily been the result of trial and error,
guided by loose heurististics, rather than by a principled understanding of these systems. We will be reading works which may help us gain
a more principled understanding of deep neural networks.

We are running three sections:

Wednesdays @ 6-8pm in2-151 , led by Eric MichaudFriday @ 7:30-8:30pm in4-146 , led by Eric Michaud and Alex Infanger

This reading group is being organized by Eric Michaud, Kaivu Hariharan, and Guilhermo Cutrim Costa, with support from the MIT AI Alignment Club (MAIA).

- Read the first chapter (Chapter 0) of Roberts and Yaida's textbook The Principles of Deep Learning Theory for a physicist's perspective on the task of understanding deep learning.
- Read Chris Olah's work Zoom In: An Introduction to Circuits, and this more recent twitter thread. If you want to look play around with some visualizations, check out the OpenAI Microscope tool.

- Read Jared Kaplan et al. Scaling Laws for Neural Language Models

- Read (or perhaps just skim) Sharma & Kaplan's Scaling Laws from the Data Manifold Dimension.
- Optional: Bahri et al.'s Explaining Neural Scaling Laws.
- Optional: Marcus Hutter's Learning Curve Theory
- Optional: Critical Truths About Power Laws

- Read Emergent Abilities of Large Language Models
- Read section 3.4 of Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

- Read Anthropic's In-context learning and induction heads

- Read OpenAI's Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
- Read Neel Nanda's Mechanistic Interpretability Analysis of Grokking or just his Twitter thread
- Optional: Liu et al.'s Towards Understanding Grokking: An Effective Theory of Representation Learning

- Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
- we'll want to do another one here, perhaps original NTK

- Opening the Black Box of Deep Neural Networks via Information
- Optional: On the information bottleneck theory of deep learning