My Personal Notes
Slides On Transformers (Attention Is All you need)
Content:
Maths + Visualization of each Transformer block
PyTorch code implementation
Class Notes On Introduction To Statistical Learning Theory
Content:
Basic concentration inequalities
Probably Approximately Correct (PAC) learning framework
Learning via Uniform Convergence
Vapnik Chervonenkis (VC) dimension and VC generalization bound
Survey On Generalization Bounds For Over-Parameterized Neural Networks
Abstract:
Modern neural networks are generally considered to be over-parameterized because they have significantly more number of parameters than the number of training examples needed to generalize well. In the regime of over-parameterization, we can analyse the performance of these neural networks as their width tends to infinity. The infinite width Neural Tangent Kernel (NTK) plays a crucial role in deriving generalization bounds for the networks. In this survey, we discuss two major results on the generalization bound for a 2-layer fully connected network (FCN) and for a deep FCN.
Content:
PAC learning a linear combination of k ReLU activations under the standard Gaussian distribution with respect to the square loss.
An efficient algorithm with run time which is polynomial in input dimension and target accuracy.
Content:
Standard results of Convergence of Stochastic Gradient Descent for convex and non-convex loss functions with fixed or diminishing step sizes.
Content:
Proposes an algorithm called Local SGD that runs independently in parallel on different workers and averages the sequences only once in a while.
Local SGD converges at the same rate as mini-batch SGD in terms of number of evaluated gradients
The number of communication rounds can be reduced up to a factor of T^(1/2) compared to mini-batch SGD where T is number of total steps.
Content:
A class of bilevel programming problems where the inner objective function is strongly convex.
An approximation algorithm for solving this class of problems with its finite-time convergence analysis under different convexity assumption on the outer objective function.
Deep Learning Theory by Matus Telgarsky
Theory of Deep Learning by Sanjeev Arora et al
The Principles of Deep Learning Theory by Daniel A. Roberts, Sho Yaida and Boris Hanin
Understanding Machine Learning from Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David
Foundations of Machine Learning by Mehryar Mohri, Afshin Rostamizadeh and Ameet Talwalkar
Provably Learning One-Hidden Layer ReLU Neural Networks
2024
A faster and simpler algorithm for learning shallow networks.
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials.
2023
2020
Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models.
Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks.
Generalization Bounds For Neural Networks
2024
2023
2022
2021
2020
A PAC-Bayesian Approach to Generalization Bounds for Graph Neural Networks.
Generalization bounds for deep convolutional neural networks
2019
Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks.
Generalization bounds for deep convolutional neural networks.
On Generalization Bounds of a Family of Recurrent Neural Networks.
Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
Reconciling modern machine learning practice and the bias-variance trade-off.
2018
Stronger generalization bounds for deep nets via a compression approach.
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers.
Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks.
On Tighter Generalization Bounds for Deep Neural Networks: CNNs, ResNets, and Beyond.
ICLR: Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach
2017
2016
2015
Optimization & Learning Of Neural Networks
2024
2023
COLT: Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron.
COLT: SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics.
2022
2021
2020
2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.
ICLR: Gradient Descent Maximizes the Margin of Homogeneous Neural Networks.
NeurIPS: On the Convergence Rate of Training Recurrent Neural Networks.
2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks.
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks.
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks.
Gradient Descent Finds Global Minima of Deep Neural Networks.
ICML: A Convergence Theory for Deep Learning via Over-Parameterization
2017
Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs.
NeurIPS: Convergence Analysis of Two-layer Neural Networks with ReLU Activation.
NeurIPS: Gradient descent GAN optimization is locally stable
2016
Neural Tangent Kernel
2024
2023
2022
2021
2020
NeurIPS: A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks.
Infinite attention: NNGP and NTK for deep attention networks.
2019
Neural Tangents: Fast and Easy Infinite Neural Networks in Python.
NeurIPS: Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels.
Finite Depth and Width Corrections to the Neural Tangent Kernel
2018
Tensor Programs
2023
Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit.
Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks.
2022
2021
2020
2019