Learning Resources
A comprehensive collection of learning resources organized to match the LLM development roadmap structure.
Part 1: The Foundations π
π― Focus: Core ML concepts, neural networks, traditional models, tokenization, embeddings, transformers
π Difficulty: Beginner to Intermediate
π Outcome: Solid foundation in ML/NLP fundamentals and transformer architecture
Prerequisites
Mathematics & Statistics:
- 3Blue1Brown - Essence of Linear Algebra
- StatQuest - Statistics Fundamentals
- AP Statistics Intuition
- Immersive Linear Algebra
- Khan Academy - Linear Algebra
- Khan Academy - Calculus
- Khan Academy - Probability and Statistics
- Matrix Calculus Notes
- Review of Differential Calculus
- Derivatives, Backpropagation, and Vectorization
Programming & Python:
Books:
- Speech and Language Processing (2024 pre-release)
- Natural Language Processing
- A Primer on Neural Network Models for Natural Language Processing
- Natural Language Processing with PyTorch
- Natural Language Processing with Transformers
- Generative AI with LangChain
- Build a Large Language Model (From Scratch)
- LLMs-from-scratch
- BUILD GPT: HOW AI WORKS
- Hands-On Large Language Models
- The Chinese Book for Large Language Models
Machine Learning Fundamentals:
- Machine Learning for Everybody
- Udacity - Intro to Machine Learning
- A Course in Machine Learning
- Caltech CS156: Learning from Data
- Stanford CS229: Machine Learning
- Stanford CS224n: Natural Language Processing with Deep Learning
- Oxford Deep NLP
- Making Friends with Machine Learning
- Applied Machine Learning
- Introduction to Machine Learning (TΓΌbingen)
- Machine Learning Lecture (Stefan Harmeling)
- Statistical Machine Learning (TΓΌbingen)
- Probabilistic Machine Learning
- MIT 6.S897: Machine Learning for Healthcare (2019)
- Machine Learning with Graphs (Stanford)
Deep Learning Basics:
- 3Blue1Brown - But What is a Neural Network?
- Deep Learning Crash Course
- Fast.ai - Practical Deep Learning
- Patrick Loeber - PyTorch Posts
- Deep Learning Book
- Neural Networks and Deep Learning
- Introduction to Deep Learning
- Neural Networks: Zero to Hero
- Andrej Karpathy Series
- Umar Jamil Series
- Letβs build GPT: from scratch, in code, spelled out
- State of GPT
- MIT: Deep Learning for Art, Aesthetics, and Creativity
- Stanford CS230: Deep Learning (2018)
- Introduction to Deep Learning (MIT)
- CMU Introduction to Deep Learning (11-785)
- Deep Learning: CS 182
- Deep Unsupervised Learning
- NYU Deep Learning SP21
- Foundation Models
- Deep Learning (TΓΌbingen)
- Introduction to Deep Learning and Deep Generative Models
- Parallel Computing and Scientific Machine Learning
1. Neural Networks Foundations for LLMs
π Difficulty: Intermediate | π― Prerequisites: Calculus, linear algebra |
Core Textbooks & Courses:
Mathematical Foundations:
Essential Papers & Articles:
- Learning Representations by Backpropagating Errors
- Yes You Should Understand Backprop
- Natural Language Processing (Almost) from Scratch
2. Traditional Language Models
π Difficulty: Intermediate | π― Prerequisites: Probability, statistics |
Core Textbooks:
N-gram Models:
RNN & LSTM Resources:
- Understanding LSTM Networks
- The Unreasonable Effectiveness of Recurrent Neural Networks
- Sequence Modeling: Recurrent and Recursive Neural Nets
- Vanishing Gradients Jupyter Notebook
- RealPython - NLP with spaCy
- Kaggle - NLP Guide
- Jake Tae - PyTorch RNN from Scratch
Foundational Papers:
- Neural Probabilistic Language Models
- Empirical Evaluation of Gated RNNs
- Learning Long-term Dependencies with Gradient Descent is Difficult
- On the Difficulty of Training Recurrent Neural Networks
Historical Context:
Dependency Parsing:
- Dependency Parsing
- Jurafsky & Martin Chapter 19
- Incrementality in Deterministic Dependency Parsing
- A Fast and Accurate Dependency Parser using Neural Networks
- Globally Normalized Transition-Based Neural Networks
- Universal Stanford Dependencies
- Universal Dependencies Website
3. Tokenization
π Difficulty: Beginner | π― Prerequisites: Python basics |
Core Concepts & Posts:
- Introduction to Tokenization: A Theoretical Perspective
- Understanding BPE Tokenization
- Fast Tokenizers: How Rust is Turbocharging NLP
- Letβs build the GPT Tokenizer
- minbpe
- minbpe
Hands-On Implementations:
- Tokenization Techniques
- GPT Tokenizer Implementation
- Build and Push a Tokenizer
- Tokenizer Comparison
- Hugging Face Tokenizers
- New Tokenizer Training
Interactive Tools:
Libraries & Documentation:
Research Papers:
- BPE Research Paper
- RadarLLM: Cross-Modal Tokenization
- CoreMatching: Token-Neuron Synergy
- MOM: Memory-Efficient Token Handling
4. Embeddings
π Difficulty: Beginner-Intermediate | π― Prerequisites: Linear algebra, Python |
Core Concepts & Posts:
- Word Embeddings Deep Dive
- Contextual Embedding Guide
- Sentence Embedding Techniques
- Illustrated Word2Vec
- The Illustrated BERT, ELMo, and co.
- CS224N Lecture 1 - Word Vectors
- Lena Voita - Word Embeddings
Hands-On Implementations:
- Interactive Word2Vec Posts
- Word2vec from Scratch
- Training Sentence Transformers
- Sentence Transformers
- MTEB Leaderboard
Foundational Papers:
Advanced Topics:
- Multilingual BERT
- Bias in Embeddings
- Contextual Word Representations: A Contextual Introduction
- Evaluation Methods for Unsupervised Word Embeddings
- Improving Distributional Similarity with Lessons Learned from Word Embeddings
5. The Transformer Architecture
π Difficulty: Advanced | π― Prerequisites: Neural networks, linear algebra |
Foundational Paper:
Visual Explanations:
- The Illustrated Transformer
- The Annotated Transformer
- Transformer (Google AI Posts)
- Visual Intro to Transformers
- LLM Visualization
- nanoGPT
- Attention? Attention!
- Decoding Strategies in LLMs
- Stanford CS25 - Transformers United
- CS25-Transformers United
- CS324 - Large Language Models
- UWaterloo CS 886: Recent Advances on Foundation Models
- Princeton: Understanding Large Language Models
- XCS224U: Natural Language Understanding (2023)
- NLP Course (Hugging Face)
- CS224N: Natural Language Processing with Deep Learning
- CMU Neural Networks for NLP
- CS224U: Natural Language Understanding
- CMU Advanced NLP 2021
- CMU Advanced NLP 2022
- CMU Advanced NLP 2024
- Multilingual NLP 2020
- Multilingual NLP 2022
- Advanced NLP
Technical Deep Dives:
Implementation Posts:
Textbook Resources:
Applications & Extensions:
Part 2: Building & Training Models π§¬
π― Focus: Data preparation, pre-training, fine-tuning, preference alignment
π Difficulty: Intermediate to Advanced
π Outcome: Ability to train and fine-tune language models from scratch
π― Learning Objectives: Learn to prepare high-quality datasets, implement distributed pre-training, create instruction datasets, perform supervised fine-tuning, and align models with human preferences using advanced techniques like RLHF and DPO.
6. Data Preparation
π Difficulty: Intermediate | π― Prerequisites: Python, SQL |
Data Collection & Scraping:
- Common Crawl Documentation
- Beautiful Soup Documentation
- Scrapy Documentation
- Hugging Face Datasets Guide
Data Processing Libraries:
Data Quality & Ethics:
Text Preprocessing:
Version Control & Management:
LLM-Specific Resources:
7. Pre-Training Large Language Models
π Difficulty: Expert | π― Prerequisites: Transformers, distributed systems |
Foundational Understanding:
Video Resources:
Key Research Papers:
Training Frameworks & Tools:
8. Post-Training Datasets (for Fine-Tuning)
π Difficulty: Intermediate | π― Prerequisites: Data preparation |
Instruction Datasets:
Conversation Datasets:
Preference & RLHF Datasets:
Question Answering:
Resources:
9. Supervised Fine-Tuning (SFT)
π Difficulty: Advanced | π― Prerequisites: Pre-training basics |
Libraries & Tools:
Research Papers:
Implementation Examples:
Posts:
- Fine-tune Llama 3.1 with Unsloth
- Fine-tune Llama 3 with ORPO
- Fine-tune Mistral-7b with DPO
- Fine-tune Mistral-7b with QLoRA
- Fine-tune CodeLlama using Axolotl
- Fine-tune Llama 2 with QLoRA
- Mastering LLMs
- LoRA Insights
- ChatGPT Prompt Engineering
Parameter-Efficient Methods:
- Parameter-Efficient Transfer Learning for NLP
- The Lottery Ticket Hypothesis
- Few-Shot Learning
- Chain-of-Thought Prompting
- Practical Methodology
10. Preference Alignment (RL Fine-Tuning)
π Difficulty: Expert | π― Prerequisites: Reinforcement learning basics |
Libraries & Frameworks:
Core RLHF Papers:
Constitutional AI & Safety:
Scaling & Evaluation:
- Scaling Instruction-Finetuned Language Models
- AlpacaFarm: A Simulation Framework
- How Far Can Camels Go?
Learning Resources:
- Illustrating RLHF
- LLM Training: RLHF and Alternatives
- Preference Tuning LLMs
- Fine-tune with DPO
- Fine-tune with GRPO
- DPO Wandb Logs
- Why you should work on AI AGENTS!
- Deep Reinforcement Learning
- Reinforcement Learning Lecture Series (DeepMind)
- Reinforcement Learning (Polytechnique Montreal, Fall 2021)
- Foundations of Deep RL
- Stanford CS234: Reinforcement Learning
- Advanced Robotics: UC Berkeley
- Stanford CS330: Deep Multi-Task and Meta Learning
Part 3: Advanced Topics & Specialization βοΈ
π― Focus: Evaluation, reasoning, optimization, architectures, enhancement
π Difficulty: Expert/Research Level
π Outcome: Research credentials, publications, and ability to lead theoretical advances
π― Learning Objectives: This advanced track develops research-grade expertise in LLM evaluation, reasoning enhancement, model optimization, novel architectures, and model enhancement techniques for cutting-edge research and development.
11. Model Evaluation
π Difficulty: Intermediate | π― Prerequisites: Statistics, model training |
Standard Benchmarks:
Evaluation Frameworks:
- HELM
- EleutherAI Evaluation Harness
- AlpacaEval
- Evaluation Guidebook
- Open LLM Leaderboard
- Language Model Evaluation Harness
- Lighteval
- Chatbot Arena
- Ragas
- DeepEval
Specialized Evaluation:
- TruthfulQA
- OpenBookQA
- WebShop: Scalable Real-World Web Interaction
- SWE-bench: GitHub Issues Resolution
- Tau-bench: Tool-Agent-User Interaction
LLM-as-Judge:
Research & Methodology:
- Challenges and Opportunities in NLP Benchmarking
- Measuring Massive Multitask Language Understanding
- Holistic Evaluation of Language Models
12. Reasoning
π Difficulty: Intermediate | π― Prerequisites: Prompt engineering |
Core Reasoning Papers:
Tool Use & Action:
Evaluation Datasets:
Advanced Reasoning Systems:
- Learning to Reason with LLMs
- OpenAI o1 System Card
- DeepSeek-R1: Reinforcement Learning for Reasoning
Resources:
13. Quantization
π Difficulty: Intermediate | π― Prerequisites: Model optimization |
Quantization Libraries:
Advanced Quantization Methods:
Formats & Standards:
Learning Resources:
- 4-bit LLM Quantization with GPTQ
- Quantize Llama models with llama.cpp
- ExLlamaV2: Fastest Library to Run LLMs
- Understanding AWQ
- SmoothQuant on Llama 2
- DeepSpeed Model Compression
- GPTQ Paper
- AWQ Paper
14. Inference Optimization
π Difficulty: Advanced | π― Prerequisites: Model deployment |
High-Performance Inference Engines:
Attention Optimization:
Advanced Techniques:
Learning Resources:
- Optimizing Latency
- GPU Inference
- LLM Inference Best Practices
- Optimizing LLMs for Speed and Memory
- Assisted Generation
15. Model Architecture Variants
π Difficulty: Advanced | π― Prerequisites: Transformer architecture |
Sparse & Efficient Architectures:
State Space Models:
Long Context Models:
Positional Encodings:
16. Model Enhancement
π Difficulty: Advanced | π― Prerequisites: Model training, optimization |
Context Window Extension:
Model Merging & Composition:
Knowledge Transfer:
Learning Resources:
- Merge LLMs with MergeKit
- Create MoEs with MergeKit
- Uncensor any LLM with Abliteration
- Smol Vision
- Large Multimodal Models
Part 4: Engineering & Applications π
π― Focus: Production deployment, RAG, agents, multimodal, security, ops
π Difficulty: Intermediate to Advanced
π Outcome: Production-ready LLM applications and systems at scale
π― Learning Objectives: This production-focused track teaches deployment optimization, inference acceleration, application development with RAG systems and agents, multimodal integration, LLMOps implementation, and responsible AI practices for scalable LLM solutions.
17. Running LLMs & Building Applications
π Difficulty: Intermediate | π― Prerequisites: Web development, APIs |
Web Frameworks:
LLM APIs:
- OpenAI API
- Anthropic API
- Hugging Face Inference API
- Google Vertex AI
- OpenRouter
- Together AI
- Hugging Face Hub
- Hugging Face Spaces
Local LLM Tools:
Development Tools:
- LLM AutoEval
- LazyMergekit
- LazyAxolotl
- AutoQuant
- Model Family Tree
- ZeroSpace
- AutoAbliteration
- AutoDedup
Technologies:
- Supabase - Database, authentication, storage, and realtime
- LangChain - Building RAG pipelines
- PostHog - Analytics
- FastAPI - Backend framework
- Next.js - Frontend framework
- Resend - Email service
- LiteLLM - LLM compatibility layer
- Ollama - Local LLM serving
- Mistral AI - Open source LLMs
Educational Platforms:
Learning Resources:
- Run LLM with LM Studio
- Prompt Engineering Guide
- Outlines - Quickstart
- LMQL Overview
- Streamlit - Build LLM App
- HF LLM Inference Container
- Philschmid Posts
- SkyPilot
- LLMOps: Building Real-World Applications With Large Language Models
- Evaluating and Debugging Generative AI
- LangChain for LLM Application Development
- LangChain: Chat with Your Data
- Building Systems with the ChatGPT API
- LangChain & Vector Databases in Production
- Building LLM-Powered Apps
- Full Stack LLM Bootcamp
- Full Stack Deep Learning
- Practical Deep Learning for Coders - Part 1
- Practical Deep Learning for Coders - Part 2
- Stanford MLSys Seminars
- Machine Learning Engineering for Production (MLOps)
- MIT Introduction to Data-Centric AI
- llm-course
- GPT in 60 Lines of NumPy
- femtoGPT
- Master and Build Large Language Models
- Test Yourself On Build a Large Language Model (From Scratch)
- PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs
18. Retrieval Augmented Generation (RAG)
π Difficulty: Advanced | π― Prerequisites: Embeddings, databases |
RAG Frameworks:
Vector Databases:
Graph RAG:
Foundational RAG Papers:
Advanced RAG Research:
- Lost in the Middle: How Language Models Use Long Contexts
- In-Context Retrieval-Augmented Language Models
- Scaling Retrieval-Based Language Models
- SILO Language Models
Question Answering:
- SQuAD: 100,000+ Questions for Machine Comprehension
- Bidirectional Attention Flow for Machine Comprehension
- Adversarial Examples for Evaluating Reading Comprehension
Learning Resources:
- LangChain Text Splitters
- Top 7 Vector Databases
- LlamaIndex High-level Concepts
- Model Context Protocol
- Pinecone Retrieval Augmentation
- LangChain Q&A with RAG
- LangChain Memory Types
- RAG Pipeline Metrics
- LangChain Query Construction
- LangChain SQL Posts
- Applying OpenAIβs RAG
- DSPy in 8 Steps
- Improve ChatGPT with Knowledge Graphs
- RAG-Fusion
- DSPy
19. Tool Use & AI Agents
π Difficulty: Advanced | π― Prerequisites: Function calling, planning |
Agent Frameworks:
Function Calling & Tools:
Microsoft Frameworks:
Learning Resources:
20. Multimodal LLMs
π Difficulty: Advanced | π― Prerequisites: Computer vision, audio processing |
Vision-Language Models:
Audio Processing:
Image Generation:
Processing Libraries:
Learning Resources:
- Large Multimodal Models
- Smol Vision
- CS231N: Convolutional Neural Networks for Visual Recognition
- Deep Learning for Computer Vision
- Deep Learning for Computer Vision (DL4CV)
- Deep Learning for Computer Vision (neuralearn.ai)
- AMMI Geometric Deep Learning Course
21. Securing LLMs & Responsible AI (Optional)
π Difficulty: Advanced | π― Prerequisites: Security fundamentals, ethical AI |
Security Frameworks:
Attack Vectors & Defense:
Safety & Evaluation:
Privacy Protection:
Learning Resources:
Interpretability Research:
- BERT Rediscovers the Classical NLP Pipeline
- Axiomatic Attribution for Deep Networks
- Faithful, Interpretable Model Explanations via Causal Abstraction
- Investigating Gender Bias Using Causal Mediation Analysis
22. Large Language Model Operations (LLMOps)
π Difficulty: Advanced | π― Prerequisites: DevOps, MLOps, cloud platforms |
MLOps Platforms:
Infrastructure & Orchestration:
Monitoring & Observability:
Data Processing:
CI/CD & Model Management: