Traditional Language Models
Traditional language models form the foundation of modern NLP and are crucial for understanding the evolution towards transformer-based architectures. This guide provides an overview of classical approaches to language modeling.
N-gram Language Models and Smoothing Techniques
- N-gram Models
- Probability calculations
- Maximum Likelihood Estimation
- Context window approaches
- Markov assumption
- Smoothing Methods
- Laplace (Add-1) smoothing
- Add-k smoothing
- Good-Turing smoothing
- Kneser-Ney smoothing
- Backoff and interpolation
- Evaluation Metrics
- Perplexity
- Cross-entropy
- Out-of-vocabulary handling
Feedforward Neural Language Models
- Architecture Overview
- Input representation
- Projection layer
- Hidden layers
- Output softmax
- Fixed context window
- Training Process
- Word embeddings
- Continuous space language models
- Handling vocabulary size
- Mini-batch training
- Limitations
- Fixed context size
- Lack of parameter sharing
- Scalability issues
Recurrent Neural Network Language Models
- Basic RNN Architecture
- Hidden state representation
- Time-step processing
- Backpropagation through time (BPTT)
- Vanishing/exploding gradients
Long Short-Term Memory (LSTM) Networks
- LSTM Components
- Input gate
- Forget gate
- Output gate
- Cell state
- Hidden state
Gated Recurrent Units (GRUs)
- GRU Architecture
- Reset gate
- Update gate
- Candidate activation
- Final activation
Bidirectional and Multilayer RNNs
- Architecture Types
- Bidirectional processing
- Deep RNN design
- Residual connections
- Layer normalization