1. Intro
A model related to language, model behind GPT, BERT, T5
2. What
A general model
- CNN for vision
- RNN for language, sequential
- hard to train
- always forget
Initially, trained for translation
- good for training
- huge dataset
3. How it work
- Positional Encoding
add a index for the word, easier to train
- Attention
looking for related word for translation
- Self-Attention
How to understand the language? by checking around word to determine the real meaning of a word