0%

What is Transformer

1. Intro

A model related to language, model behind GPT, BERT, T5

2. What

A general model

  • CNN for vision
  • RNN for language, sequential
  • hard to train
  • always forget

Initially, trained for translation

  • good for training
  • huge dataset

3. How it work

  1. Positional Encoding

add a index for the word, easier to train

  1. Attention

looking for related word for translation

  1. Self-Attention

How to understand the language? by checking around word to determine the real meaning of a word

4. Use