Light

Dark

The Transformer Family

Short Summary about The Transformer Family

Mar 02, 2022 About 1 min

Vanila Transformer

vanila trm

self-attention is applied in each encoder and decoer.
cross-attention is applied between encoder and decoder.
dot(Query vector, Key vector) = attention score.
and then dot(attention score, Value vector) = attention value.
No long term dependency

Linformer

linformer

Reference

https://lilianweng.github.io/posts/2020-04-07-the-transformer-family/
Linformer