Review Note
Last Update: 04/08/2024 10:23 AM
Current Deck: Deeplearning(week 3 & 4)::week 4
PublishedCurrently Published Content
Text
Self-Attention

For each item in the input sequence we compute three things. What are those? {{c1::For each item in the input sequence we compute a Query, Key and Value. These vectors are obtained by multiplying the input by learned weight matrices.}}
What do we get when we perform the row-wise softmax?
{{c2::The softmax gives a probability distribution. This distribution represents the attention weights.}}

For each item in the input sequence we compute three things. What are those? {{c1::For each item in the input sequence we compute a Query, Key and Value. These vectors are obtained by multiplying the input by learned weight matrices.}}
What do we get when we perform the row-wise softmax?
{{c2::The softmax gives a probability distribution. This distribution represents the attention weights.}}
Extra
Current Tags:
Pending Suggestions
No pending suggestions for this note.