In our last post, we explored generalised and self-attention. So now let's move forward and learn about more types of attention mechanisms.
(In case you missed part 1)
Multi-head attention is a transformer model of attention mechanism. When the attention module repeats its computations over several iterations, each computation forms parallel layers known as attention heads.
Each separate head independently passes the input sequence and corresponding output sequence element through a separate head.
A final attention score is produced by combining attention scores at each head so that every nuance of the input sequence is taken into consideration.
This type of attention is also known as the Bahdanau attention mechanism. It makes use of attention alignment scores based on a number of factors.
These alignment scores are calculated at different points in a neural network.
Source or input sequence words are correlated with target or output sequence words but not to an exact degree.
This correlation takes into account all hidden states, and the final alignment score is the summation of the matrix of alignment scores.
This type of attention mechanism is also referred to as the Luong mechanism. This is a multiplicative attention model, which is an improvement over the Bahdanau model.
In situations where neural machine translations are required, the Luong model can either attend to all source words or predict the target sentence, thereby attending to a smaller subset of words.
While both the global and local attention models are equally viable, the context vectors used in each method tend to differ.
AI already creates software, hardware is next. Several companies like Circuitmind, Cells and JITX are starting to use AI to do hardware design.
Voxels vs Polygons I like this paper on generating 3D voxels based objects: https://alexzhou907.github.io/pvd As compared to polygon based models, I think voxels are a more accurate way of modeling actual 3D objects. Also seems closer to how 3D printing would work.