Considers removing bias term of LogitLens.
Decomposes attention weights and see relation between positional encoding.
Decomposes the computation of LayerNormalization