...
LogitLens without bias

Considers removing bias term of LogitLens.

...
Attention Bias and Positional Encoding (GPT2)

Decomposes attention weights and see relation between positional encoding.

...
Linearity of LayerNormalization

Decomposes the computation of LayerNormalization