Mechanistic Interpretability

[Franco et al., 2026]
 
Show BibTeX entry
Gabriel Franco, Carson Loughridge and Mark Crovella (2026).
Singular Vectors of Attention Heads Align with Features.
Technical Report. doi:10.48550/arXiv.2602.13524
[Franco et al., 2026]
 
Show BibTeX entry
Gabriel Franco, Lucas M. Tassis, Azalea Rohr and Mark Crovella (2026).
Finding Highly Interpretable Prompt-Specific Circuits in Language Models.
Technical Report. doi:10.48550/arXiv.2602.13483
[Franco and Crovella, 2025]
 
Show BibTeX entry
Gabriel Franco and Mark Crovella (2025).
Pinpointing Attention-Causal Communication in Language Models.
In: Proceedings of NeurIPS. San Diego, CA. Also appeared in Mechanistic Interpretability Workshop at NeurIPS 2025. doi:TBD
[Franco and Crovella, 2024]
 
Show BibTeX entry
Gabriel Franco and Mark Crovella (2024).
Sparse Attention Decomposition Applied to Circuit Tracing.
Technical Report Nr. 2410.00340. doi:10.48550/arXiv.2410.00340