Pivotal Token Search
-
Updated
Jul 15, 2025 - Python
Pivotal Token Search
Adversarial Manipulation of CoT
Analysed determinism, faithfulness, reasoning patterns, & steering. Developed and tested methods to enhance control and fail-safes
Mechanistic analysis of Chain-of-Thought faithfulness using GPT-2 Small
Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.
My AI interpretability research journey
Unofficial implementation to reproduce the experiments from "Superposition as a Phase Change" of "Toy Models of Superposition".
Add a description, image, and links to the mech-interp topic page so that developers can more easily learn about it.
To associate your repository with the mech-interp topic, visit your repo's landing page and select "manage topics."