Arthur Conmy

160

120

2022202320245 141 132

Alexandre VariengienENS de Lyon & EPFLVerified email at ens-lyon.fr
Jacob SteinhardtStanford UniversityVerified email at cs.stanford.edu
Adrià Garriga-AlonsoResearch Scientist, FAR AIVerified email at far.ai
Stefan HeimersheimInstitute of Astronomy, University of CambridgeVerified email at cam.ac.uk
Aengus LynchPhD Student, University College LondonVerified email at ucl.ac.uk
Neel NandaResearch Engineer, Google DeepMindVerified email at deepmind.com
Cody RushingUniversity of Texas at AustinVerified email at utexas.edu
Thomas McGrathResearch Scientist, DeepMindVerified email at google.com
Rowan WangVerified email at rdwrs.com
Aaquib SyedStudent, University of MarylandVerified email at umd.edu
Rhys GouldMathematics Undergraduate, University of CambridgeVerified email at cam.ac.uk
Euan OngResearch Assistant, University of CambridgeVerified email at cam.ac.uk
Nicholas CarliniGoogle DeepMindVerified email at google.com
Daniel PalekaETH ZurichVerified email at inf.ethz.ch
Rohin ShahResearch Scientist, Google DeepMindVerified email at deepmind.com
Janos KramarDeepMindVerified email at google.com
Can RagerIndependent
Lewis SmithPhD Student, University of OxfordVerified email at kellogg.ox.ac.uk
Vikrant VarmaDeepMindVerified email at deepmind.com
Tom LieberumGoogle DeepMindVerified email at deepmind.com

Arthur Conmy

Google DeepMind

Verified email at google.com - Homepage


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small K Wang, A Variengien, A Conmy, B Shlegeris, J Steinhardt ICLR 2023, 2022	174	2022
Towards Automated Circuit Discovery for Mechanistic Interpretability A Conmy, AN Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso NeurIPS 2023 Spotlight, 2023	80	2023
Copy Suppression: Comprehensively Understanding an Attention Head C McDougall, A Conmy, C Rushing, T McGrath, N Nanda NeurIPS 2023 Workshop (Attributing Model Behavior at Scale), 2023	9	2023
Attribution Patching Outperforms Automated Circuit Discovery A Syed, C Rager, A Conmy NeurIPS 2023 Workshop (Attributing Model Behavior at Scale), 2023	8	2023
Successor Heads: Recurring, Interpretable Attention Heads In The Wild R Gould, E Ong, G Ogden, A Conmy ICLR 2024, 2023	5	2023
StyleGAN-induced Data-Driven Regularization for Inverse Problems A Conmy, S Mukherjee, CB Schönlieb IEEE ICASSP 2022, 2022	4	2022
Stealing Part of a Production Language Model N Carlini, D Paleka, KD Dvijotham, T Steinke, J Hayase, AF Cooper, ... ICML 2024, 2024	1	2024
Improving Dictionary Learning with Gated Sparse Autoencoders S Rajamanoharan, A Conmy, L Smith, T Lieberum, V Varma, J Kramár, ... arXiv preprint arXiv:2404.16014, 2024		2024
Sparse Autoencoders Work on Attention Layer Outputs C Kissane, R Krzyzanowski, A Conmy, N Nanda AI Alignment Forum, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–9

Citations per year