Paul Röttger

Citeret af

	Alle	Siden 2019
Henvisninger	634	634
h-index	10	10
i10-indeks	10	10

320

160

240

202120222023202418 118 320 175

Offentlig adgang

Se alle

1 artikel

0 artikler

tilgængelige

ikke tilgængelige

Baseret på krav i forbindelse med finansiering

Medforfattere

Bertie VidgenOxford, TuringVerificeret mail på rewire.online
Hannah Rose KirkUniversity of OxfordVerificeret mail på oii.ox.ac.uk
Dirk HovyBocconi UniversityVerificeret mail på unibocconi.it
Janet B. PierrehumbertProf. of Language Modelling, Univ. of Oxford Dept. of Engineering ScienceVerificeret mail på oerc.ox.ac.uk
Helen MargettsProfessor of Society and the Internet, University of OxfordVerificeret mail på oii.ox.ac.uk
Giuseppe AttanasioPostdoctoral Researcher, Instituto de TelecomunicaçõesVerificeret mail på lx.it.pt
Debora NozzaAssistant Professor, Bocconi UniversityVerificeret mail på unibocconi.it

Følg

Paul Röttger

Postdoctoral Researcher, Bocconi University

Verificeret mail på unibocconi.it - Startside

Natural Language Processing Large Language Models Online Harms AI Safety


Titel Sortér efter henvisninger Sortér efter årstal Sortér efter titel	Citeret af Citeret af	År
HateCheck: Functional Tests for Hate Speech Detection Models P Röttger, B Vidgen, D Nguyen, Z Waseem, H Margetts, J Pierrehumbert ACL 2021 (Main), 2021	193	2021
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks P Röttger, B Vidgen, D Hovy, JB Pierrehumbert NAACL 2022 (Main), 2022	96	2022
SemEval-2023 Task 10: Explainable Detection of Online Sexism HR Kirk, W Yin, B Vidgen, P Röttger ACL 2023 (Main), 2023	75	2023
Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media P Röttger, JB Pierrehumbert EMNLP 2021 (Findings), 2021	52	2021
The benefits, risks and bounds of personalizing the alignment of large language models to individuals HR Kirk, B Vidgen, P Röttger, SA Hale Nature Machine Intelligence, 2024	49*	2024
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale NAACL 2022 (Main), 2021	47	2021
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models P Röttger, H Seelawi, D Nozza, Z Talat, B Vidgen NAACL 2022 (WOAH), 2022	32	2022
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy NAACL 2024 (Main), 2023	24	2023
Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions F Bianchi, M Suzgun, G Attanasio, P Röttger, D Jurafsky, T Hashimoto, ... ICLR 2024 (Poster), 2023	23	2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale EMNLP 2023 (Main), 2023	11	2023
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages P Röttger, D Nozza, F Bianchi, D Hovy EMNLP 2022 (Main), 2022	9	2022
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics M Orlikowski, P Röttger, P Cimiano, D Hovy ACL 2023 (Main), 2023	5	2023
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models X Wang, B Ma, C Hu, L Weber-Genzel, P Röttger, F Kreuter, D Hovy, ... arXiv preprint arXiv:2402.14499, 2024	4	2024
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising “Alignment” in Large Language Models HR Kirk, B Vidgen, P Röttger, SA Hale NeurIPS 2023 (SoLaR Workshop), 2023	4	2023
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy arXiv preprint arXiv:2402.16786, 2024	3	2024
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models B Vidgen, HR Kirk, R Qian, N Scherrer, A Kannappan, SA Hale, P Röttger arXiv preprint arXiv:2311.08370, 2023	3	2023
Near to Mid-term Risks and Opportunities of Open Source Generative AI F Eiras, A Petrov, B Vidgen, CS de Witt, F Pizzati, K Elkins, ... ICML 2024, 2024	1	2024
Evaluating the elementary multilingual capabilities of large language models with MultiQ C Holtermann, P Röttger, T Dill, A Lauscher arXiv preprint arXiv:2403.03814, 2024	1	2024
Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore J Haber, B Vidgen, M Chapman, V Agarwal, RKW Lee, YK Yap, P Röttger ACL 2023 (Main), 2023	1	2023
Tracking abuse on Twitter against football players in the 2021–22 Premier League Season B Vidgen, YL Chung, P Johansson, HR Kirk, A Williams, SA Hale, ... Available at SSRN 4403913, 2022	1	2022

Systemet kan ikke foretage handlingen nu. Prøv igen senere.

Artikler 1–20

Henvisninger pr. år

Dublerede henvisninger

Flettede henvisninger

Tilføj medforfattereMedforfattere

Følg

Citeret af

Medforfattere