Nouha Dziri

prof_pic.jpg

I’m a research scientist at Allen Institute for AI (AI2) working with Yejin Choi. Currently, I co-lead the safety and post-training effort at Ai2 to build (OLMo): a highly capable and truly open LLM to advance AI.

Besides this, I work on understanding the limits of Transformers and their inner workings. Check out “Faith and Fate” to learn about the limits of Transformers on reasoning tasks. I also work on studying alignment in LLMs; checkout “Roadmap to Pluralistic Alignment”, “Finegrained RLHF”, and “RewardBench” the first evaluation tool for reward models..

My work has been featured in TechCrunch, LeMonde, The Economist, and Science News.

I was fortunate to work with brilliant researchers in the field. I have worked with Siva Reddy at Mila/McGill, with Hannah Rashkin, Tal Linzen, David Reitter, Diyi Yang, and Tom Kwiatkowski at Google Research NYC and have worked with Alessandro Sordoni, and Goeff Gordon at Microsoft Research Montreal.

News

Oct 2024 :tada: WildTeaming and WildGuard got accepted at NeurIPS 2024. See you in Vancouver :canada:
Sep 2024 :toolbox: :crossed_swords: New blogpost about AI safety “Current Paradigms of LLMs Safety Alignment are superficial
Sep 2024 :bellhop_bell: :strawberry: New blogpost about o1 models and LLMs reasoning “Have o1 Models Cracked Human Reasoning?
Aug 2024 :tada: Super excited that our workshop “System 2 Reasoning At Scale” was accepted to NeurIPS24, Vancouver! Mark your calendar for Dec 15, 2024!
Jul 2024 Check out our :fire: new safety moderation tool :fire: WildGuard: a state-of-the-art open tool for assessing safety risks, jailbreaks, and refusals in LLMs.
Jul 2024 New red-teaming method :lion: WildTeaming: an automatic red-teaming framework that discovers novel jailbreaks based on in-the-wild user-LLMs interactions.
Jul 2024 Check out my interview with Science News Magazine about “LLMs reasoning skills” featuring “Faith and Fate” and “Generative AI Paradox”.
Jul 2024 I will serve as a Demo Chair for NAACL 2025.
Jun 2024 I will serve as a Senior Area Chair for ACL 2025 in the area of Ethics, Bias, and Fairness.
Jun 2024 Check out my interview with LeMonde (equivalent of NYT in France) about hallucinations in LLMs.
May 2024 Invited Talk “What it can create, it may not understand: Studying the Limits of Transformers” at the University of Cambridge.
May 2024 I served as an Area Chair for COLM 2024 in the area of Safety in LLMs.