Nouha Dziri

prof_pic.jpg

I’m a research scientist at Allen Institute for AI (AI2) working with Yejin Choi. Currently, I co-lead the safety and post-training effort at Ai2 to build (OLMo): a highly capable and truly open LLM to advance AI.

Besides this, I work on understanding the limits of Transformers and their inner workings. Check out “Faith and Fate” to learn about the limits of Transformers on reasoning tasks. I also work on studying alignment in LLMs; checkout “Roadmap to Pluralistic Alignment”, “Finegrained RLHF”, and “RewardBench” the first evaluation tool for reward models..

My work has been featured in TechCrunch, LeMonde, The Economist, and Science News.

I was fortunate to work with brilliant researchers in the field. I have worked with Siva Reddy at Mila/McGill, with Hannah Rashkin, Tal Linzen, David Reitter, Diyi Yang, and Tom Kwiatkowski at Google Research NYC and have worked with Alessandro Sordoni, and Goeff Gordon at Microsoft Research Montreal.

News

Dec 2024 System 2 Reasoning at Scale workshop (NeurIPS 2024) was a success :tada: including keynote talks, panel, posters and lightening talks. Check out details!
Dec 2024 Invited talk “In-Context Learning in LLMs: Potential and Limits” at the Language Gamification Workshop @ NeurIPS 2024 :canada:
Dec 2024 Invited as a panelist at the Meta-Generation Algorithms for Large Language Models Tutorial at NeurIPS 2024 :canada:
Oct 2024 :tada: WildTeaming and WildGuard got accepted at NeurIPS 2024. See you in Vancouver :canada:
Sep 2024 :toolbox: :crossed_swords: New blogpost about AI safety “Current Paradigms of LLMs Safety Alignment are superficial
Sep 2024 :bellhop_bell: :strawberry: New blogpost about o1 models and LLMs reasoning “Have o1 Models Cracked Human Reasoning?
Aug 2024 :tada: Super excited that our workshop “System 2 Reasoning At Scale” was accepted to NeurIPS24, Vancouver! Mark your calendar for Dec 15, 2024!
Jul 2024 Check out our :fire: new safety moderation tool :fire: WildGuard: a state-of-the-art open tool for assessing safety risks, jailbreaks, and refusals in LLMs.
Jul 2024 New red-teaming method :lion: WildTeaming: an automatic red-teaming framework that discovers novel jailbreaks based on in-the-wild user-LLMs interactions.
Jul 2024 Check out my interview with Science News Magazine about “LLMs reasoning skills” featuring “Faith and Fate” and “Generative AI Paradox”.
Jul 2024 I will serve as a Demo Chair for NAACL 2025.
Jun 2024 I will serve as a Senior Area Chair for ACL 2025 in the area of Ethics, Bias, and Fairness.