Announcement_49
New red-teaming method WildTeaming: an automatic red-teaming framework that discovers novel jailbreaks based on in-the-wild user-LLMs interactions.
New red-teaming method WildTeaming: an automatic red-teaming framework that discovers novel jailbreaks based on in-the-wild user-LLMs interactions.