• Universal and Transferable Adversarial Attacks on Aligned Language Models
    • Authors: Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson
    • Venue: Arxiv, 2023
    • Paper Link
    • Leader: Zainab Altaweel