1 | Adel Bibi

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction

Runqi Lin, Alasdair Paren, Suqin Yuan, Muyang Li, Philip Torr, Adel Bibi, Tongliang Liu

Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

James Oldfield, Philip Torr, Ioannis Patras, Adel Bibi, Fazl Barez

BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models

Thierry Blankenstein, Jialin Yu, Zixuan Li, Vassilis Plachouras, Sunando Sengupta, Philip Torr, Yarin Gal, Alasdair Paren, Adel Bibi

Attacking multimodal OS agents with malicious image patches

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software. Create your slides in Markdown - click the Slides button to check out the example.

Lukas Aichberger, Alasdair Paren, Guohao Li, Philip H.S. Torr, Yarin Gal, Adel Bibi

Attacking multimodal OS agents with malicious image patches

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Andrew M Bean, Ryan Othniel Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne, Jan Batzner, Negar Foroutan, Chris Schmitz, Karolina Korgul, Hunar Batra, Oishi Deb, Emma Beharry, Cornelius Emde, Thomas Foster, Anna Gausen, María Grandury, Simeng Han, Valentin Hofmann, Lujain Ibrahim, Hazel Kim, Hannah Rose Kirk, Fangru Lin, Gabrielle Kaili-May Liu, Lennart Luettgau, Jabez Magomere, Jonathan Rystrøm, Anna Sotnikova, Yushi Yang, Yilun Zhao, Adel Bibi, Antoine Bosselut, Ronald Clark, Arman Cohan, Jakob Foerster, Yarin Gal, Scott A Hale, Inioluwa Deborah Raji, Christopher Summerfield, Philip HS Torr, Cozmin Ududec, Luc Rocher, Adam Mahdi