1

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models
Measuring what Matters: Construct Validity in Large Language Model Benchmarks