January 2026

Simon Willison's Weblog

Quoting Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar

OpenAI researchers are developing a "confessions" training method where AI models produce a second output that is rewarded solely for honesty. This approach creates an anonymous tip line where models...

Read summary
Simon Willison's Weblog

Claude Cowork Exfiltrates Files

Researchers discovered a clever attack that bypasses Claude Cowork's security protections by exploiting its trusted domain whitelist to exfiltrate user files. The attack uses the victim's own AI...

Read summary
Universe of AI

AI News: Gemini UPGRADED, GPT-5.3 LEAKED, Claude Cowork, AI Doctors!

Major AI companies are making strategic moves toward personal, agentic, and specialized AI systems. Google introduced personal intelligence in Gemini, Anthropic launched Claude Co-work for autonomous...

Read summary
AI Engineer

Identity for AI Agents - Patrick Riley & Carlos Galan, Auth0

Patrick Riley and Carlos Galan from Auth0 present their approach to securing AI agents through identity management. They demonstrate four key pillars for agent security: AI needs to know who you are,...

Read summary
Simon Willison's Weblog

Anthropic invests $1.5 million in the Python Software Foundation and open source security

Anthropic has committed $1.5 million over two years to the Python Software Foundation, with a focus on ecosystem security. This addresses a critical funding gap after the PSF withdrew from an NSF...

Read summary