About

Currently, I work at Constellation, where I plan events and run programs that aim to strengthen the connections between different parts of the AI safety ecosystem.

I'm concerned that we're on track to develop very powerful AI systems soon without being confident that this won't lead to a global catastrophe such as human extinction, human disempowerment, or permanent authoritarian lock-in. The 80,000 Hours website is a good starting point for understanding the risks and the opportunities for reducing them. I plan to spend my professional life working to reduce the danger from AI systems as much as possible.

Previously, I took part in MATS under Ethan Perez, and continued the research I started there until starting at Constellation in 2025. Before that, I completed a DPhil at the University of Oxford, supervised by Tom Melham and Daniel Kroening, using generative image models to test and evaluate the robustness of image classification models. Before that, I taught computer science for two years at a comprehensive secondary school through the Teach First Leadership Development Programme, and studied computer science at the University of Cambridge.

Publications

Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities

Isaac Dunn, Hadrien Pouget, Laura Hanu, Daniel Kroening, Tom Melham

We introduce a method that finds context-sensitive feature perturbations (shape, location, texture, colour) by adjusting the activations of a generative network. State-of-the-art classifiers are not robust to these changes — and adversarial training against pixel-space attacks turns out to be counterproductive for coarse-grained ones.

A volcano image perturbed until a classifier labels it a goldfish

Adaptive Generation of Unrestricted Adversarial Inputs

Isaac Dunn, Hadrien Pouget, Tom Melham, Daniel Kroening

We introduce an adaptive algorithm for generating unrestricted adversarial inputs that tunes its attacks to the target classifier, runs 400–2000× faster than prior work, and defeats adversarial training against it.