About

I work at Constellation, where I plan events and run programs that aim to strengthen connections between different parts of the AI safety ecosystem.

I've chosen to spend my career reducing the danger from advanced AI. I'm concerned that we're building powerful AI systems in a way that might lead to catastrophe in the next ten years: human extinction, human disempowerment, or permanent authoritarian lock-in. The 80,000 Hours website is a good introduction to the risks and what you can do about them.

Previously, I took part in the MATS research program under Ethan Perez, and continued the research I started there independently. Before that, I completed a DPhil at the University of Oxford, supervised by Tom Melham and Daniel Kroening, using generative image models to test and evaluate the robustness of image classification models. Before that, I taught computer science for two years at a comprehensive secondary school through the Teach First Leadership Development Programme. My undergraduate degree was in computer science at the University of Cambridge.

Research

Exposing Previously Undetectable Faults in Deep Neural Networks

Isaac Dunn, Hadrien Pouget, Tom Melham, Daniel Kroening

Existing methods for testing DNNs constrain test inputs to lie close to known examples, which limits the faults they can find. By leveraging generative machine learning, we generate fresh test cases that vary in high-level features (shape, location, texture, colour) and expose faults that other methods cannot.

Detecting a fault in a deep neural network image classifier

Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities

Isaac Dunn, Hadrien Pouget, Laura Hanu, Daniel Kroening, Tom Melham

We introduce a method that finds context-sensitive feature perturbations (shape, location, texture, colour) by adjusting the activations of a generative network. State-of-the-art classifiers are not robust to these changes — and adversarial training against pixel-space attacks turns out to be counterproductive for coarse-grained ones.

A volcano image perturbed until a classifier labels it a goldfish

Adaptive Generation of Unrestricted Adversarial Inputs

Isaac Dunn, Hadrien Pouget, Tom Melham, Daniel Kroening

We introduce an adaptive algorithm for generating unrestricted adversarial inputs that tunes its attacks to the target classifier, runs 400–2000× faster than prior work, and defeats adversarial training against it.

Contact