Publications

My publications in machine learning and alignment are listed here. You can find a full overview of my research publications at my Google Scholar page including past work in policy and technology security.

Equal contribution is indicated by *.

2024

Evaluating Frontier Models for Dangerous Capabilities

Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah, Allan Dafoe, and Toby Shevlane

arXiv, 2024

Abstract
PDF
BibTeX
```
@article{phuongEvaluating2024,
  title = {Evaluating Frontier Models for Dangerous Capabilities},
  author = {Phuong, Mary and Aitchison, Matthew and Catt, Elliot and Cogan, Sarah and Kaskasoli, Alexandre and Krakovna, Victoria and Lindner, David and Rahtz, Matthew and Assael, Yannis and Hodkinson, Sarah and Howard, Heidi and Lieberum, Tom and Kumar, Ramana and Raad, Maria Abi and Webson, Albert and Ho, Lewis and Lin, Sharon and Farquhar, Sebastian and Hutter, Marcus and Deletang, Gregoire and Ruoss, Anian and El-Sayed, Seliem and Brown, Sasha and Dragan, Anca and Shah, Rohin and Dafoe, Allan and Shevlane, Toby},
  year = {2024},
  eprint = {2403.13793},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
  journal = {arXiv}
}
```
To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous capabilities in the models we evaluated, but we flag early warning signs. Our goal is to help advance a rigorous science of dangerous capability evaluation, in preparation for future models.
Holistic Safety and Responsibility Evaluations of Advanced AI Models

Laura Weidinger, Joslyn Barnhart, Jenny Brennan, Christina Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona Comanescu, Oscar Chang, Mikel Rodriguez, Jennifer Beroshi, Dawn Bloxwich, Lev Proleev, Jilin Chen, Sebastian Farquhar, Lewis Ho, Iason Gabriel, Allan Dafoe, and William Isaac

arXiv, 2024

Abstract
PDF
BibTeX
```
@article{weidingerHolistic2024,
  title = {Holistic Safety and Responsibility Evaluations of Advanced AI Models},
  author = {Weidinger, Laura and Barnhart, Joslyn and Brennan, Jenny and Butterfield, Christina and Young, Susie and Hawkins, Will and Hendricks, Lisa Anne and Comanescu, Ramona and Chang, Oscar and Rodriguez, Mikel and Beroshi, Jennifer and Bloxwich, Dawn and Proleev, Lev and Chen, Jilin and Farquhar, Sebastian and Ho, Lewis and Gabriel, Iason and Dafoe, Allan and Isaac, William},
  year = {2024},
  eprint = {2404.14068},
  archiveprefix = {arXiv},
  primaryclass = {cs.AI},
  journal = {arXiv}
}
```
Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice. In the development of Google DeepMind’s advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned include: First, theoretical underpinnings and frameworks are invaluable to organise the breadth of risk domains, modalities, forms, metrics, and goals. Second, theory and practice of safety evaluation development each benefit from collaboration to clarify goals, methods and challenges, and facilitate the transfer of insights between different stakeholders and disciplines. Third, similar key methods, lessons, and institutions apply across the range of concerns in responsibility and safety - including established and emerging harms. For this reason it is important that a wide range of actors working on safety evaluation and safety research communities work together to develop, refine and implement novel evaluation approaches and best practices, rather than operating in silos. The report concludes with outlining the clear need to rapidly advance the science of evaluations, to integrate new evaluations into the development and governance of AI, to establish scientifically-grounded norms and standards, and to promote a robust evaluation ecosystem.
Detecting Hallucinations in Large Language Models Using Semantic Entropy

Sebastian Farquhar*, Jannik Kossen*, Lorenz Kuhn*, and Yarin Gal

Nature, 2024

Abstract
PDF Blog
BibTeX
```
@article{farquharDetecting2024,
  title = {Detecting Hallucinations in Large Language Models Using Semantic Entropy},
  author = {Farquhar, Sebastian and Kossen, Jannik and Kuhn, Lorenz and Gal, Yarin},
  year = {2024},
  journal = {Nature},
  volume = {630},
  pages = {625-630},
  issue = {8017}
```
Large language model (LLM) systems, such as ChatGPT or Gemini, can show impressive reasoning and question-answering capabilities but often ’hallucinate’ false outputs and unsubstantiated answers. Answering unreliably or without the necessary information prevents adoption in diverse fields, with problems including fabrication of legal precedents or untrue facts in news articles and even posing a risk to human life in medical domains such as radiology. Encouraging truthfulness through supervision or reinforcement has been only partially successful. Researchers need a general method for detecting hallucinations in LLMs that works even with new and unseen questions to which humans might not know the answer. Here we develop new methods grounded in statistics, proposing entropy-based uncertainty estimators for LLMs to detect a subset of hallucinations—confabulations—which are arbitrary and incorrect generations. Our method addresses the fact that one idea can be expressed in many ways by computing uncertainty at the level of meaning rather than specific sequences of words. Our method works across datasets and tasks without a priori knowledge of the task, requires no task-specific data and robustly generalizes to new tasks not seen before. By detecting when a prompt is likely to produce a confabulation, our method helps users understand when they must take extra care with LLMs and opens up new possibilities for using LLMs that are otherwise prevented by their unreliability.

2023

Discovering Agents

Zachary Kenton, Ramana Kumar, Sebastian Farquhar, John Richens, Matt MacDermott, and Tom Everitt

Artificial Intelligence, Sep 2023

Abstract
PDF
BibTeX
```
@article{kentonDiscovering2023,
  title = {Discovering {{Agents}}},
  author = {Kenton, Zachary and Kumar, Ramana and Farquhar, Sebastian and Richens, John and MacDermott, Matt and Everitt, Tom},
  year = {2023},
  month = sep,
  volume = {322},
  journal = {Artificial Intelligence}
}
```
Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial – often the causal model is just assumed by the modeler without much justification – and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents – roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive the first causal discovery algorithm for discovering agents from empirical data, and give algorithms for translating between causal models and game-theoretic influence diagrams. We demonstrate our approach by resolving some previous confusions caused by incorrect causal modelling of agents.
Tracr: Compiled Transformers as a Laboratory for Interpretability

David Lindner, János Kramár, Sebastian Farquhar, Matthew Rahtz, Thomas McGrath, and Vladimir Mikulik

Neural Information Processing Systems, Sep 2023

Abstract
PDF Code
BibTeX
```
@article{lindnerTracr2023,
  title = {Tracr: Compiled Transformers as a Laboratory for Interpretability},
  author = {Lindner, David and Kramár, János and Farquhar, Sebastian and Rahtz, Matthew and McGrath, Thomas and Mikulik, Vladimir},
  year = {2023},
  journal = {Neural Information Processing Systems}
```
We show how to "compile" human-readable programs into standard decoder-only transformer models. Our compiler, Tracr, generates models with known structure. This structure can be used to design experiments. For example, we use it to study "superposition" in transformers that execute multi-step algorithms. Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evaluating interpretability methods. Commonly, because the "programs" learned by transformers are unknown it is unclear whether an interpretation succeeded. We demonstrate our approach by implementing and examining programs including computing token frequencies, sorting, and parenthesis checking.
Challenges With Unsupervised LLM Knowledge Discovery

Sebastian Farquhar*, Vikrant Varma*, Zachary Kenton*, Johannes Gasteiger, Vladimir Mikulik, and Rohin Shah

arXiv, Sep 2023

Abstract
PDF Blog
BibTeX
```
@article{farquharChallenges2023,
  title = {Challenges With Unsupervised LLM Knowledge Discovery},
  author = {Farquhar, Sebastian and Varma, Vikrant and Kenton, Zachary and Gasteiger, Johannes and Mikulik, Vladimir and Shah, Rohin},
  year = {2023},
  eprint = {2312.10029},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
  journal = {arXiv}
```
We show that existing unsupervised methods on large language model (LLM) activations do not discover knowledge – instead they seem to discover whatever feature of the activations is most prominent. The idea behind unsupervised knowledge elicitation is that knowledge satisfies a consistency structure, which can be used to discover knowledge. We first prove theoretically that arbitrary features (not just knowledge) satisfy the consistency structure of a particular leading unsupervised knowledge-elicitation method, contrast-consistent search (Burns et al. - arXiv:2212.03827). We then present a series of experiments showing settings in which unsupervised methods result in classifiers that do not predict knowledge, but instead predict a different prominent feature. We conclude that existing unsupervised methods for discovering latent knowledge are insufficient, and we contribute sanity checks to apply to evaluating future knowledge elicitation methods. Conceptually, we hypothesise that the identification issues explored here, e.g. distinguishing a model’s knowledge from that of a simulated character’s, will persist for future unsupervised methods.
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar

ICLR (notable-top-25%), Sep 2023

Abstract
PDF
BibTeX
```
@article{kuhnSemantic2023,
  title = {Semantic {{Uncertainty}}: {{Linguistic Invariances}} for {{Uncertainty Estimation}} in {{Natural Language Generation}}},
  author = {Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian},
  year = {2023},
  journal = {ICLR (notable-top-25\%)}
}
```
We introduce a method to measure uncertainty in large language models. For tasks like question answering, it is essential to know when we can trust the natural language outputs of foundation models. We show that measuring uncertainty in natural language is challenging because of "semantic equivalence"—different sentences can mean the same thing. To overcome these challenges we introduce semantic entropy—an entropy which incorporates linguistic invariances created by shared meanings. Our method is unsupervised, uses only a single model, and requires no modifications to off-the-shelf language models. In comprehensive ablation studies we show that the semantic entropy is more predictive of model accuracy on question answering data sets than comparable baselines.
Do Bayesian Neural Networks Need To Be Fully Stochastic?

Mrinank Sharma, Sebastian Farquhar, Eric Nalisnick, and Tom Rainforth

AI Stats (Oral), Sep 2023

Abstract
PDF
BibTeX
```
@article{sharmaBayesian2022,
  title = {Do {{Bayesian Neural Networks Need To Be Fully Stochastic}}?},
  author = {Sharma, Mrinank and Farquhar, Sebastian and Nalisnick, Eric and Rainforth, Tom},
  year = {2023},
  journal = {AI Stats (Oral)}
}
```
We investigate the efficacy of treating all the parameters in a Bayesian neural network stochastically and find compelling theoretical and empirical evidence that this standard construction may be unnecessary. To this end, we prove that expressive predictive distributions require only small amounts of stochasticity. In particular, partially stochastic networks with only n stochastic biases are universal probabilistic predictors for n-dimensional predictive problems. In empirical investigations, we find no systematic benefit of full stochasticity across four different inference modalities and eight datasets; partially stochastic networks can match and sometimes even outperform fully stochastic networks, despite their reduced memory costs.
Model Evaluation for Extreme Risks

Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, and Allan Dafoe

arXiv, Sep 2023

Abstract
PDF
BibTeX
```
@article{shevlaneModel2023,
  title = {Model Evaluation for Extreme Risks},
  author = {Shevlane, Toby and Farquhar, Sebastian and Garfinkel, Ben and Phuong, Mary and Whittlestone, Jess and Leung, Jade and Kokotajlo, Daniel and Marchal, Nahema and Anderljung, Markus and Kolt, Noam and Ho, Lewis and Siddarth, Divya and Avin, Shahar and Hawkins, Will and Kim, Been and Gabriel, Iason and Bolina, Vijay and Clark, Jack and Bengio, Yoshua and Christiano, Paul and Dafoe, Allan},
  year = {2023},
  eprint = {2305.15324},
  archiveprefix = {arXiv},
  primaryclass = {cs.AI},
  journal = {arXiv}
}
```
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through "dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through "alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
Prediction-Oriented Bayesian Active Learning

Freddie Bickford Smith, Andreas Kirsch, Sebastian Farquhar, Yarin Gal, Adam Foster, and Tom Rainforth

AI Stats, Sep 2023

Abstract
PDF
BibTeX
```
@article{bickfordsmithPrediction2023,
  title = {Prediction-{{Oriented Bayesian Active Learning}}},
  author = {Bickford Smith, Freddie and Kirsch, Andreas and Farquhar, Sebastian and Gal, Yarin and Foster, Adam and Rainforth, Tom},
  year = {2023},
  journal = {AI Stats}
}
```
We show that the BALD score can be a sub-optimal objective for active learning when our ultimate goal is to maximise predictive performance. BALD prioritises gathering information about model parameters, but it fails to tailor this to the predictive task the model is being trained for. In particular, it does not account for the downstream input distribution and can thus prioritise inconsequential information. To address this, we propose the expected predictive information gain (EPIG), an information-theoretic acquisition function that instead targets data that will be most useful for our downstream predictive task. This is achieved by formulating the information gain in the space of predictions themselves and incorporating knowledge of the downstream input distribution. Empirically we find that EPIG achieves higher label efficiency than BALD across a range of datasets and models.

2022

CLAM: Selective Clarification for Ambiguous Questions with Large Language Models

Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar

arXiv, Dec 2022

Abstract
PDF
BibTeX
```
@article{kuhnCLAM2022,
  title = {{{CLAM}}: {{Selective Clarification}} for {{Ambiguous Questions}} with {{Large Language Models}}},
  author = {Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian},
  year = {2022},
  month = dec,
  journal = {arXiv}
}
```
State-of-the-art language models are often accurate on many question-answering benchmarks with well-defined questions. Yet, in real settings questions are often unanswerable without asking the user for clarifying information. We show that current SotA models often do not ask the user for clarification when presented with imprecise questions and instead provide incorrect answers or "hallucinate". To address this, we introduce CLAM, a framework that first uses the model to detect ambiguous questions, and if an ambiguous question is detected, prompts the model to ask the user for clarification. Furthermore, we show how to construct a scalable and cost-effective automatic evaluation protocol using an oracle language model with privileged information to provide clarifying information. We show that our method achieves a 20.15 percentage point accuracy improvement over SotA on a novel ambiguous question-answering answering data set derived from TriviaQA.
What ‘Out-of-distribution’ Is and Is Not

Sebastian Farquhar, and Yarin Gal

ML Safety workshop at NeurIPS, Dec 2022

Abstract
PDF
BibTeX
```
@article{farquharWhat2022,
  title = {What `{{Out-of-distribution}}' Is and Is Not},
  author = {Farquhar, Sebastian and Gal, Yarin},
  year = {2022},
  month = dec,
  journal = {ML Safety workshop at NeurIPS}
}
```
Researchers want to generalize robustly to ‘out-of-distribution’ (OOD) data. Unfortunately, this term is used ambiguously causing confusion and creating risk—people might believe they have made progress on OOD data and not realize this progress only holds in limited cases. We critique a standard definition of OOD—difference-in-distribution—and then disambiguate four meaningful types of OOD data: transformed-distributions, related-distributions, complement-distributions, and synthetic-distributions. We describe how existing OOD datasets, evaluations, and techniques fit into this framework. We provide a template for researchers to carefully present the scope of distribution shift considered in their work.
Prioritized Training on Points That Are Learnable, Worth Learning, and Not Yet Learned

Soren Mindermann, Muhammad Razzak, Winnie Xu, Andreas Kirsch, Mrinank Sharma, Adrien Morisot, Aidan Gomez, Sebastian Farquhar, Jan Brauner, and Yarin Gal

International Conference on Machine Learning, Jun 2022

Abstract
PDF Video
BibTeX
```
@article{mindermannPrioritized2022,
  title = {Prioritized {{Training}} on {{Points}} That Are {{Learnable}}, {{Worth Learning}}, and {{Not Yet Learned}}},
  author = {Mindermann, Soren and Razzak, Muhammad and Xu, Winnie and Kirsch, Andreas and Sharma, Mrinank and Morisot, Adrien and Gomez, Aidan and Farquhar, Sebastian and Brauner, Jan and Gal, Yarin},
  year = {2022},
  month = jun,
  journal = {International Conference on Machine Learning},
  pages = {15630--15649}
```
Training on web-scale data can take months. But most computation and time is wasted on redundant and noisy points that are already learnt or not learnable. To accelerate training, we introduce Reducible Holdout Loss Selection (RHO-LOSS), a simple but principled technique which selects approximately those points for training that most reduce the model’s generalization loss. As a result, RHO-LOSS mitigates the weaknesses of existing data selection methods: techniques from the optimization literature typically select ’hard’ (e.g. high loss) points, but such points are often noisy (not learnable) or less task-relevant. Conversely, curriculum learning prioritizes ’easy’ points, but such points need not be trained on once learned. In contrast, RHO-LOSS selects points that are learnable, worth learning, and not yet learnt. RHO-LOSS trains in far fewer steps than prior art, improves accuracy, and speeds up training on a wide range of datasets, hyperparameters, and architectures (MLPs, CNNs, and BERT). On the large web-scraped image dataset Clothing-1M, RHO-LOSS trains in 18x fewer steps and reaches 2% higher final accuracy than uniform data shuffling.
Understanding Approximation for Bayesian Inference in Neural Networks

Sebastian Farquhar

DPhil Thesis at University of Oxford, Apr 2022

Abstract
PDF
BibTeX
```
@phdthesis{farquharUnderstanding2022,
  type = {{{DPhil Thesis}}},
  title = {Understanding {{Approximation}} for {{Bayesian Inference}} in {{Neural Networks}}},
  author = {Farquhar, Sebastian},
  year = {2022},
  month = apr,
  address = {{Oxford}},
  school = {University of Oxford}
}
```
Bayesian inference has theoretical attractions as a principled framework for reasoning about beliefs. However, the motivations of Bayesian inference which claim it to be the only ‘rational’ kind of reasoning do not apply in practice. They create a binary split in which all approximate inference is equally ‘irrational’. Instead, we should ask ourselves how to define a spectrum of more and less-rational reasoning that explains why we might prefer one Bayesian approximation to another. I explore approximate inference in Bayesian neural networks and consider the unintended interactions between the probabilistic model, approximating distribution, optimization algorithm, and dataset. The complexity of these interactions highlights the difficulty of any strategy for evaluating Bayesian approximations which focuses entirely on the method, outside the context of specific datasets and decision-problems. For given applications, the expected utility of the approximate posterior can measure inference quality. To assess a model’s ability to incorporate different parts of the Bayesian framework we can identify desirable characteristic behaviours of Bayesian reasoning and pick decision-problems that make heavy use of those behaviours. Here, we use continual learning (testing the ability to update sequentially) and active learning (testing the ability to represent credence). But existing continual and active learning set-ups pose challenges that have nothing to do with posterior quality which can distort their ability to evaluate Bayesian approximations. These unrelated challenges can be removed or reduced, allowing better evaluation of approximate inference methods.
Prospect Pruning: Finding Trainable Weights at Initialization Using Meta-gradients

Milad Alizadeh, Shyam Tailor, Luisa Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Lane, and Yarin Gal

International Conference on Learning Representations, Mar 2022

Abstract
PDF
BibTeX
```
@article{alizadehProspect2022,
  title = {Prospect {{Pruning}}: {{Finding Trainable Weights}} at {{Initialization Using Meta-gradients}}},
  author = {Alizadeh, Milad and Tailor, Shyam and Zintgraf, Luisa and {van Amersfoort}, Joost and Farquhar, Sebastian and Lane, Nicholas and Gal, Yarin},
  year = {2022},
  month = mar,
  journal = {International Conference on Learning Representations}
}
```
Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network while consuming fewer computational resources for training and inference. However, current methods are insufficient to enable this optimization and lead to a large degradation in model performance. In this paper, we identify a fundamental limitation in the formulation of current methods, namely that their saliency criteria look at a single step at the start of training without taking into account the trainability of the network. While pruning iteratively and gradually has been shown to improve pruning performance, explicit consideration of the training stage that will immediately follow pruning has so far been absent from the computation of the saliency criterion. To overcome the short-sightedness of existing methods, we propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune. ProsPr combines an estimate of the higherorder effects of pruning on the loss and the ptimization trajectory to identify the trainable sub-network. Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.
Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation

Jannik Kossen, Sebastian Farquhar, Yarin Gal, and Tom Rainforth

Neural Information Processing Systems, Feb 2022

Abstract
PDF Video
BibTeX
```
@article{kossenActive2022,
  title = {Active {{Surrogate Estimators}}: {{An Active Learning Approach}} to {{Label-Efficient Model Evaluation}}},
  shorttitle = {Active {{Surrogate Estimators}}},
  author = {Kossen, Jannik and Farquhar, Sebastian and Gal, Yarin and Rainforth, Tom},
  year = {2022},
  month = feb,
  journal = {Neural Information Processing Systems},
  eprint = {2202.06881},
  eprinttype = {arxiv},
  primaryclass = {cs, stat},
  doi = {10.48550/arXiv.2202.06881},
  archiveprefix = {arXiv}
```
We propose Active Surrogate Estimators (ASEs), a new method for label-efficient model evaluation. Evaluating model performance is a challenging and important problem when labels are expensive. ASEs address this active testing problem using a surrogate-based estimation approach, whereas previous methods have focused on Monte Carlo estimates. ASEs actively learn the underlying surrogate, and we propose a novel acquisition strategy, XWING, that tailors this learning to the final estimation task. We find that ASEs offer greater label-efficiency than the current state-of-the-art when applied to challenging model evaluation problems for deep neural networks. We further theoretically analyze ASEs’ errors.
Path-Specific Objectives for Safer Agent Incentives

Sebastian Farquhar, Ryan Carey, and Tom Everitt

AAAI Conference on Artificial Intelligence, Feb 2022

Abstract
PDF
BibTeX
```
@article{farquharPathspecific2022,
  title = {Path-Specific {{Objectives}} for {{Safer Agent Incentives}}},
  author = {Farquhar, Sebastian and Carey, Ryan and Everitt, Tom},
  year = {2022},
  month = feb,
  journal = {AAAI Conference on Artificial Intelligence},
  volume = {36}
}
```
We present a general framework for training safe agents whose naive incentives are unsafe. As an example, manipulative or deceptive behaviour can improve rewards but should be avoided. Most approaches fail here: agents maximize expected return by any means necessary. We formally describe settings with ‘delicate’ parts of the state which should not be used as a means to an end. We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis. The resulting agents have no incentive to control the delicate state. We further show how our framework unifies and generalizes existing proposals.
Stochastic Batch Acquisition for Deep Active Learning

Andreas Kirsch*, Sebastian Farquhar*, Parmida Atighehchian, Andrew Jesson, Frederic Branchaud-Charron, and Yarin Gal

SubsetML ICML Workshop, Jan 2022

Abstract
PDF
BibTeX
```
@article{kirschStochastic2022,
  title = {Stochastic {{Batch Acquisition}} for {{Deep Active Learning}}},
  author = {Kirsch, Andreas and Farquhar, Sebastian and Atighehchian, Parmida and Jesson, Andrew and {Branchaud-Charron}, Frederic and Gal, Yarin},
  year = {2022},
  month = jan,
  journal = {SubsetML ICML Workshop},
  eprint = {2106.12059},
  eprinttype = {arxiv},
  archiveprefix = {arXiv}
}
```
In active learning, new labels are commonly acquired in batches. However, common acquisition functions are only meant for one-sample acquisition rounds at a time, and when their scores are used naively for batch acquisition, they result in batches lacking diversity, which deteriorates performance. On the other hand, state-of-the-art batch acquisition functions are costly to compute. In this paper, we present a novel class of stochastic acquisition functions that extend one-sample acquisition functions to the batch setting by observing how one-sample acquisition scores change as additional samples are acquired and modelling this difference for additional batch samples. We simply acquire new samples by sampling from the pool set using a Gibbs distribution based on the acquisition scores. Our acquisition functions are both vastly cheaper to compute and out-perform other batch acquisition functions.

2021

Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning

Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry, Sebastian Farquhar, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren, Tim G. J. Rudner, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jasper Snoek, Yarin Gal, and Dustin Tran

arXiv, Jun 2021

Abstract

BibTeX
```
@article{nadoUncertainty2021,
  title = {Uncertainty {{Baselines}}: {{Benchmarks}} for {{Uncertainty}} \& {{Robustness}} in {{Deep Learning}}},
  shorttitle = {Uncertainty {{Baselines}}},
  author = {Nado, Zachary and Band, Neil and Collier, Mark and Djolonga, Josip and Dusenberry, Michael W. and Farquhar, Sebastian and Filos, Angelos and Havasi, Marton and Jenatton, Rodolphe and Jerfel, Ghassen and Liu, Jeremiah and Mariet, Zelda and Nixon, Jeremy and Padhy, Shreyas and Ren, Jie and Rudner, Tim G. J. and Wen, Yeming and Wenzel, Florian and Murphy, Kevin and Sculley, D. and Lakshminarayanan, Balaji and Snoek, Jasper and Gal, Yarin and Tran, Dustin},
  year = {2021},
  month = jun,
  journal = {arXiv},
  eprint = {2106.04015},
  eprinttype = {arxiv},
  archiveprefix = {arXiv}
}
```
High-quality estimates of uncertainty and robustness are crucial for numerous real-world applications, especially for deep learning which underlies many deployed ML systems. The ability to compare techniques for improving these estimates is therefore very important for research and practice alike. Yet, competitive comparisons of methods are often lacking due to a range of reasons, including: compute availability for extensive tuning, incorporation of sufficiently many baselines, and concrete documentation for reproducibility. In this paper we introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks. As of this writing, the collection spans 19 methods across 9 tasks, each with at least 5 metrics. Each baseline is a self-contained experiment pipeline with easily reusable and extendable components. Our goal is to provide immediate starting points for experimentation with new methods or applications. Additionally we provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results. Code available at https://github.com/google/uncertainty-baselines.
On Statistical Bias In Active Learning: How and When to Fix It

Sebastian Farquhar*, Yarin Gal, and Tom Rainforth*

International Conference on Learning Representations (Spotlight), Jun 2021

Abstract
PDF Video Code
BibTeX
```
@article{farquharStatistical2021,
  title = {On {{Statistical Bias In Active Learning}}: {{How}} and {{When}} to {{Fix It}}},
  shorttitle = {On {{Statistical Bias In Active Learning}}},
  author = {Farquhar, Sebastian and Gal, Yarin and Rainforth, Tom},
  year = {2021},
  journal = {International Conference on Learning Representations (Spotlight)},
  langid = {english}
```
Active learning is a powerful tool when labelling data is expensive, but it introduces a bias because the training data no longer follows the population distribution. We formalize this bias and...
Active Testing: Sample-Efficient Model Evaluation

Jannik Kossen*, Sebastian Farquhar*, Yarin Gal, and Tom Rainforth

International Conference on Machine Learning, Jun 2021

Abstract
PDF Video Slides
BibTeX
```
@article{kossenActive2021,
  title = {Active {{Testing}}: {{Sample-Efficient Model Evaluation}}},
  author = {Kossen, Jannik and Farquhar, Sebastian and Gal, Yarin and Rainforth, Tom},
  year = {2021},
  journal = {International Conference on Machine Learning}
```
We introduce a new framework for sample-efficient model evaluation that we call active testing. While approaches like active learning reduce the number of labels needed for model training, existing literature largely ignores the cost of labeling test data, typically unrealistically assuming large test sets for model evaluation. This creates a disconnect to real applications, where test labels are important and just as expensive, e.g. for optimizing hyperparameters. Active testing addresses this by carefully selecting the test points to label, ensuring model evaluation is sample-efficient. To this end, we derive theoretically-grounded and intuitive acquisition strategies that are specifically tailored to the goals of active testing, noting these are distinct to those of active learning. As actively selecting labels introduces a bias; we further show how to remove this bias while reducing the variance of the estimator at the same time. Active testing is easy to implement and can be applied to any supervised machine learning method. We demonstrate its effectiveness on models including WideResNets and Gaussian processes on datasets including Fashion-MNIST and CIFAR-100.
Evaluating Approximate Inference in Bayesian Deep Learning

Andrew Gordon Wilson, Pavel Izmailov, Matthew D Hoffman, Yarin Gal, Yingzhen Li, Melanie F Pradier, Sharad Vikram, Andrew Foong, Sanae Lotfi, and Sebastian Farquhar

NeurIPS Competition, Jun 2021

Abstract
Video
BibTeX
```
@article{wilsonEvaluating2021,
  title = {Evaluating {{Approximate Inference}} in {{Bayesian Deep Learning}}},
  author = {Wilson, Andrew Gordon and Izmailov, Pavel and Hoffman, Matthew D and Gal, Yarin and Li, Yingzhen and Pradier, Melanie F and Vikram, Sharad and Foong, Andrew and Lotfi, Sanae and Farquhar, Sebastian},
  year = {2021},
  journal = {NeurIPS Competition},
  langid = {english}
```
Uncertainty representation is crucial to the safe and reliable deployment of deep learning. Bayesian methods provide a natural mechanism to represent epistemic uncertainty, leading to improved generalization and calibrated predictive distributions. Bayesian methods are particularly promising for deep neural networks, which can represent many different explanations to a given problem corresponding to different settings of parameters. While approximate inference procedures in Bayesian deep learning are improving in scalability and generalization performance, there has been no way of knowing, until now, whether these methods are working as intended, to provide ever more faithful representations of the Bayesian predictive distribution. In this competition we provide the first opportunity to measure the fidelity of approximate inference procedures in deep learning through comparison to Hamiltonian Monte Carlo (HMC). HMC is a highly efficient and well-studied Markov Chain Monte Carlo (MCMC) method that is guaranteed to asymptotically produce samples from the true posterior, but is prohibitively expensive in modern deep learning. To address this computational challenge, we have parallelized the computation over hundreds of tensor processing unit (TPU) devices.

2020

Single Shot Structured Pruning Before Training

Joost van Amersfoort*, Milad Alizadeh*, Sebastian Farquhar*, Nicholas Lane, and Yarin Gal

Jul 2020

Abstract
PDF
BibTeX
```
@misc{vanamersfoortSingle2020,
  title = {Single {{Shot Structured Pruning Before Training}}},
  author = {{van Amersfoort}, Joost and Alizadeh, Milad and Farquhar, Sebastian and Lane, Nicholas and Gal, Yarin},
  year = {2020},
  month = jul,
  number = {arXiv:2007.00389},
  eprint = {2007.00389},
  eprinttype = {arxiv},
  primaryclass = {cs, stat},
  publisher = {{arXiv}},
  archiveprefix = {arXiv}
}
```
We introduce a method to speed up training by 2x and inference by 3x in deep neural networks using structured pruning applied before training. Unlike previous works on pruning before training which prune individual weights, our work develops a methodology to remove entire channels and hidden units with the explicit aim of speeding up training and inference. We introduce a compute-aware scoring mechanism which enables pruning in units of sensitivity per FLOP removed, allowing even greater speed ups. Our method is fast, easy to implement, and needs just one forward/backward pass on a single batch of data to complete pruning before training begins.
Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations

Sebastian Farquhar, Lewis Smith, and Yarin Gal

Advances In Neural Information Processing Systems, Jul 2020

Abstract
PDF Video Blog Poster
BibTeX
```
@article{farquharLiberty2020,
  title = {Liberty or {{Depth}}: {{Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations}}},
  author = {Farquhar, Sebastian and Smith, Lewis and Gal, Yarin},
  year = {2020},
  journal = {Advances In Neural Information Processing Systems},
  langid = {english}
```
We challenge the longstanding assumption that the mean-field approximation for variational inference in Bayesian neural networks is severely restrictive, and show this is not the case in deep networks. We prove several results indicating that deep mean-field variational weight posteriors can induce similar distributions in functionspace to those induced by shallower networks with complex weight posteriors. We validate our theoretical contributions empirically, both through examination of the weight posterior using Hamiltonian Monte Carlo in small models and by comparing diagonal- to structured-covariance in large settings. Since complex variational posteriors are often expensive and cumbersome to implement, our results suggest that using mean-field variational inference in a deeper model is both a practical and theoretically justified alternative to structured approximations.
Radial Bayesian Neural Networks: Robust Variational Inference In Big Models

Sebastian Farquhar, Michael Osborne, and Yarin Gal

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Jul 2020

Abstract
PDF Video Blog Code
BibTeX
```
@article{farquharRadial2020,
  title = {Radial {{Bayesian Neural Networks}}: {{Robust Variational Inference In Big Models}}},
  shorttitle = {Radial {{Bayesian Neural Networks}}},
  author = {Farquhar, Sebastian and Osborne, Michael and Gal, Yarin},
  year = {2020},
  journal = {Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics}
```
We propose Radial Bayesian Neural Networks: a variational distribution for mean field variational inference (MFVI) in Bayesian neural networks that is simple to implement, scalable to large models, and robust to hyperparameter selection. We hypothesize that standard MFVI fails in large models because of a property of the high-dimensional Gaussians used as posteriors. As variances grow, samples come almost entirely from a ‘soap-bubble’ far from the mean. We show that the ad-hoc tweaks used previously in the literature to get MFVI to work served to stop such variances growing. Designing a new posterior distribution, we avoid this pathology in a theoretically principled way. Our distribution improves accuracy and uncertainty over standard MFVI, while scaling to large data where most other VI and MCMC methods struggle. We benchmark Radial BNNs in a real-world task of diabetic retinopathy diagnosis from fundus images, a task with ~100x larger input dimensionality and model size compared to previous demonstrations of MFVI.

2019

Try Depth Instead of Weight Correlations: Mean-field Is Not a Restrictive Assumpiton for Variational Inference in Deep Networks

Sebastian Farquhar, and Yarin Gal

Bayesian Deep Learning Workshop at NeurIPS, Jul 2019

PDF Video

BibTeX

@article{farquharTry2019,
  title = {Try {{Depth Instead}} of {{Weight Correlations}}: {{Mean-field}} Is {{Not}} a {{Restrictive Assumpiton}} for {{Variational Inference}} in {{Deep Networks}}},
  author = {Farquhar, Sebastian and Gal, Yarin},
  year = {2019},
  journal = {Bayesian Deep Learning Workshop at NeurIPS}

Benchmarking Bayesian Deep Learning with Diabetic Retinopathy Diagnosis

Angelos Filos, Sebastian Farquhar, Aidan N Gomez, Tim G J Rudner, Zachary Kenton, Lewis Smith, Milad Alizadeh, Arnoud de Kroon, and Yarin Gal

Bayesian Deep Learning Workshop at NeurIPS, Jul 2019

Abstract
PDF
BibTeX
```
@article{filosBenchmarking2019,
  title = {Benchmarking {{Bayesian Deep Learning}} with {{Diabetic Retinopathy Diagnosis}}},
  author = {Filos, Angelos and Farquhar, Sebastian and Gomez, Aidan N and Rudner, Tim G J and Kenton, Zachary and Smith, Lewis and Alizadeh, Milad and {de Kroon}, Arnoud and Gal, Yarin},
  year = {2019},
  journal = {Bayesian Deep Learning Workshop at NeurIPS},
  langid = {english}
}
```
We propose a new Bayesian deep learning (BDL) benchmark, inspired by a realworld medical imaging application on diabetic retinopathy diagnosis. In contrast to popular toy regression experiments on the UCI datasets, our benchmark can be used to assess both the scalability and the effectiveness of different techniques for uncertainty estimation, going beyond RMSE and NLL. A binary classification task on visual inputs (512 \texttimes 512 RGB images of retinas) is considered, where model uncertainty is used for medical pre-screening—i.e. to refer patients to an expert when model diagnosis is uncertain. We provide a comprehensive comparison of well-tuned BDL techniques on the benchmark, including Monte Carlo dropout, mean-field variational inference, an ensemble of deep models, an ensemble of dropout models, as well as a deterministic (deep) model. Baselines are ranked according to metrics derived from expert-domain to reflect real-world use of model uncertainty in automated diagnosis. We show that some current techniques which solve benchmarks such as UCI ‘overfit’ their uncertainty to UCI—when evaluated on our benchmark these underperform in comparison to simpler baselines—while other techniques that solve UCI do not scale or fail on the new benchmark. The code for the benchmark, its baselines, and a simple API for evaluating new models are made available at https://github.com/oatml/bdl-benchmarks.

2018

A Unifying Bayesian View of Continual Learning

Sebastian Farquhar, and Yarin Gal

Bayesian Deep Learning Workshop at NeurIPS, Dec 2018

Abstract
PDF
BibTeX
```
@article{farquharUnifying2018,
  title = {A {{Unifying Bayesian View}} of {{Continual Learning}}},
  author = {Farquhar, Sebastian and Gal, Yarin},
  year = {2018},
  month = dec,
  journal = {Bayesian Deep Learning Workshop at NeurIPS},
  langid = {english}
}
```
Some machine learning applications require continual learning - where data comes in a sequence of datasets, each is used for training and then permanently discarded. From a Bayesian perspective, continual learning seems straightforward: Given the model posterior one would simply use this as the prior for the next task. However, exact posterior evaluation is intractable with many models, especially with Bayesian neural networks (BNNs). Instead, posterior approximations are often sought. Unfortunately, when posterior approximations are used, prior-focused approaches do not succeed in evaluations designed to capture properties of realistic continual learning use cases. As an alternative to prior-focused methods, we introduce a new approximate Bayesian derivation of the continual learning loss. Our loss does not rely on the posterior from earlier tasks, and instead adapts the model itself by changing the likelihood term. We call these approaches likelihood-focused. We then combine prior- and likelihood-focused methods into one objective, tying the two views together under a single unifying framework of approximate Bayesian continual learning.
Towards Robust Evaluations of Continual Learning

Sebastian Farquhar, and Yarin Gal

Lifelong Learning: A Reinforcement Learning Approach Workshop ICML, May 2018

Abstract
PDF
BibTeX
```
@article{farquharRobust2018,
  title = {Towards {{Robust Evaluations}} of {{Continual Learning}}},
  author = {Farquhar, Sebastian and Gal, Yarin},
  year = {2018},
  month = may,
  journal = {Lifelong Learning: A Reinforcement Learning Approach Workshop ICML},
  eprint = {1805.09733},
  eprinttype = {arxiv},
  archiveprefix = {arXiv}
}
```
The experiments used in current continual learning research do not faithfully assess fundamental challenges of learning continually. We examine standard evaluations and show why these evaluations make some types of continual learning approaches look better than they are. In particular, current evaluations are biased towards continual learning approaches that treat previous models as a prior (e.g., EWC, VCL). We introduce desiderata for continual learning evaluations and explain why their absence creates misleading comparisons. Our analysis calls for a reprioritization of research effort by the community.
Differentially Private Continual Learning

Sebastian Farquhar, and Yarin Gal

Privacy in Machine Learning and AI workshop at ICML, Feb 2018

Abstract
PDF
BibTeX
```
@article{farquharDifferentially2018,
  title = {Differentially {{Private Continual Learning}}},
  author = {Farquhar, Sebastian and Gal, Yarin},
  year = {2018},
  month = feb,
  journal = {Privacy in Machine Learning and AI workshop at ICML},
  langid = {english}
}
```
Catastrophic forgetting can be a significant problem for institutions that must delete historic data for privacy reasons. For example, hospitals might not be able to retain patient data permanently. But neural networks trained on recent data alone will tend to forget lessons learned on old data. We present a differentially private continual learning framework based on variational inference. We estimate the likelihood of past data given the current model using differentially private generative models of old datasets.