My publications in machine learning and alignment are listed here. You can find a full overview of my research publications at my Google Scholar page including past work in policy and technology security.
Equal contribution is indicated by *.
-
Evaluating Frontier Models for Dangerous Capabilities
Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum,
Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin,
Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah, Allan Dafoe, and Toby Shevlane
arXiv, 2024
@article{phuongEvaluating2024,
title = {Evaluating Frontier Models for Dangerous Capabilities},
author = {Phuong, Mary and Aitchison, Matthew and Catt, Elliot and Cogan, Sarah and Kaskasoli, Alexandre and Krakovna, Victoria and Lindner, David and Rahtz, Matthew and Assael, Yannis and Hodkinson, Sarah and Howard, Heidi and Lieberum, Tom and Kumar, Ramana and Raad, Maria Abi and Webson, Albert and Ho, Lewis and Lin, Sharon and Farquhar, Sebastian and Hutter, Marcus and Deletang, Gregoire and Ruoss, Anian and El-Sayed, Seliem and Brown, Sasha and Dragan, Anca and Shah, Rohin and Dafoe, Allan and Shevlane, Toby},
year = {2024},
eprint = {2403.13793},
archiveprefix = {arXiv},
primaryclass = {cs.LG},
journal = {arXiv}
}
To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous capabilities in the models we evaluated, but we flag early warning signs. Our goal is to help advance a rigorous science of dangerous capability evaluation, in preparation for future models.
-
Holistic Safety and Responsibility Evaluations of Advanced AI Models
Laura Weidinger, Joslyn Barnhart, Jenny Brennan, Christina Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona Comanescu, Oscar Chang, Mikel Rodriguez, Jennifer Beroshi, Dawn Bloxwich, Lev Proleev, Jilin Chen, Sebastian Farquhar, Lewis Ho, Iason Gabriel, Allan Dafoe, and William Isaac
arXiv, 2024
@article{weidingerHolistic2024,
title = {Holistic Safety and Responsibility Evaluations of Advanced AI Models},
author = {Weidinger, Laura and Barnhart, Joslyn and Brennan, Jenny and Butterfield, Christina and Young, Susie and Hawkins, Will and Hendricks, Lisa Anne and Comanescu, Ramona and Chang, Oscar and Rodriguez, Mikel and Beroshi, Jennifer and Bloxwich, Dawn and Proleev, Lev and Chen, Jilin and Farquhar, Sebastian and Ho, Lewis and Gabriel, Iason and Dafoe, Allan and Isaac, William},
year = {2024},
eprint = {2404.14068},
archiveprefix = {arXiv},
primaryclass = {cs.AI},
journal = {arXiv}
}
Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice. In the development of Google DeepMind’s advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned include: First, theoretical underpinnings and frameworks are invaluable to organise the breadth of risk domains, modalities, forms, metrics, and goals. Second, theory and practice of safety evaluation development each benefit from collaboration to clarify goals, methods and challenges, and facilitate the transfer of insights between different stakeholders and disciplines. Third, similar key methods, lessons, and institutions apply across the range of concerns in responsibility and safety - including established and emerging harms. For this reason it is important that a wide range of actors working on safety evaluation and safety research communities work together to develop, refine and implement novel evaluation approaches and best practices, rather than operating in silos. The report concludes with outlining the clear need to rapidly advance the science of evaluations, to integrate new evaluations into the development and governance of AI, to establish scientifically-grounded norms and standards, and to promote a robust evaluation ecosystem.
-
Detecting Hallucinations in Large Language Models Using Semantic Entropy
Nature, 2024
@article{farquharDetecting2024,
title = {Detecting Hallucinations in Large Language Models Using Semantic Entropy},
author = {Farquhar, Sebastian and Kossen, Jannik and Kuhn, Lorenz and Gal, Yarin},
year = {2024},
journal = {Nature},
volume = {630},
pages = {625-630},
issue = {8017}
Large language model (LLM) systems, such as ChatGPT or Gemini, can show impressive reasoning and question-answering capabilities but often ’hallucinate’ false outputs and unsubstantiated answers. Answering unreliably or without the necessary information prevents adoption in diverse fields, with problems including fabrication of legal precedents or untrue facts in news articles and even posing a risk to human life in medical domains such as radiology. Encouraging truthfulness through supervision or reinforcement has been only partially successful. Researchers need a general method for detecting hallucinations in LLMs that works even with new and unseen questions to which humans might not know the answer. Here we develop new methods grounded in statistics, proposing entropy-based uncertainty estimators for LLMs to detect a subset of hallucinations—confabulations—which are arbitrary and incorrect generations. Our method addresses the fact that one idea can be expressed in many ways by computing uncertainty at the level of meaning rather than specific sequences of words. Our method works across datasets and tasks without a priori knowledge of the task, requires no task-specific data and robustly generalizes to new tasks not seen before. By detecting when a prompt is likely to produce a confabulation, our method helps users understand when they must take extra care with LLMs and opens up new possibilities for using LLMs that are otherwise prevented by their unreliability.
-
Discovering Agents
Artificial Intelligence, Sep 2023
@article{kentonDiscovering2023,
title = {Discovering {{Agents}}},
author = {Kenton, Zachary and Kumar, Ramana and Farquhar, Sebastian and Richens, John and MacDermott, Matt and Everitt, Tom},
year = {2023},
month = sep,
volume = {322},
journal = {Artificial Intelligence}
}
Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial – often the causal model is just assumed by the modeler without much justification – and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents – roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive the first causal discovery algorithm for discovering agents from empirical data, and give algorithms for translating between causal models and game-theoretic influence diagrams. We demonstrate our approach by resolving some previous confusions caused by incorrect causal modelling of agents.
-
Tracr: Compiled Transformers as a Laboratory for Interpretability
David Lindner, János Kramár, Sebastian Farquhar, Matthew Rahtz, Thomas McGrath, and Vladimir Mikulik
Neural Information Processing Systems, Sep 2023
@article{lindnerTracr2023,
title = {Tracr: Compiled Transformers as a Laboratory for Interpretability},
author = {Lindner, David and Kramár, János and Farquhar, Sebastian and Rahtz, Matthew and McGrath, Thomas and Mikulik, Vladimir},
year = {2023},
journal = {Neural Information Processing Systems}
We show how to "compile" human-readable programs into standard decoder-only transformer models. Our compiler, Tracr, generates models with known structure. This structure can be used to design experiments. For example, we use it to study "superposition" in transformers that execute multi-step algorithms. Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evaluating interpretability methods. Commonly, because the "programs" learned by transformers are unknown it is unclear whether an interpretation succeeded. We demonstrate our approach by implementing and examining programs including computing token frequencies, sorting, and parenthesis checking.
-
Challenges With Unsupervised LLM Knowledge Discovery
Sebastian Farquhar*, Vikrant Varma*,
Zachary Kenton*, Johannes Gasteiger, Vladimir Mikulik, and Rohin Shah
arXiv, Sep 2023
@article{farquharChallenges2023,
title = {Challenges With Unsupervised LLM Knowledge Discovery},
author = {Farquhar, Sebastian and Varma, Vikrant and Kenton, Zachary and Gasteiger, Johannes and Mikulik, Vladimir and Shah, Rohin},
year = {2023},
eprint = {2312.10029},
archiveprefix = {arXiv},
primaryclass = {cs.LG},
journal = {arXiv}
We show that existing unsupervised methods on large language model (LLM) activations do not discover knowledge – instead they seem to discover whatever feature of the activations is most prominent. The idea behind unsupervised knowledge elicitation is that knowledge satisfies a consistency structure, which can be used to discover knowledge. We first prove theoretically that arbitrary features (not just knowledge) satisfy the consistency structure of a particular leading unsupervised knowledge-elicitation method, contrast-consistent search (Burns et al. - arXiv:2212.03827). We then present a series of experiments showing settings in which unsupervised methods result in classifiers that do not predict knowledge, but instead predict a different prominent feature. We conclude that existing unsupervised methods for discovering latent knowledge are insufficient, and we contribute sanity checks to apply to evaluating future knowledge elicitation methods. Conceptually, we hypothesise that the identification issues explored here, e.g. distinguishing a model’s knowledge from that of a simulated character’s, will persist for future unsupervised methods.
-
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
ICLR (notable-top-25%), Sep 2023
@article{kuhnSemantic2023,
title = {Semantic {{Uncertainty}}: {{Linguistic Invariances}} for {{Uncertainty Estimation}} in {{Natural Language Generation}}},
author = {Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian},
year = {2023},
journal = {ICLR (notable-top-25\%)}
}
We introduce a method to measure uncertainty in large language models. For tasks like question answering, it is essential to know when we can trust the natural language outputs of foundation models. We show that measuring uncertainty in natural language is challenging because of "semantic equivalence"—different sentences can mean the same thing. To overcome these challenges we introduce semantic entropy—an entropy which incorporates linguistic invariances created by shared meanings. Our method is unsupervised, uses only a single model, and requires no modifications to off-the-shelf language models. In comprehensive ablation studies we show that the semantic entropy is more predictive of model accuracy on question answering data sets than comparable baselines.
-
Do Bayesian Neural Networks Need To Be Fully Stochastic?
AI Stats (Oral), Sep 2023
@article{sharmaBayesian2022,
title = {Do {{Bayesian Neural Networks Need To Be Fully Stochastic}}?},
author = {Sharma, Mrinank and Farquhar, Sebastian and Nalisnick, Eric and Rainforth, Tom},
year = {2023},
journal = {AI Stats (Oral)}
}
We investigate the efficacy of treating all the parameters in a Bayesian neural network stochastically and find compelling theoretical and empirical evidence that this standard construction may be unnecessary. To this end, we prove that expressive predictive distributions require only small amounts of stochasticity. In particular, partially stochastic networks with only n stochastic biases are universal probabilistic predictors for n-dimensional predictive problems. In empirical investigations, we find no systematic benefit of full stochasticity across four different inference modalities and eight datasets; partially stochastic networks can match and sometimes even outperform fully stochastic networks, despite their reduced memory costs.
-
Model Evaluation for Extreme Risks
Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, and Allan Dafoe
arXiv, Sep 2023
@article{shevlaneModel2023,
title = {Model Evaluation for Extreme Risks},
author = {Shevlane, Toby and Farquhar, Sebastian and Garfinkel, Ben and Phuong, Mary and Whittlestone, Jess and Leung, Jade and Kokotajlo, Daniel and Marchal, Nahema and Anderljung, Markus and Kolt, Noam and Ho, Lewis and Siddarth, Divya and Avin, Shahar and Hawkins, Will and Kim, Been and Gabriel, Iason and Bolina, Vijay and Clark, Jack and Bengio, Yoshua and Christiano, Paul and Dafoe, Allan},
year = {2023},
eprint = {2305.15324},
archiveprefix = {arXiv},
primaryclass = {cs.AI},
journal = {arXiv}
}
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through "dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through "alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
-
Prediction-Oriented Bayesian Active Learning
AI Stats, Sep 2023
@article{bickfordsmithPrediction2023,
title = {Prediction-{{Oriented Bayesian Active Learning}}},
author = {Bickford Smith, Freddie and Kirsch, Andreas and Farquhar, Sebastian and Gal, Yarin and Foster, Adam and Rainforth, Tom},
year = {2023},
journal = {AI Stats}
}
We show that the BALD score can be a sub-optimal objective for active learning when our ultimate goal is to maximise predictive performance. BALD prioritises gathering information about model parameters, but it fails to tailor this to the predictive task the model is being trained for. In particular, it does not account for the downstream input distribution and can thus prioritise inconsequential information. To address this, we propose the expected predictive information gain (EPIG), an information-theoretic acquisition function that instead targets data that will be most useful for our downstream predictive task. This is achieved by formulating the information gain in the space of predictions themselves and incorporating knowledge of the downstream input distribution. Empirically we find that EPIG achieves higher label efficiency than BALD across a range of datasets and models.
-
CLAM: Selective Clarification for Ambiguous Questions with Large Language Models
arXiv, Dec 2022
@article{kuhnCLAM2022,
title = {{{CLAM}}: {{Selective Clarification}} for {{Ambiguous Questions}} with {{Large Language Models}}},
author = {Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian},
year = {2022},
month = dec,
journal = {arXiv}
}
State-of-the-art language models are often accurate on many question-answering benchmarks with well-defined questions. Yet, in real settings questions are often unanswerable without asking the user for clarifying information. We show that current SotA models often do not ask the user for clarification when presented with imprecise questions and instead provide incorrect answers or "hallucinate". To address this, we introduce CLAM, a framework that first uses the model to detect ambiguous questions, and if an ambiguous question is detected, prompts the model to ask the user for clarification. Furthermore, we show how to construct a scalable and cost-effective automatic evaluation protocol using an oracle language model with privileged information to provide clarifying information. We show that our method achieves a 20.15 percentage point accuracy improvement over SotA on a novel ambiguous question-answering answering data set derived from TriviaQA.
-
What ‘Out-of-distribution’ Is and Is Not
ML Safety workshop at NeurIPS, Dec 2022
@article{farquharWhat2022,
title = {What `{{Out-of-distribution}}' Is and Is Not},
author = {Farquhar, Sebastian and Gal, Yarin},
year = {2022},
month = dec,
journal = {ML Safety workshop at NeurIPS}
}
Researchers want to generalize robustly to ‘out-of-distribution’ (OOD) data. Unfortunately, this term is used ambiguously causing confusion and creating risk—people might believe they have made progress on OOD data and not realize this progress only holds in limited cases. We critique a standard definition of OOD—difference-in-distribution—and then disambiguate four meaningful types of OOD data: transformed-distributions, related-distributions, complement-distributions, and synthetic-distributions. We describe how existing OOD datasets, evaluations, and techniques fit into this framework. We provide a template for researchers to carefully present the scope of distribution shift considered in their work.
-
Prioritized Training on Points That Are Learnable, Worth Learning, and Not Yet Learned
International Conference on Machine Learning, Jun 2022
@article{mindermannPrioritized2022,
title = {Prioritized {{Training}} on {{Points}} That Are {{Learnable}}, {{Worth Learning}}, and {{Not Yet Learned}}},
author = {Mindermann, Soren and Razzak, Muhammad and Xu, Winnie and Kirsch, Andreas and Sharma, Mrinank and Morisot, Adrien and Gomez, Aidan and Farquhar, Sebastian and Brauner, Jan and Gal, Yarin},
year = {2022},
month = jun,
journal = {International Conference on Machine Learning},
pages = {15630--15649}
Training on web-scale data can take months. But most computation and time is wasted on redundant and noisy points that are already learnt or not learnable. To accelerate training, we introduce Reducible Holdout Loss Selection (RHO-LOSS), a simple but principled technique which selects approximately those points for training that most reduce the model’s generalization loss. As a result, RHO-LOSS mitigates the weaknesses of existing data selection methods: techniques from the optimization literature typically select ’hard’ (e.g. high loss) points, but such points are often noisy (not learnable) or less task-relevant. Conversely, curriculum learning prioritizes ’easy’ points, but such points need not be trained on once learned. In contrast, RHO-LOSS selects points that are learnable, worth learning, and not yet learnt. RHO-LOSS trains in far fewer steps than prior art, improves accuracy, and speeds up training on a wide range of datasets, hyperparameters, and architectures (MLPs, CNNs, and BERT). On the large web-scraped image dataset Clothing-1M, RHO-LOSS trains in 18x fewer steps and reaches 2% higher final accuracy than uniform data shuffling.
-
Understanding Approximation for Bayesian Inference in Neural Networks
Sebastian Farquhar
DPhil Thesis at University of Oxford, Apr 2022
@phdthesis{farquharUnderstanding2022,
type = {{{DPhil Thesis}}},
title = {Understanding {{Approximation}} for {{Bayesian Inference}} in {{Neural Networks}}},
author = {Farquhar, Sebastian},
year = {2022},
month = apr,
address = {{Oxford}},
school = {University of Oxford}
}
Bayesian inference has theoretical attractions as a principled framework for reasoning about beliefs. However, the motivations of Bayesian inference which claim it to be the only ‘rational’ kind of reasoning do not apply in practice. They create a binary split in which all approximate inference is equally ‘irrational’. Instead, we should ask ourselves how to define a spectrum of more and less-rational reasoning that explains why we might prefer one Bayesian approximation to another. I explore approximate inference in Bayesian neural networks and consider the unintended interactions between the probabilistic model, approximating distribution, optimization algorithm, and dataset. The complexity of these interactions highlights the difficulty of any strategy for evaluating Bayesian approximations which focuses entirely on the method, outside the context of specific datasets and decision-problems. For given applications, the expected utility of the approximate posterior can measure inference quality. To assess a model’s ability to incorporate different parts of the Bayesian framework we can identify desirable characteristic behaviours of Bayesian reasoning and pick decision-problems that make heavy use of those behaviours. Here, we use continual learning (testing the ability to update sequentially) and active learning (testing the ability to represent credence). But existing continual and active learning set-ups pose challenges that have nothing to do with posterior quality which can distort their ability to evaluate Bayesian approximations. These unrelated challenges can be removed or reduced, allowing better evaluation of approximate inference methods.
-
Prospect Pruning: Finding Trainable Weights at Initialization Using Meta-gradients
International Conference on Learning Representations, Mar 2022
@article{alizadehProspect2022,
title = {Prospect {{Pruning}}: {{Finding Trainable Weights}} at {{Initialization Using Meta-gradients}}},
author = {Alizadeh, Milad and Tailor, Shyam and Zintgraf, Luisa and {van Amersfoort}, Joost and Farquhar, Sebastian and Lane, Nicholas and Gal, Yarin},
year = {2022},
month = mar,
journal = {International Conference on Learning Representations}
}
Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network while consuming fewer computational resources for training and inference. However, current methods are insufficient to enable this optimization and lead to a large degradation in model performance. In this paper, we identify a fundamental limitation in the formulation of current methods, namely that their saliency criteria look at a single step at the start of training without taking into account the trainability of the network. While pruning iteratively and gradually has been shown to improve pruning performance, explicit consideration of the training stage that will immediately follow pruning has so far been absent from the computation of the saliency criterion. To overcome the short-sightedness of existing methods, we propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune. ProsPr combines an estimate of the higherorder effects of pruning on the loss and the ptimization trajectory to identify the trainable sub-network. Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.
-
Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation
Neural Information Processing Systems, Feb 2022
@article{kossenActive2022,
title = {Active {{Surrogate Estimators}}: {{An Active Learning Approach}} to {{Label-Efficient Model Evaluation}}},
shorttitle = {Active {{Surrogate Estimators}}},
author = {Kossen, Jannik and Farquhar, Sebastian and Gal, Yarin and Rainforth, Tom},
year = {2022},
month = feb,
journal = {Neural Information Processing Systems},
eprint = {2202.06881},
eprinttype = {arxiv},
primaryclass = {cs, stat},
doi = {10.48550/arXiv.2202.06881},
archiveprefix = {arXiv}
We propose Active Surrogate Estimators (ASEs), a new method for label-efficient model evaluation. Evaluating model performance is a challenging and important problem when labels are expensive. ASEs address this active testing problem using a surrogate-based estimation approach, whereas previous methods have focused on Monte Carlo estimates. ASEs actively learn the underlying surrogate, and we propose a novel acquisition strategy, XWING, that tailors this learning to the final estimation task. We find that ASEs offer greater label-efficiency than the current state-of-the-art when applied to challenging model evaluation problems for deep neural networks. We further theoretically analyze ASEs’ errors.
-
Path-Specific Objectives for Safer Agent Incentives
AAAI Conference on Artificial Intelligence, Feb 2022
@article{farquharPathspecific2022,
title = {Path-Specific {{Objectives}} for {{Safer Agent Incentives}}},
author = {Farquhar, Sebastian and Carey, Ryan and Everitt, Tom},
year = {2022},
month = feb,
journal = {AAAI Conference on Artificial Intelligence},
volume = {36}
}
We present a general framework for training safe agents whose naive incentives are unsafe. As an example, manipulative or deceptive behaviour can improve rewards but should be avoided. Most approaches fail here: agents maximize expected return by any means necessary. We formally describe settings with ‘delicate’ parts of the state which should not be used as a means to an end. We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis. The resulting agents have no incentive to control the delicate state. We further show how our framework unifies and generalizes existing proposals.
-
Stochastic Batch Acquisition for Deep Active Learning
SubsetML ICML Workshop, Jan 2022
@article{kirschStochastic2022,
title = {Stochastic {{Batch Acquisition}} for {{Deep Active Learning}}},
author = {Kirsch, Andreas and Farquhar, Sebastian and Atighehchian, Parmida and Jesson, Andrew and {Branchaud-Charron}, Frederic and Gal, Yarin},
year = {2022},
month = jan,
journal = {SubsetML ICML Workshop},
eprint = {2106.12059},
eprinttype = {arxiv},
archiveprefix = {arXiv}
}
In active learning, new labels are commonly acquired in batches. However, common acquisition functions are only meant for one-sample acquisition rounds at a time, and when their scores are used naively for batch acquisition, they result in batches lacking diversity, which deteriorates performance. On the other hand, state-of-the-art batch acquisition functions are costly to compute. In this paper, we present a novel class of stochastic acquisition functions that extend one-sample acquisition functions to the batch setting by observing how one-sample acquisition scores change as additional samples are acquired and modelling this difference for additional batch samples. We simply acquire new samples by sampling from the pool set using a Gibbs distribution based on the acquisition scores. Our acquisition functions are both vastly cheaper to compute and out-perform other batch acquisition functions.
-
Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning
Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry,
Sebastian Farquhar,
Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren,
Tim G. J. Rudner, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jasper Snoek,
Yarin Gal, and Dustin Tran
arXiv, Jun 2021
@article{nadoUncertainty2021,
title = {Uncertainty {{Baselines}}: {{Benchmarks}} for {{Uncertainty}} \& {{Robustness}} in {{Deep Learning}}},
shorttitle = {Uncertainty {{Baselines}}},
author = {Nado, Zachary and Band, Neil and Collier, Mark and Djolonga, Josip and Dusenberry, Michael W. and Farquhar, Sebastian and Filos, Angelos and Havasi, Marton and Jenatton, Rodolphe and Jerfel, Ghassen and Liu, Jeremiah and Mariet, Zelda and Nixon, Jeremy and Padhy, Shreyas and Ren, Jie and Rudner, Tim G. J. and Wen, Yeming and Wenzel, Florian and Murphy, Kevin and Sculley, D. and Lakshminarayanan, Balaji and Snoek, Jasper and Gal, Yarin and Tran, Dustin},
year = {2021},
month = jun,
journal = {arXiv},
eprint = {2106.04015},
eprinttype = {arxiv},
archiveprefix = {arXiv}
}
High-quality estimates of uncertainty and robustness are crucial for numerous real-world applications, especially for deep learning which underlies many deployed ML systems. The ability to compare techniques for improving these estimates is therefore very important for research and practice alike. Yet, competitive comparisons of methods are often lacking due to a range of reasons, including: compute availability for extensive tuning, incorporation of sufficiently many baselines, and concrete documentation for reproducibility. In this paper we introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks. As of this writing, the collection spans 19 methods across 9 tasks, each with at least 5 metrics. Each baseline is a self-contained experiment pipeline with easily reusable and extendable components. Our goal is to provide immediate starting points for experimentation with new methods or applications. Additionally we provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results. Code available at https://github.com/google/uncertainty-baselines.
-
On Statistical Bias In Active Learning: How and When to Fix It
International Conference on Learning Representations (Spotlight), Jun 2021
@article{farquharStatistical2021,
title = {On {{Statistical Bias In Active Learning}}: {{How}} and {{When}} to {{Fix It}}},
shorttitle = {On {{Statistical Bias In Active Learning}}},
author = {Farquhar, Sebastian and Gal, Yarin and Rainforth, Tom},
year = {2021},
journal = {International Conference on Learning Representations (Spotlight)},
langid = {english}
Active learning is a powerful tool when labelling data is expensive, but it introduces a bias because the training data no longer follows the population distribution. We formalize this bias and...
-
Active Testing: Sample-Efficient Model Evaluation
International Conference on Machine Learning, Jun 2021
@article{kossenActive2021,
title = {Active {{Testing}}: {{Sample-Efficient Model Evaluation}}},
author = {Kossen, Jannik and Farquhar, Sebastian and Gal, Yarin and Rainforth, Tom},
year = {2021},
journal = {International Conference on Machine Learning}
We introduce a new framework for sample-efficient model evaluation that we call active testing. While approaches like active learning reduce the number of labels needed for model training, existing literature largely ignores the cost of labeling test data, typically unrealistically assuming large test sets for model evaluation. This creates a disconnect to real applications, where test labels are important and just as expensive, e.g. for optimizing hyperparameters. Active testing addresses this by carefully selecting the test points to label, ensuring model evaluation is sample-efficient. To this end, we derive theoretically-grounded and intuitive acquisition strategies that are specifically tailored to the goals of active testing, noting these are distinct to those of active learning. As actively selecting labels introduces a bias; we further show how to remove this bias while reducing the variance of the estimator at the same time. Active testing is easy to implement and can be applied to any supervised machine learning method. We demonstrate its effectiveness on models including WideResNets and Gaussian processes on datasets including Fashion-MNIST and CIFAR-100.
-
Evaluating Approximate Inference in Bayesian Deep Learning
Andrew Gordon Wilson, Pavel Izmailov, Matthew D Hoffman,
Yarin Gal, Yingzhen Li, Melanie F Pradier, Sharad Vikram, Andrew Foong, Sanae Lotfi, and
Sebastian Farquhar
NeurIPS Competition, Jun 2021
@article{wilsonEvaluating2021,
title = {Evaluating {{Approximate Inference}} in {{Bayesian Deep Learning}}},
author = {Wilson, Andrew Gordon and Izmailov, Pavel and Hoffman, Matthew D and Gal, Yarin and Li, Yingzhen and Pradier, Melanie F and Vikram, Sharad and Foong, Andrew and Lotfi, Sanae and Farquhar, Sebastian},
year = {2021},
journal = {NeurIPS Competition},
langid = {english}
Uncertainty representation is crucial to the safe and reliable deployment of deep learning. Bayesian methods provide a natural mechanism to represent epistemic uncertainty, leading to improved generalization and calibrated predictive distributions. Bayesian methods are particularly promising for deep neural networks, which can represent many different explanations to a given problem corresponding to different settings of parameters. While approximate inference procedures in Bayesian deep learning are improving in scalability and generalization performance, there has been no way of knowing, until now, whether these methods are working as intended, to provide ever more faithful representations of the Bayesian predictive distribution. In this competition we provide the first opportunity to measure the fidelity of approximate inference procedures in deep learning through comparison to Hamiltonian Monte Carlo (HMC). HMC is a highly efficient and well-studied Markov Chain Monte Carlo (MCMC) method that is guaranteed to asymptotically produce samples from the true posterior, but is prohibitively expensive in modern deep learning. To address this computational challenge, we have parallelized the computation over hundreds of tensor processing unit (TPU) devices.
-
Single Shot Structured Pruning Before Training
Jul 2020
@misc{vanamersfoortSingle2020,
title = {Single {{Shot Structured Pruning Before Training}}},
author = {{van Amersfoort}, Joost and Alizadeh, Milad and Farquhar, Sebastian and Lane, Nicholas and Gal, Yarin},
year = {2020},
month = jul,
number = {arXiv:2007.00389},
eprint = {2007.00389},
eprinttype = {arxiv},
primaryclass = {cs, stat},
publisher = {{arXiv}},
archiveprefix = {arXiv}
}
We introduce a method to speed up training by 2x and inference by 3x in deep neural networks using structured pruning applied before training. Unlike previous works on pruning before training which prune individual weights, our work develops a methodology to remove entire channels and hidden units with the explicit aim of speeding up training and inference. We introduce a compute-aware scoring mechanism which enables pruning in units of sensitivity per FLOP removed, allowing even greater speed ups. Our method is fast, easy to implement, and needs just one forward/backward pass on a single batch of data to complete pruning before training begins.
-
Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations
Advances In Neural Information Processing Systems, Jul 2020
@article{farquharLiberty2020,
title = {Liberty or {{Depth}}: {{Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations}}},
author = {Farquhar, Sebastian and Smith, Lewis and Gal, Yarin},
year = {2020},
journal = {Advances In Neural Information Processing Systems},
langid = {english}
We challenge the longstanding assumption that the mean-field approximation for variational inference in Bayesian neural networks is severely restrictive, and show this is not the case in deep networks. We prove several results indicating that deep mean-field variational weight posteriors can induce similar distributions in functionspace to those induced by shallower networks with complex weight posteriors. We validate our theoretical contributions empirically, both through examination of the weight posterior using Hamiltonian Monte Carlo in small models and by comparing diagonal- to structured-covariance in large settings. Since complex variational posteriors are often expensive and cumbersome to implement, our results suggest that using mean-field variational inference in a deeper model is both a practical and theoretically justified alternative to structured approximations.
-
Radial Bayesian Neural Networks: Robust Variational Inference In Big Models
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Jul 2020
@article{farquharRadial2020,
title = {Radial {{Bayesian Neural Networks}}: {{Robust Variational Inference In Big Models}}},
shorttitle = {Radial {{Bayesian Neural Networks}}},
author = {Farquhar, Sebastian and Osborne, Michael and Gal, Yarin},
year = {2020},
journal = {Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics}
We propose Radial Bayesian Neural Networks: a variational distribution for mean field variational inference (MFVI) in Bayesian neural networks that is simple to implement, scalable to large models, and robust to hyperparameter selection. We hypothesize that standard MFVI fails in large models because of a property of the high-dimensional Gaussians used as posteriors. As variances grow, samples come almost entirely from a ‘soap-bubble’ far from the mean. We show that the ad-hoc tweaks used previously in the literature to get MFVI to work served to stop such variances growing. Designing a new posterior distribution, we avoid this pathology in a theoretically principled way. Our distribution improves accuracy and uncertainty over standard MFVI, while scaling to large data where most other VI and MCMC methods struggle. We benchmark Radial BNNs in a real-world task of diabetic retinopathy diagnosis from fundus images, a task with ~100x larger input dimensionality and model size compared to previous demonstrations of MFVI.
-
A Unifying Bayesian View of Continual Learning
Bayesian Deep Learning Workshop at NeurIPS, Dec 2018
@article{farquharUnifying2018,
title = {A {{Unifying Bayesian View}} of {{Continual Learning}}},
author = {Farquhar, Sebastian and Gal, Yarin},
year = {2018},
month = dec,
journal = {Bayesian Deep Learning Workshop at NeurIPS},
langid = {english}
}
Some machine learning applications require continual learning - where data comes in a sequence of datasets, each is used for training and then permanently discarded. From a Bayesian perspective, continual learning seems straightforward: Given the model posterior one would simply use this as the prior for the next task. However, exact posterior evaluation is intractable with many models, especially with Bayesian neural networks (BNNs). Instead, posterior approximations are often sought. Unfortunately, when posterior approximations are used, prior-focused approaches do not succeed in evaluations designed to capture properties of realistic continual learning use cases. As an alternative to prior-focused methods, we introduce a new approximate Bayesian derivation of the continual learning loss. Our loss does not rely on the posterior from earlier tasks, and instead adapts the model itself by changing the likelihood term. We call these approaches likelihood-focused. We then combine prior- and likelihood-focused methods into one objective, tying the two views together under a single unifying framework of approximate Bayesian continual learning.
-
Towards Robust Evaluations of Continual Learning
Lifelong Learning: A Reinforcement Learning Approach Workshop ICML, May 2018
@article{farquharRobust2018,
title = {Towards {{Robust Evaluations}} of {{Continual Learning}}},
author = {Farquhar, Sebastian and Gal, Yarin},
year = {2018},
month = may,
journal = {Lifelong Learning: A Reinforcement Learning Approach Workshop ICML},
eprint = {1805.09733},
eprinttype = {arxiv},
archiveprefix = {arXiv}
}
The experiments used in current continual learning research do not faithfully assess fundamental challenges of learning continually. We examine standard evaluations and show why these evaluations make some types of continual learning approaches look better than they are. In particular, current evaluations are biased towards continual learning approaches that treat previous models as a prior (e.g., EWC, VCL). We introduce desiderata for continual learning evaluations and explain why their absence creates misleading comparisons. Our analysis calls for a reprioritization of research effort by the community.
-
Differentially Private Continual Learning
Privacy in Machine Learning and AI workshop at ICML, Feb 2018
@article{farquharDifferentially2018,
title = {Differentially {{Private Continual Learning}}},
author = {Farquhar, Sebastian and Gal, Yarin},
year = {2018},
month = feb,
journal = {Privacy in Machine Learning and AI workshop at ICML},
langid = {english}
}
Catastrophic forgetting can be a significant problem for institutions that must delete historic data for privacy reasons. For example, hospitals might not be able to retain patient data permanently. But neural networks trained on recent data alone will tend to forget lessons learned on old data. We present a differentially private continual learning framework based on variational inference. We estimate the likelihood of past data given the current model using differentially private generative models of old datasets.