Red teaming large language models (LLMs) for resilience to scientific disinformation

23 May 2024

This note provides a summary of a red teaming event co-hosted by The Royal Society and Humane Intelligence in the run-up to the 2023 Global AI Safety Summit (Bletchley, UK).

The event took place on 25 October 2023 and was part of the Science x AI Safety series of events hosted at the Royal Society, which explored the risks associated with the use of AI in scientific activities. It brought together 40 health and climate postgraduate students with the objective to scrutinise and bring attention to potential vulnerabilities in large language models (LLMs). 

Building on the report The online information environment: Understanding how the internet shapes people’s engagement with scientific information, published in January 2022, the activity aimed to provide insights on AI-generated scientific disinformation, and the efficacy of LLM guardrails to prevent its production and dissemination. An additional objective was to understand the opportunities and limitations of involving scientists in red teaming efforts. 

The note summarises preliminary findings from the activity and concludes with areas for future examination, research, and improvement of red teaming event design.