Skip to content
What's on

Genomic population structures of microbial pathogens

Discussion meeting

Location

Zoom webinar

Overview

Scientific discussion meeting organised by Professor Mark Achtman FRS, Professor Kathryn Holt and Professor David Aanensen.

GrapeTree of 1610 Ebola genomes from the West African epidemic of 2013–2016. Credit: from Fig. 3 in Zhou et al., Genome research 28, 1395-1404 (2018)

The comparative genomics of microbial pathogens from all domains of life has become a big data problem. Databases already contain >200,000 assembled genomes from single species but adequate tools to reveal population structures are in their infancy. This meeting will bring together world-class bioinformaticians with experts on bacterial and viral genomes to illustrate multiple approaches to solving this challenge.

Meeting papers will be published in a future issue of Philosophical Transactions of the Royal Society B.

Attending the event

This meeting is intended for researchers in relevant fields.

  • Free to attend
  • Advance registration essential
  • Live subtitles will be available
  • Please note the schedule shows UK times for the sessions. 

Enquiries: contact the Scientific Programmes team

Event organisers

Select an organiser for more information

Schedule of talks

28 September

Session 1 13:50-15:30

4 talks Show detail Hide detail

Chairs

Dr Alison Mather, Quadram Institute Bioscience, UK

13:50-14:00 Open

14:00-14:30 EnteroBase: Hierarchical clustering of >600,000 bacterial genomes

Professor Mark Achtman FRS, University of Warwick, UK

Abstract

The number of sets of genomic short read sequences in the public domain has exploded since 2012. EnteroBase (https://enterobase.warwick.ac.uk/) has assembled >600,000 draft genomes from short reads (from SRA or uploaded by users), assigned allelic designations to sequences from the core genome (cgMLST), and clustered the resulting sequence types at multiple levels (hierarchical clustering; HierCC (PMID: 33823553) for the genera Salmonella, Escherichia/Shigella, Streptococcus, Clostridioides, Vibrio and Yersinia (PMID: 31809257, 32726198, 33055096, 33614977). One HierCC level is a complete replacement for classical taxonomy or ANI because it automatically and reliably identifies species/sub-species. In several genera, two other HierCC levels correspond to ST Complexes and Super-Lineages, which are the predominant population structures in these genera. HierCC levels with even higher resolution are proving useful for tracking transmissions and single-source outbreaks of gastrointestinal disease. HierCC topologies are also consistent with trees based on the presence or absence of all genes in the pan-genome. These conclusions will be illustrated with specialised case studies.

Show speakers

14:30-15:00 Analysing bacterial population structure from millions of genomes

Professor Jukka Corander, University of Oslo and Wellcome Sanger Institute, Norway and UK

Abstract

In less than a decade, bacterial population genomics has progressed from the effort of sequencing dozens to thousands of strains in a single study. There are now >250,000 genomes available even for a single bacterial species and the number of genomes is expected to continue to increase rapidly given the advances in sequencing technology and widespread genomic surveillance initiatives. The biological insights enabled by population genomics are particularly important in evolutionary epidemiology, as the genome sequences provide high-resolution data for the estimation of transmission and evolutionary dynamics, including the horizontal transfer of virulence and resistance elements. Professor Corander will discuss statistical and computational techniques that are amenable to rapidly analysing population structure in data consisting of millions of whole genomes. 

Show speakers

15:00-15:30 Discussion

Dr Francesc Coll, London School of Hygiene & Tropical Medicine, UK
Dr Alison Mather, Quadram Institute Bioscience, UK

Show speakers

Tea break 15:30-16:00

Session 2 16:00-18:00

4 talks Show detail Hide detail

Chairs

Dr Nicholas Croucher, Imperial College London, UK

16:00-16:30 Beyond the S. aureus comet: what tree shapes occur in large bacterial genomic data?

Professor Caroline Colijn, Simon Fraser University, Canada

Abstract

When methicillin-resistant Staphylococcus aureus (MRSA) arose and disseminated widely, some phylogenetic trees of MRSA-containing types of staphylococcus aureus had a distinctive 'comet' shape, with a 'comet head' of recently-adapted resistant isolates in the context of a 'comet tail' that was predominantly drug sensitive. Placing an isolate in the context of such a 'comet' helped public health laboratories interpret local data within the broader setting of S aureus evolution. In this work Professor Colijn and her colleagues ask what other tree shapes, analogous to the MRSA comet, are present in bacterial WGS datasets. They extract trees from large bacterial genomic datasets, visualise them as images, and cluster the images. They find nine major groups of tree images, including the 'comet', star-like phylogenies, barbell' phylogenies and other shapes, and comment on the evolutionary and epidemiological stories these shapes might illustrate.

Show speakers

16:30-17:00 Genome-scale metabolic network reconstructions of hundreds of diverse Escherichia coli strains reveal strain-specific adaptations and evolutionary trajectories

Dr Jonathan Monk, University of California San Diego, USA

Abstract

Bottom-up approaches to systems biology rely on constructing a mechanistic basis for the biochemical and genetic processes that underlie cellular functions. Genome-scale network reconstructions of metabolism are built from all known metabolic reactions and metabolic genes in a target organism. A network reconstruction can be converted into a mathematical format and thus lend itself to mathematical analysis. Genome-scale models (GEMs) of enable a systems approach to characterise the pan and core metabolic capabilities of the E coli species. The models have been used to systematically analyze growth capabilities in more than 650 different growth-supporting environments as well as to predict strain-specific auxotrophies. In this work, genome-scale models were constructed for more than 300 representative strains of E coli across all 295 HC1100 levels. The models were used to study E coli metabolic diversity and speciation on a large scale. The results show that unique strain-specific metabolic capabilities correspond to pathotypes and environmental niches. Genome-scale analysis of multiple strains of a species can thus be used to define the metabolic essence of a microbial species and delineate growth differences that shed light on the adaptation process to a particular microenvironment.

Show speakers

17:00-17:30 New methods with high accuracy and scalability for large-scale phylogenetic estimation

Professor Tandy Warnow, University of Illinois, USA

Abstract

The estimation of phylogenetic trees for individual genes or multi-locus datasets is a basic part of considerable biological research. In order to enable large trees to be computed, Disjoint Tree Mergers (DTMs) have been developed; these methods operate by dividing the input sequence dataset into disjoint sets, constructing trees on each subset, and then combining the subset trees (using auxiliary information) into a tree on the full dataset. DTMs have been used to advantage for multi-locus species tree estimation, enabling highly accurate species trees at reduced computational effort, compared to leading species tree estimation methods. The talk will show that DTMs can be used to improve the accuracy and speed of methods for species tree estimation methods (eg, ASTRAL) as well as for gene tree estimation (eg, RAxML), thus enabling these methods to run efficiently on much larger datasets than currently possible, and without the need for high performing computing platforms or massive parallelism. These methods are available in open source form on github. 

Show speakers

17:30-18:00 Discussion

Dr Nicholas Croucher, Imperial College London, UK
Dr John Lees, Imperial College London, UK
Dr Cheryl P Andam, University at Albany, State University of New York, USA

Show speakers

29 September

Session 3 13:00-15:00

4 talks Show detail Hide detail

Chairs

Professor Edward Feil, University of Bath, UK

13:00-13:30 Opening the door to studying nucleotide-resolution genetic variation in bacterial pan-genomes

Dr Zamin Iqbal, The European Bioinformatics Institute, UK

Abstract

When we study evolution of a bacterial species, we use different models, depending on what we want to achieve or infer. One approach is to reduce to single nucleotide polymorphism (SNP) variation in the 'core genome'  (presumably inherited vertically) to study phylogeography or to study an outbreak. In focusing on SNPs (and invariant sites), it has been possible for researchers to build a range of sophisticated phylogenetic models. However once we try to incorporate genome organisation, chromosomal rearrangements, movement of plasmids, transposons or phage, then the modelling problem is far harder. The question of how to  properly model bacterial genetic variation is wide open and extremely challenging. A prerequisite for any solution to this, is a decision on how to describe the variation in the first place – you cannot model variation until you represent it. Note that this is true even if you have perfect genome assemblies: even if it were possible to multiple sequence align them, this would not really help with how to notice that a SNP at one position in one genome is 'the same' as a SNP somewhere else in another. This talk will cover a solution to this representation problem, showing how it is possible to represent the pan genome of a species as a network of 'floating' graphs, representing the ensemble of known variation in orthology blocks (using genes and intergenic regions, but this could be done for mobile elements also). In doing so it becomes possible to discover and describe genetic variation at fine (SNP/indel) and coarse (gene order) level, and to compare diverse cohorts of genomes across the full pan-genome.

Show speakers

13:30-14:00 How the interplay between mobile elements shapes bacterial genomes

Dr Eduardo Rocha, Institut Pasteur & CNRS, France

Abstract

Horizontal gene transfer driven by self-mobilisable genetic elements allows the acquisition of complex adaptive traits and their transmission to subsequent generations. Transfer speeds up evolutionary processes as exemplified by the acquisition of virulence traits in emerging infectious agents and by antibiotic resistance in many human pathogens. Transfer is also costly because the vectors of horizontal transfer compete within genomes, have their own mobile elements and are often deadly. As a result, genomes are repositories of multiple defense systems from hosts and from mobile elements that interact in complex ways to drive gene flow in communities. The combination of evolutionary genomics and sequence analysis is now opening up these processes to show how they bring into the genome a constant flux of novel genes that favour the establishment and the invention of novel functions. 

Show speakers

14:00-14:30 Diversification and adaptation of human skin bacteria during health and disease

Professor Tami Lieberman, Massachusetts Institute of Technology, USA

Show speakers

14:30-15:00 Discussion

Professor Edward Feil, University of Bath, UK

Show speakers

Tea break 15:00-15:30

Session 4 15:30-17:30

4 talks Show detail Hide detail

Chairs

Professor Ross Fitzgerald, Edinburgh Infectious Diseases and University of Edinburgh, UK

15:30-16:00 A scalable analytical approach from bacterial genomes to epidemiology

Professor Xavier Didelot, University of Warwick, UK

Abstract

Recent years have seen a remarkable increase in the practicality of sequencing whole genomes from large numbers of bacterial isolates. The availability of this data source has huge potential to deliver new insights into the evolution and epidemiology of bacterial pathogens, but the analytical methodology has been lagging behind the sequencing technology. Here Professor Didelot presents a step-by-step approach for such genomic epidemiology analyses, from bacterial genomes to epidemiological interpretations. A central component of this approach is the dated phylogeny, which is a phylogenetic tree with branch lengths measured in units of time. The construction of dated phylogenies from bacterial genomic data needs to account for the disruptive effect of recombination on phylogenetic relationships, and Professor Didelot describes how this can be achieved. Dated phylogenies can then be used to perform fine-scale or large-scale epidemiological analyses, depending on the proportion of cases for which genomes are available. A key feature of this approach is computational scalability, and in particular the ability to process hundreds or thousands of genomes within a matter of hours. This is a clear advantage of the step-by-step approach described here. Professor Didelot discusses other advantages and disadvantages of the approach, as well as potential improvements and avenues for future research.

Show speakers

16:00-16:30 Title to be confirmed

16:30-17:00 Unlocking Typhi genomics data to inform public health policy

Professor Kathryn Holt, London School of Hygiene and Tropical Medicine, UK

Abstract

Typhoid fever is a systemic infection caused by Salmonella enterica serovar Typhi (S Typhi). Antimicrobials are the mainstay of typhoid disease control, and effective antimicrobial therapy can reduce the rate of complications from 10–30% down to 1%. A new conjugate vaccine has recently been pre-qualified by WHO and national immunisation programs are currently being considered by many countries where the disease is endemic, however data on disease burden, pathogen populations and antimicrobial resistance (AMR) are scarce in most such settings. Where typhoid surveillance is undertaken, namely for routine surveillance of travel-related infections in high income countries and burden studies in low income countries, whole genome sequencing (WGS) has been widely adopted as the primary method for characterisation of S Typhi isolates. WGS data can provide insights into pathogen diversity and transmission dynamics, as well as the emergence, dissemination and prevalence of AMR, much of which has relevance to understanding disease in settings other than those directly sampled (including regional trends, and country-of-acquisition for travel cases). However the resulting data are not readily accessible to public health decision makers. To fill this gap we are developing an interactive dashboard (TyphiNET, http://typhi.net), which aims to provide a window into genome-derived surveillance information for non-genomics experts. The dashboard relies on critical infrastructure that is being developed alongside, including (i) a community-driven effort to publicly share S Typhi sequence and source data in a manner that facilitates downstream aggregation for public health surveillance (the Global Typhoid Genomics Consortium, https://www.typhoidgenomics.org/); (ii) the GenoTyphi genotyping scheme, which provides simple, stable, phylogenetically informative, nomenclature to facilitate reporting and communication about pathogen variants; and (iii) Typhi Pathogenwatch, a public genomic epidemiology platform that provides uniform identification of genotypes and AMR determinants from genome data (in addition to whole-genome-based clustering), which is then fed into the TyphiNET dashboard.

Show speakers

17:00-17:30 Discussion

Dr Yogesh Hooda, MRC Laboratory of Molecular Biology, UK
Professor Ross Fitzgerald, Edinburgh Infectious Diseases and University of Edinburgh, UK

Show speakers

30 September

Session 5 08:00-10:00

4 talks Show detail Hide detail

Chairs

Dr Sebastian Duchene, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Australia

08:00-08:30 Real-time to Real-life: Phylogenetics & SARS-CoV-2 Variant Tracking

Dr Emma Hodcroft, Institute of Social and Preventive Medicine, University of Bern, Switzerland

Abstract

Since the UK government announced a more transmissible variant of SARS-CoV-2 in December 2020, countries around the world have quickly committed resources to tracking the main variants of concern (VoC). But even variants without obvious changes in viral traits can give insight into how SARS-CoV-2 spreads, help track specific epidemics, and give clues about arising mutations. However, tracking variants can be challenging for scientists unfamiliar with huge numbers of sequences and complex phylogenetic trees – with over 2.5 million publicly available sequences, it is no small task to track and monitor emerging and existing variants. This talk will discuss significant variants and what we have observed about them, how we detect variants of concern and interest, and how both our methods and what we classify as 'variants worth watching' may change in the future. 

Show speakers

08:30-09:00 Bayesian spatiotemporal reconstruction of SARS-CoV-2 spread

Professor Philippe Lemey, KU Leuven, Belgium

Abstract

SARS-CoV-2 genome data has been crucial to track the rapidly changing COVID-19 epidemic. The accumulation of high data volumes over short time makes time-consuming Bayesian phylogenetic inference impractical for real-time analyses. However, SARS-CoV-2 genomes come with a number of other challenges that can be confronted by Bayesian phylodynamic approaches. Specifically, these methods can take advantage of data integration opportunities and result in more realistic spatiotemporal reconstruction of SARS-CoV-2 spread. This will be demonstrated through the incorporation of global mobility data, individual travel histories and upsampled diversity in phylogeographic reconstructions. Such approaches allow addressing important epidemiological questions, such as to what extent lineage persistence and new introductions contributed to the COVID-19 resurgence in Europe late summer 2020. Various ways of making Bayesian inference more efficient and scalable will be highlighted in different settings.

Show speakers

09:00-09:30 The fitness advantage and effective reproductive number of SARS-CoV-2 variants

Ms Jana S Huisman, ETH Zurich, Switzerland

Abstract

During the COVID-19 pandemic, the Stadler group has set up an extensive nation-wide sequencing effort, covering roughly 8% of all confirmed SARS-CoV-2 cases in Switzerland each week. In addition, the group has developed the method that is federally used to estimate the effective reproductive number of SARS-CoV-2 from a variety of case report data. During this talk, Jana will detail the group’s efforts to track SARS-CoV-2 variants in PCR samples and wastewater, as well as related work to estimate the effective reproductive number and fitness advantage of individual variants.

Show speakers

09:30-10:00 Discussion

Dr Sebastian Duchene, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Australia

Show speakers

Coffee break 10:00-10:30

Session 6 10:30-12:15

4 talks Show detail Hide detail

Chairs

Professor Julian Parkhill FRS FMedSci, University of Cambridge, UK

10:30-11:00 Evolutionary and ecological dynamics of emerging viruses

Professor Oliver Pybus, University of Oxford and Royal Veterinary College London, UK

Show speakers

11:00-11:30 Within and between host pathogen genetics as a unique window into transmission and evolution

Professor Christophe Fraser, Nuffield Department of Medicine, University of Oxford, UK

Abstract

Pathogen genomics provides insight into the structure of epidemics, providing a disaggregated view of the epidemic, resolving transmission into clusters. Phylogenetic reconstruction enables the inference of the history of transmission. A particularity of pathogen genetics is that different pathogens within the same infection can be genetically distinct, either because the individual was infected more than once, or more usually because the pathogen has replicated and differentiated during the course of infection. This talk will review advances that allow improved characterisation of within and between host pathogen genetic diversity. Applications for viruses and bacteria will be shown. Improvements in the resolution of transmission will be highlighted. The talk will finish with a research agenda focused on characterising the transmission interface as a key area for improving intervention that limit infection and disease.

Show speakers

11:30-12:00 Discussion

Daniel Falush, Institute Pasteur Shanghai, China
Dr Kate Baker, University of Liverpool, UK
Professor Julian Parkhill FRS FMedSci, University of Cambridge, UK

Show speakers

12:00-12:15 Final comments and close

Professor Mark Achtman FRS, University of Warwick, UK
Professor Kathryn Holt, London School of Hygiene and Tropical Medicine, UK
Professor David Aanensen, University of Oxford and Wellcome Sanger Institute, UK

Show speakers
Genomic population structures of microbial pathogens

28 – 30 September 2021

other
Was this page useful?
Thank you for your feedback
Thank you for your feedback. Please help us improve this page by taking our short survey.