This page is archived

Links to external sources may no longer work as intended. The content may not represent the latest thinking in this area or the Society’s current position on the topic.

Genomic population structures of microbial pathogens

28 - 30 September 2021 12:50 - 11:15

Scientific discussion meeting organised by Professor Mark Achtman FRS, Professor Kathryn Holt and Professor David Aanensen.

The comparative genomics of microbial pathogens from all domains of life has become a big data problem. Databases already contain >200,000 assembled genomes from single species but adequate tools to reveal population structures are in their infancy. This meeting brought together world-class bioinformaticians with experts on bacterial and viral genomes to illustrate multiple approaches to solving this challenge.

A related journal issue has been published in Philosophical Transactions of the Royal Society B.

Attending the event

This meeting has taken place. You can watch the recording here.

Enquiries: contact the Scientific Programmes team

Organisers

  • Professor Mark Achtman FRS, University of Warwick, UK

    Since 1965, Achtman has founded four highly distinct areas of bacterial genetics: 1) bacterial conjugation involving the Escherichia coli F sex factor (1965-78), 2) E. coli neonatal meningitis (1979-86), 3) epidemic cerebrospinal meningitis caused by Neisseria meningitis (1983-2000). Since 1998 he has dedicated himself to the population genetics and genomics of bacterial pathogens. In each area he made seminal discoveries, resulting in global recognition, and is one of the globally most prominent bacterial population geneticists. In recent years, he was one of three co-inventors of multilocus sequence typing and has been at the forefront of comparative population genomics. He elucidated the historical associations of Helicobacter pylori with ancient human migrations, ancient global routes of transmission of historical plague, and has introduced dramatic changes to the practice of epidemiological typing of Salmonella enterica. More recently, he has been responsible for developing EnteroBase which provides access to 100,000s of assembled genomes from a variety of genera containing bacterial pathogens.

    Honours: main prize of the Deutsche Gesellschaft fuer Hygiene und Mikrobiologie, 2004; foreign member of the Norwegian Academy of Sciences and Letters, 2014, Fellow of the Royal Society, 2015, and the Pettenkofer Prize, 2018.

  • Professor Kathryn Holt, London School of Hygiene and Tropical Medicine, UK

    Kat is a computational biologist specialising in infectious disease genomics, and is Professor of Microbial Systems Genomics at LSHTM’s Department of Infection Biology and an Adjunct Professor in the Department of Infectious Diseases at Monash University in Australia. She has a BA/BSc (Hons) majoring in Biochemistry, Applied Statistics and Philosophy (University of Western Australia); a Master of Epidemiology (University of Melbourne); and a PhD in Molecular Biology (University of Cambridge and Sanger Institute). Kat is currently Editor-in-Chief of the UK Microbiology Society journal Microbial Genomics and a HHMI-Gates International Research Scholar. Kat’s research group uses computational genomics and sequencing, phylogenetics, spatiotemporal analysis and epidemiology to study the evolution and transmission of bacterial pathogens, including tropical diseases such as typhoid, dysentery, E. coli diarrhoea and tuberculosis; and hospital associated pathogens such as Klebsiella and Acinetobacter.

  • Professor David Aanensen, University of Oxford and Wellcome Sanger Institute, UK

    David is Director of The Centre for Genomic Pathogen Surveillance housed between the Wellcome Sanger Institute and The Big Data Institute, University of Oxford.

    David and team focus on data flow and the use of genome sequencing for translational surveillance of microbial pathogens through a combination of web and software engineering, methods development for population genomics and large-scale structured pathogen surveys and sequencing of microbes with delivery of information for decision making. Working with major public health agencies such as the US CDC, the European CDC, Public Health England and the WHO, methods and systems are utilised to interpret and aid decision making for infection control.

    David is also Director of the NIHR funded Global Health Research Unit on Genomic Surveillance of Antimicrobial Resistance working with partners leading National AMR strategies in The Phillipines, Colombia, Nigeria and India to implement genomic surveillance and linking and processing routine phenotypic and epidemiological data for priority pathogens.

    http://pathogensurveillance.net

    http://ghru.pathogensurveillance.net

     

     

Schedule

Chair

Dr Alison Mather, Quadram Institute Bioscience, UK

12:50 - 13:00 Open

Professor Mark Achtman FRS, University of Warwick, UK

13:00 - 13:30 EnteroBase: Hierarchical clustering of >600,000 bacterial genomes

The number of sets of genomic short read sequences in the public domain has exploded since 2012. EnteroBase (https://enterobase.warwick.ac.uk/) has assembled >600,000 draft genomes from short reads (from SRA or uploaded by users), assigned allelic designations to sequences from the core genome (cgMLST), and clustered the resulting sequence types at multiple levels (hierarchical clustering; HierCC (PMID: 33823553) for the genera Salmonella, Escherichia/Shigella, Streptococcus, Clostridioides, Vibrio and Yersinia (PMID: 31809257, 32726198, 33055096, 33614977). One HierCC level is a complete replacement for classical taxonomy or ANI because it automatically and reliably identifies species/sub-species. In several genera, two other HierCC levels correspond to ST Complexes and Super-Lineages, which are the predominant population structures in these genera. HierCC levels with even higher resolution are proving useful for tracking transmissions and single-source outbreaks of gastrointestinal disease. HierCC topologies are also consistent with trees based on the presence or absence of all genes in the pan-genome. These conclusions will be illustrated with specialised case studies.

Professor Mark Achtman FRS, University of Warwick, UK

13:30 - 14:00 Analysing bacterial population structure from millions of genomes

In less than a decade, bacterial population genomics has progressed from the effort of sequencing dozens to thousands of strains in a single study. There are now >250,000 genomes available even for a single bacterial species and the number of genomes is expected to continue to increase rapidly given the advances in sequencing technology and widespread genomic surveillance initiatives. The biological insights enabled by population genomics are particularly important in evolutionary epidemiology, as the genome sequences provide high-resolution data for the estimation of transmission and evolutionary dynamics, including the horizontal transfer of virulence and resistance elements. Professor Corander will discuss statistical and computational techniques that are amenable to rapidly analysing population structure in data consisting of millions of whole genomes. 

Professor Jukka Corander, University of Oslo and Wellcome Sanger Institute, Norway and UK

14:00 - 14:30 Discussion

Dr Francesc Coll, London School of Hygiene & Tropical Medicine, UK

Dr Alison Mather, Quadram Institute Bioscience, UK

Professor Jukka Corander, University of Oslo and Wellcome Sanger Institute, Norway and UK

Professor Mark Achtman FRS, University of Warwick, UK

Chair

Dr Nicholas Croucher, Imperial College London, UK

15:00 - 15:30 Beyond the S. aureus comet: what tree shapes occur in large bacterial genomic data?

When methicillin-resistant Staphylococcus aureus (MRSA) arose and disseminated widely, some phylogenetic trees of MRSA-containing types of staphylococcus aureus had a distinctive 'comet' shape, with a 'comet head' of recently-adapted resistant isolates in the context of a 'comet tail' that was predominantly drug sensitive. Placing an isolate in the context of such a 'comet' helped public health laboratories interpret local data within the broader setting of S aureus evolution. In this work Professor Colijn and her colleagues ask what other tree shapes, analogous to the MRSA comet, are present in bacterial WGS datasets. They extract trees from large bacterial genomic datasets, visualise them as images, and cluster the images. They find nine major groups of tree images, including the 'comet', star-like phylogenies, barbell' phylogenies and other shapes, and comment on the evolutionary and epidemiological stories these shapes might illustrate.

Professor Caroline Colijn, Simon Fraser University, Canada

15:30 - 16:00 Genome-scale metabolic network reconstructions of hundreds of diverse Escherichia coli strains reveal strain-specific adaptations and evolutionary trajectories

Bottom-up approaches to systems biology rely on constructing a mechanistic basis for the biochemical and genetic processes that underlie cellular functions. Genome-scale network reconstructions of metabolism are built from all known metabolic reactions and metabolic genes in a target organism. A network reconstruction can be converted into a mathematical format and thus lend itself to mathematical analysis. Genome-scale models (GEMs) of enable a systems approach to characterise the pan and core metabolic capabilities of the E coli species. The models have been used to systematically analyze growth capabilities in more than 650 different growth-supporting environments as well as to predict strain-specific auxotrophies. In this work, genome-scale models were constructed for more than 300 representative strains of E coli across all 295 HC1100 levels. The models were used to study E coli metabolic diversity and speciation on a large scale. The results show that unique strain-specific metabolic capabilities correspond to pathotypes and environmental niches. Genome-scale analysis of multiple strains of a species can thus be used to define the metabolic essence of a microbial species and delineate growth differences that shed light on the adaptation process to a particular microenvironment.

Dr Jonathan Monk, University of California San Diego, USA

16:00 - 16:30 New methods with high accuracy and scalability for large-scale phylogenetic estimation

The estimation of phylogenetic trees for individual genes or multi-locus datasets is a basic part of considerable biological research. In order to enable large trees to be computed, Disjoint Tree Mergers (DTMs) have been developed; these methods operate by dividing the input sequence dataset into disjoint sets, constructing trees on each subset, and then combining the subset trees (using auxiliary information) into a tree on the full dataset. DTMs have been used to advantage for multi-locus species tree estimation, enabling highly accurate species trees at reduced computational effort, compared to leading species tree estimation methods. The talk will show that DTMs can be used to improve the accuracy and speed of methods for species tree estimation methods (eg, ASTRAL) as well as for gene tree estimation (eg, RAxML), thus enabling these methods to run efficiently on much larger datasets than currently possible, and without the need for high performing computing platforms or massive parallelism. These methods are available in open source form on github. 

Professor Tandy Warnow, University of Illinois, USA

16:30 - 17:00 Discussion

Dr Nicholas Croucher, Imperial College London, UK

Dr John Lees, Imperial College London, UK

Dr Cheryl P Andam, University at Albany, State University of New York, USA

Professor Caroline Colijn, Simon Fraser University, Canada

Dr Jonathan Monk, University of California San Diego, USA

Professor Tandy Warnow, University of Illinois, USA

Chair

Professor Edward Feil, University of Bath, UK

12:00 - 12:30 Opening the door to studying nucleotide-resolution genetic variation in bacterial pan-genomes

When we study evolution of a bacterial species, we use different models, depending on what we want to achieve or infer. One approach is to reduce to single nucleotide polymorphism (SNP) variation in the 'core genome'  (presumably inherited vertically) to study phylogeography or to study an outbreak. In focusing on SNPs (and invariant sites), it has been possible for researchers to build a range of sophisticated phylogenetic models. However once we try to incorporate genome organisation, chromosomal rearrangements, movement of plasmids, transposons or phage, then the modelling problem is far harder. The question of how to  properly model bacterial genetic variation is wide open and extremely challenging. A prerequisite for any solution to this, is a decision on how to describe the variation in the first place – you cannot model variation until you represent it. Note that this is true even if you have perfect genome assemblies: even if it were possible to multiple sequence align them, this would not really help with how to notice that a SNP at one position in one genome is 'the same' as a SNP somewhere else in another. This talk will cover a solution to this representation problem, showing how it is possible to represent the pan genome of a species as a network of 'floating' graphs, representing the ensemble of known variation in orthology blocks (using genes and intergenic regions, but this could be done for mobile elements also). In doing so it becomes possible to discover and describe genetic variation at fine (SNP/indel) and coarse (gene order) level, and to compare diverse cohorts of genomes across the full pan-genome.

Dr Zamin Iqbal, The European Bioinformatics Institute, UK

12:30 - 13:00 How the interplay between mobile elements shapes bacterial genomes

Horizontal gene transfer driven by self-mobilisable genetic elements allows the acquisition of complex adaptive traits and their transmission to subsequent generations. Transfer speeds up evolutionary processes as exemplified by the acquisition of virulence traits in emerging infectious agents and by antibiotic resistance in many human pathogens. Transfer is also costly because the vectors of horizontal transfer compete within genomes, have their own mobile elements and are often deadly. As a result, genomes are repositories of multiple defense systems from hosts and from mobile elements that interact in complex ways to drive gene flow in communities. The combination of evolutionary genomics and sequence analysis is now opening up these processes to show how they bring into the genome a constant flux of novel genes that favour the establishment and the invention of novel functions. 

Dr Eduardo Rocha, Institut Pasteur & CNRS, France

13:00 - 13:30 Diversification and adaptation of human skin bacteria during health and disease

Professor Tami Lieberman, Massachusetts Institute of Technology, USA

13:30 - 14:00 Discussion

Professor Edward Feil, University of Bath, UK

Dr Eduardo Rocha, Institut Pasteur & CNRS, France

Dr Zamin Iqbal, The European Bioinformatics Institute, UK

Professor Tami Lieberman, Massachusetts Institute of Technology, USA

Chair

Professor Ross Fitzgerald, Edinburgh Infectious Diseases and University of Edinburgh, UK

14:30 - 15:00 A scalable analytical approach from bacterial genomes to epidemiology

Recent years have seen a remarkable increase in the practicality of sequencing whole genomes from large numbers of bacterial isolates. The availability of this data source has huge potential to deliver new insights into the evolution and epidemiology of bacterial pathogens, but the analytical methodology has been lagging behind the sequencing technology. Here Professor Didelot presents a step-by-step approach for such genomic epidemiology analyses, from bacterial genomes to epidemiological interpretations. A central component of this approach is the dated phylogeny, which is a phylogenetic tree with branch lengths measured in units of time. The construction of dated phylogenies from bacterial genomic data needs to account for the disruptive effect of recombination on phylogenetic relationships, and Professor Didelot describes how this can be achieved. Dated phylogenies can then be used to perform fine-scale or large-scale epidemiological analyses, depending on the proportion of cases for which genomes are available. A key feature of this approach is computational scalability, and in particular the ability to process hundreds or thousands of genomes within a matter of hours. This is a clear advantage of the step-by-step approach described here. Professor Didelot discusses other advantages and disadvantages of the approach, as well as potential improvements and avenues for future research.

Professor Xavier Didelot, University of Warwick, UK

15:00 - 15:30 Pathogenwatch and data tools to bridge genomics and epidemiology for public health

Professor David Aanensen, University of Oxford and Wellcome Sanger Institute, UK

15:30 - 16:00 Unlocking Typhi genomics data to inform public health policy

Typhoid fever is a systemic infection caused by Salmonella enterica serovar Typhi (S Typhi). Antimicrobials are the mainstay of typhoid disease control, and effective antimicrobial therapy can reduce the rate of complications from 10–30% down to 1%. A new conjugate vaccine has recently been pre-qualified by WHO and national immunisation programs are currently being considered by many countries where the disease is endemic, however data on disease burden, pathogen populations and antimicrobial resistance (AMR) are scarce in most such settings. Where typhoid surveillance is undertaken, namely for routine surveillance of travel-related infections in high income countries and burden studies in low income countries, whole genome sequencing (WGS) has been widely adopted as the primary method for characterisation of S Typhi isolates. WGS data can provide insights into pathogen diversity and transmission dynamics, as well as the emergence, dissemination and prevalence of AMR, much of which has relevance to understanding disease in settings other than those directly sampled (including regional trends, and country-of-acquisition for travel cases). However the resulting data are not readily accessible to public health decision makers. To fill this gap we are developing an interactive dashboard (TyphiNET, http://typhi.net), which aims to provide a window into genome-derived surveillance information for non-genomics experts. The dashboard relies on critical infrastructure that is being developed alongside, including (i) a community-driven effort to publicly share S Typhi sequence and source data in a manner that facilitates downstream aggregation for public health surveillance (the Global Typhoid Genomics Consortium, https://www.typhoidgenomics.org/); (ii) the GenoTyphi genotyping scheme, which provides simple, stable, phylogenetically informative, nomenclature to facilitate reporting and communication about pathogen variants; and (iii) Typhi Pathogenwatch, a public genomic epidemiology platform that provides uniform identification of genotypes and AMR determinants from genome data (in addition to whole-genome-based clustering), which is then fed into the TyphiNET dashboard.

Professor Kathryn Holt, London School of Hygiene and Tropical Medicine, UK

16:00 - 16:30 Discussion

Dr Yogesh Hooda, MRC Laboratory of Molecular Biology, UK

Professor Ross Fitzgerald, Edinburgh Infectious Diseases and University of Edinburgh, UK

Professor David Aanensen, University of Oxford and Wellcome Sanger Institute, UK

Professor Kathryn Holt, London School of Hygiene and Tropical Medicine, UK

Professor Xavier Didelot, University of Warwick, UK

Chair

Dr Sebastian Duchene, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Australia

07:00 - 07:30 Real-time to Real-life: Phylogenetics & SARS-CoV-2 Variant Tracking

Since the UK government announced a more transmissible variant of SARS-CoV-2 in December 2020, countries around the world have quickly committed resources to tracking the main variants of concern (VoC). But even variants without obvious changes in viral traits can give insight into how SARS-CoV-2 spreads, help track specific epidemics, and give clues about arising mutations. However, tracking variants can be challenging for scientists unfamiliar with huge numbers of sequences and complex phylogenetic trees – with over 2.5 million publicly available sequences, it is no small task to track and monitor emerging and existing variants. This talk will discuss significant variants and what we have observed about them, how we detect variants of concern and interest, and how both our methods and what we classify as 'variants worth watching' may change in the future. 

Dr Emma Hodcroft, Institute of Social and Preventive Medicine, University of Bern, Switzerland

07:30 - 08:00 Bayesian spatiotemporal reconstruction of SARS-CoV-2 spread

SARS-CoV-2 genome data has been crucial to track the rapidly changing COVID-19 epidemic. The accumulation of high data volumes over short time makes time-consuming Bayesian phylogenetic inference impractical for real-time analyses. However, SARS-CoV-2 genomes come with a number of other challenges that can be confronted by Bayesian phylodynamic approaches. Specifically, these methods can take advantage of data integration opportunities and result in more realistic spatiotemporal reconstruction of SARS-CoV-2 spread. This will be demonstrated through the incorporation of global mobility data, individual travel histories and upsampled diversity in phylogeographic reconstructions. Such approaches allow addressing important epidemiological questions, such as to what extent lineage persistence and new introductions contributed to the COVID-19 resurgence in Europe late summer 2020. Various ways of making Bayesian inference more efficient and scalable will be highlighted in different settings.

Professor Philippe Lemey, KU Leuven, Belgium

08:00 - 08:30 The fitness advantage and effective reproductive number of SARS-CoV-2 variants

During the COVID-19 pandemic, the Stadler group has set up an extensive nation-wide sequencing effort, covering roughly 8% of all confirmed SARS-CoV-2 cases in Switzerland each week. In addition, the group has developed the method that is federally used to estimate the effective reproductive number of SARS-CoV-2 from a variety of case report data. During this talk, Jana will detail the group’s efforts to track SARS-CoV-2 variants in PCR samples and wastewater, as well as related work to estimate the effective reproductive number and fitness advantage of individual variants.

Ms Jana S Huisman, ETH Zurich, Switzerland

08:30 - 09:00 Discussion

Dr Sebastian Duchene, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Australia

Dr Emma Hodcroft, Institute of Social and Preventive Medicine, University of Bern, Switzerland

Ms Jana S Huisman, ETH Zurich, Switzerland

Professor Philippe Lemey, KU Leuven, Belgium

Chair

Professor Julian Parkhill FRS FMedSci, University of Cambridge, UK

09:30 - 10:00 Evolutionary and ecological dynamics of emerging viruses

Professor Oliver Pybus, University of Oxford and Royal Veterinary College London, UK

10:00 - 10:30 Within and between host pathogen genetics as a unique window into transmission and evolution

Pathogen genomics provides insight into the structure of epidemics, providing a disaggregated view of the epidemic, resolving transmission into clusters. Phylogenetic reconstruction enables the inference of the history of transmission. A particularity of pathogen genetics is that different pathogens within the same infection can be genetically distinct, either because the individual was infected more than once, or more usually because the pathogen has replicated and differentiated during the course of infection. This talk will review advances that allow improved characterisation of within and between host pathogen genetic diversity. Applications for viruses and bacteria will be shown. Improvements in the resolution of transmission will be highlighted. The talk will finish with a research agenda focused on characterising the transmission interface as a key area for improving intervention that limit infection and disease.

Professor Christophe Fraser, Nuffield Department of Medicine, University of Oxford, UK

10:30 - 11:00 Discussion

Daniel Falush, Institute Pasteur Shanghai, China

Dr Kate Baker, University of Liverpool, UK

Professor Julian Parkhill FRS FMedSci, University of Cambridge, UK

Professor Christophe Fraser, Nuffield Department of Medicine, University of Oxford, UK

Professor Oliver Pybus, University of Oxford and Royal Veterinary College London, UK

11:00 - 11:15 Final comments and close

Professor Mark Achtman FRS, University of Warwick, UK

Professor Kathryn Holt, London School of Hygiene and Tropical Medicine, UK

Professor David Aanensen, University of Oxford and Wellcome Sanger Institute, UK