Current Fellow - Maya Wardeh

Big Data approaches to identifying potential sources of emerging pathogens in humans, domesticated animals and crops

Emerging infectious diseases continue to pose major threats to humans, animals and plants. Recent years have seen significant outbreaks of several emerging diseases, ranging from the well-known (Ebola and Olive quick decline syndrome), to the previously little known (Zika), to the entirely novel (Schmallenberg), to name but a few. It is well established that the ability of a pathogen to infect multiple hosts, particularly hosts in different taxonomic orders or wildlife, is a risk factor for emergence in human and livestock pathogens. Emerging wild-life diseases have also been linked to 'spill-overs' from humans or domesticated animals. Despite the importance of cross-species disease transmission, there has been relatively little attention paid to which species are the most important sources cross communities (e.g., zoonotic, wild-life to domestic, plants to other kingdoms), which are the most prolific vectors, how those species acquired the pathogens, and by what means the diseases entered new species or populations. A major reason for this limited understanding is the lack of comprehensive data on the pathogens in animal and plant populations and, in most cases, poorly documented information on how they are transmitted, including to humans.

In this fellowship, I will improve and exploit a novel bioinformatic resource developed at the University of Liverpool to investigate how humans, their domesticated animals and crops are connected to the pathogen reservoir in other species, and how these pathogens pass from that reservoir to the focus populations. The bioinformatic resource, developed by me with funding from BBSRC, is the Enhanced Infectious Disease Database (EID2). EID2 utilises state-of-the-art, text and data mining procedures to extract information from multiple sources, including millions of metadata records accompanying genetic sequences and scientific publications. After processing, EID2 provides evidence for over 60,000 interactions between species of hosts and pathogens and is the most comprehensive data source on the known pathogens of humans, animals, and plants and their geographical ranges.

During this fellowship, I aim to investigate the factors which lead to emergence of pathogens, asking the following questions:

  • What are the characteristics of the networks that connect species via shared pathogens? How central are humans and their domesticated animals and crops in these networks and which other species are each of those communities most closely connected to?
  • What is the role of different pathogen transmission routes on the nature of these networks? Are the potential species-to-species transmission pathways different for direct, food-borne, water-borne and vector-borne pathogens?
  • What factors determine the host ranges of pathogens? Are host species more likely to become exposed to pathogens that infect a wide range of species? From species that are closer to them genetically? Or from those species with which they often interact?
  • What are we missing? Given the networks, transmission routes and host ranges, what is the risk associated with each pathogen emerging in new species? What are the pathogens that can be prioritised as more-likely to emerge in the future?

Training Plan

Ecological networks, in which nodes represent species and links illustrate different interactions between those species, have been used to model and investigate a spectrum of important phenomena. In ecological multi-host networks, nodes are host species linked through sharing of pathogens. The relative importance of nodes can be quantified using centrality measures. Central hosts act as interspecies super-spreaders, and their identification is important for developing surveillance protocols and interventions aimed at preventing future disease emergence in populations of humans, their domesticated animals or crops. Link prediction models which take into account the typology of observed interactions networks and evolutionary relationships between hosts can be used to predict missing links between hosts and pathogens. Missing links indicate future emerging pathogens or undocumented interactions between host and pathogen species. Developing the various components of this project requires combining skills in programming, data mining and management (in order to mine the information required to build the networks), with mathematical and statistical skills (for network analysis, and prediction of missing links), underpinned by understanding of evolutionary relationships between species (for link prediction model parametrisation and interpretation). This project will be at the interface of data science and network analysis, with statistical components.

Skills Gap
Advanced mathematical and statistical network analysis; mathematical modelling of biological systems; statistical modelling; phylogenetics.

Year 1
Mathematical network analysis skills will be obtained through bi-weekly meetings (running over the three years) with Dr Sharkey (Department of Mathematics). Modules will be undertaken in mathematics relevant to modelling of biological systems; namely, Population Dynamics (MATH227, MATH332), as well as the theory of statistical inference (MATH361). Formal foundation in network science will be obtained by attending either Mathematical Biology (MATH426) or Networks in Theory & Practice (MATH367). Basics of phylogenetics will be obtained through completing EMBL-EBI online courses (free). I will attend relevant short courses in Biostatistics when timetables are released. I will attend seminars in both Mathematics and Biostatistics departments on regular basis.

Year 2
Further statistical skills will be obtained by attending modules in the linear models (MATH363), and short courses in Bayesian methods (Lancaster University, £460 + £18 travel + £150 subsistence) and statistical learning (Lancaster university, £460 + £18 travel + £150 subsistence). I will get an immersive experience in network modelling by spending 4 weeks visiting the Dynamics of Biological Networks (BioND) group, led by Dr Thilo Gross, at the University of Bristol (£85 travel; £4,200 subsistence [28 days @£150]). I will spend a week at the national institute of informatics, Tokyo, to work with associate professor Dr Mahito Sugiyama on developing tools for quantifying differences between large graphs/networks (£700 travel; £1,400 subsistence [7 days @£200]).

I plan to spend one week visiting Dr Dimitris Rizopoulos at Erasmus University in Rotterdam, to develop further my joint modelling techniques with his leading research team. (Anticipated Costs: £ flight, £1050 subsistence (7 days @£150))

Year 3
I will spend 4 weeks at a leading centre of statistical network analysis - Department of Mathematics and Statistics, Boston University, Massachusetts. (£700 travel; £4,200 subsistence [28 days @£150]).

I will attend two within UK and two international conferences during the course of the fellowship (average costing £1500 per international conference, £600 per within UK conference [including travel and registration fees]).

Supervisory Team
Mathematician with background in network analysis (Dr Kieran Sharkey); Mathematician with background in nonparametric and Bayesian statistics (Dr Kamila Zychaluk); Life scientist with expertise in infectious disease epidemiology (Professor Matthew Baylis).

Expression of Interest by email

31st January 2019

Applications Close

17th May 2019



Start the Fellowship

June 2019 - March 2020