Training Plan
High-dimensional data is now routinely collected in many settings, due to the number of different variables being measured on a subject, the number of times a variable is measured and the number of individuals in a study. This data can be used for screening patients to determine their risk of disease, or to classify patients into risk groups.
In the machine learning literature variational Bayes approximation methods have been shown to give fast and accurate estimates of model parameters in a variety of settings.
My proposal is to derive a mean-field variational Bayes (MFVB) approach to estimating joint models for multiple longitudinal biomarkers of different types and time-to-event data. Making it practicable to fit such models in a reasonable time frame, will allow personalised risk-prediction and diagnostics to be performed in real-time.
This project will be at the interface of statistics and computer science with significant statistical, methodological, and computational components.
Skills gaps to be addressed
Machine and statistical learning; Software for clinical decision-making; Modelling of multidimensional data structures.
Year 1
I will be based within the Department of Biostatistics. I will derive the MFVB approximation to multivariate generalized linear mixed models, and will apply these new models to a number of clinical datasets. I will gain an understanding of the clinical challenges as well as more experience of the biomedical research environment by meeting with my clinical collaborators and by attending additional departmental research meetings related to the areas of diabetic retinopathy and cancer metabolomics. I will develop skills in machine learning by attending modules run by the Department of Computer Science; Machine Learning and Bioinspired Optimisation (COMP532), ), Data mining and visualization (COMP527).
I plan spend two weeks visiting Professor Matt Wand, a leading expert in variational approximation techniques, in Sydney. This collaboration will help me to develop my statistical and machine learning skills, and will provide excellent guidance for my research proposal. (Cost to visit University of Technology, Sydney: £1400 flight; £2800 subsistence (14 days @ £200))
In addition, I will also interact regularly with the Data Mining and machine learning research group, and the cross-faculty Bayesian Statistics group. These networks will be develop my awareness of current research in the fields related to my research project.
Year 2
I will develop MFVB approximations for joint models of multiple longitudinal markers and time-to-event data, and develop risk prediction models in clinical datasets. I will develop my software skills by completing courses run by Jumping Rivers; R for Big Data (£+VAT), Introduction to Bayesian Inference using RStan (£+VAT). I will learn about optimization techniques in Optimisation (COMP557). I will continue interactions with the clinical departments by presenting my work at departmental seminars and through on-going discussion.
I plan to spend one week visiting Dr Dimitris Rizopoulos at Erasmus University in Rotterdam, to develop further my joint modelling techniques with his leading research team. (Anticipated Costs: £ flight, £1050 subsistence (7 days @£150))
Year 3
Regular meetings with clinical colleagues will take place to discuss the implementation and clinical assessment of the methodology. I will also create an R package containing the code for methods I develop.
During my final year I will participate in the Research Leadership programme at the University of Liverpool. This programme will develop my leadership skills ahead in preparation for leading my own research team, and developing grant proposals.
Supervisory Team
Marta Garcia-Fiñana (Multivariate data modelling, Biostatistics, academic sponsor), Frans Oliehoek (machine learning, Computer Science). Simon Harding (Diabetic Retinopathy risk, Ophthalmology), Chris Probert (Cancer Metabolomics, Cellular and Molecular Physiology)