Antonia Tsvetanova, BSc MRes PhD
AI Scientist
IQVIA London, UK
Email: antonia.tsvetanova@iqvia.com

I am a PhD-trained Applied AI Scientist at IQVIA, working within the Data Standardization & Analytics team focused on real-world evidence and global health-data networks. In this role I contribute to transforming diverse observational health-care data into the OMOP Common Data Model, thereby enabling large-scale, reproducible analyses across the global federated OHDSI/OMOP network coordinated by IQVIA — a network spanning over 150 databases in 21 countries, with more than 2 billion patient records.

Prior to my current role, I was a Postdoctoral Research Fellow at the MRC Biostatistics Unit at the University of Cambridge, working with Dr Pavel Mozgunov within the Efficient Study Design theme of the BSU, and Dr Nikos Demiris, a lead statistician in the Cambridge Clinical Trials Unit of the Addenbrooke’s Hospital. Half of my time I spent on the development of novel statistical methodology for early-phase dose-finding trials, and another half – on taking the cancer trials from initial ideas and design to their implementation in clinical practice.

I obtained my PhD in Methodological Statistics from the University of Manchester in December 2023, where I was part of the Prediction Modelling Group within the Centre for Health Informatics. My doctoral research focused on methods for handling missing data throughout the entire clinical prediction model pipeline — from development to validation and deployment. I investigated which missing data handling approaches were compatible across these stages and under which missingness mechanisms they led to bias in the model’s predictive performance. This work was centred on models developed with routinely collected health data, where missing data is a common challenge. I explored commonly used methods such as complete case analysis (CCA) and mean/mode imputation, more sophisticated approaches like regression and multiple imputation, and novel emerging techniques, including the pattern submodel method. The ultimate goal of my research was to develop recommendations for handling missing data in clinical risk prediction, where the aim is to optimise the predictive performance of these models. I was supervised by Dr Glen Martin, Dr Matthew Sperrin, Prof Niels Peek, Dr David Jenkins and Prof Iain Buchan.

During my PhD, I completed two internships in Machine Learning working at AstraZeneca and Microsoft. At AstraZeneca, I used large language models (LLMs) to extract critical insights from the clinical trial literature, supporting drug discovery, repurposing, and competitive analysis. At Microsoft, I explored the problem of missing data in clinical risk prediction from a machine learning perspective, assessing whether models are susceptible to changes in the missingness mechanisms and the approach to handling missing data.

My Google Scholar page can be found here.

Publications

Missing data was handled inconsistently in UK prediction models: a review of method used.
Antonia Tsvetanova, Matthew Sperrin, Niels Peek, Iain Buchan, Stephanie Hyland, Glen P Martin
Journal of Clinical Epidemiology, 2021.
[paper]
Impact of incompatibilities in missing data handling on bias in estimated predictive performance: A simulation study.
Antonia Tsvetanova, Matthew Sperrin, David Jenkins, Niels Peek, Iain Buchan, Stephanie Hyland, Glen P Martin
In Preparation for submission at the Journal of Statistics in Medicine
The curse of knowing: What happens when ML-algorithms can no longer rely on missingness patterns to make accurate predictions when deployed? A simulation study.
Antonia Tsvetanova, Glen Martin, Matthew Sperrin, David Jenkins, Niels Peek, Iain Buchan, Stephanie Hyland
In Preparation for submission

Invited Talks

July 2023: The Australasian Institute of Digital Health MedINFO23 at Sydney, Australia.
November 2022: American Medical Informatics Association AMIA at Washington, DC, USA.
September 2022: Royal Statistical Society International Conference RSS at Aberdeen, UK.
August 2022: International Society for Clinical Biostatistics ISCB at Newcastle, UK.
January 2021: AI4Health Winter School Paris ai4health at Paris, France.
June 2020 : Young Statisticians Meeting YSM2020 online

Teaching

Autumn 2022: Graduate Teaching Assistant for Design and Analysis of Randomised Controlled Trials as part of the MSc Health Data Science at The University of Manchester
Autumn 2022: Graduate Teaching Assistant for Fundamental Mathematics & Statistics for Health Data as part of the MSc Health Data Science at The University of Manchester
Autumn 2022: Graduate Teaching Assistant for Statistical Modelling and Inference for Health as part of the MSc Health Data Sicence at The University of Manchester
Autumn 2022: Graduate Teaching Assistant for Health Information Systems as part of the MSc Health Informatics, UCL, London
Autumn 2022: Graduate Teaching Assistant for Scientific Skills: Human Biology and Radiochemistry as part of the MSc Medical Imaging, UoM

Awards

February 2021: Prize for the best presentation as voted by those who gave a talk at the Postgraduate Research Seminar in the Division of Informatics, Imaging and Data Sciences, UoM
January 2020: The best presentation on a literature review and research project introduction by a first year PhD student at the annual PGR student showcase, UoM
September 2019: I was awarded the prestigious EPSRC Industrial CASE studentship, in collaboration with Microsoft as the industrial partner, which allowed me to carry out my PhD research in both academia and industry.