Antonia Tsvetanova, BSc MRes PhD
AI Scientist
IQVIA
London, UK
Email: antonia.tsvetanova@iqvia.com
I am a PhD-trained Applied AI Scientist at IQVIA, working within the Data Standardization & Analytics team focused on real-world evidence and global health-data networks. In this role I contribute to transforming diverse observational health-care data into the OMOP Common Data Model, thereby enabling large-scale, reproducible analyses across the global federated OHDSI/OMOP network coordinated by IQVIA — a network spanning over 150 databases in 21 countries, with more than 2 billion patient records.
Prior to my current role, I was a Postdoctoral Research Fellow at the MRC Biostatistics Unit at the University of Cambridge, working with Dr Pavel Mozgunov within the Efficient Study Design theme of the BSU, and Dr Nikos Demiris, a lead statistician in the Cambridge Clinical Trials Unit of the Addenbrooke’s Hospital. Half of my time I spent on the development of novel statistical methodology for early-phase dose-finding trials, and another half – on taking the cancer trials from initial ideas and design to their implementation in clinical practice.
I obtained my PhD in Methodological Statistics from the University of Manchester in December 2023, where I was part of the Prediction Modelling Group within the Centre for Health Informatics. My doctoral research focused on methods for handling missing data throughout the entire clinical prediction model pipeline — from development to validation and deployment. I investigated which missing data handling approaches were compatible across these stages and under which missingness mechanisms they led to bias in the model’s predictive performance. This work was centred on models developed with routinely collected health data, where missing data is a common challenge. I explored commonly used methods such as complete case analysis (CCA) and mean/mode imputation, more sophisticated approaches like regression and multiple imputation, and novel emerging techniques, including the pattern submodel method. The ultimate goal of my research was to develop recommendations for handling missing data in clinical risk prediction, where the aim is to optimise the predictive performance of these models. I was supervised by Dr Glen Martin, Dr Matthew Sperrin, Prof Niels Peek, Dr David Jenkins and Prof Iain Buchan.
During my PhD, I completed two internships in Machine Learning working at AstraZeneca and Microsoft. At AstraZeneca, I used large language models (LLMs) to extract critical insights from the clinical trial literature, supporting drug discovery, repurposing, and competitive analysis. At Microsoft, I explored the problem of missing data in clinical risk prediction from a machine learning perspective, assessing whether models are susceptible to changes in the missingness mechanisms and the approach to handling missing data.
My Google Scholar page can be found here.Template Design by Aditya Grover.