My research background

My PhD research:
During my PhD program, I actively participated in the advancement of machine learning models and their applications within the healthcare domain. Specifically, my research focused on the utilization of semi-supervised learning models and their adaptation to the field of survival analysis. The primary focus of my PhD thesis revolved around the development of predictive models tailored for critically ill patients diagnosed with acute kidney injury (AKI). Within my PhD project, I pursued both methodological objectives, contributing to the field of machine learning, and medical application objectives, resulting in enhanced post-ICU policies for individuals with AKI. By leveraging electronic health record (EHR) data, a personalized risk profile was constructed for each AKI survivor upon their discharge from the intensive care unit (ICU). This enabled the creation of a customized follow-up plan, aimed at improving patient care and outcomes. Machine learning-based prediction models were developed to forecast outcomes following stage 3 Acute Kidney Injury (AKI) events in the ICU, including progression to chronic kidney disease (CKD) and mortality. Results demonstrated the superior performance of random forest (RF) and XGBoost models compared to baseline models, with the potential to aid clinicians in making informed decisions for critically ill patients with severe AKI regarding CKD development and survival rates, particularly when unlabeled data is integrated into the analysis.

The following is a selection of my recently published or soon-to-be published research papers. Check out my Google Scholar for the list of all publications.

Supervised fuzzy partitioning

This paper thus presents a generative model extending the centroid-based clustering approach to be applicable to classification and regression tasks. Given an arbitrary loss function, the proposed approach termed Supervised Fuzzy Partitioning (SFP) incorporates label information into its objective function through a surrogate term penalizing the empirical risk. Entropy-based regularization is also employed to fuzzify the partition and to weight features, enabling the method to capture more complex patterns, identify significant features, and yield better performance facing high-dimensional data.

Predicting Survival Outcomes in the Presence of Unlabeled Data

Many clinical studies require the follow-up of patients over time. This is challenging: apart from frequently observed drop-out, there are often also organizational and financial challenges, which can lead to reduced data collection and, in turn, can complicate subsequent analyses. In contrast, there is often plenty of baseline data available of patients with similar characteristics and background information, e.g. from patients that did not consent to be followed over time or from patients that fall outside the study time window. In this article, we investigate whether we can benefit from the inclusion of such unlabeled data instances to predict accurate survival times. We also show that integrating the partial supervision provided by censored data in a semi-supervised wrapper approach generally provides the best results, often achieving high improvements, compared to not using unlabeled data.

Predicting outcomes of acute kidney injury in critically ill patients using machine learning

In this study, machine learning-based prediction models were developed to forecast outcomes after stage 3 Acute Kidney Injury (AKI) in intensive care unit (ICU) patients. The models aimed to predict the progression to chronic kidney disease (CKD) and mortality. A prospective observational study was conducted using medical records of 101 critically ill patients with AKI stage 3. Random forest and XGBoost algorithms demonstrated superior performance compared to baseline models in predicting CKD (AUPR: 0.895 and 0.848) and mortality (c-index: 0.8248) respectively. The incorporation of unlabeled data enhanced the performance of survival analysis tasks. These machine learning models have potential in aiding clinicians' decision-making for post-discharge management of severely AKI patients at risk of CKD development.

Comparison between Cystatin C- and Creatinine-Based Estimated Glomerular Filtration Rate in the Follow-Up of Patients Recovering from a Stage-3 AKI in ICU

Serum creatinine (SCr), the most widely used biomarker to evaluate kidney function, does not always accurately predict the glomerular filtration rate (GFR) since it is affected by some non-GFR determinants such as muscle mass and recent meat ingestion. Researchers and clinicians have gained interest in cystatin C (CysC), another biomarker of kidney function. The study objective was to compare GFR estimation using SCr and CysC in detecting CKD over a 1-year follow-up after an AKI-stage 3 event in the ICU, as well as to analyze the association between eGFR (using SCr and CysC) and mortality after the AKI event.