Causal machine learning for medical texts
- Autori: Alessandro Albano; Chiara Di Maria; Mariangela Sciandra; Antonella Plaia
- Anno di pubblicazione: 2025
- Tipologia: Capitolo o Saggio
- OA Link: http://hdl.handle.net/10447/674633
Abstract
Text analysis has become increasingly common in medical research, especially for tasks like patient diagnosis based on medical notes. However, most existing approaches do not account for causal rela tionships between words and diagnoses. This paper proposes a causal approach using the MIMIC-III dataset to identify words or word pairs that causally affect the probability of receiving a specific diagnosis. We employ causal forests to assess the impact of individual linguistic fac tors on patient outcomes while adjusting for potential confounders. Our analysis reveals significant causal relationships between specific terms in clinical notes and the presence of hypothyroidism diagnosis.