CORE-BEHRT: A Carefully Optimized and Rigorously Evaluated BEHRT
The widespread adoption of Electronic Health Records (EHR) has increased available healthcare data, enabling the use of NLP and Computer Vision models in EHR research. BERT-based models, like BEHRT and Med-BERT, have become popular, though their design choices remain underexplored. This study optimizes BERT-based EHR modeling, showing that improving data representation and training protocols can enhance performance. Evaluations across 25 clinical tasks demonstrated significant performance increases in 17 tasks, highlighting the models' generalizability. These findings provide a foundation for future work and aim to increase trust in BERT-based EHR models.