BenchmarkEHRShot: Benchmark predictive tasks using EHRShot
Source:R/Task_BenchmarkEHRShot.R
BenchmarkEHRShot.RdTask class for running benchmark evaluations on the EHRShot dataset. Supports multiple categories of prediction tasks including operational outcomes, lab value predictions, new diagnoses, and medical imaging findings.
Details
The BenchmarkEHRShot task class implements predictive modeling tasks from the EHRShot benchmark. Each task uses clinical codes from the ehrshot table as features and generates predictions based on task-specific labels.
Operational Outcomes (Binary Classification):
guo_los: Predicts if length of stay exceeds a threshold
guo_readmission: Predicts hospital readmission
guo_icu: Predicts ICU admission
Lab Values (Multiclass Classification):
lab_thrombocytopenia: Predicts low platelet count severity
lab_hyperkalemia: Predicts high potassium level severity
lab_hypoglycemia: Predicts low blood sugar severity
lab_hyponatremia: Predicts low sodium level severity
lab_anemia: Predicts anemia severity
New Diagnoses (Binary Classification):
new_hypertension: Predicts new hypertension diagnosis
new_hyperlipidemia: Predicts new hyperlipidemia diagnosis
new_pancan: Predicts new pancreatic cancer diagnosis
new_celiac: Predicts new celiac disease diagnosis
new_lupus: Predicts new lupus diagnosis
new_acutemi: Predicts new acute myocardial infarction diagnosis
Medical Imaging (Multilabel Classification):
chexpert: Predicts multiple chest X-ray findings simultaneously (14 possible findings from the CheXpert dataset)
Features
The task uses clinical codes from the ehrshot table as features. Optionally,
you can filter events by OMOP table type using the omop_tables parameter
to focus on specific types of clinical data (e.g., conditions, procedures, drugs).
Data Split
The task automatically assigns samples to train/validation/test splits based on the splits table in the EHRShot dataset.
Super class
RHealth::BaseTask -> BenchmarkEHRShot
Public fields
taskName of the specific benchmark task.
omop_tablesOptional vector of OMOP table names to filter events.
max_seq_lengthMaximum sequence length for codes.
truncation_countCounter for truncated sequences.
task_nameFull task name (BenchmarkEHRShot/task).
input_schemaInput schema specification.
output_schemaOutput schema specification.
tasks_by_categoryList of available tasks organized by category.
Methods
Method new()
Initialize the BenchmarkEHRShot task.
Usage
BenchmarkEHRShot$new(task, omop_tables = NULL, max_seq_length = 2000)Arguments
taskCharacter. The specific benchmark task to run. Must be one of:
Operational outcomes: "guo_los", "guo_readmission", "guo_icu"
Lab values: "lab_thrombocytopenia", "lab_hyperkalemia", "lab_hypoglycemia", "lab_hyponatremia", "lab_anemia"
New diagnoses: "new_hypertension", "new_hyperlipidemia", "new_pancan", "new_celiac", "new_lupus", "new_acutemi"
Medical imaging: "chexpert"
omop_tablesOptional character vector. Names of OMOP tables to filter input events. If specified, only events from ehrshot with matching
omop_tablevalues will be included as features. Common values include: "condition_occurrence", "procedure_occurrence", "drug_exposure", "measurement", "observation".max_seq_lengthInteger. Maximum sequence length for clinical codes. Sequences longer than this will be truncated to the most recent codes. Default is 2000. Set to NULL for no limit (not recommended for large datasets).
Examples
\dontrun{
# Basic task initialization
task <- BenchmarkEHRShot$new(task = "guo_los")
# With OMOP table filtering
task <- BenchmarkEHRShot$new(
task = "new_hypertension",
omop_tables = c("condition_occurrence", "drug_exposure")
)
# With custom max sequence length
task <- BenchmarkEHRShot$new(task = "guo_los", max_seq_length = 5000)
}
Method pre_filter()
Pre-filter hook to retain only necessary columns and optionally filter by OMOP tables.
Method call()
Process a single patient to generate samples.
Examples
if (FALSE) { # \dontrun{
library(RHealth)
# Example 1: Binary classification task (operational outcome)
dataset <- EHRShotDataset$new(
root = "/path/to/ehrshot",
tables = c("ehrshot", "splits", "guo_los"),
dev = TRUE
)
task <- BenchmarkEHRShot$new(task = "guo_los")
samples <- dataset$set_task(task = task)
# Example 2: Multiclass classification task (lab values)
dataset <- EHRShotDataset$new(
root = "/path/to/ehrshot",
tables = c("ehrshot", "splits", "lab_thrombocytopenia"),
dev = TRUE
)
task <- BenchmarkEHRShot$new(task = "lab_thrombocytopenia")
samples <- dataset$set_task(task = task)
# Example 3: Multilabel classification task (medical imaging)
dataset <- EHRShotDataset$new(
root = "/path/to/ehrshot",
tables = c("ehrshot", "splits", "chexpert"),
dev = FALSE
)
task <- BenchmarkEHRShot$new(task = "chexpert")
samples <- dataset$set_task(task = task)
# Example 4: Filter by specific OMOP tables
task <- BenchmarkEHRShot$new(
task = "new_hypertension",
omop_tables = c("condition_occurrence", "drug_exposure")
)
samples <- dataset$set_task(task = task)
} # }
## ------------------------------------------------
## Method `BenchmarkEHRShot$new`
## ------------------------------------------------
if (FALSE) { # \dontrun{
# Basic task initialization
task <- BenchmarkEHRShot$new(task = "guo_los")
# With OMOP table filtering
task <- BenchmarkEHRShot$new(
task = "new_hypertension",
omop_tables = c("condition_occurrence", "drug_exposure")
)
# With custom max sequence length
task <- BenchmarkEHRShot$new(task = "guo_los", max_seq_length = 5000)
} # }