BaseDataset — R6 infrastructure for clinical event datasets
Source:R/Dataset_BaseDataset.R
BaseDataset.RdBaseDataset — R6 infrastructure for clinical event datasets
BaseDataset — R6 infrastructure for clinical event datasets
Details
The BaseDataset class mirrors rhealth's BaseDataset, providing a
fully‑featured, YAML driven loader that converts multi‑table electronic
health records into a single event table. It supports:
URL or local‐file ingestion (with automatic
.csv/.csv.gzfallback).Per‑table joins as declared in the config.
Flexible timestamp parsing (single or multi‑column).
A
devmode that caps the number of patients for rapid prototyping.Multi‑threaded sample generation with progress bars.
Down‑stream, it cooperates with BaseTask (task definition),
Patient (per‑subject wrapper), and SampleDataset (collection of
input/output pairs).
Dependencies
Polars is used via the polars R package. Parallelism and progress
reporting require future, future.apply, and progressr.
Public fields
rootRoot directory (or URL prefix) for data files.
tablesCharacter vector of table names to ingest.
dataset_nameHuman‑readable dataset label.
configParsed YAML configuration list.
devLogical flag — when TRUE limits to 1 000 patients.
global_event_dfA polars LazyFrame with all events combined.
.collected_global_event_dfPolars dataframe storing all global events.
.unique_patient_idsCharacter vector of unique patient IDs.
Methods
Method new()
Instantiate a BaseDataset.
Usage
BaseDataset$new(
root,
tables,
dataset_name = NULL,
config_path = NULL,
dev = FALSE
)Arguments
rootCharacter. Root directory / URL prefix where CSV files live.
tablesCharacter vector of table keys defined in the config.
dataset_nameOptional custom name; defaults to the R6 class name.
config_pathPath to YAML or schema describing each table.
devLogical. If TRUE, limits to 1000 patients for speed.
Method collected_global_event_df()
Materialise (collect) the lazy event dataframe. In dev‑mode only the first 1000 patients are kept.
Method load_table()
Load one table, apply joins, lowercase columns, and standardise to the event schema.
Method set_task()
Apply a BaseTask to build a SampleDataset.