Processor for irregular time series data with missing values. Supports uniform resampling, two imputation strategies (forward-fill and zero-fill), and automatic z-score normalization using training data statistics.
Details
The processor performs three main steps:
Resampling: Converts irregular time series to uniform time grid based on
sampling_rateImputation: Fills missing values using either forward-fill or zero-fill strategy
Normalization (default): Applies z-score normalization:
(x - mean) / std
Normalization is enabled by default (normalize = TRUE):
Call
fit()on training samples to compute feature-wise mean and standard deviationThe same statistics are used for all subsequent
process()calls (train/val/test)This ensures no data leakage between training and validation/test sets
Super classes
RHealth::Processor -> RHealth::FeatureProcessor -> TimeseriesProcessor
Public fields
sampling_rateA lubridate duration indicating the sampling step size.
impute_strategyA character string: 'forward_fill' or 'zero'.
normalizeLogical flag indicating whether to apply z-score normalization. Default TRUE.
feature_meansNumeric vector of feature means (computed during fit).
feature_stdsNumeric vector of feature standard deviations (computed during fit).
.sizeNumber of features (set on first call to process()).
Methods
Inherited methods
Method new()
Initialize the processor with a sampling rate, imputation strategy, and normalization option.
Usage
TimeseriesProcessor$new(
sampling_rate = lubridate::dhours(1),
impute_strategy = "forward_fill",
normalize = TRUE
)Method process()
Process irregular time series into uniformly sampled tensor. Step 1: uniformly sample time points and place values at correct positions. Step 2: impute missing entries using selected strategy. Step 3: (optional) apply z-score normalization using training statistics.
Examples
if (FALSE) { # \dontrun{
library(torch)
library(lubridate)
# Create training samples with timeseries data
train_samples <- list(
list(
patient_id = 1,
labs = list(
timestamps = as.POSIXct(c("2020-01-01 00:00:00", "2020-01-01 02:00:00"), tz = "UTC"),
values = matrix(c(100, 50, 150, 60), ncol = 2)
)
),
list(
patient_id = 2,
labs = list(
timestamps = as.POSIXct(c("2020-01-01 00:00:00", "2020-01-01 01:00:00"), tz = "UTC"),
values = matrix(c(120, 55, 180, 65), ncol = 2)
)
)
)
# Example 1: Default behavior (with normalization)
processor <- TimeseriesProcessor$new(
sampling_rate = dhours(1),
impute_strategy = "forward_fill"
# normalize = TRUE by default
)
# Fit on training data to compute statistics
processor$fit(train_samples, "labs")
# Process samples (applies normalization)
result <- processor$process(train_samples[[1]]$labs)
# Check normalization statistics
print(processor$feature_means) # Feature means from training data
print(processor$feature_stds) # Feature standard deviations
# Example 2: Disable normalization if needed
processor_no_norm <- TimeseriesProcessor$new(
sampling_rate = dhours(1),
impute_strategy = "forward_fill",
normalize = FALSE # Explicitly disable normalization
)
result_no_norm <- processor_no_norm$process(train_samples[[1]]$labs)
} # }