Output Survey

The survey framework computes quality metrics over a BabelBetes output directory and generates an HTML report. It is built around surveys — immutable Parquet files that capture patient and study metric aggregates at a point in time. This is useful to describe and compare study outputs as well as to check for (un)intended changes in the output for example after making changes to the code base.

Usage

Compute and report

# Compute all metrics and save a survey
python -m babelbetes.survey survey --data-dir data/out

# Generate an HTML report from the latest survey
python -m babelbetes.survey report

# Include circadian and gap/chunk figures (requires raw time series data)
python -m babelbetes.survey report --data-dir data/out

Why --data-dir is optional for report

Most figures are generated from the pre-computed survey files. --data-dir is only needed for figures that cannot be computed from aggregated stats: circadian patterns (moving average by hour of day) and gap/chunk CDFs — both require access to individual timestamps.

Surveys are saved to data/out/survey/surveys/ and the HTML report is written to data/out/survey/reports/report_<timestamp>.html.

Track changes with `diff`

# Compare the two most recent surveys — flags changes > 5%
python -m babelbetes.survey diff

# Or compare two specific surveys
python -m babelbetes.survey diff \
    --a data/out/survey/surveys/20250101_000000_study_stats.parquet \
    --b data/out/survey/surveys/20250201_000000_study_stats.parquet

The diff covers all study-level metrics, including the per-patient aggregates (e.g. tir_study_gm — geometric mean of individual patient TIR values across a study). Metrics that changed by more than 5% are flagged with ⚠️.

Note

The diff currently operates on study-level surveys only. Patient-level diffing is not yet implemented.

Explore a single study or patient

For targeted exploration without saving a survey, call the compute functions directly:

from babelbetes.src import data_store
from babelbetes.survey import compute
import pandas as pd

# Scope to one study, or filter further to one patient
store = data_store.load("data/out", studies=["DCLP3"])
store = {dt: df[df["patient_id"] == "P001"] for dt, df in store.items()}

patient_records = (
    compute.compute_cgm_stats(store["cgm"])
    + compute.compute_basal_stats(store["basal"])
    + compute.compute_bolus_stats(store["bolus"])
    + compute.compute_complete_days(store)
)
tdd_df      = compute.compute_tdd_per_patient(store)
tdd_records = compute.compute_tdd_stats(tdd_df)

stats = pd.DataFrame(patient_records + tdd_records)
print(stats[stats["data_type"] == "cgm"].pivot_table(
    index="patient_id", columns="metric", values="value"
))

Architecture

The framework is split into four modules:

Module	Responsibility
`compute`	Pure functions that take DataFrames and return metrics as `list[dict]`
`survey`	Save and load metric surveys as Parquet files
`figures`	Matplotlib/seaborn figures that accept pre-computed metric DataFrames
`diff`	Compare two study-level surveys and flag significant changes

__main__.py wires these together for the CLI. The report module renders surveys into a self-contained HTML file.

Metrics are stored in long format — one row per (study, [patient_id,] data_type, metric, value) — so any metric can be filtered, pivoted, or plotted without schema changes.

Survey files

Each survey run produces four Parquet files, all stamped with the same YYYYMMDD_HHMMSS timestamp:

File	Contents
`<ts>_study_stats.parquet`	Study-level aggregates: columns `study`, `data_type`, `metric`, `value`, `survey_id`
`<ts>_patient_stats.parquet`	Per-patient metrics: columns `study`, `patient_id`, `data_type`, `metric`, `value`, `survey_id`
`<ts>_tdd.parquet`	Daily TDD per patient: columns `study`, `patient_id`, `date`, `basal`, `bolus`, `total`, `survey_id`
`<ts>_cdf_quantiles.parquet`	Pre-computed CDF quantiles: columns `study`, `data_type`, `quantile_level`, `value`, `survey_id`

Metrics reference

Per-patient metrics (in patient_stats) — present for each of cgm, basal, bolus:

Metric	Description
`row_count`	Total rows including NaN
`nan_count`	Rows with a missing value
`duplicate_count`	Rows with a duplicated timestamp
`min`, `max`	Range of non-NaN values
`gm`, `gs`	Geometric mean and std of non-NaN values
`patient_days`	Number of unique calendar days with at least one measurement
`data_fraction_days`	`patient_days / calendar_span` (0–1)
`missing_days`	Calendar days in the span with no data
`samples_per_day`	`row_count / calendar_span`
`data_fraction`	Fraction of the total timespan covered by chunks
`chunk_count`	Number of continuous data segments
`gm_chunk_dur_hrs`, `gs_chunk_dur_hrs`	Geometric mean/std of chunk durations (hours)
`gm_gap_dur_hrs`, `gs_gap_dur_hrs`	Geometric mean/std of gap durations (hours)

CGM-specific additions:

Metric	Description
`tir`	Time-in-range fraction (70–180 mg/dL)
`tar`	Time-above-range fraction (> 180 mg/dL)
`tbr`	Time-below-range fraction (< 70 mg/dL)
`outlier_low`	Count of readings < 40 mg/dL
`outlier_high`	Count of readings > 400 mg/dL

Gap thresholds used to split continuous chunks: CGM = 30 min, basal = 6 hr, bolus = 16 hr.

TDD metrics (in patient_stats, data_type="tdd"):

basal_gm, basal_gs, basal_min, basal_max, basal_nan_count, bolus_gm, bolus_gs, bolus_min, bolus_max, bolus_nan_count, tdd_gm, tdd_gs, tdd_min, tdd_max, tdd_nan_count, bolus_basal_ratio.

Study-level metrics (in study_stats):

Every per-patient metric is aggregated across patients as a geometric mean (_study_gm) and geometric std (_study_gs). Additionally:

Metric	Description
`patient_count`	Number of patients per `(study, data_type)`
`patient_days`	Sum of per-patient `patient_days`
`complete_days`	Sum of days where CGM + bolus + basal all present
`age_min`, `age_max`, `age_gm`, `age_std`	Age statistics (when age data is available)

NaN in study-level stats

_study_gs is NaN for studies with a single patient (std is undefined for n=1). _study_gm for metrics that are zero or negative for all patients (e.g. duplicate_count) will be absent from the output — geometric mean requires positive values.

API Reference

`babelbetes.survey.compute`

`aggregate_study_stats(patient_stats_df)`

Aggregate per-patient stats to study level using vectorised groupby.

Every per-patient metric is aggregated to geometric mean and std across patients, producing two output metrics suffixed _study_gm / _study_gs. For example, row_count becomes row_count_study_gm and row_count_study_gs — it is not summed, so it does not give the total row count for the study. To get study-level totals, use the explicitly summed metrics below.

Explicitly summed (appear without a suffix):

patient_days — total patient-days across all patients per data_type
complete_days — total complete patient-days (CGM + bolus + basal) per patient

Always present (never NaN):

patient_count — number of patients in each (study, data_type) group

May be NaN and are dropped from the output:

_study_gs for any metric where a study has only one patient (std undefined)
_study_gm for any metric that is zero or negative for all patients (geometric mean requires positive values; e.g. duplicate_count_study_gm is absent when no patient has any duplicates)

Parameters:

Name	Type	Description	Default
`patient_stats_df`	`DataFrame`	DataFrame with columns `study`, `patient_id`, `data_type`, `metric`, `value`.	required

Returns:

Type	Description
`list[dict]`	List of dicts with keys `study`, `data_type`, `metric`, `value`
`list[dict]`	(no `patient_id` column). NaN values are dropped.

`compute_age_stats(store)`

Study-level age statistics: min, max, geometric mean, and std.

Parameters:

Name	Type	Description	Default
`store`	`dict[str, DataFrame]`	Dict mapping data type to DataFrame. Uses the `age` key. The DataFrame must have columns `study_name` and `age`.	required

Returns:

Name	Type	Description
	`list[dict]`	List of dicts with keys `study`, `data_type`=`age`, `metric`, `value`.
`Metrics`	`list[dict]`	`age_min`, `age_max`, `age_gm`, `age_std`.

`compute_basal_stats(df)`

Per-patient basal stats: value metrics and temporal coverage.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Basal DataFrame. Required columns: `study_name`, `patient_id`, `datetime`, `basal_rate`.	required

Returns:

Type	Description
`list[dict]`	Long-format records with columns `study`, `patient_id`, `data_type`=`basal`,
`list[dict]`	`metric`, `value`. Metrics: `row_count`, `nan_count`, `duplicate_count`, `min`,
`list[dict]`	`max`, `gm`, `gs`, `patient_days`, `data_fraction_days`, `missing_days`,
`list[dict]`	`samples_per_day`, `data_fraction`, `chunk_count`, `gm_chunk_dur_hrs`,
`list[dict]`	`gs_chunk_dur_hrs`, `gm_gap_dur_hrs`, `gs_gap_dur_hrs`.

`compute_bolus_stats(df)`

Per-patient bolus stats: value metrics and temporal coverage.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Bolus DataFrame. Required columns: `study_name`, `patient_id`, `datetime`, `bolus`.	required

Returns:

Type	Description
`list[dict]`	Long-format records with columns `study`, `patient_id`, `data_type`=`bolus`,
`list[dict]`	`metric`, `value`. Metrics: `row_count`, `nan_count`, `duplicate_count`, `min`,
`list[dict]`	`max`, `gm`, `gs`, `patient_days`, `data_fraction_days`, `missing_days`,
`list[dict]`	`samples_per_day`, `data_fraction`, `chunk_count`, `gm_chunk_dur_hrs`,
`list[dict]`	`gs_chunk_dur_hrs`, `gm_gap_dur_hrs`, `gs_gap_dur_hrs`.

`compute_cdf_quantiles(store, verbose=False)`

Pre-compute CDF quantiles per study for CGM, bolus, and basal.

Parameters:

Name	Type	Description	Default
`store`	`dict[str, DataFrame]`	Dict mapping data type to DataFrame. Uses keys `cgm`, `bolus`, `basal`. Each DataFrame must have a `study_name` column.	required
`verbose`	`bool`	Print per-data-type progress.	`False`

Returns:

Type	Description
`DataFrame`	DataFrame with columns `study`, `data_type`, `quantile_level`, `value`.
`DataFrame`	401 rows per (`study`, `data_type`) pair (0–100% at 0.25% steps).

`compute_cgm_stats(df)`

Per-patient CGM stats: value metrics, temporal coverage, TIR/TAR/TBR, and outliers.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	CGM DataFrame. Required columns: `study_name`, `patient_id`, `datetime`, `cgm`.	required

Returns:

Type	Description
`list[dict]`	Long-format records with columns `study`, `patient_id`, `data_type`=`cgm`,
`list[dict]`	`metric`, `value`. Metrics: `row_count`, `nan_count`, `duplicate_count`, `min`,
`list[dict]`	`max`, `gm`, `gs`, `patient_days`, `data_fraction_days`, `missing_days`,
`list[dict]`	`samples_per_day`, `data_fraction`, `chunk_count`, `gm_chunk_dur_hrs`,
`list[dict]`	`gs_chunk_dur_hrs`, `gm_gap_dur_hrs`, `gs_gap_dur_hrs`, `tir`, `tar`, `tbr`,
`list[dict]`	`outlier_low`, `outlier_high`.

`compute_complete_days(store)`

Per-patient count of days where CGM, bolus, and basal all have at least one sample.

Parameters:

Name	Type	Description	Default
`store`	`dict[str, DataFrame]`	Dict mapping data type to DataFrame. Uses keys `cgm`, `bolus`, `basal`. Each DataFrame must have columns `study_name`, `patient_id`, `datetime`.	required

Returns:

Type	Description
`list[dict]`	Long-format records with columns `study`, `patient_id`, `data_type`=`all`,
`list[dict]`	`metric`=`complete_days`, `value`. Empty list if any of `cgm`/`bolus`/`basal`
`list[dict]`	is missing from the store.

`compute_gap_durations(store)`

Raw gap and chunk durations per patient for CDF plotting.

Parameters:

Name	Type	Description	Default
`store`	`dict[str, DataFrame]`	Dict mapping data type to DataFrame. Uses keys `cgm`, `basal`, `bolus`. Each DataFrame must have columns `study_name`, `patient_id`, `datetime`.	required

Returns:

Type	Description
`dict[str, DataFrame]`	Dict keyed by data type. Each value is a DataFrame with columns `study`,
`dict[str, DataFrame]`	`patient_id`, `kind` (`gap` or `chunk`), `dur_hrs`.

`compute_tdd_per_patient(store, verbose=False)`

Daily Total Daily Dose (TDD) per patient for each study.

Requires both bolus and basal keys in the store.

Parameters:

Name	Type	Description	Default
`store`	`dict[str, DataFrame]`	Dict mapping data type to DataFrame. Uses keys `bolus` and `basal`. Each DataFrame must have a `study_name` column.	required
`verbose`	`bool`	Print per-study progress.	`False`

Returns:

Type	Description
`DataFrame`	Wide DataFrame with columns `study`, `patient_id`, `date`, `basal`, `bolus`, `total`.

`compute_tdd_stats(tdd_df)`

Per-patient TDD statistics: gm, gs, min, max, nan_count for basal, bolus, total, and ratio.

Parameters:

Name	Type	Description	Default
`tdd_df`	`DataFrame`	Wide DataFrame from `compute_tdd_per_patient`. Required columns: `study`, `patient_id`, `date`, `basal`, `bolus`, `total`.	required

Returns:

Type	Description
`list[dict]`	Long-format records with columns `study`, `patient_id`, `data_type`=`tdd`,
`list[dict]`	`metric`, `value`. Metrics: `basal_gm`, `basal_gs`, `basal_min`, `basal_max`,
`list[dict]`	`basal_nan_count`, `bolus_gm`, `bolus_gs`, `bolus_min`, `bolus_max`,
`list[dict]`	`bolus_nan_count`, `tdd_gm`, `tdd_gs`, `tdd_min`, `tdd_max`, `tdd_nan_count`,
`list[dict]`	`bolus_basal_ratio`.

`babelbetes.survey.figures`

Validation figures for the BabelBetes output validation report.

`plot_cdfs(cdf_df)`

CDF per study for CGM, bolus, and basal — one subplot per data type.

Parameters:

Name	Type	Description	Default
`cdf_df`	`DataFrame`	Pre-computed quantile DataFrame from compute.compute_cdf_quantiles(). Columns: [study, data_type, quantile_level, value].	required

`plot_complete_days_treemap(stats_df)`

Treemap: total complete patient-days (CGM + bolus + basal) per study.

Parameters:

Name	Type	Description	Default
`stats_df`	`DataFrame`	Columns [study, data_type, metric, value]. Uses rows where metric="complete_days", data_type="complete".	required

`plot_days_per_study(stats_df)`

Grouped bar chart: patient-days per study, split by data type.

Parameters:

Name	Type	Description	Default
`stats_df`	`DataFrame`	Columns [study, data_type, metric, value]. Uses rows where metric="patient_days" (cgm/bolus/basal) and metric="complete_days" (data_type="complete").	required

`plot_gap_chunk_cdfs(gap_dur_dict)`

CDF of gap and chunk durations per data type, coloured by study.

Parameters:

Name	Type	Description	Default
`gap_dur_dict`	`dict[str, DataFrame]`	Output of compute.compute_gap_durations(). Dict keyed by data_type; each DataFrame has columns [study, patient_id, kind, dur_hrs].	required

`plot_gm_vs_gs(patient_stats_df)`

Scatter plot of per-patient geometric mean vs geometric std, coloured by study.

One subplot per data type (cgm, bolus, basal).

Parameters:

Name	Type	Description	Default
`patient_stats_df`	`DataFrame`	Columns [study, patient_id, data_type, metric, value]. Uses metric="gm" (geometric mean) and metric="gs" (geometric std).	required

`plot_moving_averages(store)`

Moving average of values by hour-of-day per study — one subplot per data type.

Parameters:

Name	Type	Description	Default
`store`	`dict[str, DataFrame]`	{data_type: df} where df contains a study_name column. cgm df columns: patient_id, study_name, datetime, cgm (float, mg/dL) bolus df columns: patient_id, study_name, datetime, bolus (float, U) basal df columns: patient_id, study_name, datetime, basal_rate (float, U/hr)	required

`plot_subjects_per_study(stats_df)`

Bar chart: number of patients per study.

Parameters:

Name	Type	Description	Default
`stats_df`	`DataFrame`	Columns [study, data_type, metric, value]. Uses rows where metric="patient_count", data_type="all".	required

`plot_tdd_cdfs(tdd_df)`

CDF per study for basal TDD, bolus TDD, and total TDD.

Parameters:

Name	Type	Description	Default
`tdd_df`	`DataFrame`	Columns [study, patient_id, date, basal, bolus, total]. basal/bolus/total are daily insulin doses in U/day.	required

`plot_tdd_split(patient_stats_df, relative=False)`

Stacked bar chart of mean daily basal vs bolus TDD per study.

Parameters:

Name	Type	Description	Default
`patient_stats_df`	`DataFrame`	Columns [study, patient_id, data_type, metric, value]. Uses data_type='tdd', metrics 'basal_gm' and 'bolus_gm'.	required
`relative`	`bool`	When True, show percentage split instead of absolute U/day using a seaborn grouped bar chart. A dashed line at 50% is drawn for reference.	`False`

`babelbetes.survey.survey`

`list_cdf_quantile_surveys()`

Return all CDF quantile survey paths sorted chronologically.

`list_patient_stats_surveys()`

Return all patient stats survey paths sorted chronologically.

`list_study_stats_surveys()`

Return all study stats survey paths sorted chronologically (oldest first).

`list_tdd_surveys()`

Return all TDD survey paths sorted chronologically.

`load_cdf_quantiles(path=None)`

Load a CDF quantiles survey.

Parameters:

Name	Type	Description	Default
`path`	`Path \| str \| None`	Path to a specific Parquet file. Defaults to the latest survey.	`None`

Returns:

Type	Description
`DataFrame`	DataFrame with columns `study`, `data_type`, `quantile_level`, `value`, `survey_id`.

`load_patient_stats(path=None)`

Load a patient stats survey.

Parameters:

Name	Type	Description	Default
`path`	`Path \| str \| None`	Path to a specific Parquet file. Defaults to the latest survey.	`None`

Returns:

Type	Description
`DataFrame`	DataFrame with columns `study`, `patient_id`, `data_type`, `metric`, `value`,
`DataFrame`	`survey_id`.

`load_study_stats(path=None)`

Load a study stats survey.

Parameters:

Name	Type	Description	Default
`path`	`Path \| str \| None`	Path to a specific Parquet file. Defaults to the latest survey.	`None`

Returns:

Type	Description
`DataFrame`	DataFrame with columns `study`, `data_type`, `metric`, `value`, `survey_id`.

`load_tdd(path=None)`

Load a TDD survey.

Parameters:

Name	Type	Description	Default
`path`	`Path \| str \| None`	Path to a specific Parquet file. Defaults to the latest survey.	`None`

Returns:

Type	Description
`DataFrame`	DataFrame with columns `study`, `patient_id`, `date`, `basal`, `bolus`, `total`,
`DataFrame`	`survey_id`.

`save_cdf_quantiles(df, survey_id=None)`

Save pre-computed CDF quantiles as a Parquet survey.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	DataFrame with columns `study`, `data_type`, `quantile_level`, `value`.	required
`survey_id`	`str \| None`	Optional timestamp string. Generated if not provided.	`None`

Returns:

Type	Description
`Path`	Path to the saved Parquet file.

`save_patient_stats(records, survey_id=None)`

Save per-patient stats as a long-format Parquet survey.

Parameters:

Name	Type	Description	Default
`records`	`list[dict]`	List of `{study, patient_id, data_type, metric, value}` dicts.	required
`survey_id`	`str \| None`	Optional timestamp string. Generated if not provided.	`None`

Returns:

Type	Description
`Path`	Path to the saved Parquet file.

`save_study_stats(records, survey_id=None)`

Save scalar study-level stats as a long-format Parquet survey.

Parameters:

Name	Type	Description	Default
`records`	`list[dict]`	List of `{study, data_type, metric, value}` dicts.	required
`survey_id`	`str \| None`	Optional timestamp string (YYYYMMDD_HHMMSS). Generated if not provided.	`None`

Returns:

Type	Description
`Path`	Path to the saved Parquet file.

`save_tdd(df, survey_id=None)`

Save per-patient daily TDD as a wide-format Parquet survey.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Wide DataFrame with columns `study`, `patient_id`, `date`, `basal`, `bolus`, `total`.	required
`survey_id`	`str \| None`	Optional timestamp string. Generated if not provided.	`None`

Returns:

Type	Description
`Path`	Path to the saved Parquet file.

`babelbetes.survey.diff`

`diff_study_stats(snap_a, snap_b)`

Diff two stats snapshots.

Uses an outer merge on (study, data_type, metric) so that: - New metrics in snap_b appear with value_before=NaN (status='added') - Metrics only in snap_a appear with value_after=NaN (status='removed') - Metrics in both get delta and pct_change computed

Parameters:

Name	Type	Description	Default
`snap_a`	`DataFrame`	Earlier snapshot DataFrame from snapshot.load_study_stats()	required
`snap_b`	`DataFrame`	Later snapshot DataFrame from snapshot.load_study_stats()	required

Returns:

Type	Description
`DataFrame`	DataFrame with columns: study, data_type, metric, value_before, value_after,
`DataFrame`	delta, pct_change, status, flagged

`format_diff_report(diff_df)`

Format a diff DataFrame as a human-readable text table.

Output Survey

Usage

Compute and report

Track changes with diff

Explore a single study or patient

Architecture

Survey files

Metrics reference

API Reference

babelbetes.survey.compute

aggregate_study_stats(patient_stats_df)

compute_age_stats(store)

compute_basal_stats(df)

compute_bolus_stats(df)

compute_cdf_quantiles(store, verbose=False)

compute_cgm_stats(df)

compute_complete_days(store)

compute_gap_durations(store)

compute_tdd_per_patient(store, verbose=False)

compute_tdd_stats(tdd_df)

babelbetes.survey.figures

plot_cdfs(cdf_df)

plot_complete_days_treemap(stats_df)

plot_days_per_study(stats_df)

plot_gap_chunk_cdfs(gap_dur_dict)

plot_gm_vs_gs(patient_stats_df)

plot_moving_averages(store)

plot_subjects_per_study(stats_df)

plot_tdd_cdfs(tdd_df)

plot_tdd_split(patient_stats_df, relative=False)

babelbetes.survey.survey

list_cdf_quantile_surveys()

list_patient_stats_surveys()

list_study_stats_surveys()

list_tdd_surveys()

load_cdf_quantiles(path=None)

load_patient_stats(path=None)

load_study_stats(path=None)

load_tdd(path=None)

save_cdf_quantiles(df, survey_id=None)

save_patient_stats(records, survey_id=None)

save_study_stats(records, survey_id=None)

save_tdd(df, survey_id=None)

babelbetes.survey.diff

diff_study_stats(snap_a, snap_b)

format_diff_report(diff_df)

Track changes with `diff`

`babelbetes.survey.compute`

`aggregate_study_stats(patient_stats_df)`

`compute_age_stats(store)`

`compute_basal_stats(df)`

`compute_bolus_stats(df)`

`compute_cdf_quantiles(store, verbose=False)`

`compute_cgm_stats(df)`

`compute_complete_days(store)`

`compute_gap_durations(store)`

`compute_tdd_per_patient(store, verbose=False)`

`compute_tdd_stats(tdd_df)`

`babelbetes.survey.figures`

`plot_cdfs(cdf_df)`

`plot_complete_days_treemap(stats_df)`

`plot_days_per_study(stats_df)`

`plot_gap_chunk_cdfs(gap_dur_dict)`

`plot_gm_vs_gs(patient_stats_df)`

`plot_moving_averages(store)`

`plot_subjects_per_study(stats_df)`

`plot_tdd_cdfs(tdd_df)`

`plot_tdd_split(patient_stats_df, relative=False)`

`babelbetes.survey.survey`

`list_cdf_quantile_surveys()`

`list_patient_stats_surveys()`

`list_study_stats_surveys()`

`list_tdd_surveys()`

`load_cdf_quantiles(path=None)`

`load_patient_stats(path=None)`

`load_study_stats(path=None)`

`load_tdd(path=None)`

`save_cdf_quantiles(df, survey_id=None)`

`save_patient_stats(records, survey_id=None)`

`save_study_stats(records, survey_id=None)`

`save_tdd(df, survey_id=None)`

`babelbetes.survey.diff`

`diff_study_stats(snap_a, snap_b)`

`format_diff_report(diff_df)`