Skip to content

Output Survey

The survey framework computes quality metrics over a BabelBetes output directory and generates an HTML report. It is built around surveys — immutable Parquet files that capture patient and study metric aggregates at a point in time. This is useful to describe and compare study outputs as well as to check for (un)intended changes in the output for example after making changes to the code base.


Usage

Compute and report

# Compute all metrics and save a survey
python -m babelbetes.survey survey --data-dir data/out

# Generate an HTML report from the latest survey
python -m babelbetes.survey report

# Include circadian and gap/chunk figures (requires raw time series data)
python -m babelbetes.survey report --data-dir data/out

Why --data-dir is optional for report

Most figures are generated from the pre-computed survey files. --data-dir is only needed for figures that cannot be computed from aggregated stats: circadian patterns (moving average by hour of day) and gap/chunk CDFs — both require access to individual timestamps.

Surveys are saved to data/out/survey/surveys/ and the HTML report is written to data/out/survey/reports/report_<timestamp>.html.

Track changes with diff

# Compare the two most recent surveys — flags changes > 5%
python -m babelbetes.survey diff

# Or compare two specific surveys
python -m babelbetes.survey diff \
    --a data/out/survey/surveys/20250101_000000_study_stats.parquet \
    --b data/out/survey/surveys/20250201_000000_study_stats.parquet

The diff covers all study-level metrics, including the per-patient aggregates (e.g. tir_study_gm — geometric mean of individual patient TIR values across a study). Metrics that changed by more than 5% are flagged with ⚠️.

Note

The diff currently operates on study-level surveys only. Patient-level diffing is not yet implemented.

Explore a single study or patient

For targeted exploration without saving a survey, call the compute functions directly:

from babelbetes.src import data_store
from babelbetes.survey import compute
import pandas as pd

# Scope to one study, or filter further to one patient
store = data_store.load("data/out", studies=["DCLP3"])
store = {dt: df[df["patient_id"] == "P001"] for dt, df in store.items()}

patient_records = (
    compute.compute_cgm_stats(store["cgm"])
    + compute.compute_basal_stats(store["basal"])
    + compute.compute_bolus_stats(store["bolus"])
    + compute.compute_complete_days(store)
)
tdd_df      = compute.compute_tdd_per_patient(store)
tdd_records = compute.compute_tdd_stats(tdd_df)

stats = pd.DataFrame(patient_records + tdd_records)
print(stats[stats["data_type"] == "cgm"].pivot_table(
    index="patient_id", columns="metric", values="value"
))

Architecture

The framework is split into four modules:

Module Responsibility
compute Pure functions that take DataFrames and return metrics as list[dict]
survey Save and load metric surveys as Parquet files
figures Matplotlib/seaborn figures that accept pre-computed metric DataFrames
diff Compare two study-level surveys and flag significant changes

__main__.py wires these together for the CLI. The report module renders surveys into a self-contained HTML file.

Metrics are stored in long format — one row per (study, [patient_id,] data_type, metric, value) — so any metric can be filtered, pivoted, or plotted without schema changes.

Survey files

Each survey run produces four Parquet files, all stamped with the same YYYYMMDD_HHMMSS timestamp:

File Contents
<ts>_study_stats.parquet Study-level aggregates: columns study, data_type, metric, value, survey_id
<ts>_patient_stats.parquet Per-patient metrics: columns study, patient_id, data_type, metric, value, survey_id
<ts>_tdd.parquet Daily TDD per patient: columns study, patient_id, date, basal, bolus, total, survey_id
<ts>_cdf_quantiles.parquet Pre-computed CDF quantiles: columns study, data_type, quantile_level, value, survey_id

Metrics reference

Per-patient metrics (in patient_stats) — present for each of cgm, basal, bolus:

Metric Description
row_count Total rows including NaN
nan_count Rows with a missing value
duplicate_count Rows with a duplicated timestamp
min, max Range of non-NaN values
gm, gs Geometric mean and std of non-NaN values
patient_days Number of unique calendar days with at least one measurement
data_fraction_days patient_days / calendar_span (0–1)
missing_days Calendar days in the span with no data
samples_per_day row_count / calendar_span
data_fraction Fraction of the total timespan covered by chunks
chunk_count Number of continuous data segments
gm_chunk_dur_hrs, gs_chunk_dur_hrs Geometric mean/std of chunk durations (hours)
gm_gap_dur_hrs, gs_gap_dur_hrs Geometric mean/std of gap durations (hours)

CGM-specific additions:

Metric Description
tir Time-in-range fraction (70–180 mg/dL)
tar Time-above-range fraction (> 180 mg/dL)
tbr Time-below-range fraction (< 70 mg/dL)
outlier_low Count of readings < 40 mg/dL
outlier_high Count of readings > 400 mg/dL

Gap thresholds used to split continuous chunks: CGM = 30 min, basal = 6 hr, bolus = 16 hr.

TDD metrics (in patient_stats, data_type="tdd"):

basal_gm, basal_gs, basal_min, basal_max, basal_nan_count, bolus_gm, bolus_gs, bolus_min, bolus_max, bolus_nan_count, tdd_gm, tdd_gs, tdd_min, tdd_max, tdd_nan_count, bolus_basal_ratio.

Study-level metrics (in study_stats):

Every per-patient metric is aggregated across patients as a geometric mean (_study_gm) and geometric std (_study_gs). Additionally:

Metric Description
patient_count Number of patients per (study, data_type)
patient_days Sum of per-patient patient_days
complete_days Sum of days where CGM + bolus + basal all present
age_min, age_max, age_gm, age_std Age statistics (when age data is available)

NaN in study-level stats

_study_gs is NaN for studies with a single patient (std is undefined for n=1). _study_gm for metrics that are zero or negative for all patients (e.g. duplicate_count) will be absent from the output — geometric mean requires positive values.


API Reference

babelbetes.survey.compute

aggregate_study_stats(patient_stats_df)

Aggregate per-patient stats to study level using vectorised groupby.

Every per-patient metric is aggregated to geometric mean and std across patients, producing two output metrics suffixed _study_gm / _study_gs. For example, row_count becomes row_count_study_gm and row_count_study_gs — it is not summed, so it does not give the total row count for the study. To get study-level totals, use the explicitly summed metrics below.

Explicitly summed (appear without a suffix):

  • patient_days — total patient-days across all patients per data_type
  • complete_days — total complete patient-days (CGM + bolus + basal) per patient

Always present (never NaN):

  • patient_count — number of patients in each (study, data_type) group

May be NaN and are dropped from the output:

  • _study_gs for any metric where a study has only one patient (std undefined)
  • _study_gm for any metric that is zero or negative for all patients (geometric mean requires positive values; e.g. duplicate_count_study_gm is absent when no patient has any duplicates)

Parameters:

Name Type Description Default
patient_stats_df DataFrame

DataFrame with columns study, patient_id, data_type, metric, value.

required

Returns:

Type Description
list[dict]

List of dicts with keys study, data_type, metric, value

list[dict]

(no patient_id column). NaN values are dropped.

compute_age_stats(store)

Study-level age statistics: min, max, geometric mean, and std.

Parameters:

Name Type Description Default
store dict[str, DataFrame]

Dict mapping data type to DataFrame. Uses the age key. The DataFrame must have columns study_name and age.

required

Returns:

Name Type Description
list[dict]

List of dicts with keys study, data_type=age, metric, value.

Metrics list[dict]

age_min, age_max, age_gm, age_std.

compute_basal_stats(df)

Per-patient basal stats: value metrics and temporal coverage.

Parameters:

Name Type Description Default
df DataFrame

Basal DataFrame. Required columns: study_name, patient_id, datetime, basal_rate.

required

Returns:

Type Description
list[dict]

Long-format records with columns study, patient_id, data_type=basal,

list[dict]

metric, value. Metrics: row_count, nan_count, duplicate_count, min,

list[dict]

max, gm, gs, patient_days, data_fraction_days, missing_days,

list[dict]

samples_per_day, data_fraction, chunk_count, gm_chunk_dur_hrs,

list[dict]

gs_chunk_dur_hrs, gm_gap_dur_hrs, gs_gap_dur_hrs.

compute_bolus_stats(df)

Per-patient bolus stats: value metrics and temporal coverage.

Parameters:

Name Type Description Default
df DataFrame

Bolus DataFrame. Required columns: study_name, patient_id, datetime, bolus.

required

Returns:

Type Description
list[dict]

Long-format records with columns study, patient_id, data_type=bolus,

list[dict]

metric, value. Metrics: row_count, nan_count, duplicate_count, min,

list[dict]

max, gm, gs, patient_days, data_fraction_days, missing_days,

list[dict]

samples_per_day, data_fraction, chunk_count, gm_chunk_dur_hrs,

list[dict]

gs_chunk_dur_hrs, gm_gap_dur_hrs, gs_gap_dur_hrs.

compute_cdf_quantiles(store, verbose=False)

Pre-compute CDF quantiles per study for CGM, bolus, and basal.

Parameters:

Name Type Description Default
store dict[str, DataFrame]

Dict mapping data type to DataFrame. Uses keys cgm, bolus, basal. Each DataFrame must have a study_name column.

required
verbose bool

Print per-data-type progress.

False

Returns:

Type Description
DataFrame

DataFrame with columns study, data_type, quantile_level, value.

DataFrame

401 rows per (study, data_type) pair (0–100% at 0.25% steps).

compute_cgm_stats(df)

Per-patient CGM stats: value metrics, temporal coverage, TIR/TAR/TBR, and outliers.

Parameters:

Name Type Description Default
df DataFrame

CGM DataFrame. Required columns: study_name, patient_id, datetime, cgm.

required

Returns:

Type Description
list[dict]

Long-format records with columns study, patient_id, data_type=cgm,

list[dict]

metric, value. Metrics: row_count, nan_count, duplicate_count, min,

list[dict]

max, gm, gs, patient_days, data_fraction_days, missing_days,

list[dict]

samples_per_day, data_fraction, chunk_count, gm_chunk_dur_hrs,

list[dict]

gs_chunk_dur_hrs, gm_gap_dur_hrs, gs_gap_dur_hrs, tir, tar, tbr,

list[dict]

outlier_low, outlier_high.

compute_complete_days(store)

Per-patient count of days where CGM, bolus, and basal all have at least one sample.

Parameters:

Name Type Description Default
store dict[str, DataFrame]

Dict mapping data type to DataFrame. Uses keys cgm, bolus, basal. Each DataFrame must have columns study_name, patient_id, datetime.

required

Returns:

Type Description
list[dict]

Long-format records with columns study, patient_id, data_type=all,

list[dict]

metric=complete_days, value. Empty list if any of cgm/bolus/basal

list[dict]

is missing from the store.

compute_gap_durations(store)

Raw gap and chunk durations per patient for CDF plotting.

Parameters:

Name Type Description Default
store dict[str, DataFrame]

Dict mapping data type to DataFrame. Uses keys cgm, basal, bolus. Each DataFrame must have columns study_name, patient_id, datetime.

required

Returns:

Type Description
dict[str, DataFrame]

Dict keyed by data type. Each value is a DataFrame with columns study,

dict[str, DataFrame]

patient_id, kind (gap or chunk), dur_hrs.

compute_tdd_per_patient(store, verbose=False)

Daily Total Daily Dose (TDD) per patient for each study.

Requires both bolus and basal keys in the store.

Parameters:

Name Type Description Default
store dict[str, DataFrame]

Dict mapping data type to DataFrame. Uses keys bolus and basal. Each DataFrame must have a study_name column.

required
verbose bool

Print per-study progress.

False

Returns:

Type Description
DataFrame

Wide DataFrame with columns study, patient_id, date, basal, bolus, total.

compute_tdd_stats(tdd_df)

Per-patient TDD statistics: gm, gs, min, max, nan_count for basal, bolus, total, and ratio.

Parameters:

Name Type Description Default
tdd_df DataFrame

Wide DataFrame from compute_tdd_per_patient. Required columns: study, patient_id, date, basal, bolus, total.

required

Returns:

Type Description
list[dict]

Long-format records with columns study, patient_id, data_type=tdd,

list[dict]

metric, value. Metrics: basal_gm, basal_gs, basal_min, basal_max,

list[dict]

basal_nan_count, bolus_gm, bolus_gs, bolus_min, bolus_max,

list[dict]

bolus_nan_count, tdd_gm, tdd_gs, tdd_min, tdd_max, tdd_nan_count,

list[dict]

bolus_basal_ratio.

babelbetes.survey.figures

Validation figures for the BabelBetes output validation report.

plot_cdfs(cdf_df)

CDF per study for CGM, bolus, and basal — one subplot per data type.

Parameters:

Name Type Description Default
cdf_df DataFrame

Pre-computed quantile DataFrame from compute.compute_cdf_quantiles(). Columns: [study, data_type, quantile_level, value].

required

plot_complete_days_treemap(stats_df)

Treemap: total complete patient-days (CGM + bolus + basal) per study.

Parameters:

Name Type Description Default
stats_df DataFrame

Columns [study, data_type, metric, value]. Uses rows where metric="complete_days", data_type="complete".

required

plot_days_per_study(stats_df)

Grouped bar chart: patient-days per study, split by data type.

Parameters:

Name Type Description Default
stats_df DataFrame

Columns [study, data_type, metric, value]. Uses rows where metric="patient_days" (cgm/bolus/basal) and metric="complete_days" (data_type="complete").

required

plot_gap_chunk_cdfs(gap_dur_dict)

CDF of gap and chunk durations per data type, coloured by study.

Parameters:

Name Type Description Default
gap_dur_dict dict[str, DataFrame]

Output of compute.compute_gap_durations(). Dict keyed by data_type; each DataFrame has columns [study, patient_id, kind, dur_hrs].

required

plot_gm_vs_gs(patient_stats_df)

Scatter plot of per-patient geometric mean vs geometric std, coloured by study.

One subplot per data type (cgm, bolus, basal).

Parameters:

Name Type Description Default
patient_stats_df DataFrame

Columns [study, patient_id, data_type, metric, value]. Uses metric="gm" (geometric mean) and metric="gs" (geometric std).

required

plot_moving_averages(store)

Moving average of values by hour-of-day per study — one subplot per data type.

Parameters:

Name Type Description Default
store dict[str, DataFrame]

{data_type: df} where df contains a study_name column. cgm df columns: patient_id, study_name, datetime, cgm (float, mg/dL) bolus df columns: patient_id, study_name, datetime, bolus (float, U) basal df columns: patient_id, study_name, datetime, basal_rate (float, U/hr)

required

plot_subjects_per_study(stats_df)

Bar chart: number of patients per study.

Parameters:

Name Type Description Default
stats_df DataFrame

Columns [study, data_type, metric, value]. Uses rows where metric="patient_count", data_type="all".

required

plot_tdd_cdfs(tdd_df)

CDF per study for basal TDD, bolus TDD, and total TDD.

Parameters:

Name Type Description Default
tdd_df DataFrame

Columns [study, patient_id, date, basal, bolus, total]. basal/bolus/total are daily insulin doses in U/day.

required

plot_tdd_split(patient_stats_df, relative=False)

Stacked bar chart of mean daily basal vs bolus TDD per study.

Parameters:

Name Type Description Default
patient_stats_df DataFrame

Columns [study, patient_id, data_type, metric, value]. Uses data_type='tdd', metrics 'basal_gm' and 'bolus_gm'.

required
relative bool

When True, show percentage split instead of absolute U/day using a seaborn grouped bar chart. A dashed line at 50% is drawn for reference.

False

babelbetes.survey.survey

list_cdf_quantile_surveys()

Return all CDF quantile survey paths sorted chronologically.

list_patient_stats_surveys()

Return all patient stats survey paths sorted chronologically.

list_study_stats_surveys()

Return all study stats survey paths sorted chronologically (oldest first).

list_tdd_surveys()

Return all TDD survey paths sorted chronologically.

load_cdf_quantiles(path=None)

Load a CDF quantiles survey.

Parameters:

Name Type Description Default
path Path | str | None

Path to a specific Parquet file. Defaults to the latest survey.

None

Returns:

Type Description
DataFrame

DataFrame with columns study, data_type, quantile_level, value, survey_id.

load_patient_stats(path=None)

Load a patient stats survey.

Parameters:

Name Type Description Default
path Path | str | None

Path to a specific Parquet file. Defaults to the latest survey.

None

Returns:

Type Description
DataFrame

DataFrame with columns study, patient_id, data_type, metric, value,

DataFrame

survey_id.

load_study_stats(path=None)

Load a study stats survey.

Parameters:

Name Type Description Default
path Path | str | None

Path to a specific Parquet file. Defaults to the latest survey.

None

Returns:

Type Description
DataFrame

DataFrame with columns study, data_type, metric, value, survey_id.

load_tdd(path=None)

Load a TDD survey.

Parameters:

Name Type Description Default
path Path | str | None

Path to a specific Parquet file. Defaults to the latest survey.

None

Returns:

Type Description
DataFrame

DataFrame with columns study, patient_id, date, basal, bolus, total,

DataFrame

survey_id.

save_cdf_quantiles(df, survey_id=None)

Save pre-computed CDF quantiles as a Parquet survey.

Parameters:

Name Type Description Default
df DataFrame

DataFrame with columns study, data_type, quantile_level, value.

required
survey_id str | None

Optional timestamp string. Generated if not provided.

None

Returns:

Type Description
Path

Path to the saved Parquet file.

save_patient_stats(records, survey_id=None)

Save per-patient stats as a long-format Parquet survey.

Parameters:

Name Type Description Default
records list[dict]

List of {study, patient_id, data_type, metric, value} dicts.

required
survey_id str | None

Optional timestamp string. Generated if not provided.

None

Returns:

Type Description
Path

Path to the saved Parquet file.

save_study_stats(records, survey_id=None)

Save scalar study-level stats as a long-format Parquet survey.

Parameters:

Name Type Description Default
records list[dict]

List of {study, data_type, metric, value} dicts.

required
survey_id str | None

Optional timestamp string (YYYYMMDD_HHMMSS). Generated if not provided.

None

Returns:

Type Description
Path

Path to the saved Parquet file.

save_tdd(df, survey_id=None)

Save per-patient daily TDD as a wide-format Parquet survey.

Parameters:

Name Type Description Default
df DataFrame

Wide DataFrame with columns study, patient_id, date, basal, bolus, total.

required
survey_id str | None

Optional timestamp string. Generated if not provided.

None

Returns:

Type Description
Path

Path to the saved Parquet file.

babelbetes.survey.diff

diff_study_stats(snap_a, snap_b)

Diff two stats snapshots.

Uses an outer merge on (study, data_type, metric) so that: - New metrics in snap_b appear with value_before=NaN (status='added') - Metrics only in snap_a appear with value_after=NaN (status='removed') - Metrics in both get delta and pct_change computed

Parameters:

Name Type Description Default
snap_a DataFrame

Earlier snapshot DataFrame from snapshot.load_study_stats()

required
snap_b DataFrame

Later snapshot DataFrame from snapshot.load_study_stats()

required

Returns:

Type Description
DataFrame

DataFrame with columns: study, data_type, metric, value_before, value_after,

DataFrame

delta, pct_change, status, flagged

format_diff_report(diff_df)

Format a diff DataFrame as a human-readable text table.