Output Survey
The survey framework computes quality metrics over a BabelBetes output directory and generates an HTML report. It is built around surveys — immutable Parquet files that capture patient and study metric aggregates at a point in time. This is useful to describe and compare study outputs as well as to check for (un)intended changes in the output for example after making changes to the code base.
Usage
Compute and report
# Compute all metrics and save a survey
python -m babelbetes.survey survey --data-dir data/out
# Generate an HTML report from the latest survey
python -m babelbetes.survey report
# Include circadian and gap/chunk figures (requires raw time series data)
python -m babelbetes.survey report --data-dir data/out
Why --data-dir is optional for report
Most figures are generated from the pre-computed survey files.
--data-dir is only needed for figures that cannot be computed from aggregated stats:
circadian patterns (moving average by hour of day) and gap/chunk CDFs —
both require access to individual timestamps.
Surveys are saved to data/out/survey/surveys/ and the HTML report is written to data/out/survey/reports/report_<timestamp>.html.
Track changes with diff
# Compare the two most recent surveys — flags changes > 5%
python -m babelbetes.survey diff
# Or compare two specific surveys
python -m babelbetes.survey diff \
--a data/out/survey/surveys/20250101_000000_study_stats.parquet \
--b data/out/survey/surveys/20250201_000000_study_stats.parquet
The diff covers all study-level metrics, including the per-patient aggregates (e.g. tir_study_gm — geometric mean of individual patient TIR values across a study). Metrics that changed by more than 5% are flagged with ⚠️.
Note
The diff currently operates on study-level surveys only. Patient-level diffing is not yet implemented.
Explore a single study or patient
For targeted exploration without saving a survey, call the compute functions directly:
from babelbetes.src import data_store
from babelbetes.survey import compute
import pandas as pd
# Scope to one study, or filter further to one patient
store = data_store.load("data/out", studies=["DCLP3"])
store = {dt: df[df["patient_id"] == "P001"] for dt, df in store.items()}
patient_records = (
compute.compute_cgm_stats(store["cgm"])
+ compute.compute_basal_stats(store["basal"])
+ compute.compute_bolus_stats(store["bolus"])
+ compute.compute_complete_days(store)
)
tdd_df = compute.compute_tdd_per_patient(store)
tdd_records = compute.compute_tdd_stats(tdd_df)
stats = pd.DataFrame(patient_records + tdd_records)
print(stats[stats["data_type"] == "cgm"].pivot_table(
index="patient_id", columns="metric", values="value"
))
Architecture
The framework is split into four modules:
| Module | Responsibility |
|---|---|
compute |
Pure functions that take DataFrames and return metrics as list[dict] |
survey |
Save and load metric surveys as Parquet files |
figures |
Matplotlib/seaborn figures that accept pre-computed metric DataFrames |
diff |
Compare two study-level surveys and flag significant changes |
__main__.py wires these together for the CLI. The report module renders surveys into a self-contained HTML file.
Metrics are stored in long format — one row per (study, [patient_id,] data_type, metric, value) — so any metric can be filtered, pivoted, or plotted without schema changes.
Survey files
Each survey run produces four Parquet files, all stamped with the same YYYYMMDD_HHMMSS timestamp:
| File | Contents |
|---|---|
<ts>_study_stats.parquet |
Study-level aggregates: columns study, data_type, metric, value, survey_id |
<ts>_patient_stats.parquet |
Per-patient metrics: columns study, patient_id, data_type, metric, value, survey_id |
<ts>_tdd.parquet |
Daily TDD per patient: columns study, patient_id, date, basal, bolus, total, survey_id |
<ts>_cdf_quantiles.parquet |
Pre-computed CDF quantiles: columns study, data_type, quantile_level, value, survey_id |
Metrics reference
Per-patient metrics (in patient_stats) — present for each of cgm, basal, bolus:
| Metric | Description |
|---|---|
row_count |
Total rows including NaN |
nan_count |
Rows with a missing value |
duplicate_count |
Rows with a duplicated timestamp |
min, max |
Range of non-NaN values |
gm, gs |
Geometric mean and std of non-NaN values |
patient_days |
Number of unique calendar days with at least one measurement |
data_fraction_days |
patient_days / calendar_span (0–1) |
missing_days |
Calendar days in the span with no data |
samples_per_day |
row_count / calendar_span |
data_fraction |
Fraction of the total timespan covered by chunks |
chunk_count |
Number of continuous data segments |
gm_chunk_dur_hrs, gs_chunk_dur_hrs |
Geometric mean/std of chunk durations (hours) |
gm_gap_dur_hrs, gs_gap_dur_hrs |
Geometric mean/std of gap durations (hours) |
CGM-specific additions:
| Metric | Description |
|---|---|
tir |
Time-in-range fraction (70–180 mg/dL) |
tar |
Time-above-range fraction (> 180 mg/dL) |
tbr |
Time-below-range fraction (< 70 mg/dL) |
outlier_low |
Count of readings < 40 mg/dL |
outlier_high |
Count of readings > 400 mg/dL |
Gap thresholds used to split continuous chunks: CGM = 30 min, basal = 6 hr, bolus = 16 hr.
TDD metrics (in patient_stats, data_type="tdd"):
basal_gm, basal_gs, basal_min, basal_max, basal_nan_count, bolus_gm, bolus_gs, bolus_min, bolus_max, bolus_nan_count, tdd_gm, tdd_gs, tdd_min, tdd_max, tdd_nan_count, bolus_basal_ratio.
Study-level metrics (in study_stats):
Every per-patient metric is aggregated across patients as a geometric mean (_study_gm) and geometric std (_study_gs). Additionally:
| Metric | Description |
|---|---|
patient_count |
Number of patients per (study, data_type) |
patient_days |
Sum of per-patient patient_days |
complete_days |
Sum of days where CGM + bolus + basal all present |
age_min, age_max, age_gm, age_std |
Age statistics (when age data is available) |
NaN in study-level stats
_study_gs is NaN for studies with a single patient (std is undefined for n=1).
_study_gm for metrics that are zero or negative for all patients (e.g. duplicate_count)
will be absent from the output — geometric mean requires positive values.
API Reference
babelbetes.survey.compute
aggregate_study_stats(patient_stats_df)
Aggregate per-patient stats to study level using vectorised groupby.
Every per-patient metric is aggregated to geometric mean and std across
patients, producing two output metrics suffixed _study_gm / _study_gs.
For example, row_count becomes row_count_study_gm and row_count_study_gs
— it is not summed, so it does not give the total row count for the study.
To get study-level totals, use the explicitly summed metrics below.
Explicitly summed (appear without a suffix):
patient_days— total patient-days across all patients perdata_typecomplete_days— total complete patient-days (CGM + bolus + basal) per patient
Always present (never NaN):
patient_count— number of patients in each (study,data_type) group
May be NaN and are dropped from the output:
_study_gsfor any metric where a study has only one patient (std undefined)_study_gmfor any metric that is zero or negative for all patients (geometric mean requires positive values; e.g.duplicate_count_study_gmis absent when no patient has any duplicates)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
patient_stats_df
|
DataFrame
|
DataFrame with columns |
required |
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of dicts with keys |
list[dict]
|
(no |
compute_age_stats(store)
Study-level age statistics: min, max, geometric mean, and std.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store
|
dict[str, DataFrame]
|
Dict mapping data type to DataFrame. Uses the |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list[dict]
|
List of dicts with keys |
|
Metrics |
list[dict]
|
|
compute_basal_stats(df)
Per-patient basal stats: value metrics and temporal coverage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Basal DataFrame. Required columns: |
required |
Returns:
| Type | Description |
|---|---|
list[dict]
|
Long-format records with columns |
list[dict]
|
|
list[dict]
|
|
list[dict]
|
|
list[dict]
|
|
compute_bolus_stats(df)
Per-patient bolus stats: value metrics and temporal coverage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Bolus DataFrame. Required columns: |
required |
Returns:
| Type | Description |
|---|---|
list[dict]
|
Long-format records with columns |
list[dict]
|
|
list[dict]
|
|
list[dict]
|
|
list[dict]
|
|
compute_cdf_quantiles(store, verbose=False)
Pre-compute CDF quantiles per study for CGM, bolus, and basal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store
|
dict[str, DataFrame]
|
Dict mapping data type to DataFrame. Uses keys |
required |
verbose
|
bool
|
Print per-data-type progress. |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns |
DataFrame
|
401 rows per ( |
compute_cgm_stats(df)
Per-patient CGM stats: value metrics, temporal coverage, TIR/TAR/TBR, and outliers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
CGM DataFrame. Required columns: |
required |
Returns:
| Type | Description |
|---|---|
list[dict]
|
Long-format records with columns |
list[dict]
|
|
list[dict]
|
|
list[dict]
|
|
list[dict]
|
|
list[dict]
|
|
compute_complete_days(store)
Per-patient count of days where CGM, bolus, and basal all have at least one sample.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store
|
dict[str, DataFrame]
|
Dict mapping data type to DataFrame. Uses keys |
required |
Returns:
| Type | Description |
|---|---|
list[dict]
|
Long-format records with columns |
list[dict]
|
|
list[dict]
|
is missing from the store. |
compute_gap_durations(store)
Raw gap and chunk durations per patient for CDF plotting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store
|
dict[str, DataFrame]
|
Dict mapping data type to DataFrame. Uses keys |
required |
Returns:
| Type | Description |
|---|---|
dict[str, DataFrame]
|
Dict keyed by data type. Each value is a DataFrame with columns |
dict[str, DataFrame]
|
|
compute_tdd_per_patient(store, verbose=False)
Daily Total Daily Dose (TDD) per patient for each study.
Requires both bolus and basal keys in the store.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store
|
dict[str, DataFrame]
|
Dict mapping data type to DataFrame. Uses keys |
required |
verbose
|
bool
|
Print per-study progress. |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Wide DataFrame with columns |
compute_tdd_stats(tdd_df)
Per-patient TDD statistics: gm, gs, min, max, nan_count for basal, bolus, total, and ratio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tdd_df
|
DataFrame
|
Wide DataFrame from |
required |
Returns:
| Type | Description |
|---|---|
list[dict]
|
Long-format records with columns |
list[dict]
|
|
list[dict]
|
|
list[dict]
|
|
list[dict]
|
|
babelbetes.survey.figures
Validation figures for the BabelBetes output validation report.
plot_cdfs(cdf_df)
CDF per study for CGM, bolus, and basal — one subplot per data type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cdf_df
|
DataFrame
|
Pre-computed quantile DataFrame from compute.compute_cdf_quantiles(). Columns: [study, data_type, quantile_level, value]. |
required |
plot_complete_days_treemap(stats_df)
Treemap: total complete patient-days (CGM + bolus + basal) per study.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats_df
|
DataFrame
|
Columns [study, data_type, metric, value]. Uses rows where metric="complete_days", data_type="complete". |
required |
plot_days_per_study(stats_df)
Grouped bar chart: patient-days per study, split by data type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats_df
|
DataFrame
|
Columns [study, data_type, metric, value]. Uses rows where metric="patient_days" (cgm/bolus/basal) and metric="complete_days" (data_type="complete"). |
required |
plot_gap_chunk_cdfs(gap_dur_dict)
CDF of gap and chunk durations per data type, coloured by study.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gap_dur_dict
|
dict[str, DataFrame]
|
Output of compute.compute_gap_durations(). Dict keyed by data_type; each DataFrame has columns [study, patient_id, kind, dur_hrs]. |
required |
plot_gm_vs_gs(patient_stats_df)
Scatter plot of per-patient geometric mean vs geometric std, coloured by study.
One subplot per data type (cgm, bolus, basal).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
patient_stats_df
|
DataFrame
|
Columns [study, patient_id, data_type, metric, value]. Uses metric="gm" (geometric mean) and metric="gs" (geometric std). |
required |
plot_moving_averages(store)
Moving average of values by hour-of-day per study — one subplot per data type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store
|
dict[str, DataFrame]
|
{data_type: df} where df contains a study_name column. cgm df columns: patient_id, study_name, datetime, cgm (float, mg/dL) bolus df columns: patient_id, study_name, datetime, bolus (float, U) basal df columns: patient_id, study_name, datetime, basal_rate (float, U/hr) |
required |
plot_subjects_per_study(stats_df)
Bar chart: number of patients per study.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats_df
|
DataFrame
|
Columns [study, data_type, metric, value]. Uses rows where metric="patient_count", data_type="all". |
required |
plot_tdd_cdfs(tdd_df)
CDF per study for basal TDD, bolus TDD, and total TDD.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tdd_df
|
DataFrame
|
Columns [study, patient_id, date, basal, bolus, total]. basal/bolus/total are daily insulin doses in U/day. |
required |
plot_tdd_split(patient_stats_df, relative=False)
Stacked bar chart of mean daily basal vs bolus TDD per study.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
patient_stats_df
|
DataFrame
|
Columns [study, patient_id, data_type, metric, value]. Uses data_type='tdd', metrics 'basal_gm' and 'bolus_gm'. |
required |
relative
|
bool
|
When True, show percentage split instead of absolute U/day using a seaborn grouped bar chart. A dashed line at 50% is drawn for reference. |
False
|
babelbetes.survey.survey
list_cdf_quantile_surveys()
Return all CDF quantile survey paths sorted chronologically.
list_patient_stats_surveys()
Return all patient stats survey paths sorted chronologically.
list_study_stats_surveys()
Return all study stats survey paths sorted chronologically (oldest first).
list_tdd_surveys()
Return all TDD survey paths sorted chronologically.
load_cdf_quantiles(path=None)
Load a CDF quantiles survey.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str | None
|
Path to a specific Parquet file. Defaults to the latest survey. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns |
load_patient_stats(path=None)
Load a patient stats survey.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str | None
|
Path to a specific Parquet file. Defaults to the latest survey. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns |
DataFrame
|
|
load_study_stats(path=None)
Load a study stats survey.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str | None
|
Path to a specific Parquet file. Defaults to the latest survey. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns |
load_tdd(path=None)
Load a TDD survey.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str | None
|
Path to a specific Parquet file. Defaults to the latest survey. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns |
DataFrame
|
|
save_cdf_quantiles(df, survey_id=None)
Save pre-computed CDF quantiles as a Parquet survey.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame with columns |
required |
survey_id
|
str | None
|
Optional timestamp string. Generated if not provided. |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the saved Parquet file. |
save_patient_stats(records, survey_id=None)
Save per-patient stats as a long-format Parquet survey.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
records
|
list[dict]
|
List of |
required |
survey_id
|
str | None
|
Optional timestamp string. Generated if not provided. |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the saved Parquet file. |
save_study_stats(records, survey_id=None)
Save scalar study-level stats as a long-format Parquet survey.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
records
|
list[dict]
|
List of |
required |
survey_id
|
str | None
|
Optional timestamp string (YYYYMMDD_HHMMSS). Generated if not provided. |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the saved Parquet file. |
save_tdd(df, survey_id=None)
Save per-patient daily TDD as a wide-format Parquet survey.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Wide DataFrame with columns |
required |
survey_id
|
str | None
|
Optional timestamp string. Generated if not provided. |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the saved Parquet file. |
babelbetes.survey.diff
diff_study_stats(snap_a, snap_b)
Diff two stats snapshots.
Uses an outer merge on (study, data_type, metric) so that: - New metrics in snap_b appear with value_before=NaN (status='added') - Metrics only in snap_a appear with value_after=NaN (status='removed') - Metrics in both get delta and pct_change computed
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
snap_a
|
DataFrame
|
Earlier snapshot DataFrame from snapshot.load_study_stats() |
required |
snap_b
|
DataFrame
|
Later snapshot DataFrame from snapshot.load_study_stats() |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns: study, data_type, metric, value_before, value_after, |
DataFrame
|
delta, pct_change, status, flagged |
format_diff_report(diff_df)
Format a diff DataFrame as a human-readable text table.