Code Reference
This part of the project documentation focuses on
an information-oriented approach. Use it as a
reference for the technical implementation of the
BabelBetes project code.
babelbetes.run_functions
run_functions.py
This script performs data normalization on raw study data found in the data/raw directory.
Execution
python run_functions.py
Process Overview:
1. Identifies the appropriate handler class (subclass of studydataset) for each folder in the data/raw directory (see supported studies).
2. Loads the study data into memory.
3. Extracts bolus, basal, CGM event histories, and age data into a standardized format (see Output Format).
4. Saves the extracted data as CSV files.
Output format:
The outptut format is standardized across all studies and follows the definitions of the studydataset base class.
Boluses
bolus_history.csv: Event stream of all bolus delivery events. Standard boluses are assumed to be delivered immediately.
| Column Name | Type | Description |
|---|---|---|
| patient_id | str | Patient ID |
| datetime | pd.Timestamp | Datetime of the bolus event |
| bolus | float | Actual delivered bolus amount in units |
| delivery_duration | pd.Timedelta | Duration of the bolus delivery |
Basal Rates
basal_history.csv:Event stream of basal rates, accounting for temporary basal adjustments, pump suspends, and closed-loop modes. The basal rates are active until the next rate is reported.
| Column Name | Type | Description |
|---|---|---|
| patient_id | str | Patient ID |
| datetime | pd.Timestamp | Datetime of the basal rate start event |
| basal_rate | float | Basal rate in units per hour |
CGM (Continuous Glucose Monitor)
cgm_history.csv: Event stream of CGM values.
| Column Name | Type | Description |
|---|---|---|
| patient_id | str | Patient ID |
| datetime | pd.Timestamp | Datetime of the CGM measurement |
| cgm | float | CGM value in mg/dL |
Age Data
age_data.csv: Patient age at study enrollment/start.
| Column Name | Type | Description |
|---|---|---|
| patient_id | str | Patient ID |
| age | float | Patient age at study enrollment/start |
Output Files:
For each study, the dataframes are saved in the data/out/<study-name>/ folder:
- To reduce file size, the data is saved in a compressed format using the gzip
- datetimes and timedeltas are saved as unix timestamps (seconds) and integers (seconds) respectively.
- boluses and basals are rounded to 4 decimal places
- cgm values are converted to integers
main(load_subset=False, remove_repetitive=True, input_dir=None, output_dir=None, studies=None, data_types=None)
Main function to process study data folders.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
load_subset
|
bool
|
If True, runs the script on a limited amount of data (e.g. skipping rows). |
False
|
input_dir
|
str
|
Custom input directory path. Defaults to 'data/raw'. |
None
|
output_dir
|
str
|
Custom output directory path. Defaults to 'data/out'. |
None
|
studies
|
list
|
List of study names to process. If None, all available studies will be processed. Available studies: IOBP2, Flair, PEDAP, DCLP3, DCLP5, ReplaceBG, Loop, T1DEXI, T1DEXIP |
None
|
data_types
|
list
|
List of data types to extract ['cgm', 'bolus', 'basal', 'age']. If None, all types are extracted. |
None
|
Logs
- Information about the current working directory and paths being used.
- Warnings for folders that do not match any known study patterns.
- Errors if no supported studies are found.
- Progress of processing each matched study folder.
process_folder(study, store, progress, remove_repetitive, data_types)
Processes the data for a given study by loading, extracting, and saving bolus, basal, CGM, and age events.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
study
|
StudyDataset
|
Study instance to extract data from. |
required |
store
|
ParquetStore
|
Store to write data to. |
required |
progress
|
tqdm
|
Progress bar to update. |
required |
remove_repetitive
|
bool
|
Whether to drop repetitive basal values. |
required |
data_types
|
list
|
Data types to extract ['cgm', 'bolus', 'basal', 'age']. |
required |
babelbetes.studies.studydataset
StudyDataset
Abstract base class for clinical diabetes datasets with CGM, bolus, basal, and age data.
Subclasses implement four abstract methods:
- _extract_bolus_event_history: Return bolus events as a DataFrame.
- _extract_basal_event_history: Return basal rate events as a DataFrame.
- _extract_cgm_history: Return CGM measurements as a DataFrame.
- _extract_age_data: Return patient age at enrollment as a DataFrame.
Public properties (bolus, basal, cgm, age) validate output against pandera schemas
and cache results via cached_property. Do not override them; override the private
_extract_* methods instead.
For memory management when processing multiple studies, declare raw file cache attributes
in _raw_attrs and call unload_raw() after extraction is complete.
age
property
Patient age at enrollment as a validated, cached DataFrame.
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Columns: patient_id (str), age (int, 0–120). |
basal
property
Basal rate event history as a validated, cached DataFrame.
Notes
- Zero basal rates (pump suspends) must be included.
- Rates are active until the next event.
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Columns: patient_id (str), datetime (datetime64), basal_rate (float, units/hour). |
bolus
property
Bolus event history as a validated, cached DataFrame.
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Columns: patient_id (str), datetime (datetime64), bolus (float, units), delivery_duration (timedelta). Standard boluses have delivery_duration of 0 seconds. |
cgm
property
CGM measurements as a validated, cached DataFrame.
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Columns: patient_id (str), datetime (datetime64), cgm (float, mg/dL). |
unload_raw()
Free raw file caches from memory.
Call this after all needed data types have been extracted to release the memory
used by raw file DataFrames. Derived outputs (bolus, basal, cgm, age) are kept.
Raw attributes to clear are declared by subclasses in _raw_attrs.
babelbetes.studies.iobp2.IOBP2
Bases: StudyDataset
babelbetes.studies.flair.Flair
Bases: StudyDataset
get_reported_tdds(method='max')
Retrieves reported total daily doses (TDDs) based on the specified method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
The method to use for retrieving the TDDs. - 'max': Returns the TDD with the maximum reported value for each patient and date. - 'sum': Returns the sum of all reported TDDs for each patient and date. - 'latest': Returns the TDD with the latest reported datetime for each patient and date. - 'all': Returns all TDDs without any grouping or filtering. |
'max'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
The DataFrame containing the retrieved TDDs based on the specified method. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the method is not one of: 'max', 'sum', 'latest', 'all'. |
babelbetes.studies.pedap.PEDAP
Bases: StudyDataset
babelbetes.studies.dclp.DCLP3
Bases: StudyDataset
babelbetes.studies.dclp.DCLP5
Bases: DCLP3
babelbetes.studies.loop.Loop
Bases: StudyDataset
babelbetes.studies.t1dexi.T1DEXI
Bases: StudyDataset
babelbetes.studies.t1dexi.T1DEXIP
Bases: T1DEXI
babelbetes.studies.replacebg.ReplaceBG
Bases: StudyDataset
babelbetes.src
cdf
get_cdf(data, normalize=True)
Get the Cumulative Distribution Function (CDF) of a data array.
Parameters: data (array-like): The data array for which the CDF is to be calculated.
tuple: A tuple containing two elements: - data_sorted (array-like): The sorted data array. - cdf (array-like): The CDF values.
plot_cdf(data, title='CDF', xlabel='Value', ylabel='CDF', ax=None, log_scaled=False, percent_right_axis=False, **kwargs)
Plots the Cumulative Distribution Function (CDF) of a data array.
Parameters: data (array-like): The data array for which the CDF is to be plotted. title (str): The title of the plot. xlabel (str): The label for the x-axis. ylabel (str): The label for the y-axis. ax (matplotlib.axes._subplots.AxesSubplot): Optional axis to plot on. log_scaled (bool): Whether to apply log scale to the y-axis. percent_right_axis (bool): Whether to show the right axis in percent.
data_store
ParquetStore
Read/write interface for the partitioned Parquet output store.
Data is stored as Hive-partitioned Parquet files under base_path,
with each data type in its own subdirectory to keep schemas homogeneous:
base_path/cgm/study_name=X/patient_id=Z/*.parquet
base_path/bolus/study_name=X/patient_id=Z/*.parquet
base_path/basal/study_name=X/patient_id=Z/*.parquet
base_path/age/study_name=X/patient_id=Z/*.parquet
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_path
|
str
|
Root directory of the store (e.g. "data/out"). |
required |
cleanup(study_name, data_types=None)
Remove existing output for a study to ensure a clean write.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
study_name
|
str
|
Study whose output should be removed. |
required |
data_types
|
list
|
Specific data types to remove. If None, removes the study from all data type directories. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
list |
Paths that were actually removed. |
load(study=None, data_type=None, patient=None)
Load data from the store, optionally filtered by study, data type, or patient.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
study
|
str | list[str]
|
Filter by study name(s) (e.g. "Flair" or ["Flair", "DCLP3"]). |
None
|
data_type
|
str | list[str]
|
One or more of 'cgm', 'bolus', 'basal', 'age'. A single string returns a DataFrame; a list returns a dict[str, DataFrame]. |
None
|
patient
|
str | list[str]
|
Filter by patient ID(s). |
None
|
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: When data_type is a single string. |
|
|
dict[str, pd.DataFrame]: When data_type is a list or None (keyed by data type). |
save(df, study_name, data_type)
Write a DataFrame to the store for a given study and data type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Data to save. |
required |
study_name
|
str
|
Study identifier (e.g. "Flair"). |
required |
data_type
|
str
|
One of 'cgm', 'bolus', 'basal', 'age'. |
required |
date_helper
convert_duration_to_timedelta(duration)
Parse a duration string in the format "hours:minutes:seconds" and return a timedelta object. Args: duration_str (str): The duration string to parse in the form of "hours:minutes:seconds". Returns: timedelta: A timedelta object representing the parsed duration.
parse_flair_dates(dates, format_date='%m/%d/%Y', format_time='%I:%M:%S %p')
Optimized parsing of date strings with or without time components.
drawing
create_axis()
Creates a new figure and axis for plotting.
Returns:
| Name | Type | Description |
|---|---|---|
figure |
Figure
|
The created figure. |
axes |
Axes
|
The created axis. |
drawAbsoluteBasalRates(ax, datetimes, rate, **kwargs)
Draws the absolute basal rates on the given axes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ax
|
Axes
|
The axes on which to draw the basal rates. |
required |
datetimes
|
array - like
|
An array of datetime objects representing the time points. |
required |
rate
|
array - like
|
An array of basal rates corresponding to the time points. |
required |
**kwargs
|
dict
|
Additional keyword arguments to customize the plot. Possible keys include:
|
{}
|
drawBasal(ax, datetimes, rates, color=colors['Basal'], **kwargs)
Draws the basal rates on the given axes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ax
|
Axes
|
The axes on which to draw the basal rates. |
required |
datetimes
|
list of datetime
|
List of datetime objects representing the time points. |
required |
rates
|
list of float
|
List of basal rates corresponding to the datetime points. |
required |
color
|
str
|
Color for the basal rates plot. Defaults to colors['Basal']. |
colors['Basal']
|
**kwargs
|
dict
|
Additional keyword arguments to customize the plot. |
{}
|
drawBoluses(ax, datetimes, boluses, **kwargs)
Draws insulin boluses events on a given matplotlib axis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ax
|
Axes
|
The axis on which to draw the boluses. |
required |
datetimes
|
list of datetime.datetime
|
List of datetime objects representing the times of the boluses. |
required |
boluses
|
list of float
|
List of bolus values corresponding to the datetimes. |
required |
**kwargs
|
dict
|
Additional keyword arguments passed to the ax.bar() method. |
{}
|
drawCGM(ax, datetimes, values, color=colors['CGM'], unit='mg/dL', target_range=True, **kwargs)
Draws CGM (Continuous Glucose Monitoring) data on the given axes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ax
|
Axes
|
The axes on which to draw the CGM data. |
required |
datetimes
|
list of datetime
|
List of datetime objects representing the time points. |
required |
values
|
list of float
|
List of glucose values corresponding to the datetime points. |
required |
color
|
str
|
Color for the CGM plot. Defaults to colors['CGM']. |
colors['CGM']
|
**kwargs
|
dict
|
Additional keyword arguments to customize the plot. |
{}
|
drawExtendedBoluses(ax, datetimes, boluses_units, duration, color=colors['Bolus'], **kwargs)
Draws extended boluses on the given axes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ax
|
Axes
|
The axes on which to draw the boluses. |
required |
datetimes
|
list of datetime
|
List of datetime objects representing the times of the boluses. |
required |
boluses_units
|
list of float
|
List of bolus units corresponding to each datetime. |
required |
duration
|
list of numpy.timedelta
|
List of delivery duration for each bolus. |
required |
color
|
str
|
Color of the boluses. Default is colors['Bolus']. |
colors['Bolus']
|
**kwargs
|
dict
|
Additional keyword arguments to pass to the bar function. |
{}
|
drawSuspendTimes(ax, start_date, duration)
Draws a bar on the given axis to represent suspend times.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ax
|
Axes
|
The axis on which to draw the bar. |
required |
start_date
|
datetime - like
|
The starting date and time for the bar. |
required |
duration
|
timedelta
|
The duration for which the bar extends. |
required |
drawTempBasal(ax, datetimes, temp_basal_rates, temp_basal_durations, temp_basal_types, color=colors['Basal'], **kwargs)
Draws temporary basal rates on the given axes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ax
|
Axes
|
The axes on which to draw the temporary basal rates. |
required |
datetimes
|
list of datetime
|
List of datetime objects representing the times of the temporary basal rates. |
required |
temp_basal_rates
|
list of float
|
List of temporary basal rates corresponding to the datetimes. |
required |
temp_basal_durations
|
list of numpy.timedelta
|
List of temporary basal durations corresponding to the datetimes. |
required |
color
|
str
|
Color of the temporary basal rates. Default is colors['Basal']. |
colors['Basal']
|
**kwargs
|
dict
|
Additional keyword arguments passed to the ax.bar() method. |
{}
|
draw_presence_matrix(ax, df, x_col, y_col, offset=0, **kwargs)
Scatter plots unique available x_values for y_col groups in a DataFrame.
Args:
ax (matplotlib.axes.Axes): The matplotlib Axes object to plot on.
df (pd.DataFrame): The input DataFrame containing the data to plot.
x_col (str): The column name in the DataFrame representing the x-axis values (e.g., datetime).
y_col (str): The column name in the DataFrame representing the y-axis values used for grouping (e.g., patient IDs).
offset (int, optional): An offset to apply to the y-axis values. Defaults to 0.
**kwargs: Additional keyword arguments to pass to the ax.scatter method.
Returns:
None
format_time_axis(ax, major_interval_days=1, minor_interval_hours=2, major_format='%-d/%-m/%Y', minor_format='%H:%M:%S')
Formats the x-axis of the given matplotlib axis for time series plots. Sets major ticks to days and minor ticks to hours, with appropriate formatting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ax
|
Axes
|
The axis to format. |
required |
parse_duration(duration_str)
Parses a duration string in the format "HH:MM:SS" and returns a timedelta object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
duration_str
|
str
|
A string representing the duration in the format "HH:MM:SS". |
required |
Returns: timedelta: A timedelta object representing the parsed duration.
find_periods
find_periods(df, value_col, time_col, start_trigger_fun, stop_trigger_fun, use_last_start_occurence=False)
Find periods in a DataFrame based on start and stop triggers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The DataFrame to search for periods. |
required |
value_col
|
str
|
The name of the column containing the trigger values. |
required |
time_col
|
str
|
The name of the column containing the time values. |
required |
start_trigger_fun
|
callable
|
The value that indicates the start of a period. |
required |
stop_trigger_fun
|
callable
|
The value that indicates the end of a period. |
required |
use_last_start_occurence
|
bool
|
If True, the last occurrence of the start trigger will be used. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
list |
list
|
A list of named tuples representing the periods found. Each namedtuple contains the following attributes: - start_index (int): The index of the start trigger in the DataFrame. - end_index (int): The index of the stop trigger in the DataFrame. - start_time: The time value of the start trigger. - end_time: The time value of the stop trigger. |
logger
Logger
get_logger(name, level=logging.DEBUG)
staticmethod
Returns a configured logger instance with the specified name and log level.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the logger. |
required |
level
|
int
|
The logging level (e.g., logging.INFO, logging.DEBUG). |
DEBUG
|
Returns:
| Type | Description |
|---|---|
Logger
|
logging.Logger: Configured logger. |
pandas_helper
count_differences_in_duplicates(df, subset)
Counts the number of differences between duplicated rows for all columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The input DataFrame. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
series |
Series
|
A series where the index represents column names and values represent the count of differences. |
extract_surrounding_rows(df, index, n, sort_by)
Extracts rows surrounding a given index after sorting the DataFrame by a subset of columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The input DataFrame. |
required |
index
|
int
|
The row index to center on. |
required |
n
|
int
|
The number of rows before and after the given index to extract (using logical indexing). |
required |
sort_by
|
list
|
List of column names to sort the DataFrame by. |
required |
Returns: pd.DataFrame: A DataFrame containing the extracted rows.
get_df(path, usecols=None, subset=False, dtype=None, encoding=None)
Reads a data file from a given path, handling both standard file formats and files within ZIP archives.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
The file path or a path to a file inside a ZIP archive. |
required |
usecols
|
list
|
List of column names to include in the df. |
None
|
subset
|
bool
|
If True, read only a subset of the data for lightweight testing. |
False
|
dtype
|
dict
|
Data types to enforce for specific columns. |
None
|
encoding
|
str
|
Encoding to use when reading text files (csv/txt). If None, uses pandas default (utf-8). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
dataframe |
DataFrame
|
The loaded data as a Pandas DataFrame. |
get_duplicated_max_indexes(df, check_cols, max_col)
Find duplicate indexes, maximum indexes, and indexes to drop in a dataframe.
Args: df (pd.DataFrame): The dataframe to check for duplicates. check_cols (list): The columns to check for duplicates. max_col (str): The column to use for keeping the maximum value.
tuple: A tuple containing three elements: - duplicated_indexes (np.array): Indexes of duplicated rows. - max_indexes (np.array): Indexes of rows with the maximum value in the max_col. - drop_indexes (np.array): Indexes of rows to drop.
Example
Example usage get duplicated max indexes
df = pd.DataFrame({ 'PtID': [1, 1, 1, 2, 2, 2, 3, 3, 3, 1], 'DataDtTm': [1, 2, 3, 1, 2, 2, 1, 1, 1, 2], 'CGMValue': [1, 2, 3, 1, 2, 3, 4, 2, 3, 3] }) dup_indexes, max_indexes, drop_indexes = get_duplicated_max_indexes(df, ['PtID', 'DataDtTm'], 'CGMValue') print(df.drop(drop_indexes))
grouped_value_counts(df, group_cols, value_cols)
Count the number of NaN, Non-NaN, and Zero values in each group of a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The input DataFrame. |
required |
group_cols
|
str or list
|
The column(s) to group by. |
required |
value_cols
|
str or list
|
The column(s) to count values for. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dataframe |
DataFrame
|
A DataFrame containing the count of NaN, Non-NaN, and Zero values for each group. |
head_tail(df, n=2)
Returns the first n rows and the last n rows of a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The DataFrame to get the head and tail of. |
required |
n
|
int
|
The number of rows to return from the head and tail of the DataFrame. |
2
|
Returns:
| Name | Type | Description |
|---|---|---|
dataframe |
DataFrame
|
A new pandas dataframe containing
|
overlaps(df, datetime_col, duration_col)
Check for overlapping intervals in a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A DataFrame containing at least two columns: - 'datetime_col': Start times of the intervals. - 'duration_col': Durations of the intervals. |
required |
Returns: pd.Series: A boolean Series indicating whether each interval overlaps with the next interval
repetitive(df, datetime_col, value_col, max_duration)
Get the indexes of repetitive values in a DataFrame based on a datetime column and a value column. Args: df (pd.DataFrame): The DataFrame to process. datetime_col (str): The name of the datetime column. value_col (str): The name of the value column. max_duration (timedelta, optional): To prevent long gaps between values, this parameter is used define the max duration for which consecutive values are dropped. At least one value will be kept whenever duration exceeds tha map_duration.
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
tuple
|
A tuple containing three elements: - i_all_rep (np.array): Indexes of all repetitive values. - i_keep (np.array): Indexes of the first occurrence of repetitive values. - i_drop (np.array): Indexes of values to drop (to remove repetitive values after the first occurrence). |
split_groups(x, threshold)
Assigns unique group IDs based on the distance between consecutive values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Series
|
Series of numerical values. |
required |
threshold
|
same as x.diff()
|
The maximum distance between two consecutive values to consider them in the same group. Must be same type as x.diff() values (e.g. int, float, Timedelta) |
required |
Returns:
| Type | Description |
|---|---|
Series
|
The Series containing the data. |
Example
df = pd.DataFrame({'sensor': ['a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], 'y': [0, 1, 2, 3, 10, 11, 12, 13, 50, 51, 70, 71]}) df['sensor_session'] = df.groupby('sensor').y.transform(lambda x: split_groups(x, 5)) start_ends = df.groupby(['sensor', 'sensor_session']).y.agg(['idxmin','idxmax']).reset_index()
split_sequences(df, label_col)
Assigns a unique group ID to each sequence of consecutive labels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The DataFrame containing the data. |
required |
label_col
|
str
|
The column name for the labels. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
group_ids |
Series
|
The group IDs. |
Example
df = pd.DataFrame({'label': ['A', 'A', 'B', 'B', 'B', 'A', 'A', 'C', 'C', 'A']}) df['sequence'] = split_sequences(df, 'label') print(df) start_ends = df.groupby(['label', 'sequence']).apply(lambda group: pd.Series({ 'idxmin': group.index.min(), 'idxmax': group.index.max() }),include_groups=False).reset_index() print(start_ends)
postprocessing
basal_transform(basal_data)
Transform the basal data by aligning timestamps and handling duplicates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
basal_data
|
DataFrame
|
The input is a basal data dataframe containing columns 'datetime', and 'basal_rate'. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
basal_data |
DataFrame
|
The transformed basal equivalent deliveries with aligned timestamps and duplicates removed. |
bolus_transform(df)
Transform the bolus data by aligning timestamps, handling duplicates, and extending boluses based on durations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The input is a bolus data dataframe containing columns 'datetime', 'bolus', and 'delivery_duration'. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bolus_data |
DataFrame
|
5 Minute resampled and time aligned at midnight bolus data with columns: datetime, delivery |
cgm_transform(cgm_data)
Time aligns the cgm data to midnight with a 5 minute sampling rate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cgm_data
|
DataFrame
|
The input is a cgm data dataframe containing columns 'datetime', and 'cgm'. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
cgm_data |
DataFrame
|
The transformed cgm data with aligned timestamps. |
tdd
calculate_daily_basal_dose(df)
Calculate the Total Daily Dose (TDD) of basal insulin for each day in the given DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The DataFrame containing the insulin data. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tdds |
DataFrame
|
dataframe with two columns: |
Required Column Names
- datetime: The timestamp of each basal insulin rate event.
- basal_rate: The basal insulin rate event [U/hr].
calculate_daily_bolus_dose(df)
Calculate the daily bolus dose for each patient. Parameters: df (pandas.DataFrame): The input DataFrame containing the following columns: - datetime (datetime): The date and time of the bolus dose. - bolus (float): The amount of bolus dose. Returns: pandas.DataFrame: A DataFrame with the daily bolus dose for each patient, grouped by patient_id and date.
calculate_tdd(df_bolus, df_basal)
Calculates the total daily dose (TDD) by merging the daily basal dose and daily bolus dose. Parameters: df_bolus (DataFrame): DataFrame containing the bolus dose data. - patient_id (int): The ID of the patient. - datetime (datetime): The date and time of the bolus dose. - bolus (float): The amount of bolus dose. df_basal (DataFrame): DataFrame containing the basal dose data. - patient_id (int): The ID of the patient. - datetime (datetime): The date and time of the basal dose. - basal_rate (float): The basal insulin rate event [U/hr]. Returns: tdd (DataFrame): DataFrame containing both the bolus and basal tdd data.
total_delivered(df, datetime_col, rate_col)
Calculate the total delivered insulin over the time intervals in the given DataFrame.