How to build a dashboard for your Peloton workout data using Dash
Visualize your workout data with python ๐
As many other fitness minded people who have been working from home, one of my obsessions has been my Peloton bike I purchased in back in 2020. In this tutorial, I will cover how I was able to build my own dashboard to visualize my Peloton workout data using Dash. Dash is a low code web application for rapidly building data apps in Python. It is written on top of Plotly and React but requires no Javascript.
In order to follow along with this tutorial, you will need to download your workout data in CSV format by logging into your Peloton account and navigating to workouts under your profile. You can also find this code on GitHub.
Once we have our csv, we can make a directory for our project and place this file inside it. We can call this directory PelotonDash.
mkdir PelotonDash && cd PelotonDash
cp ~/Downloads/workouts.csv .
Now, we are ready to create our virtual environment and install the necessary dependencies for this project.
python -m venv venv && source venv/bin/activate
pip install pandas
pip install dash
pip install dash_bootstrap_components
Here is what we just installed:
- Pandas for loading, analyzing, and transforming our data.
- Dash for creating our web dashboard. This includes the plotly dependencies for creating our visualizations.
- dash_bootstrap_components which provides bootstrap components such as buttons and styling for our dashboard app.
Now that we have all our necessary dependencies, lets create a file named pelotondashboard.py inside this directory. Our workspace should now look like this:
Inside our pelotondashboard.py file, we can start importing our dependencies and loading our csv data.
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import dash_bootstrap_components as dbc
from plotly.subplots import make_subplots
from dash import Dash, dcc, html
from dash.dependencies import Input, Output, State
df = pd.read_csv('workouts.csv')
Here is what we are importing:
- pandas to load, transform and analyze our data.
- plotly.graph_object, plotly.express are our graphing dependencies. We will be using these to plot our data. These libraries can be used on their own to create plots, however, we will be using these along with Dash in order to create these plots on our web dashboard. We are also importing make_subplots which will allow us to plot multiple plots with different axes on the same graph.
- dash_bootstrap_components will be used to provide some styling to our html elements.
- dash will provide us with dash core components (dcc) for interactive elements such as dropdowns and sliders as wel as Dash HTML components for creating our layout with html elements.
- dash.dependencies will allow us to handle callbacks in our dashboard for the interactive elements we will be creating. This will be useful for allowing us to interact with our graphs by updating the plots based on any inputs we select.
After importing dependencies, we are reading our workouts csv into our pandas DataFrame.
Now, we are ready to filter and group our data based on timestamps.
ts ='Workout Timestamp'
def group_data_by_timestamp(start_date = df[ts].min(), end_date = df[ts].max() , freq: str ='W') -> pd.DataFrame:
"""
Filters dataframe by date range and groups by frequency of week by default.
Returns a dataframe with Calories Burned and Total Output summed by the chosen frequency.
Parameters
----------
start_date : date The start date of the range
end_date : date The end date of the range
freq : str The frequency of the group. Default is weekly.
Returns
-------
df : pd.DataFrame The dataframe with the summed columns.
"""
dff = df.loc[(start_date <= df[ts]) & (df[ts] <= end_date)]
for tz in ['EST', 'EDT', '-04', '-05']:
dff.loc[:, ts]= dff[ts].str.replace(f"\({tz}\)", '', regex=True)
dff.loc[:,ts] = pd.to_datetime(dff[ts], format='%Y-%m-%d %H:%M', errors='coerce')
return dff.groupby(pd.Grouper(key=ts, freq=freq))[['Calories Burned', 'Total Output']].agg('sum').reset_index()
print(group_data_by_timestamp())
This function takes in a start_date, end_date, and frequency. In the function signature we have provided default values for start_date and end_date which will be the min and max values of timestamps corresponding to the data. Frequency will allow us to group our data by day, week, month or year. I have given this a default value of week. This function will be called with the default values when the page is initially loaded. On interaction with the dashboard, this function will be called with values we pass in through our dcc elements. Now, let us step through what is going on in the function:
- We are using df.loc to filter the DataFrame to the rows between the start and end date passed into the function.
- We are dropping timezones for the purpose of this tutorial by using the .str.replace method with a regular expression for the timezones that appeared in my data. You may replace these values with the timezones found in your own data.
- After dropping the timezones, the column is converted to datetime in the
%Y-%m-%d %H:%M
format. - Finally, the data is grouped using pd.Grouper. The data is grouped on the Workout Timestamp key and with the frequency that is passed into the function. This Grouper is passed into the groupby method of the DataFrame to group the Calories Burned and Total Output columns from our dataframe. We are chaining this with the .agg('sum') method to sum all of the values in the column based on the frequency. Lastly, we are resetting the index since we want the Workout Timestamp to be a column rather than the index.
If we print the output of the function, we see output similar to that one shown below.
Workout Timestamp Calories Burned Total Output
0 2020-12-20 21.0 0.0
1 2020-12-27 100.0 0.0
2 2021-01-03 151.0 0.0
3 2021-01-10 403.0 0.0
4 2021-01-17 100.0 0.0
.. ... ... ...
64 2022-03-13 1718.0 528.0
65 2022-03-20 513.0 183.0
66 2022-03-27 1269.0 343.0
67 2022-04-03 1663.0 774.0
68 2022-04-10 911.0 411.0
[69 rows x 3 columns]
We are now ready to create our first plot using the data above in Plotly! We will later see how to get this same graph into our dashboard.
def create_timeseries_figure(df: pd.DataFrame, frequency = "W") -> go.Figure:
"""
Creates a timeseries figure with two subplots. The first subplot graphs
the Total Output for the week. The second subplot graphs the Calories Burned
for the week.
Parameters
----------
df : pd.DataFrame The dataframe to be plotted.
frequency : str The frequency of the group. Default is weekly.
Returns
-------
go.Figure: The figure to be plotted.
"""
switcher = {
'D': 'Day',
'W': 'Week',
'M': 'Month'
}
freq = switcher.get(frequency, 'Week')
line = make_subplots(specs=[[{"secondary_y": True}]])
line.add_trace(go.Scatter(x=df['Workout Timestamp'], y=df['Calories Burned'],
marker=dict(size=10, color='MediumPurple'),
name='Total Calories'
),
secondary_y=False
)
line.add_trace(go.Scatter(x=df['Workout Timestamp'], y=df['Total Output'],
marker=dict(size=10, color='MediumSeaGreen'),
name='Total Output'
),
secondary_y=True
)
line.update_layout(
title=f"Calories and Total Output per {freq}",
title_x=0.5,
yaxis_title="Calories Burned",
)
line.update_yaxes(title_text="Total Output", secondary_y=True)
line.update_yaxes(title_text="Calories Burned", secondary_y=False)
line.update_xaxes(title=f"{freq}")
return line
Here is what we are doing in this function:
- First, we are converting the frequency variable passed in into a more human readable format for which we will use for our plot titles and axis label. We are doing this with switcher.get and passing in the default value of week.
- Next, we are creating a plotly figure and specifying that it will have 2 y axes by passing in
{"secondary_y": True}
- We add both the Calories Burned and Total Output time series data values to the plot by passing them in to
line.add_trace
as Scatter graph objects. We have given them unique colors, names and specified Total Output to be the secondary y series. - Lastly, we are updating our layout with the Title and axes labels for each of our series.
This function will create a plot similar to the one shown below.
Now that we have our data and plotting functions we are ready to start building our dashboard!
We will create a bootstrap card with a title, DatePicker (for our filtering) and RadioButton (for grouping our data). Underneath these elements we will display our timeseries figure.
app = Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP], suppress_callback_exceptions=True)
def create_calories_and_output_card(df: pd.DataFrame) -> dbc.Card:
"""
Creates a dbc card with a title, DatePicker, RadioButton, and a timeseries figure.
Parameters
----------
df : pd.DataFrame The dataframe to be plotted.
Returns
-------
dbc.Card: Card with header, datepicker, radio buttons and plot.
"""
return dbc.Card([
dbc.CardBody([
html.Center(html.H1("Weekly Calorie and Output Breakdown", className='card-title')),
html.Center(
dcc.DatePickerRange(
id='date-range-picker-2',
min_date_allowed=df['Workout Timestamp'].min(),
max_date_allowed=df['Workout Timestamp'].max(),
initial_visible_month=df['Workout Timestamp'].min(),
start_date=df['Workout Timestamp'].min(),
end_date=df['Workout Timestamp'].max(),
style={
"margin-top": "1rem",
"margin-bottom": "1rem"
}
)
),
html.Center(dcc.RadioItems(
options=[
{'label': 'Daily', 'value': 'D'},
{'label': 'Weekly', 'value': 'W'},
{'label': 'Monthly', 'value': 'M'},
],
value='W',
id='frequency-radio-2',
labelStyle={'padding-right': '20px'}
),
),
dcc.Graph(
id='calories-output-graph',
)
])
],
outline=True,
color='info',
style={
"width": "40rem",
"margin-left": "5rem",
"margin-bottom": "1rem"
}
)
@app.callback(
Output('calories-output-graph', 'figure'),
[ Input('date-range-picker-2', 'start_date'),
Input('date-range-picker-2', 'end_date'),
Input('frequency-radio-2', 'value')]
)
def update_weekly_calories_burned_chart(start_date, end_date, frequency):
"""
Updates the weekly calories burned chart with the selected date range and frequency.
"""
grouped_df = group_data_by_timestamp(start_date, end_date, frequency)
return create_timeseries_figure(grouped_df, frequency)
app.layout = html.Div(
children=[
dbc.Row([create_calories_and_output_card(df)])
]
)
app.run_server(debug=True)
At the top of this code block, we are initializing our app and passing in the bootstrap stylesheet through external_stylesheets
.
There is a lot going on here but let's break down what each function is doing.
create_calories_and_output_card
In this function, we are creating our dbc.Card. This card takes in a list with the cardBody in addition to some styling. This styling consists of outline = True
and color = 'info'
which will create a border around our card. We have also set the width of the card and set some margins. Inside of the cardBody we are defining the layout of the card from top to bottom. We make use of html components for creating a header, dcc for our DatePicker as well as our radio buttons and underneath it we are inserting our graph. We are creating this graph with the id of calories-output-graph
but not inserting any graph code as the next function will handle the callbacks associated with creating this graph.
update_weekly_calories_burned_chart
This function will handle callbacks associated with the calories output graph as you can see from the @app.callback
decorator. Inside this decorator we are specifying what the id of the output will be and its output type of figure. In addition, we are passing in the id values of each input needed for this function. This allows our function to take in these inputs when it is called. Inside this function we are passing the values over to the group_data_by_timestamp
method we created at the beginning of this article for filtering our DataFrame. Finally, we pass this filtered dataframe over to our plotting function to return the updated plot. This callback function will be called when the page is initially loaded and whenever the inputs are modified.
Next, we define the overall layout of our dash web app. We have a div which is passed in a row with the dbc card as its only child.
Finally, we run our app by calling app.run_server(debug=True)
. We are now able to run this and see our Dash app by navigating to localhost:8050. Lets take a look at how this looks.
We now have our first interactive graph in our dashboard! Let's now create a header in its own dbc card to add to the dashboard.
titleCard = dbc.Card([
dbc.CardBody([
html.H1("Welcome to your workout dashboard!", className='card-title'),
])
],
color='dark',
inverse=True,
style={
"width": "55rem",
"margin-left": "1rem",
"margin-top": "1rem",
"margin-bottom": "1rem"
}
)
We are using the same pattern for our title card as we did for our graph card. We define what goes in the CardBody and then add some styling. We will need to update our app layout now to use this card. This card will go on its own row as shown belo
app.layout = html.Div(
children=[
dbc.Row([
html.Center(titleCard),
],
justify="center",
style={
'margin-left': '0.5rem'
}
),
dbc.Row([create_calories_and_output_card(df)])
]
)
If we rerun our app we should now see our header.
Let's continue by adding a second plot to show a breakdown of Time spent on each Fitness Discipline category. We can follow the same design pattern of having one function to create our pie graph, another function to create a dbc.Card and finally update our app layout.
def get_fitness_discipline_chart(df: pd.DataFrame) -> px.scatter:
pie = px.pie(
df,
values="Length (minutes)",
names="Fitness Discipline",
title="Time Spent Per Fitness Discipline",
hole=0.2,
)
pie.update_traces(textposition='inside', textinfo='percent+label')
pie.update_layout(width=1600)
pie.update_layout(title_x=0.5)
return pie
def create_discipline_card(workout_df: pd.DataFrame) -> dbc.Card:
return dbc.Card([
dbc.CardBody([
html.Center(html.H1("Fitness Discipline Breakdown", className='card-title')),
html.P("This chart shows the percentage of time spent in minutes for each fitness discipline.", className='card-body'),
dcc.Graph(
id='fitness-discipline-by-calories-chart',
figure=get_fitness_discipline_chart(workout_df)
)
])
],
color='info',
outline=True,
style={
"width": "40rem",
"margin-bottom": "1rem"
}
)
app.layout = html.Div(
children=[
dbc.Row([
html.Center(titleCard),
],
justify="center",
style={
'margin-left': '0.5rem'
}
),
dbc.Row([
dbc.Col([
create_calories_and_output_card(df),
create_discipline_card(df)
])
])
]
)
Notice that we are not using a callback function here since our plot is static (not that it couldn't be made dynamic). Inside our function that created the card, we are now calling the function that creates the pie chart directly through the figure
argument. This fitness discipline pie chart takes in the whole DataFrame and we specify when creating the pie chart that we are interested in the 'Length (minutes)' column as the values used to calculate percentages in our pie chart. The names
argument provides the labels for our pie graph categories, we give it a category and finally we give it a radius of 0.2 to create a donut hole in the graph. Next we call update_traces
to specify that we want the label text inside the pie chart itself and we would like the textinfo to include the percentage and label.
Let's see how our updated dashboard looks.
Looking good ๐
Let's end by adding a final bar chart at the bottom of our dashboard with a count breakdown of how many classes we have taken per instructor and per fitness discipline. This chart will go on its own row under our previous two graphs.
def get_instructors_by_discipline_chart(df: pd.DataFrame) -> px.bar:
df.insert(0, 'count', '')
dff = df.groupby(['Instructor Name','Fitness Discipline'])['count'].agg('count').reset_index()
chart = px.bar(dff, x="Instructor Name", y="count", color="Fitness Discipline", title="Total Output by Instructor", width=1600)
chart.update_layout(
title_x=0.5
)
return chart
def create_instructor_card(df: pd.DataFrame) -> dbc.Card:
return dbc.Card([
dbc.CardBody([
html.Center(html.H1("Instructor by Fitness Discipline", className='card-title')),
dcc.Graph(
id='instructor-by-discipline-chart',
figure=get_instructors_by_discipline_chart(df),
style={
"margin-top": "1rem",
"margin-bottom": "1rem"
}
)
])
],
color='info',
outline=True,
style={
"margin-top": "1rem",
"margin-left": "1rem",
"margin-bottom": "1rem"
}
)
app.layout = html.Div(
children=[
dbc.Row([
html.Center(titleCard),
],
justify="center",
style={
'margin-left': '0.5rem'
}
),
dbc.Row([
dbc.Col([
create_calories_and_output_card(df),
]),
dbc.Col([
create_discipline_card(df)
])
]),
dbc.Row([
dbc.Col([
create_instructor_card(df)
])
])
]
)
Inside the get_instructors_by_discipline_chart
function, we are first creating an empty 'count' column and then chaining the groupby
method with agg('count')
to count all unique combinations of the Instructor Name and Fitness Discipline columns. We then pass this grouped data into px.bar to create our bar chart. The x axis consists of instructor names and y values consist of counts. We specify we want unique colors for each Fitness Discipline. Similar to the previous two graphs we create a card for this graph and pass it into its own row in the layout.
Let's take a look at our completed workout dashboard! ๐ฏ๐ฏ๐ฏ
I hope you've learned something new from this tutorial. Follow me to stay updated on future projects and tutorials.
Have any improvements or modifications you've made to this dashboard? I'd โค๏ธ to hear about them in the comments!