Coronavirus Dashboard: Technology
I made my COVID-19 dashboard for southeast Michigan in Python and used the libraries Beautiful Soup to obtain data from the web, pandas for data aggregation and cleaning, and the duo of Dash and Plotly for the web app and data visualizations. This post is a quick overview of the technologies I used to create and host it.
I started this project to create my own dashboard to follow the pandemic in the area I call home. In the last post, I described the data in my dashboard. The dashboard continues to be useful to me as the pandemic is not yet over.
Self-learning
Learning Python for data science, analysis, and visualization has been one of my interests during the COVID-19 pandemic. I read a few books and worked through the examples and exercises in them. But writing and implementing my own project is the best way to test these skills and keep them fresh. Since I have been following pandemic developments, I decided to fetch local data and present it in a format I found useful.
Technology
Python is a general-purpose programming language, but the multitude of libraries make it a great tool for downloading, processing, and presenting data. The entire project is written in Python and several libraries including Beautiful Soup, pandas, Plotly, and Dash. I use Beautiful Soup to parse data from state webpages, and pandas to clean and aggregate the data in the ways I want to present it. Plotly is a library designed for making charts, and it is built on top of plotly.js, which itself is built on D3.js and stack.gl. It makes beautiful visualizations, and it works very nicely with pandas objects. Dash is the library used to make the dashboard website itself. It is built on Flask, which is a library for making websites and web apps.
Hosting the dashboard
The dashboard is hosted in its own Docker container on the DigitalOcean droplet I set up for my website. I have Nginx functioning as a front-end reverse proxy from one container via swag, while the dashboard is hosted as a microservice in a different container on the same Docker network. I wrote a Docker compose file to define and build the container. The dashboard container runs Gunicorn to serve the Dash application.
Updating the data
To update the data, I set up a cron job to run a bash script every day (except Sunday). The bash script calls two Python scripts I wrote: the first goes to the State of Michigan website and downloads all the data I want to use for the dashboard. The second script cleans and prepares new data for the dashboard to load. The last part of the script rebuilds and restarts the dashboard container with updates and new data.
Hospitalization Data
It was hard to find detailed public information about hospitals at the beginning of the pandemic. That was understandable – how do you report on a novel virus? Fortunately, hopsitalization reporting has improved significantly since spring 2020. The state now provides detailed data daily, but to get a time series, I downloaded archival versions of the webpage hosted by the Wayback Machine at Internet Archive.
First, I used a great piece of software called waybackpack that automated retrieval of the archives. After downloading the HTML files, I wrote a Python script to loop through them and extract data from relevant tables using Beautiful Soup. After taking into account a change in reporting standards in the middle and some other data cleaning, I merged all the tables together into one pandas DataFrame. To keep this series updated, recent data is added to the Wayback Machine data series.
The end result is a real-time panel data set of the hospitalization data by healthcare region and time. I assume there are some errors and inconsistencies in this data, and there are also gaps when I do not have information. Nonetheless, it is the longest real-time data set I could put together from publicly available information about COVID-19 hospitalizations in the healthcare regions.
Lessons from this project
My COVID-19 dashboard was a great project and test of my self-learning. First, I learned Plotly and Dash for data visualization and presentation; they are great for creating beautiful, sharable web app charts and figures. Second, I have a better understanding of data structures in Python and pandas after using them extensively to prepare and aggregate data for the dashboard. Third, I learned how to host the app as a containerized microservice on the web server. I could see this being useful for future projects; it is important to share data and results.