Monthly ridership data was gathered from the National Transit Database. This data included information on unlinked passenger trips across various transit agencies and modes of transit and was updated up to August of 2023. The data gathered was in an excel workbook which was then saved as a CSV to import into Python. Yearly commuter data was gathered from the American Census Bureau per year from 2018 to 2022 each as a CSV.
As for processing and merging data, the monthly ridership data needed to be merged into yearly statistics to match the commuter datasets. In addition, because the ridership was broken down by transit agency and type of transit, such as bus, rail or ferry, extra steps had to be taken to aggregate over urban area to gather the full extents of ridership per urban area.
The monthly ridership and commuter datasets were imported into Python to be cleaned and profiled. In the process of cleaning, it was found that the the US Census Bureau had changed the wording of working at home from "worked at home" to "work from home" between the years of 2021 to 2022. This affected the commuter datasets so steps were taken to change "worked at home" columns to instead use the current verbiage of "work from home". As for the ridership data, there were summary statistic rows at the bottom of the dataset that were dropped as they were extraneous.
For the analysis, the first steps were to explore the data using correlation matrix and scatterplots. It was found that mean commute time rose as pubic transportation was used more. Another interesting relationship was that as commuting by car decreased commuting by public transit rose. However, the inclusion of year showcased that cars and public transit were not directly inverses. Instead, during the COVID-19 pandemic years of 2020, 2021, and 2022, both public transit usage and car usage decreased as seen in the scatterplot with purple hues.
Next steps included creating geospatial visualizations and using time series to analyze monthly trips taken from 2010 to August of 2023. When viewing the time series of monthly transit trips across the United States, one can see a drastic drop in ridership at the start of the pandemic.
Lastly, the clean and merged data was imported into Tableau to create a dashboard and showcase public transit, work from home, and urban area statistics. One of the charts is seen below which showcases the drop in ridership at the start of the COVID-19 pandemic.