Civic Data Analysis with DataHub

I propose a platform for civic data analysis based on the MIT CSAIL project DataHub. DataHub is a GitHub for data, allowing users to follow other users and their data analysis projects and fork others’ projects to extend them. It provides an ecosystem of composable applications for data ingestion, cleaning, visualization, and statistical analysis. I argue that DataHub’s centralization of datasets and analyses on a single platform, support for extension of others’ analyses, and emphasis on composability of applications make it uniquely suited – unlike Socrata and other competing platforms – for creating excitement around civic data analysis, which is imperative for a large community to form. Simply put, DataHub lowers the barriers to gleaning insights from civic data dramatically, eliminating the standard challenges of installing a variety of software applications, reformatting data, and more. I suggest a few extensions to DataHub as it appears today, including comment sections on datasets and analyses, news articles about interesting analyses, and a news feed surfacing analyses and datasets that may be of interest to users. I believe these extensions will further solidify the platform as a one-stop shop to be inspired by others’ work and quickly go from idea to execution on one’s own analyses.

Full Paper