piccard is a Python package which provides an alternative framework to traditional harmonization techniques for combining spatial data with inconsistent geographic units across multiple years. It uses a network representation containing nodes and edges to retain all information available in the data. Nodes are used to represent all the geographic areas (e.g., census tracts, dissemination areas) for each year. An edge connects two nodes when the geographic area corresponding to the tail node has at least a 5% area overlap with the geographic area corresponding to the head node in the previous available year.
The method behind this package can be found in the following research paper:
Dias, F., & Silver, D. (2018). Visualizing demographic evolution using geographically inconsistent census data. California Digital Library (CDL). https://doi.org/10.31235/osf.io/a3gtd
The latest released version is available at the Python Package Index (PyPI)
pip install piccard
from piccard import piccard as pc
piccard.preprocessing(ct_data, year, id)
Return a cleaned GeoDataFrame of the input data with a new column showing the area of each census tract.
piccard.create_network(census_dfs, years, id, threshold=0.05)
Creates a network representation of the temporal connections present in census_dfs over years when each yearly geographic area has at most threshold percentage of overlap with its corresponding area(s) in the next year.
piccard.create_network_table(census_dfs, years, id, threshold=0.05)
Return the final network table with all the temporal connections present in census_dfs over years when each yearly geographic area has at most threshold percentage of overlap with its corresponding area(s) in the next year.
piccard.draw_subnetwork(network_table, G, num_cts=4)
Draws a subgraph of the network representation where num_cts is the number of census tracts in the first census year which are followed through all census years.
piccard.plot_num_cts(network_table, years, id)
Plots the number of census tracts across all given census years.
Note: Further explanation of the parameters and example code for all the above functions can be found in the documentation.
GeoPandas - Allows spatial operations in Python, making it easier to work with geospatial data
Matplotlib - a comprehensive library for creating visualizations
NetworkX - Adds support for analyzing networks represented by nodes and edges
NumPy - Adds support for large, multi-dimensional arrays and matrices, with functions to operate on these arrays
pandas - Offers data structures and operations for manipulating numerical tables
Maliha Lodi, Fernando Calderon Figueroa, Daniel Silver