tabulation#

Stacked horizontal bar charts for cross-tabulated data.

Functions#

tabulate(df, y_cat, x_cat[, v, y_max_n, x_max_n, ...])

Cross-tabulate a numeric variable by two categorical columns.

plot_tabulation(df, y_cat, x_cat[, v, title, y_max_n, ...])

Stacked horizontal bar chart of v ~ f(y_cat, x_cat).

Module Contents#

openplaces.viz.tabulation.tabulate(df, y_cat, x_cat, v='n', y_max_n=None, x_max_n=None, y_cat_order=None, x_cat_order=None, show_empty_category=True)#

Cross-tabulate a numeric variable by two categorical columns.

Parameters:
  • df (pd.DataFrame or gpd.GeoDataFrame)

  • y_cat (str) – Column for y-axis categories.

  • x_cat (str) – Column for x-axis categories (the stacked dimension).

  • v (str) – Numeric column to aggregate; 'n' counts rows.

  • y_max_n (int, optional) – Keep only the top-N y categories by total weight; remainder is collapsed into '(all others)'.

  • x_max_n (int, optional) – Same for x categories.

  • y_cat_order (list, optional) – Explicit ordering for y-axis values.

  • x_cat_order (list, optional) – Explicit ordering for x-axis values (stack order).

  • show_empty_category (bool) – If True, fill NaN labels with '(N/A)' instead of dropping.

Returns:

Normalized crosstab (values sum to 1), shape (y_cats, x_cats).

Return type:

pd.DataFrame

openplaces.viz.tabulation.plot_tabulation(df, y_cat, x_cat, v='n', title=None, y_max_n=None, x_max_n=None, y_cat_order=None, x_cat_order=None, show_empty_category=True, y_lab_maxlength=30, x_lab_maxlength=30, gap_perc=0.01, cmap='tab20b', alpha=0.8, savefig=None, figsize=(7, 5), legend_kwds=None, colors=None, fontsize=9, titlesize=12)#

Stacked horizontal bar chart of v ~ f(y_cat, x_cat).

Bar heights are proportional to group totals; bar widths show the x_cat breakdown within each y_cat group.

Parameters:
  • df (pd.DataFrame or gpd.GeoDataFrame)

  • y_cat (str) – Column for y-axis groups (one bar per value).

  • x_cat (str) – Column for the stacked dimension (legend entries).

  • v (str) – Numeric column to aggregate; 'n' counts rows.

  • title (str, optional) – Plot title. Defaults to '% of <v> by <y_cat>'.

  • y_max_n (int, optional) – Cap the number of categories shown per axis.

  • x_max_n (int, optional) – Cap the number of categories shown per axis.

  • y_cat_order (list, optional) – Explicit category orderings.

  • x_cat_order (list, optional) – Explicit category orderings.

  • show_empty_category (bool) – Show NaN values as '(N/A)'.

  • y_lab_maxlength (int) – Truncate labels longer than this.

  • x_lab_maxlength (int) – Truncate labels longer than this.

  • gap_perc (float) – Gap between bars as a fraction of total data weight.

  • cmap (str) – Matplotlib colormap name, used when colors is None.

  • alpha (float) – Bar opacity.

  • savefig (str or Path, optional) – Save to this path if provided.

  • figsize (tuple, optional) – Figure size (width, height).

  • legend_kwds (dict, optional) – Keyword arguments passed to ax.legend(). Defaults to {'loc': 'upper right', 'bbox_to_anchor': (0.985, 0.985)}. Any key overrides the default; omitted keys keep their default value.

  • colors (list, optional) – Explicit list of colors, one per x_cat value.

  • fontsize (int) – Font sizes for labels and title.

  • titlesize (int) – Font sizes for labels and title.

Returns:

  • fig (matplotlib.figure.Figure)

  • ax (matplotlib.axes.Axes)