admin#
Administration
Worldwide administrative referencing and mapping
Manage global admin files
Manage globally unique identifiers (admin_ids)
Functions#
Get dataframe with country ISO alpha codes and names |
|
Give dataframe gdf an admin1_id index from admin1_id_a3 |
|
Get dataframe with state/province ISO3116-2 codes and names |
|
|
Give dataframe admin an admin2_id index based on GADM data |
|
Comprehensive cleaning for admin4 geographic names. |
|
Generate unique two-letter admin unit codes within parent units. |
|
Update the openplaces admin spine with admin recipe info |
Module Contents#
- openplaces.io.admin.get_admin1_iso()#
Get dataframe with country ISO alpha codes and names
- openplaces.io.admin.admin1_id_index_from_admin1_id_a3(gdf)#
Give dataframe gdf an admin1_id index from admin1_id_a3
Single-use function to create linkage between GADM and ISO
- openplaces.io.admin.get_admin2_iso()#
Get dataframe with state/province ISO3116-2 codes and names
- openplaces.io.admin.admin2_id_index_from_admin2_gadm(admin2)#
Give dataframe admin an admin2_id index based on GADM data
- openplaces.io.admin.clean_geographic_name(name)#
Comprehensive cleaning for admin4 geographic names. Returns: (clean_text, digits, letter_suffix, generic_word)
- openplaces.io.admin.generate_admin_ids(df, new_admin_id_col='admin4_id', parent_admin_id_col='admin3_id', name_col='name', id_separator='-', verbose=False)#
Generate unique two-letter admin unit codes within parent units.
Generate unique admin ID codes for administrative units
Level-agnostic design: works for any parent-child relationship: admin2->admin3 (state->county), admin3->admin4 (county->town)
Strategy#
Each name is first cleaned into structured components: a text portion, digit portion, letter suffix, and detected generic word. IDs are then assigned through a waterfall of prioritized strategies. Each row moves to the next strategy only if it remains unassigned:
Pure numeric — If the name reduces to only digits with no text (e.g., “N.A. (12)”) use the number directly.
Generic word + number — If a recognized generic word (ward, zone, barangay, district, etc.) is detected alongside a number, prefix the number with the generic word’s initial(s). A letter suffix is appended if present (e.g., “Ward 3B” → “W3B”).
Name + number for duplicates — If the same base name appears multiple times under the same parent and a digit is present, disambiguate by combining the name’s initial(s) with the number (and any letter suffix).
Initials from multi-word names — For names with two or more words, take the first letter of the first two words (e.g., “North East” → “NE”).
First two letters — Take the first two characters of the cleaned name, assigned only where unique within the parent.
Any two letters — Try all pairwise letter combinations from the cleaned name until a unique code is found.
Letter + number combinations — Combine any letter from the name with any digit from the name; fall back to “X” + digit if no letters exist.
Swapping — If a desired two-letter code is taken by another row, check whether that row can be reassigned to an alternative code, freeing up the preferred code for the current row.
Three-letter codes — Try the first three letters, then all three-letter combinations from the name.
- Sequential fallback — Assign codes like X01, X02, … (with a
letter disambiguator if needed) to any rows that all prior strategies failed to place.
After assignment, all IDs are verified to be non-null and globally unique; an exception is raised if either condition is violated.
- param df:
Input dataframe with administrative unit data
- type df:
pd.DataFrame
- param new_admin_id_col:
Name for the new administrative ID column (default ‘admin4_id’)
- type new_admin_id_col:
str
- param parent_admin_id_col:
Column name containing parent admin ID (e.g., ‘admin3_id’)
- type parent_admin_id_col:
str
- param name_col:
Column name containing subdivision name
- type name_col:
str
- param name_long_col:
Column name containing long-form name. Example: in the US, this might include ‘city’ and ‘township’ suffixes that resolve ambiguities between entity names If None or column doesn’t exist, city/township detection is skipped
- type name_long_col:
str, optional
- param id_separator:
Separator to use in IDs (default ‘_’)
- type id_separator:
str
- param verbose:
If True, prints statistics and other outputs
- type verbose:
bool
- returns:
DataFrame indexed by new_admin_id_col with diagnostics column
- rtype:
pd.DataFrame
- raises ValueError:
If unable to generate unique IDs for all rows
- openplaces.io.admin.update_admin_spine(level, admin_recipe_id, test, silent=False)#
Update the openplaces admin spine with admin recipe info
- Parameters:
level (int) – Administrative level of the spine to update
admin_recipe (str) – ID of admin recipe to update the spine with (This function assumes the recipe is already ingested.)
test (bool) – If True, writes to ‘{file}_test.csv’ instead of the original
silent (bool) – If True, silences printouts when new admin IDs are added.