agrc-sweeper 
fix data
Available Sweepers
Addresses
Checks that addresses have minimum required parts and optionally normalizes them.
Duplicates
Checks for duplicate features.
Empties
Checks for empty geometries.
Metadata
Checks to make sure that the metadata meets the SGID Metadata Minimum Requirements Document.
Tags
Checks to make sure that existing tags are cased appropriately. This mean that the are title-cased other than known abbreviations (e.g. AGRC, BLM) and articles (e.g. a, the, of).
This check also verifies that the data set contains a tag that matches the database name (e.g. SGID) and the schema (e.g. Cadastre).
--try-fix adds missing required tags and title-cases any existing tags.
Summary
Checks to make sure that the summary is less than 2048 characters (a limitation of AGOL) and that it is shorter than the description.
Description
Checks to make sure that the description contains a link to a data page on gis.utah.gov.
Use Limitations
Checks to make sure that the text in this section matches the official text for AGRC.
--try-fix updates the text to match the official text.
Parsing Addresses
This project contains a module that can be used as a standalone address parser, sweeper.address_parser. This allows developer to take advantage of sweepers advanced address parsing and normalization without having to run the entire sweeper process.
Usage Example
from sweeper.address_parser import Address
address = Address('123 South Main Street')
print(address)
'''
--> Parsed Address:
{'address_number': '123',
'normalized': '123 S MAIN ST',
'prefix_direction': 'S',
'street_name': 'MAIN',
'street_type': 'ST'}
'''Available Address class properties
All properties default to None if there is no parsed value.
address_number
address_number_suffix
prefix_direction
street_name
street_direction
street_type
unit_type
unit_id
If no unit_type is found, this property is prefixed with # (e.g. # 3). If unit_type is found, # is stripped from this property.
city
zip_code
po_box
The PO Box if a po-box-type address was entered (e.g. po_box would be 1 for p.o. box 1).
normalized
A normalized string representing the entire address that was passed into the constructor. PO Boxes are normalized in this format PO BOX <number>.
Installation (requires Pro 2.5+)
- create conda environment
conda create --clone arcgispro-py3 --name sweeper
- activate environment
activate sweeper
- install sweeper
pip install agrc-sweeper
- run cli for docs
sweeper
Development
- create conda environment
conda create --clone arcgispro-py3 --name sweeper
- activate environment
activate sweeper
test_metadata.pyuses a SQL database that needs to be restored viasrc/sweeper/tests/data/Sweeper.bakto your local SQL Server.
Installing dependencies
- install only required dependencies to run sweeper
pip install -e .
- install required dependencies to work on sweeper
pip install -e ".[develop]"
- install required dependencies to run sweeper tests
pip install -e ".[tests]"
- run tests:
pytest
Uploading to pypi.org
- Bump
versioninsetup.py python setup.py sdist bdist_wheeltwine upload dist/*(pip install twine, if needed)
