Here are
77 public repositories
matching this topic...
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Updated
May 8, 2022
Python
A powerful and modular toolkit for record linkage and duplicate detection in Python
Updated
Apr 19, 2022
Python
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Updated
May 5, 2021
JavaScript
🆔 Command line tool for deduplicating CSV files
Updated
Mar 31, 2020
Python
🆔 Examples for using the dedupe library
Updated
Jan 19, 2022
Python
A list of free data matching and record linkage software.
Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Link Discovery Framework for Metric Spaces.
Updated
Mar 11, 2022
JavaScript
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Updated
Apr 26, 2022
Jupyter Notebook
Record Linkage ToolKit (Find and link entities)
Updated
Dec 13, 2021
Python
Link Wikidata items to large catalogs
Updated
Dec 10, 2021
Python
Resources for tackling record linkage / deduplication / data matching problems
Python implementation of anonymous linkage using cryptographic linkage keys
Updated
May 4, 2022
Python
Distributed Bayesian Entity Resolution in Apache Spark
Updated
Jun 10, 2021
Scala
A simple command line interface to the datamade/dedupe library.
Updated
Nov 15, 2021
Jupyter Notebook
CLK hash: hash pii for entity matching
Updated
May 4, 2022
Python
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
Updated
Apr 10, 2022
Python
Merge Dirty Data with Clean Reference Tables
Updated
Aug 3, 2021
Python
Learned string similarity for entity names using optimal transport.
Updated
Nov 17, 2020
Python
A browser user interface for manual labeling of record pairs.
Updated
Apr 29, 2022
JavaScript
Phonetic Spelling Algorithms in R
Fork of the Freely Extensible Biomedical Record Linkage program
Updated
Nov 4, 2016
Python
Privacy Preserving Record Linkage Service
Updated
May 2, 2022
Python
A maximum-strength name parser for record linkage.
Updated
Oct 19, 2021
Python
Examples of spark-lucenerdd
Updated
Jun 2, 2021
Scala
Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).
Updated
Feb 21, 2019
Python
A Python package for efficient evaluation based on OASIS (Optimal Asymptotic Sequential Importance Sampling).
Updated
Jun 4, 2021
Python
Python implementations of record linkage blocking techniques.
Updated
May 2, 2022
Python
Improve this page
Add a description, image, and links to the
record-linkage
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
record-linkage
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
Is your feature request related to a problem? Please describe.
Currently,
MapTypeare not supported for Spark DataFramesDescribe the solution you'd like
Add support for MapType Spark DataFrame columns
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other co