Here are
70 public repositories
matching this topic...
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Updated
Jul 21, 2021
Python
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Updated
May 5, 2021
JavaScript
A toolkit for record linkage and duplicate detection in Python
Updated
Apr 28, 2021
Python
🆔 Command line tool for deduplicating CSV files
Updated
Mar 31, 2020
Python
🆔 Examples for using the dedupe library
Updated
Jun 17, 2021
Python
A list of free data matching and record linkage software.
Implementation in Apache Spark of the EM algorithm to estimate parameters of Fellegi-Sunter's canonical model of record linkage.
Link Discovery Framework for Metric Spaces.
Updated
May 12, 2021
JavaScript
Record Linkage ToolKit (Find and link entities)
Updated
Jul 15, 2021
Python
Link Wikidata items to large catalogs
Updated
Aug 19, 2021
Python
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Updated
Jul 16, 2021
Jupyter Notebook
Resources for tackling record linkage / deduplication / data matching problems
Python implementation of anonymous linkage using cryptographic linkage keys
Updated
Aug 17, 2021
Python
Distributed Bayesian Entity Resolution in Apache Spark
Updated
Jun 10, 2021
Scala
A simple command line interface to the datamade/dedupe library.
Updated
Jun 9, 2021
Jupyter Notebook
CLK hash: hash pii for entity matching
Updated
Aug 11, 2021
Python
Merge Dirty Data with Clean Reference Tables
Updated
Aug 3, 2021
Python
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
Updated
Aug 7, 2021
Python
A browser user interface for manual labeling of record pairs.
Updated
Aug 13, 2021
JavaScript
Phonetic Spelling Algorithms in R
Learned string similarity for entity names using optimal transport.
Updated
Nov 17, 2020
Python
Fork of the Freely Extensible Biomedical Record Linkage program
Updated
Nov 4, 2016
Python
Privacy Preserving Record Linkage Service
Updated
Aug 19, 2021
Python
Examples of spark-lucenerdd
Updated
Jun 2, 2021
Scala
A maximum-strength name parser for record linkage.
Updated
Jul 11, 2021
Python
Python implementations of record linkage blocking techniques.
Updated
May 31, 2021
Python
Tools for EHR patient de-duplication (aka entity resolution)
Updated
May 4, 2018
Python
A Python package for efficient evaluation based on OASIS (Optimal Asymptotic Sequential Importance Sampling).
Updated
Jun 4, 2021
Python
Improve this page
Add a description, image, and links to the
record-linkage
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
record-linkage
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
Is your feature request related to a problem? Please describe.
Currently,
MapTypeare not supported for Spark DataFramesDescribe the solution you'd like
Add support for MapType Spark DataFrame columns
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other co