Skip to content

Entity Resolution

Entity resolution is the task of deciding whether two entity descriptions refer to the same real-world entity.

According to Wikipedia, Entity Resolution or "Record Linkage" is the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, and databases).

Entity Resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference. A data set that has undergone RL-oriented reconciliation may be referred to as being cross-linked.

Use of Machine Learning to Perform Entity Resolution

In the past, deterministic rules and graph algorithms were used to perform entity resolution. Since 2020, machine learning and large-language models such as GPT-4 have been used to find the best matches.

The site code with papers has an entity resolution leaderboard that tracks the leading machine-learning implementations of these services.

Vendors

  1. IBM Quality Stage
  2. LiveRamp
  3. Signal
  4. Tapad
  5. Amperity
  6. Senzing
  7. Zeta Global
  8. SAS Dataflux
  9. FICO
  10. Throtle
  11. Infutor
  12. Merkle
  13. Criteo
  14. Acxiom
  15. Data Ladder
  16. Neustar

References