Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Record Linkage

Often data linkage is not so straightforward as linking on common columns. Indeed, the data sources may store data in very different ways and the linking may need to probabilistically match entries that appear to relate to the same entity. This is typified by attempting to match names and addresses from different data sources. The entities we are attempting to match could be businesses, patients, and clients.

A very useful tool to help out in this process is the open source Febrl.



Copyright © 2004-2006 Graham.Williams@togaware.com
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.