Add Name Recognition/Matching #8

atla5 · 2016-04-26T04:38:07Z

A drawback of using this system as opposed to others as it stands now is that there is no existing name recognition or system for flagging typos caused by the OCR or those who originally input the names.

In what will be the 0.1.0 release of the program, the same person with the same exact name may be logged multiple times depending on the parsing preferences set for the specificity. This is buggy and should be fixed for other releases. (check for previous occurrences on item.addContributor() call)
Adding a system whereby each new name was added to some sort of hash table and a fuzzy match algorithm would suggest what might be a typo in the file before doing the actual export might help prevent a lot of errors.
A deeper implementation could attempt to create a unique author identifier resistant to shortened or nick- names (Tom -> Thomas), middle names (George Bush + George W. Bush), etc. might help sync up otherwise disparate entries.

The text was updated successfully, but these errors were encountered:

atla5 · 2016-06-23T16:42:38Z

see issue #24. will implement this in another repo and use it as a library.

atla5 added bug enhancement labels Apr 26, 2016

atla5 modified the milestone: Release 2 (0.2.0) Jun 10, 2016

atla5 added [difficulty] med [priority] med labels Jun 10, 2016

atla5 added duplicate and removed bug labels Jun 23, 2016

atla5 closed this as completed Jun 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Name Recognition/Matching #8

Add Name Recognition/Matching #8

atla5 commented Apr 26, 2016 •

edited

Loading

atla5 commented Jun 23, 2016

Add Name Recognition/Matching #8

Add Name Recognition/Matching #8

Comments

atla5 commented Apr 26, 2016 • edited Loading

atla5 commented Jun 23, 2016

atla5 commented Apr 26, 2016 •

edited

Loading