Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Name Recognition/Matching #8

Closed
3 tasks
atla5 opened this issue Apr 26, 2016 · 1 comment
Closed
3 tasks

Add Name Recognition/Matching #8

atla5 opened this issue Apr 26, 2016 · 1 comment

Comments

@atla5
Copy link
Member

atla5 commented Apr 26, 2016

A drawback of using this system as opposed to others as it stands now is that there is no existing name recognition or system for flagging typos caused by the OCR or those who originally input the names.

  • In what will be the 0.1.0 release of the program, the same person with the same exact name may be logged multiple times depending on the parsing preferences set for the specificity. This is buggy and should be fixed for other releases. (check for previous occurrences on item.addContributor() call)
  • Adding a system whereby each new name was added to some sort of hash table and a fuzzy match algorithm would suggest what might be a typo in the file before doing the actual export might help prevent a lot of errors.
  • A deeper implementation could attempt to create a unique author identifier resistant to shortened or nick- names (Tom -> Thomas), middle names (George Bush + George W. Bush), etc. might help sync up otherwise disparate entries.
@atla5
Copy link
Member Author

atla5 commented Jun 23, 2016

see issue #24. will implement this in another repo and use it as a library.

@atla5 atla5 closed this as completed Jun 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant