-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NER et al #130
Comments
OCR from SPA
Corrected text
|
Test with chatGPT prompt
|
Test Erik Sparre prompt - very bad OCR
|
Carl Rydqvist
Output
|
WD Q6166587
|
|
|
ChatGPT promot with Wikidata info ChatGPT
|
@fredrik1984 @Lottabrorsson isnt it easier that you update directly in WIkidata adding a correct date it is not rocket science...
I think a more structured approach is that the book is transformed to TEI at least the articles... with well defined tags for
some of the issues I see with current Wikidata PM data
get a good linked data structure of Swedish parties - we need an issue for this in your backlog
the same with electional districts.... today I see some maybe good articles in sv.Wikipedia and the work done by other but we miss
SPARQL for Swedish PM first / sec chamber
Electional Districts
Entity schema
One way to get better quality in Wikidata is using Entity Schemas - I started on one see #129
NER on SPA data to Quality check and extract parties....
WD Q1448829 has two SPA properties P4819
I have started to add book references ---> sj9PGLAlnmUAAAAAABfkWw is Tvåkammar-riksdagen 1867–1970
SPA API --> endpoints/portraits.php?id=sj9PGLAlnmUAAAAAABfkWw you have the text in json
SPA PortraitCatalog": ["Tvåkammar-riksdagen 1867-"
SPA started as a one man scanning project and some good programmers built him an application --> you have structured data BUT its not 100 percent e.g . PortraitCatalog": ["Tvåkammar-riksdagen 1867-"] could be from the book but is nit always
The text was updated successfully, but these errors were encountered: