Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: estimate the accuracy of batch accepting best match #1

Open
mguidoti opened this issue Sep 20, 2019 · 1 comment
Open

stats: estimate the accuracy of batch accepting best match #1

mguidoti opened this issue Sep 20, 2019 · 1 comment

Comments

@mguidoti
Copy link
Collaborator

mguidoti commented Sep 20, 2019

@tcatapano suggested (2019-09-17,, Paris) the idea of a accuracy test based on manually checking a given number of entries (10-20% of Taxodros dataset), randomly selected, multiple times (5-10x).

This should be part of the O3RT paper, and cited in the Taxodros paper. The list of the subsets should also be available somewhere for publication purposes.

@mguidoti mguidoti self-assigned this Sep 20, 2019
@mguidoti mguidoti changed the title Estimate the accuracy of batch accept best match Estimate the accuracy of batch accepting best match Sep 20, 2019
@mguidoti mguidoti changed the title Estimate the accuracy of batch accepting best match stats: estimate the accuracy of batch accepting best match Aug 25, 2020
@mguidoti
Copy link
Collaborator Author

Testing Protocol

Datasets

Dataset # of Papers
Poa@Plazi Members' Publications Dataset 25 papers, 20 with known DOIs
Grazia Dataset 51 publications, 35 with known DOIs
Covid-19 Task Force Database 25 publication, 25 with known DOIs

Test 001

Refindit not hitting Datacite API for some reason
Done in August 20th, 2020.

Matching Success Rate

Dataset Raw %
Poa@Plazi 17(+1)/20(+1) 85%
Grazia 32(+2 from Zenodo) 91.42%
Covid-19 20/25 80%

Accuracy

Dataset Raw % Details
Poa@Plazi 18/18 100% 17 Matched & Equal DOIs, 1 new DOI found!, 3 Not Matched
Grazia 32/32 100% 32 Matched & Equal DOIs, 1 Not Matched, 2 Not Matched (Zenodo)
Covid-19 19/20 95%% 16 Matched & Equal DOIs, 3 Matched, different DOIs due to DOI duplication, 1 Matched, publons DOI for some reason, 5 Not matched

Summary

Dataset # of Papers Matching Rate Accuracy Rate
Plazi@Poa 25 85% 100%
Grazia 51 91.42% 100%
Covid-19 25 80% 95%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant