Fetch an open access PDF via P953 #235

Futur3r · 2023-01-21T10:17:00Z

Wikidata can function as a hub to automatically find an open access PDF version for a Zotero item if it as a QID.

Describe the solution you'd like
If the Zotero PDF scraper doesn't find any open access PDF of an article on a webpage, Cita could fetch an open access URL of this article via the property P953 of a wikidata element, if available, and give it back to the Zotero PDF scraper for an automatic second try.

For this task, the Hub could be used, by building this kind of URL for example:
https://hub.toolforge.org/[QID]?property=P953
The Hub would return the value of P953, for example with the element Q114149071 -> test.
Note: if the property doesn't exist, the Hub returns the URL of the element on wikidata.org (test), so a simple if statement would be needed to check if the P953 of an element exist.

The automatic way:

the user use the Zotero browser add-on
as the zotero item is created, Cita fetch it's QID if available
if Zotero fail to download the PDF, Cita go fetch the P953 URL and give it to the Zotero PDF scraper

Note: to not over-complicate things for the user, if Cita doesn't find the QID on the first try, do not prompt any error, maybe just a debug().

The semi-automatic way:

the user doesn't have a PDF attach to his Zotero item
the user use the function to fetch the QID of the item
if upon fetching the QID the property P953 is available, start the Zotero PDF scraper with the URL of the P953

Note: the user would have enabled this functionality of auto-scraping the PDF via wikidata, in the Cita preferences (the functionality would probably be enabled by default)

The manual way:

the user use the "Find the PDF" functionality of Zotero in the 'right-click' menu of the library
if Zotero fail to download the PDF, Cita go fetch the P953 URL and give it to the Zotero PDF scraper

Note: some P953 doesn't reference a PDF but webpage with text as this one, maybe in that case Cita would open the page on browser (like for QuickStatement) or a snapshot of the webpage could be made. It's maybe something that could be added directly in the translator of this website, I don't know ..?
Also, do the Zotero PDF scraper need the URL of the PDF directly, or does it use a translator to find the URL of the PDF on a webpage ?

The zotero-scihub add-on implements similar functionalities.

The text was updated successfully, but these errors were encountered:

Futur3r · 2023-01-21T20:57:18Z

Or, the easy way, when Cita as fetched the QID of a zotero item, it just change the URL value of the item with the one of the P953.
That way there is no need to modify the existing PDF scraper of zotero.

Also, an option "Fetch Open Access URL" could be added in the menus.

Dominic-DallOsto · 2023-01-22T11:20:31Z

This looks like how the attachment is added

https://github.com/ethanwillis/zotero-scihub/blob/ecc63def1bea5cee3e342f832f8c743f4d3b61a0/content/zoteroUtil.ts#L7

Basically we just give a URL to the PDF and Zotero should do the rest.

Adding this as a new function should be easy enough. I'd have to check how easy/hard it is to integrate a new PDF provider into Zotero's "PDF finder".

Is there any way we can quantify the rough number of items (for scholarly articles) that have P953 but Zotero won't already find a PDF for them? Or at least what proportion of scholarly items on WD have P953? Like, is this change likely to find a lot of PDFs that wouldn't already be found?

Futur3r · 2023-01-22T14:10:41Z

And I think the Zotero function for the PDF scraper is this one.

I started to code an option in the items submenu of the library (the easy way). I am adding the option "fetch Open Access urls".
It maybe is redundant, but easier to code for me. I'll make a PR.

There is currently 2 564 303 WD elements with a P953 statement.
The best query would be this one but it times out.
I got 1 351 783 just for scholarly articles that have a P953. The total number of scholarly articles on WD is 38 856 462.

And the data in WD is constantly increasing, easy to check and contribute by human, so it will only go up.
Also, WD is kind of the only way to do this for any scientific work, anywhere on the web.
I frequently find articles, booksections, ... behind paywalls but available in ResearchGate or HAL.

Futur3r · 2023-01-23T20:03:10Z

I've heard that some years ago, the EU passed a law that authorize European researchers to publish the manuscript of their papers anywhere they want, 6 month after the date of publication in a journal.

So this functionality can be quite handy.

Futur3r changed the title ~~Find an open access PDF via P953~~ Fetch an open access PDF via P953 Jan 21, 2023

Futur3r mentioned this issue Jan 22, 2023

Update the URL of an item via wikidata P953 value #237

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch an open access PDF via P953 #235

Fetch an open access PDF via P953 #235

Futur3r commented Jan 21, 2023 •

edited

Loading

Futur3r commented Jan 21, 2023 •

edited

Loading

Dominic-DallOsto commented Jan 22, 2023

Futur3r commented Jan 22, 2023 •

edited

Loading

Futur3r commented Jan 23, 2023 •

edited

Loading

Fetch an open access PDF via P953 #235

Fetch an open access PDF via P953 #235

Comments

Futur3r commented Jan 21, 2023 • edited Loading

Futur3r commented Jan 21, 2023 • edited Loading

Dominic-DallOsto commented Jan 22, 2023

Futur3r commented Jan 22, 2023 • edited Loading

Futur3r commented Jan 23, 2023 • edited Loading

Futur3r commented Jan 21, 2023 •

edited

Loading

Futur3r commented Jan 21, 2023 •

edited

Loading

Futur3r commented Jan 22, 2023 •

edited

Loading

Futur3r commented Jan 23, 2023 •

edited

Loading