Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bringing Crossref, Semantic Scholar, Open Citations and Open Alex lookup + auto-import to Cita for Zotero 7 #300

Open
wants to merge 19 commits into
base: zotero7
Choose a base branch
from

Conversation

thebluepotato
Copy link

Hi! I adapted the (now stale) PR #139 to the new Zotero 7 branch so it has a chance to be swept up in the new release. The general logic is unchanged from the other PR, but I've made quite a few updates for efficiency, code clarity and type safety as well as fixed a few failing Promises here and there. I've tested quite a bit already, but it could definitely use more in-depth testing.

And I've also added a button to citations to auto-import that reference into Zotero with one click and then link it. It's similar to what https://github.com/MuiseDestiny/zotero-reference does, but I find that addon confusing at best and it doesn't help that all the info is in Mandarin Chinese...

All in all, probably still a WIP, but happy to receive code reviews and have some people test this!

@thebluepotato thebluepotato marked this pull request as draft September 22, 2024 21:22
@thebluepotato thebluepotato marked this pull request as ready for review September 22, 2024 21:22
@Dominic-DallOsto
Copy link
Collaborator

Thanks a lot for this!

It'll take me a little while to review this in detail sorry, but this is great!

@thebluepotato
Copy link
Author

One thing that could/should be considered, is that while adding the references for which Crossref has a DOI or ISBN is quite robust, adding items as book or journal merely on the title that they have is unsatisfactory. For instance, with DOI:10.1145/2786451.2786465, some of the references are sections from the same book (a different author per section), yet they all appear in Crossref as Author + Book title (instead of section title). Maybe it should be up to the user to enable what is actually imported.

To avoid type errors and to avoid overusing `any`, I copied the TypeScript definitions from zotero/translators and slightly tweaked them.
@thebluepotato thebluepotato changed the title Bringing Crossref lookup and auto-import to Cita for Zotero 7 Bringing Crossref, Semantic Scholar and Open Alex lookup + auto-import to Cita for Zotero 7 Sep 26, 2024
@thebluepotato
Copy link
Author

thebluepotato commented Sep 26, 2024

The latest commit adds a new IndexerBase abstract class that abstracts the common logic between various "indexers" (couldn't think of a better name). This allows us to more simply add various such "indexers", which now includes Semantic Scholar and Open Alex as well. They all have their pros and cons, but this should give the user a lot of options to automatically fetch these citations.

Based on initial (limited) experimentation:

  • Crossref: citations seem more "official" than the other sources, but not all items with DOIs have references
  • Semantic Scholar: because it analyses the indexed papers, it includes many references, but also some random entries that are not actually cited
  • Open Alex: has usually fewer citations than the others

One issue that this "abstraction" brings is that the context menu when clicking on an item shows the translation keys instead of the corresponding strings.

@Dominic-DallOsto
Copy link
Collaborator

Hi, I just had a chance to quickly test this and so far things look nice, thanks so much! I haven't been able to fully review the code yet but here are some observations from testing.

Openalex build error

I get the following build error at the moment because of the openalex-sdk. Did you encounter this on your end?

    node_modules/openalex-sdk/dist/src/utils/works.js:7:37:
      7 │ const fs_1 = __importDefault(require("fs"));~~~~

  The package "fs" wasn't found on the file system but is built into node. Are you trying to bundle
  for node? You can use "platform: 'node'" to do that, which will remove this error.

I removed the openalex SDK to test a bit further.

Auto import citations

Firstly, the auto import by identifier button is really nice! It would solve #40. One thing that might also be nice is, if the citation already has a QID attached, that this should be applied to the newly created item when it's imported?

Getting Crossref citations

Testing the auto import of citations from crossref I found some bugs, but they're mostly related to crossref's data so it was just unlucky I happened to pick a bad item haha

  1. Add this item by DOI - 10.1007/BF01700692
  2. Get citations from crossref
    • newlines in text aren't rendered properly
    • it says I will get 64 citations

image

  1. Press OK
    • actually I only get 2 citations, and they're both the same
      • Checking the API response, this is actually a crossref problem:
      • we get a response with 64 citations, but 62 are unstructured - maybe the message could be edited to exclude unstructured citations if we don't attempt to parse them, or a message after importing could say "imported 2/64 citations from crossref"
      • here crossref is just a bit strange in that 2 of the references have the same DOI. Could we check for duplicates within the crossref response and remove them?

image

Getting Semantic Scholar citations

I tested with using the item with DOI - 10.1109/ITW.2015.7133169. It got 11/14 citations because 3 had no identifiers in semantic scholar. The request was very slow though compared to getting citations from crossref. Here is an overview of the timing.

image

The slowdown is because the requests to arxiv are really slow. I tested the same request in the browser and it also took ~10 seconds to complete, so it doesn't seem that this is problem with Cita. Does arxiv have an alternative (faster) API? Maybe a workaround would be to update the progress message with the number of citations already downloaded, so users can see that things are progressing?

@thebluepotato
Copy link
Author

thebluepotato commented Sep 30, 2024

Openalex build error

I get the following build error at the moment because of the openalex-sdk. Did you encounter this on your end?

Yes sorry, I'm actually entirely new to npm so I forgot to commit the patch to openalex-sdk, fixed in latest commit.

Auto import citations

Firstly, the auto import by identifier button is really nice! It would solve #40. One thing that might also be nice is, if the citation already has a QID attached, that this should be applied to the newly created item when it's imported?

I didn't really look into the Wikidata side of things, but will definitely look into ensuring the QID is imported as well. Is it usually stored in the Extra field?

  • Import QID

Getting Crossref citations

Testing the auto import of citations from crossref I found some bugs, but they're mostly related to crossref's data so it was just unlucky I happened to pick a bad item haha

  • Get newlines to show in the alert
  • Rephrase alert to clarify (parsed does not mean the citations will be added in the end, rephrase)
  • Apply duplicate filter to the citations to be added as well

Getting Semantic Scholar citations

I tested with using the item with DOI - 10.1109/ITW.2015.7133169. It got 11/14 citations because 3 had no identifiers in semantic scholar. The request was very slow though compared to getting citations from crossref. Here is an overview of the timing.

As it currently stands, the PR relies heavily on Zotero's own existing translators to avoid doing too much heavy lifting and to avoid code duplication. Therefore, if it's slow to import with Cita, it's also slow to import when using the "magic wand" tool that imports items based on their identifiers. Will look into alternatives, but it seems likely that Zotero's own translator is already quite optimized as it is.

@thebluepotato
Copy link
Author

thebluepotato commented Oct 1, 2024

Regarding arXiv, I updated the translator locally (see: zotero/translators#3366) to use another endpoint which, based on limited testing, should be faster than the one the translator currently uses. However, when testing within Cita, it's just as slow...

EDIT: rather, depending on luck I guess, it can be as "fast" as 1s per request, but still can sometimes be as slow as the other endpoint.

@Dominic-DallOsto
Copy link
Collaborator

Dominic-DallOsto commented Oct 3, 2024

That's great, thanks a lot! And thanks for addressing the issues with the arXiv translator, doing it upstream in Zotero is definitely the right way.

A couple of little things I noticed:

  • If I right click an item, in the Cita menu it says "Get citations from Semantic" instead of "Get citations from Semantic Scholar" like it says in the More... menu
  • If I have an item that only has as ISBN, in the right click menu all the options for getting citations are still enabled, whereas in the More... menu they're all rightfully disabled

Otherwise this all looks good

@thebluepotato thebluepotato changed the title Bringing Crossref, Semantic Scholar and Open Alex lookup + auto-import to Cita for Zotero 7 Bringing Crossref, Semantic Scholar, Open Citations and Open Alex lookup + auto-import to Cita for Zotero 7 Oct 4, 2024
@thebluepotato
Copy link
Author

Got a little crazy and added OpenCitations capabilities again. However, within all the confusion, I need your input on whether we could/should expand the definition of PIDType to include all "IDs" we're now using and that the various indexers support searching for, or at least OpenAlex identifier and Semantic Scholar Corpus ID. In particular, it would streamline the code by using getPID everywhere

@thebluepotato
Copy link
Author

  • If I have an item that only has as ISBN, in the right click menu all the options for getting citations are still enabled, whereas in the More... menu they're all rightfully disabled

For this, I'd like to improve the logic so it is only disabled when no supported identifiers are present. While CrossRef requires a DOI, the other indexers often can search with more identifiers.

@Dominic-DallOsto
Copy link
Collaborator

Got a little crazy and added OpenCitations capabilities again. However, within all the confusion, I need your input on whether we could/should expand the definition of PIDType to include all "IDs" we're now using and that the various indexers support searching for, or at least OpenAlex identifier and Semantic Scholar Corpus ID. In particular, it would streamline the code by using getPID everywhere

Yeah, I think that's great to abstract this out like you have.

For this, I'd like to improve the logic so it is only disabled when no supported identifiers are present. While CrossRef requires a DOI, the other indexers often can search with more identifiers.

Yeah, that makes sense. I guess how you've set it up you could just check whether IndexerBase.extractSupportedUID returns null? Maybe it'd be nice to have a specific function that does this check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants