Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accomodate pubMLST authentication change #186

Closed
Vince-janv opened this issue Nov 8, 2024 · 6 comments · Fixed by #188
Closed

Accomodate pubMLST authentication change #186

Vince-janv opened this issue Nov 8, 2024 · 6 comments · Fixed by #188
Assignees

Comments

@Vince-janv
Copy link
Contributor

Vince-janv commented Nov 8, 2024

Description

According to this information the pubMLST database will require users to authenticate themselves to perform any type of requests (even GETs).

In microSALT/utils/referencer.py microSALT fetches stuff from there. This is performed on the first micro-analysis run every day.

Not authenticating will result in not being able to fetch recent data

API docs: https://bigsdb.readthedocs.io/en/latest/rest.html#api-oauth

@Vince-janv
Copy link
Contributor Author

Technical refinement:

  • Use prod-user to authenticate
  • Use credentials to authenticate the requests

@ahdamin
Copy link
Contributor

ahdamin commented Nov 28, 2024

To clarify the issue:

New Workflow:

  1. Run the Download Script:
    The user executes the download script using the following command to run a script similar to this
python microSALT/utils/bigsdb_downloader.py \
    --key_name testApp \
    --token_dir ./tokens \
    --url https://rest.pubmlst.org/db/pubmlst_neisseria_seqdef/schemes/1/profiles \
    --method GET \
    --output_file output.json \
    --site PubMLST
  1. Prompt for API Key and Secret
    Upon running the script, the user is prompted to enter their API key and secret:
Please enter your API key: [User inputs API key]
Please enter your API secret: [User inputs API secret]

Which we get from the account:
Image

  1. Receive Authorization URL:
    The script generates an authorization URL and instructs the user to visit it:
Please log in using your user account at:
https://pubmlst.org/bigsdb?db=pubmlst_neisseria_seqdef&page=authorizeClient&oauth_token=SomeToken
using a web browser to obtain a verification code.
Please enter verification code:
  1. Authenticate via Web Browser:
  • The user opens the provided URL in a web browser.
  • Logs in with their PubMLST username and password.
  1. Authorize the Application:
  • After logging in, the user clicks the Authorize button to grant the application access.
  1. Obtain Verification Code:
  • The website displays a verification code (e.g., xxxxxaBC).
  • The user copies this code.
  1. Pass the Verification Code into the Script:
Please enter verification code: [User inputs verification code]
  1. The script confirms authentication and displays the access token and secret (valid forever**):
Access Token: [Access Token]
Access Token Secret: [Access Token Secret]
  1. Now, your app is authorized. But you will need to use the Client ID and the Client Secret in Addition to Access Token and Access Secret to get the session token and secret (these are valid for 12 hours)

  2. Now you are ready for API calls, each call will require:

  • Client ID
  • Client Secret
  • Session Token (NOT Access Token)
  • Session Secre (NOT Access Secret)

Then download the Database:
With authentication completed, the microSALT script can proceed to download data from the database.

@ahdamin
Copy link
Contributor

ahdamin commented Nov 28, 2024

Oh ! I forgot to add that pagination should also be considered when handling the response.

@ahdamin ahdamin self-assigned this Dec 4, 2024
@ahdamin ahdamin linked a pull request Dec 7, 2024 that will close this issue
4 tasks
@ahdamin
Copy link
Contributor

ahdamin commented Dec 14, 2024

Transition from unauthenticated API to the authenticated API

URLs

Non-authenticated use the http protocol, while authenticated endpoints require https. If any part of the code is handling URLs, it should be refactored to accommodate this.

Listing databases

The responses seemed to be identical if the http/https URLs are handled.

Fetch Schemes

Similar to databases if the URLs are handled.

Fetch Locus Metadata

The unauthenticated API responded with detailed data, while the authenticated API responded with some metadata links (curators, submissions, loci, etc). I will investigate additional endpoints to fetch equivalent data.

Download Locus

The unauthenticated API returned allele sequences in FASTA format, while the authenticated API was trying to return raw bytes. I will investigate that as well.

@ahdamin
Copy link
Contributor

ahdamin commented Dec 14, 2024

Authentication utils

The suggested solution is to have utilities handle the authentication and interactions with the PubMLST API. Here is a summary of the functionalities:

1. get_credentials.py

This script helps users obtain access token and access secret.

  • Inputs: The user needs to provide a client id and client secret, which can be requested via the PubMLST account page.
  • Authorization: a generated link will be generated for the user, which directs to authorize the application in their browser. After granting authorization, the user must enter the provided verification code.
  • Output: After a successful verification, the script generates and saves the access token and access secret.

2. authentication.py

These functions provide utilities for managing session tokens, which are short-lived and expire in 12 hours. It works with the credentials obtained earlier:

  • Inputs: The following attributes are required:
    • client id (or consumer key)
    • client secret (or consumer secret)
    • access token
    • access secret
  • Functionality: It handles the retrieval of session token and session secret and ensures re-authentication when the session token expires. Tokens will be saved and reused if still valid.

For more details about PubMLST authentication check the documentation here

3. client.py

Handles API interaction logic using the authentication utilities.

  • Authentication: All API calls require the following:
    • consumer key
    • consumer secret
    • session token
    • session secret
  • Endpoints:
    According to my understanding, the following will be needed (so far..):
    • Query available PubMLST databases.
    • Fetch MLST schemes for a specific database.
    • Download MLST profiles in CSV or JSON format.
    • Download locus sequence files.
    • Check database metadata (e.g., last update time).

Here is the documentation about available resources/endpoints.

@ahdamin
Copy link
Contributor

ahdamin commented Dec 17, 2024

Status update

Tested the current PR as follow:

[hiseq.clinical@hasta:/home/proj/stage/bin/git/microSALT] [base]  (add-pubmlst-utils) $ us                    [427/427]
[hiseq.clinical@hasta:/home/proj/stage/bin/git/microSALT] [S_base]  (add-pubmlst-utils) $ conda activate S_microSALT

[hiseq.clinical@hasta:/home/proj/stage/bin/git/microSALT] [S_microSALT]  (add-pubmlst-utils) $ cd /home/proj/stage/bin/
git/microSALT/
[hiseq.clinical@hasta:/home/proj/stage/bin/git/microSALT] [S_microSALT]  (add-pubmlst-utils) $ git checkout add-pubmlst
-utils
Already on 'add-pubmlst-utils'
[hiseq.clinical@hasta:/home/proj/stage/bin/git/microSALT] [S_microSALT]  (add-pubmlst-utils) $ pip install -e .

Obtaining file:///home/proj/stage/bin/git/microSALT
  Preparing metadata (setup.py) ... done
Requirement already satisfied: biopython==1.78 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages
(from microSALT==4.0.0) (1.78)
Requirement already satisfied: bs4==0.0.1 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages (from
 microSALT==4.0.0) (0.0.1)
Requirement already satisfied: click==7.1.2 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages (fr
om microSALT==4.0.0) (7.1.2)
Requirement already satisfied: flask==1.1.2 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages (fr
om microSALT==4.0.0) (1.1.2)
Requirement already satisfied: flask_sqlalchemy==2.4.4 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-p
ackages (from microSALT==4.0.0) (2.4.4)
Requirement already satisfied: pymysql==0.10.1 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages
(from microSALT==4.0.0) (0.10.1)
Requirement already satisfied: pyyaml==5.4.1 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages (f
rom microSALT==4.0.0) (5.4.1)
Requirement already satisfied: idna<4,>=2.5 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages (from requests->genologics==0.4.6->microSALT==4.0.0) (3.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages (from requests->genologics==0.4.6->microSALT==4.0.0) (1.26.20)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages (from requests->genologics==0.4.6->microSALT==4.0.0) (2.0.12)
Requirement already satisfied: certifi>=2017.4.17 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages (from requests->genologics==0.4.6->microSALT==4.0.0) (2021.5.30)
Requirement already satisfied: dataclasses in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages (from Werkzeug>=0.15->flask==1.1.2->microSALT==4.0.0) (0.8)
Requirement already satisfied: soupsieve>1.2 in /home/proj/stage/bin/miniconda3/envs/S_microSALT/lib/python3.6/site-packages (from beautifulsoup4->bs4==0.0.1->microSALT==4.0.0) (2.3.2.post1)
Installing collected packages: microSALT
  Attempting uninstall: microSALT
    Found existing installation: microSALT 4.0.0
    Uninstalling microSALT-4.0.0:
      Successfully uninstalled microSALT-4.0.0
  Running setup.py develop for microSALT
Successfully installed microSALT-4.0.0

Then executed the following command (as advised by KN earlier):

MICROSALT_CONFIG=/home/hiseq.clinical/.microSALT/microSALT.json microSALT analyse --input /home/proj/stage/microbial/fastq/adaptedstarfish/ --force_update /home/proj/stage/microbial/queries/adaptedstarfish.json

It seemed to be working:

INFO - Checking versions of references..
WARNING - Unable to find requested organism 'P. mirabilis' in pubMLST database
WARNING - Unable to find requested organism 'M. tuberculosis' in pubMLST database
WARNING - Unable to find requested organism 'M. malmoense' in pubMLST database
WARNING - Unable to find requested organism 'M. fortuitum' in pubMLST database
WARNING - Unable to find requested organism 'M. avium' in pubMLST database
WARNING - Unable to download genome '#N/A' from NCBI
INFO - pubMLST reference for Acinetobacter baumannii updated to 2024-12-05 from 2024-12-05
INFO - Profiles CSV downloaded to /home/proj/production/microbial/references/ST_profiles/acinetobacter_baumannii
INFO - Locus FASTA downloaded: Oxf_gltA.tfa
INFO - Locus FASTA downloaded: Oxf_gyrB.tfa
INFO - Locus FASTA downloaded: Oxf_gdhB.tfa
INFO - Locus FASTA downloaded: Oxf_recA.tfa
INFO - Locus FASTA downloaded: Oxf_cpn60.tfa
INFO - Locus FASTA downloaded: Oxf_gpi.tfa
INFO - Locus FASTA downloaded: Oxf_rpoD.tfa
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/acinetobacter_baumannii
INFO - pubMLST reference for Bacillus cereus updated to 2024-12-10 from 2024-12-10
INFO - Profiles CSV downloaded to /home/proj/production/microbial/references/ST_profiles/bacillus_cereus
INFO - Locus FASTA downloaded: glp.tfa
INFO - Locus FASTA downloaded: gmk.tfa
INFO - Locus FASTA downloaded: ilv.tfa
INFO - Locus FASTA downloaded: pta.tfa
INFO - Locus FASTA downloaded: pur.tfa
INFO - Locus FASTA downloaded: pyc.tfa
INFO - Locus FASTA downloaded: tpi.tfa
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/bacillus_cereus
INFO - pubMLST reference for Bacteroides fragilis updated to 2024-12-12 from 2024-12-12
INFO - Profiles CSV downloaded to /home/proj/production/microbial/references/ST_profiles/bacteroides_fragilis
INFO - Locus FASTA downloaded: dnaJ.tfa
INFO - Locus FASTA downloaded: fusA.tfa
INFO - Locus FASTA downloaded: groL.tfa
.
.
.
INFO - Saved Trailblazer slurm report file to /home/proj/production/microbial/results/reports/trailblazer/ACC16057_slurm_ids.yaml and /home/proj/production/microbial/results/ACC16057_2024.12.17_5.30.38/ACC16057_slurm_ids.yaml
INFO - Execution finished!

Remaining tasks

  • Resolve test issue on GitHub action: the current tests relied on unauthenticated endpoints. Now the credentials are needed, so we may need to update this to generate a credentials file using values from GitHub secrets
  • Review the changes introduced to the servers repo in this PR
  • Deploy to stage
  • Add config changes to microSALT production config file in the servers repo
  • Deploy to production
  • Update documentation

@karlnyr karlnyr closed this as completed Jan 7, 2025
@karlnyr karlnyr mentioned this issue Jan 7, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants