Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find Samples and Genotype from selected haplotype #482

Open
Don-Isdale opened this issue Feb 19, 2025 · 0 comments
Open

Find Samples and Genotype from selected haplotype #482

Don-Isdale opened this issue Feb 19, 2025 · 0 comments
Assignees

Comments

@Don-Isdale
Copy link
Collaborator

Introduction

A recurrent work-flow in the User Stories 2 and 3 is to select a number of samples and then click on SNPs to sort the displayed samples by their genotype values at those SNPs. This enables the user to identify a haplotype using Genotype values, i..e Alt / Ref, at those SNPs and find samples which have the chosen haplotype .

This operation sorts the samples which are loaded into the frontend GUI, based on the Genotype values which are loaded in the frontend.
For a dataset with 300 samples this is fine, but with AGG datasets with 30000 samples, it becomes necessary to do this operation on the server ("backend") to effectively say how many samples in the collection have the identified haplotype, and retrieve their genotype values.

This issue discusses a proposed feature enabling this work-flow :

  • after identifying the desired haplotype by selecting SNPs, and identifying Alt/Ref at each SNP, the user will be able to view the samples which have that haplotype
  • the Genotype values can then be requested for these samples

GUI design

A simple way to trial this functionality is to add a button in the Genotype table control dialog which will narrow the list of available samples to just those which have the identified haplotype.
The user can the select some or all samples from this list, and proceed with the VCF Lookup request as in existing work-flows.

Design of lookup functionality on the server

The method of using vcftools to implement this will be based on specifying which SNPs have Alt and Ref, using bcftools --include with --regions (e.g. `bcftools view -e 'GT="0/0"' -r chr1:123456,chr2:7891011' as in this LLM answer.

Task Structure

The work breakdown is as follows :

  • testing of bcftools commands on VCF files with large number of samples to gauge the performance of this function
  • add the server endpoint which implements this function
  • add the button which gets the list of samples and displays it on the available samples list

Each of these will be described and tracked by a sub-issue which will be a part of this issue.

@Don-Isdale Don-Isdale self-assigned this Feb 19, 2025
Don-Isdale added a commit that referenced this issue Feb 24, 2025
closes #484, #485, part of #482.

manage-genotype.hbs : add
 input checkbox filterSamplesByHaplotype
 span.badge .snpsInBrushedDomain.length, with tooltip .snpsInBrushedDomain value{_0,s.{ref,alt}}, .featureFiltersCount, .matchRefNumeric.
 use .matchRefNumeric in Sample Filters : Feature / SNP

manage-genotype.js :
 add filterSamplesByHaplotype in .args.userSettings, to enable filtering of available samples by selected SNPs
 add selectedSNPsInBrushedDomain(), snpsInBrushedDomain() to display SNP count badge.
 vcfGenotypeSamplesDataset() : add filterByHaplotype; in this case update .selectedSamples and .selectedSamplesText, otherwise don't update sampleCache.sampleNames and datasetStoreSampleNames().
  vcfGenotypeSamples(): don't throttle if .filterSamplesByHaplotype - it's valid for user to repeat the request after changing selected SNps.
  ensureSamplesForDatasetTabEffect() : if .filterSamplesByHaplotype then request samples, regardless of .vcfGenotypeSamplesText being already defined.

feature.js : add matchRefNumeric().

auth.js : genotypeSamples() : add param filter, replacing options which is not required.
block.js :
  genotypeSamples() : add param filter.
  vcfGenotypeSamples() : add param filter, use vcfGenotypeSamplesFiltered() when filter.

vcfGenotypeLookup.bash :
  argVal : recognise command-line parameter GT=gtMatch, not added to preArgs.
   bcftoolsCommand() : add filter_samples command, implemented as bcftools query | grep gtMatch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant