Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiple threshold clustering to linker #2617

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

RossKen
Copy link
Contributor

@RossKen RossKen commented Feb 11, 2025

Type of PR

  • BUG
  • FEAT
  • MAINT
  • DOC

Is your Pull Request linked to an existing Issue or Pull Request?

Give a brief description for the solution you have provided

Adding linker.clustering.cluster_pairwise_predictions_at_multiple_thresholds to the linker to provide similar behaviour to linker.clustering.cluster_pairwise_predictions_at_threshold, including outputting node information alongside the cluster_ids (unlike the non-linker version).

Part of a wider piece of work to generate graph metrics for multi-threshold clustering (graph metrics requires a linker).

One decision point I would appreciate a second opinion on is the treatment of cluster_summary_stats. In the non-linker version there is a parameter to return summary statistics as opposed to the whole result. Here I have added the summary stats to a metadata dictionary (inspired by the Andy's graph metrics). I think this covers all bases, but open to any other suggestions. Or if there is a better way to structure the code to make it more streamlined.

PR Checklist

  • Added documentation for changes
  • Added feature to example notebooks or tutorial (if appropriate)
  • Added tests (if appropriate)
  • Updated CHANGELOG.md (if appropriate)
  • Made changes based off the latest version of Splink
  • Run the linter
  • Run the spellchecker (if appropriate)

@RossKen RossKen marked this pull request as ready for review February 12, 2025 11:28
@RossKen RossKen requested review from RobinL and ADBond February 12, 2025 11:28
@RossKen
Copy link
Contributor Author

RossKen commented Feb 12, 2025

FYI, still need to add tests but marking as ready for review to get feedback on the general concepts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant