Association score

The Open Targets Platform allows prioritisation of drug targets based on the strength of their association with a disease.

We allow for the prioritisation of targets by scoring target-disease associations based on evidence from 19 data sources. Similar data sources (e.g. Open Targets Genetics Portal and PheWAS) are grouped together into data types (e.g. Genetic associations). The score for the associations ranges from 0 to 1; the stronger the evidence for an association, the stronger the association score (closer to 1). A score of 0 corresponds to no evidence supporting an association. In the Open Targets Platform, we represent the different scores with varying shades of blue: the darker the blue, the stronger the association.

What are the factors that affect the confidence we have in the evidence used for our associations? We assess key factors such as frequency, severity and significance of the evidence to provide association scores to help you answer these questions:

Which targets have the most evidence for being associated with a disease?
What is the relative weight of the evidence for different targets associated with a disease?

Our scoring framework is a four-tier process: we first score the individual evidence, then we aggregate the evidence scores into data sources scores, followed by the aggregation of data source scores to give rise to the data types scores. Our overall association score is the result of the aggregation of all data source scores together.

{% hint style="info" %} Pathways & systems biology is the new name for what we used to callAffected pathwaysin the user interface of Open Targets Platform. {% endhint %}

At each aggregation step, denoted by the sum symbol above, we apply the harmonic progression using the following formula as follows:

Computing the Association Score

We start by generating a score for each evidence from different data sources (e.g. European Variation Archive) within a data type (e.g. Genetic associations). We define the evidence score as:

s = F * S * C

where

s = score

F = frequency, the relative occurrence of a target-disease evidence

S = severity, the magnitude or strength of the effect described by the evidence

C = confidence, overall confidence for the observation that generates the target-disease evidence

The evidence score summarises the strength of the evidence and depends on factors that affect its relative strength. These factors are specific to the different data sources in the Platform.

Genetic associations

Data type	Data source	Description of scoring
Genetic associations	ClinVar (EVA)	Numeric score based on clinical significance assessment from ClinVar: "association not found" = 0.0, "benign" = 0.0, "not provided" = 0.0, "likely benign" = 0.0, "conflicting interpretations of pathogenicity" = 0.3, "other" = 0.3, "uncertain significance" = 0.3, "risk factor" = 0.5, "affects" = 0.5, "likely pathogenic" = 1, "association" = 1, "drug response" = 1, "protective" = 1, "pathogenic" = 1
Genetic associations	PheWAS Catalog	Product of: Normalised p-value Normalised sample size
Genetic associations	Gene2Phenotype	Numeric score based on confidence level that curators assign to the gene-disease evidence: "Confirmed" = 1, "Probable" = 0.5, "Possible" = 0.25, "Both RD and IF" = 1, "Child IF" = 1
Genetic associations	Genomics England PanelApp	Gene-disease associations are curated and crowdsourced by experts and all are given the highest score of 1
Genetic associations	Open Targets Genetics Portal	Locus 2 gene (L2G) score, filtered to use scores above 0.05
Genetic associations	UniProt literature	Curator inference score based on how strong the evidence for the gene's involvement in the disease is. 1 if the evidence is strong and 0.5 if evidence deemed not to be strong by the curator
Genetic associations	ClinGen	Gene-disease pairs are curated by experts using a standardised approach and controlled vocabulary that corresponds to specific evidence scores: "Definitive" = 1, "Strong" = 1, "Moderate" = 0.5, "Limited" = 0.01, "Disputed" = 0.01, "No Reported Evidence" = 0.01, "Conflicting Evidence" = 0.01

Somatic mutations

Data type

Data source

Description of scoring

Somatic mutations

Cancer Gene Census

Score based on tier of gene and frequency of mutations:

0.5 for Tier 2 genes
0.25, 0.5, 0.75 or 1 for Tier 1 genes: base score of 0.5 modified as follows:
- -0.25 if only 1 mutated sample
- +0.25 if gene mutated more frequently in particular disease compared to all other diseases
- +0.25 if mutations in a gene occur more frequently than in other genes of similar length in the same disease

Somatic mutations

ClinVar somatic (EVA)

Confidence of evidence-disease association - currently fixed value of 1

Somatic mutations

IntOGen

Normalised combined q-value of driver identification methods

Drugs

Data type

Data source

Description of scoring

Drugs

ChEMBL

Porduct of:

Clinical trials phase binned score: Phase 0 = 0.09, Phase I = 0.1, Phase II = 0.2, Phase III = 0.7, Phase IV = 1.0
Confidence of the gene being the target of the drug - currently fixed value of 1

Pathways and systems biology (previously known as Affected pathways)

Data type	Data source	Description of scoring
Pathways and systems biology	Reactome	Fixed value of 1 since association is inferred by a curator
Pathways and systems biology	Sysbio	Scoring depends on whether the original data contains or not a score: P-values and rank-based scores are normalised to the values in the 0.5-1 range If there is no score a fixed value of 0.75 is used
Pathways and systems biology	SLAPenrich	Scored according to Iorio F et al 2018, followed by quantifying, in large cohorts of cancer patients, the divergence of the total number of samples with genomic alterations in pathway from its expectation, accounting for mutational burdens and total exonic block lengths of genes in that pathway
Pathways and systems biology	PROGENy	Scored per sample and pathway following a modifications of the original implementation described by Schubert et al. 2016. Further details are available here.
Pathways and systems biology	Project Score (CRISPR)	Score based on the priority score divided by 100. The prioritiy score is described by Behan et al. 2019, which varies from 0 to 100 (any value above 40 is significant) and is available in supplementary table 6 in the publication

RNA expression

Data type

Data source

Description of scoring

RNA expression

Expression Atlas

Product of:

Normalised p-value
Normalised expression fold change
Normalised percentile rank

Text mining

Data type	Data source	Description of scoring
Text mining	EuropePMC	Score based on weighted document sections, sentence locations, and title for full text articles and abstracts as described in Kafkas et al., 2016

Animal models

Data type	Data source	Description of scoring
Animal models	PhenoDigm	Similarity score between a mouse model and a human disease described by Smedley et al 2013

Once we have the scores for each evidence, we calculate an overall score for a data source (e.g. Genomics England PanelApp) followed by a score for a data type (e.g. Genetic associations). In this step, we take into account that although multiple occurrences of evidence can suggest a strong association, the inclusion of further new evidence should not have a great impact on the overall score. For this reason, we calculate the sum of the harmonic progression of each score and adjust the contribution of each of them using a heuristic weight.

Throughout this process, the value of the score is always capped at 1, the highest association score.

To learn more about our approach to scoring, please see our latest publication, Open Targets Platform: supporting systematic drug–target identification and prioritisation.

{% hint style="info" %} Evidence fromPROGENy, SLAPenrich and Sysbio is down weighted by a factor of 2, whereas evidence fromExpression Atlas, PhenoDigm and Europe PMC is down weighted by a factor of 5.

Also, since our 18.12 release, we no longer apply a sigmoid scaling to the scores for target-disease associations, which was based on the number of hits per expression study (for RNA Expression) and on the number of targets per publication (for Text mining). {% endhint %}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scoring.md

scoring.md

Association score

Computing the Association Score

Genetic associations

Somatic mutations

Drugs

Pathways and systems biology (previously known as Affected pathways)

RNA expression

Text mining

Animal models

Files

scoring.md

Latest commit

History

scoring.md

File metadata and controls

Association score

Computing the Association Score

Genetic associations

Somatic mutations

Drugs

Pathways and systems biology (previously known as Affected pathways)

RNA expression

Text mining

Animal models