Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/target/members/count counts more data than /target/members/pages shows #394

Open
jakhag opened this issue Jul 18, 2017 · 6 comments
Open
Assignees
Milestone

Comments

@jakhag jakhag added this to the 2.2 milestone Jul 18, 2017
@ianwdunlop
Copy link
Member

Looking at the sparql for the pages query it seems that the members have no dcterms:title. However, they do have an rdfs:label. Maybe we should use that instead. But is it correct? When I changed it to rdfs:label I get the following (abridged to save space). You will notice that not all of the items have info attached. Is this to be expected? Or is there something else going on? It is possible that some data that is expected is also missing.

<items>
<item href="http://purl.uniprot.org/uniprot/E0TXE1"/>
<item href="http://purl.uniprot.org/uniprot/E1UV19"/>
<item href="http://purl.uniprot.org/uniprot/E3E2E2"/>
<item href="http://purl.uniprot.org/uniprot/E3UUE6"/>
<item href="http://purl.uniprot.org/uniprot/E7FHP1"/
<item href="http://purl.uniprot.org/uniprot/O14975">
  <prefLabel>Very long-chain acyl-CoA synthetase</prefLabel>
    <exactMatch href="http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL4326">
    <prefLabel>Fatty acid transport protein 2</prefLabel>
    <type href="http://rdf.ebi.ac.uk/terms/chembl#SingleProtein"/>
    <inDataset href="http://www.ebi.ac.uk/chembl"/>
    <target_organism>Homo sapiens</target_organism>
  </exactMatch>
  <inDataset href="http://purl.uniprot.org"/>
  <target_organism_uri href="http://purl.uniprot.org/taxonomy/9606"/>
</item
<item href="http://purl.uniprot.org/uniprot/O22898"/>
</items>

@ianwdunlop
Copy link
Member

Here is the sparql query below. I changed it to look for ?item dcterms:title|rdfs:label ?chembl_name ie either dcterms:title or rdfs:label. BTW this API call is one of those 2 part ones where it first finds all the items and then gets the properties in a different call. Not really sure is it needs those OPTIONAL blocks or not.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://www.semantic-systems-biology.org/ontology/rdf/OBO#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX chembl: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX goa: <http://www.semantic-systems-biology.org/ontology/rdf/GOA#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://www.semantic-systems-biology.org/ontology/rdf/OBO#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX chembl: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX goa: <http://www.semantic-systems-biology.org/ontology/rdf/GOA#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?item  WHERE {VALUES ?g { <http://purl.uniprot.org/enzyme/inference> <http://www.ebi.ac.uk/chembl/target/inference> <http://www.geneontology.org/inference> }
 VALUES ?node_uri { <http://purl.uniprot.org/enzyme/6.2.-.->  } GRAPH ?g {
 ?child_node rdfs:subClassOf ?node_uri.
 FILTER ( isURI(?child_node) )
}
{ ?item obo:C ?child_node .
?item uniprot:reviewed true }
UNION { ?item obo:F ?child_node .
?item uniprot:reviewed true }
UNION { ?item obo:P ?child_node .
?item uniprot:reviewed true }
UNION { ?item uniprot:enzyme|uniprot:domain/uniprot:enzyme|chembl:hasProteinClassification ?child_node }
VALUES ?g2 {<http://purl.uniprot.org> <http://www.ebi.ac.uk/chembl> <http://www.openphacts.org/goa> }
GRAPH ?g2 {
    ?item [] []
}
{ 
      ?item dcterms:title|rdfs:label ?chembl_name
FILTER (?chembl_name != '') 
  }
UNION { ?item goa:description ?uniprot_name
FILTER (?uniprot_name != '') }
OPTIONAL {
 {?mapping skos:relatedMatch/skos:exactMatch ?item }
 UNION { ?item skos:relatedMatch/skos:exactMatch ?mapping }
 MINUS { ?mapping a chembl:ProteinComplexGroup }
 { ?mapping goa:description ?mapping_name }
 UNION { ?mapping dcterms:title ?mapping_name }
 FILTER ( ?mapping_name != '' )
 { ?mapping uniprot:organism ?mapping_org_uri }
 UNION { ?mapping chembl:organismName ?mapping_org
 GRAPH <http://www.ebi.ac.uk/chembl> {
 ?mapping a ?mapping_type
 FILTER ( ?mapping_type != chembl:UniprotRef )
 }
 }
 BIND(IF(BOUND(?mapping_org), <http://www.ebi.ac.uk/chembl>, <http://purl.uniprot.org>) AS ?mapping_dataset)
}
OPTIONAL { ?item uniprot:organism ?uniprot_organism
 BIND (?item AS ?uniprot_target) }
OPTIONAL {
 GRAPH <http://www.ebi.ac.uk/chembl> {
 ?item a ?target_type
 }
}
OPTIONAL { ?item chembl:organismName ?chembl_organism
 BIND (?item AS ?chembl_target) }
 } ORDER BY ?item  LIMIT 500 OFFSET 500

@danidi
Copy link

danidi commented Jul 18, 2017

I would expect a prefLabel from Uniprot for each of the items, but not necessarily from ChEMBL.

@ianwdunlop
Copy link
Member

Takes about 0.6 seconds compared to 1.8 seconds if the optional are removed from the query.

@danidi
Copy link

danidi commented Jul 18, 2017

The organism and organism name are filters for the query. Does it make the query slower, even if no filter parameter is set by the user?

@ianwdunlop
Copy link
Member

Ok, thanks @danidi. So the optionals are for the filters. It does make it slower if no filters are set but no real way to avoid that without a bit of a code re-write. It's not massively slower though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants