Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fulltext Lucene analyzers for faster search in text data #15

Open
jarasch opened this issue Jul 20, 2020 · 7 comments
Open

Add fulltext Lucene analyzers for faster search in text data #15

jarasch opened this issue Jul 20, 2020 · 7 comments
Assignees
Labels
Status: Suggested This issue is a suggestion for doing something new or different in CovidGraph Type: Data Analysis To identify an issue as data analysis Type: Data Source To identify an issue as a data source

Comments

@jarasch
Copy link

jarasch commented Jul 20, 2020

We are preparing Cypher queries for users that want to query data either via Cypher (Neo4j-Browser) or Neo4j-Bloom.

Therefore we need to build text analyzers on the text properties on the following lables/properties:

  • Fragment.text
  • Paper.title
  • GeneSymbol.sid
  • Gene.name
  • Protein.name
  • PatentClaim.text
  • PatentTitle.text
  • PatentAbstract.text
  • Entity.name
@jarasch
Copy link
Author

jarasch commented Jul 21, 2020

CALL db.index.fulltext.createNodeIndex("textOfPapersAndPatents",["Fragment", "Abstract", "Paper", "Patent", "PatentTitle", "PatentClaim","PatentAbstract"],["title", "text"])

@jarasch
Copy link
Author

jarasch commented Jul 22, 2020

// Fulltext index on GeneSymbol where the gene name is stored in property sid
CALL db.index.fulltext.createNodeIndex("GeneSymbolFullTextIndex",["GeneSymbol"],["sid"])

@jarasch
Copy link
Author

jarasch commented Jul 22, 2020

// Fulltext index on author names
CALL db.index.fulltext.createNodeIndex("AuthorFullTextIndex",["Author"],["first", "middle","last"])

@jarasch
Copy link
Author

jarasch commented Jul 22, 2020

// Fulltext index on entity names like company names
CALL db.index.fulltext.createNodeIndex("EntityFullTextIndex",["Entity"],["name"])

@motey
Copy link
Member

motey commented Jul 22, 2020

A dedicated loader to create all needed text indexes would make sense. this loader can be mounted into the motherlode pipeline.

Can one create text indexes on nodes that are not existing yet?

if yes we can collect all text indexes (including these from other loaders that are allready existing) at one place and create them at the beginning of the pipeline.

@motey
Copy link
Member

motey commented Aug 5, 2020

https://github.com/covidgraph/graph-processing_fulltext-indexes

Will run this against DEV today

@motey
Copy link
Member

motey commented Aug 5, 2020

Indexes are on DEV and PRD

@motey motey closed this as completed Aug 5, 2020
@Jiros Jiros reopened this Nov 24, 2020
@Jiros Jiros transferred this issue from covidgraph/documentation Dec 7, 2020
@Jiros Jiros added Type: Data Source To identify an issue as a data source Status: Suggested This issue is a suggestion for doing something new or different in CovidGraph Type: Data Analysis To identify an issue as data analysis labels Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Suggested This issue is a suggestion for doing something new or different in CovidGraph Type: Data Analysis To identify an issue as data analysis Type: Data Source To identify an issue as a data source
Projects
None yet
Development

No branches or pull requests

5 participants