Add fulltext Lucene analyzers for faster search in text data #15

jarasch · 2020-07-20T14:17:43Z

We are preparing Cypher queries for users that want to query data either via Cypher (Neo4j-Browser) or Neo4j-Bloom.

Therefore we need to build text analyzers on the text properties on the following lables/properties:

Fragment.text
Paper.title
GeneSymbol.sid
Gene.name
Protein.name
PatentClaim.text
PatentTitle.text
PatentAbstract.text
Entity.name

jarasch · 2020-07-21T04:06:29Z

CALL db.index.fulltext.createNodeIndex("textOfPapersAndPatents",["Fragment", "Abstract", "Paper", "Patent", "PatentTitle", "PatentClaim","PatentAbstract"],["title", "text"])

jarasch · 2020-07-22T15:21:56Z

// Fulltext index on GeneSymbol where the gene name is stored in property sid
CALL db.index.fulltext.createNodeIndex("GeneSymbolFullTextIndex",["GeneSymbol"],["sid"])

jarasch · 2020-07-22T15:25:38Z

// Fulltext index on author names
CALL db.index.fulltext.createNodeIndex("AuthorFullTextIndex",["Author"],["first", "middle","last"])

jarasch · 2020-07-22T15:26:42Z

// Fulltext index on entity names like company names
CALL db.index.fulltext.createNodeIndex("EntityFullTextIndex",["Entity"],["name"])

motey · 2020-07-22T19:35:33Z

A dedicated loader to create all needed text indexes would make sense. this loader can be mounted into the motherlode pipeline.

Can one create text indexes on nodes that are not existing yet?

if yes we can collect all text indexes (including these from other loaders that are allready existing) at one place and create them at the beginning of the pipeline.

motey · 2020-08-05T10:18:03Z

https://github.com/covidgraph/graph-processing_fulltext-indexes

Will run this against DEV today

motey · 2020-08-05T11:35:37Z

Indexes are on DEV and PRD

jarasch assigned sarmbruster and mpreusse Jul 20, 2020

jarasch unassigned sarmbruster Jul 21, 2020

motey closed this as completed Aug 5, 2020

Jiros reopened this Nov 24, 2020

Jiros transferred this issue from covidgraph/documentation Dec 7, 2020

Jiros added Type: Data Source To identify an issue as a data source Status: Suggested This issue is a suggestion for doing something new or different in CovidGraph Type: Data Analysis To identify an issue as data analysis labels Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fulltext Lucene analyzers for faster search in text data #15

Add fulltext Lucene analyzers for faster search in text data #15

jarasch commented Jul 20, 2020

jarasch commented Jul 21, 2020

jarasch commented Jul 22, 2020

jarasch commented Jul 22, 2020

jarasch commented Jul 22, 2020

motey commented Jul 22, 2020

motey commented Aug 5, 2020

motey commented Aug 5, 2020

Add fulltext Lucene analyzers for faster search in text data #15

Add fulltext Lucene analyzers for faster search in text data #15

Comments

jarasch commented Jul 20, 2020

jarasch commented Jul 21, 2020

jarasch commented Jul 22, 2020

jarasch commented Jul 22, 2020

jarasch commented Jul 22, 2020

motey commented Jul 22, 2020

motey commented Aug 5, 2020

motey commented Aug 5, 2020