Skip to content

Latest commit

 

History

History
45 lines (34 loc) · 1.41 KB

README.md

File metadata and controls

45 lines (34 loc) · 1.41 KB

Raven

Raven is a cloud-native search engine database.

Implemented Features

  • Indexing
    • Inverted Index
  • Search
    • TF-IDF
    • BM25
  • Natural Language Processing
    • Subword Tokenization
    • Stopword Removal
    • Language Detection

Profiling and Tracing

Basically, this application uses net/http/pprof for profiling and tracing.

For visualizing the profiling and tracing, open http://localhost:6060/debug/pprof/ in your browser.

ToDos

  • Use bloomfilter for filtering the UNK tokens
  • Make search engine to be able to load & save to/from file system
    • Build index from reading and parsing raw text files
    • Save and load index and bloom filters to file
    • Add support for incremental indexing with flushing
  • Enhance the FTS features
    • Build fuzzy full-text search by using SuffixTree (or B-Tree)
    • Levenshtein Distance Spell Correction
    • Pseudo Relevance Feedback
  • Add support for vector index
    • HNSW
    • Flat

References