Skip to content

A simple CLI-based Boolean Retrieval Implementation that processes boolean and phrase queries using C++ and Python

Notifications You must be signed in to change notification settings

davinjason09/Simple-Boolean-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Boolean Retrieval Model

A simple CLI-based Boolean Retrieval Model that processes boolean, positional and phrase queries using C++ and Python. This model is capable of handling queries with the following operators:

  • AND / &
  • OR / | / /
  • NOT / ~
  • ( )

This code supports the following query types:

w1
w1 AND w2
w1 OR w2
NOT w1
w1 AND/OR NOT w2
w1 AND/OR (w2 AND/OR w3)

and any combination of the above query types. This implementation currently doesn't support phrase queries.

You can combine the operators from the available operators above. Currently, the Python implementation only supports AND, OR, and NOT operators, and not their symbol counterparts.

The operators are evaluated in the following order:

() > NOT > AND > OR

Usage

C++ Implementation

You can run the C++ implementation using the following commands:

g++ -O3 -mtune=native -march=native BooleanRetrieval.cpp InvertedIndex.cpp Main.cpp -o Main -pthread
./Main  # Linux
./Main.exe  # Windows

or you can use the CMake build system:

soon

Python Implementation

You can use the ipynb file to run the Python implementation.

Speed Comparison

Inverted Index Construction:

C++ Python
2.015 s 20.971 s

Query Retrieval:

Query retrieval times are dependent on the query and the number of documents in the collection. The C++ implementation is much more stable than the Python implementation, averaging around 65 ms per query.

About

A simple CLI-based Boolean Retrieval Implementation that processes boolean and phrase queries using C++ and Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published