Skip to content

Commit

Permalink
Moves distance metrics into their own namespace, liblevenshtein::dist…
Browse files Browse the repository at this point in the history
…ance, to promote their independence from the core library; removes workaround for absl linking for protobuf; updates the README.md
  • Loading branch information
dylon committed Feb 14, 2024
1 parent 6b0118e commit a94d962
Show file tree
Hide file tree
Showing 20 changed files with 168 additions and 118 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ set(CMAKE_VERBOSE_MAKEFILE ON)

include(GNUInstallDirs)

option(BUILD_BASELINE_METRICS "Builds baseline distance metrics for validation" ON)
option(BUILD_BASELINE_METRICS "Builds baseline distance metrics for validation" OFF)
option(BUILD_TESTS "Build liblevenshtein testing suite" OFF)
option(ENABLE_TEST_COVERAGE "Generate test coverage report" OFF)
option(ENABLE_LINTING "Enables the source code linter" OFF)
Expand Down
116 changes: 85 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,22 @@
A library for generating Finite State Transducers based on Levenshtein Automata.

NOTE: This library is currently in rc phase. I'll have it production ready as
soon as possible. Currently, the top-level components have >90% test coverage
and the library is usable as described below.
soon as possible. Currently, there is >90% test coverage over the sources and
the library is usable as described below.

To make my life easier, this library takes advantage of C++20 features. If you
need compatibility with an older standard, please either submit a pull request
or create an issue stating the standard you need compatibility with and I'll get
around to adding its support when I get time.
Due to limited resources on my part, this library requires C++20 features (or
whichever is the latest standard). If you need compatibility with an older
standard, please either submit a pull request (preferably) or create an issue
stating the standard you need compatibility with and I will comply if I can.

For a demonstration, please reference the [example app](example/).

## Initialization

To ease dependency management during development,
[Anaconda](https://www.anaconda.com/) is used. If you do not have a working
installation, I recommend the
[Anaconda](https://www.anaconda.com/) is used but should not be required if you
have the necessary libraries installed. If you do not have a working
[Anaconda](https://www.anaconda.com/) installation, I recommend the
[Mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html)
variant:

Expand All @@ -40,6 +41,8 @@ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge
bash Miniforge3-MacOSX-arm64.sh -b
```

TODO: Add instructions for Windows.

Initialize the `base` environment:

```bash
Expand Down Expand Up @@ -74,7 +77,7 @@ conda activate ll-cpp
```shell
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=Debug -D CMAKE_INSTALL_PREFIX=/usr/local ..
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX ..
make
make install
```
Expand Down Expand Up @@ -135,36 +138,36 @@ ${CMAKE_INSTALL_PREFIX}
11 directories, 37 files
```

### Disabling tests
### Enabling tests

If you want to build the library without tests, use the same instructions but
add the CMake option `BUILD_TESTS=OFF`, as described below:
If you want to build the library with tests, use the same instructions but
add the CMake option `BUILD_TESTS=ON`, as described below:

```shell
# ...
cmake -D CMAKE_BUILD_TYPE=Debug \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D BUILD_TESTS=OFF \
cmake -DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
-DBUILD_TESTS=ON \
..
# ...
```

### Disabling baseline metrics
### Enabling baseline metrics

If you want to disable the baseline metrics used for validation, you need to
disable both tests and the metrics. If you disable the metrics but enable tests
then they will be built anyway because they are required for the tests.
If you want to enable the baseline metrics for validation, you must pass
`-DBUILD_BASELINE_METRICS=ON` to CMake:

```shell
# ...
cmake -D CMAKE_BUILD_TYPE=Debug \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D BUILD_BASELINE_METRICS=OFF \
-D BUILD_TESTS=OFF \
cmake -DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
-DBUILD_BASELINE_METRICS=ON \
..
# ...
```

The baseline metrics are intended for validation of the search results but might
be useful if you need to compute edit distances among individual pairs of terms.

NOTE: The baseline metrics are required for the tests and will be implicitly
enabled for them if the baseline metrics are not explicitly enabled.

## Usage

### Algorithms
Expand Down Expand Up @@ -219,7 +222,47 @@ operation is an edit operation that errs in a penalty of 1 unit.

### Example

```cmake
# file: CMakeLists.txt
cmake_minimum_required(VERSION 3.20 FATAL_ERROR)
project(liblevenshtein-demo
VERSION 1.0.0
DESCRIPTION "Demonstrates how to use liblevenshtein-cpp."
HOMEPAGE_URL "https://github.com/universal-automata/liblevenshtein-cpp"
LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
SET(CMAKE_CXX_FLAGS_DEBUG "-g -O0")
SET(CMAKE_C_FLAGS_DEBUG "-g -O0")
set(CMAKE_COMPILE_WARNING_AS_ERROR ON)
set(CMAKE_VERBOSE_MAKEFILE ON)
include(GNUInstallDirs)
find_package(Protobuf REQUIRED)
find_package(liblevenshtein REQUIRED)
add_executable(${PROJECT_NAME}
"command_line.cpp"
"main.cpp")
target_link_libraries(${PROJECT_NAME}
PRIVATE
protobuf::libprotobuf
levenshtein)
```

```c++
// file: main.cpp

#include <algorithm>
#include <cstddef>
#include <string>
Expand Down Expand Up @@ -289,7 +332,7 @@ int main(int argc, char *argv[]) {
*/

// save the dictionary for reuse
serialize_protobuf(dawg, serialization_path);
ll::serialize_protobuf(dawg, serialization_path);

delete dawg;

Expand All @@ -308,6 +351,17 @@ int main(int argc, char *argv[]) {
```

### Dependencies
1. [Google Test](https://github.com/google/googletest)
2. [RapidCheck](https://github.com/emil-e/rapidcheck)
3. [yaml-cpp](https://github.com/jbeder/yaml-cpp)
1. [CMake](https://cmake.org/)
2. [Make](https://www.gnu.org/software/make/)
3. C++ Compiler
- Linux
- [g++](https://gcc.gnu.org/)
- [clang++](https://clang.llvm.org/)
- MacOS
- [clang++](https://clang.llvm.org/)
- Windows
- [vc++](https://visualstudio.microsoft.com/)
4. [Protocol Buffers](https://protobuf.dev/)
5. [Google Test](https://github.com/google/googletest)
6. [RapidCheck](https://github.com/emil-e/rapidcheck)
7. [yaml-cpp](https://github.com/jbeder/yaml-cpp)
2 changes: 0 additions & 2 deletions example/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ set(CMAKE_VERBOSE_MAKEFILE ON)

include(GNUInstallDirs)

find_package(absl REQUIRED) # workaround for protobuf linking bug
find_package(Protobuf REQUIRED)
find_package(liblevenshtein REQUIRED)

Expand All @@ -29,6 +28,5 @@ add_executable(${PROJECT_NAME}

target_link_libraries(${PROJECT_NAME}
PRIVATE
absl::log_internal_check_op # workaround for protobuf linking bug
protobuf::libprotobuf
levenshtein)
10 changes: 2 additions & 8 deletions proto/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,14 @@ target_sources(levenshtein

find_package(Protobuf REQUIRED)

find_package(absl REQUIRED) # workaround for protobuf linking bug

target_include_directories(levenshtein PUBLIC
"${Protobuf_INCLUDE_DIRS}"
$<BUILD_INTERFACE:${CMAKE_BINARY_DIR}/generated>
$<INSTALL_INTERFACE:include>
)
$<INSTALL_INTERFACE:include>)

target_link_libraries(levenshtein
PUBLIC
protobuf::libprotobuf
PRIVATE
absl::log_internal_check_op # workaround for protobuf linking bug
)
protobuf::libprotobuf)

protobuf_generate(
TARGET levenshtein
Expand Down
2 changes: 1 addition & 1 deletion src/liblevenshtein/distance/distance.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

#include <string>

namespace liblevenshtein {
namespace liblevenshtein::distance {

class Distance {
public:
Expand Down
5 changes: 2 additions & 3 deletions src/liblevenshtein/distance/memoized_distance.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@

using namespace std::literals;


namespace liblevenshtein {
namespace liblevenshtein::distance {

auto MemoizedDistance::operator()(const std::string &v, const std::string &w)
-> std::size_t {
Expand Down Expand Up @@ -36,4 +35,4 @@ auto MemoizedDistance::f(const std::string &u, std::size_t const t) -> std::stri
return "";
}

} // namespace liblevenshtein
} // namespace liblevenshtein::distance
28 changes: 14 additions & 14 deletions src/liblevenshtein/distance/memoized_distance.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,23 @@
#include "liblevenshtein/distance/distance.h"
#include "liblevenshtein/distance/symmetric_pair.h"

namespace liblevenshtein {
namespace liblevenshtein::distance {

class MemoizedDistance : public Distance {
public:
auto operator()(const std::string &v, const std::string &w) -> std::size_t override;
class MemoizedDistance : public Distance {
public:
auto operator()(const std::string &v, const std::string &w) -> std::size_t override;

protected:
auto get(const SymmetricPair &key, std::size_t &distance) -> bool;
auto set(const SymmetricPair &key, const std::size_t &distance)
-> std::size_t;
static auto f(const std::string &u, std::size_t t) -> std::string;
protected:
auto get(const SymmetricPair &key, std::size_t &distance) -> bool;
auto set(const SymmetricPair &key, const std::size_t &distance)
-> std::size_t;
static auto f(const std::string &u, std::size_t t) -> std::string;

private:
std::unordered_map<SymmetricPair, std::size_t> memo;
mutable std::shared_mutex mutex;
};
private:
std::unordered_map<SymmetricPair, std::size_t> memo;
mutable std::shared_mutex mutex;
};

} // namespace liblevenshtein
} // namespace liblevenshtein::distance

#endif // LIBLEVENSHTEIN_DISTANCE_MEMOIZED_DISTANCE_H
4 changes: 2 additions & 2 deletions src/liblevenshtein/distance/merge_and_split_distance.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#include "liblevenshtein/distance/merge_and_split_distance.h"
#include "liblevenshtein/distance/symmetric_pair.h"

namespace liblevenshtein {
namespace liblevenshtein::distance {

auto MergeAndSplitDistance::between(std::string v, std::string w)
-> std::size_t {
Expand Down Expand Up @@ -95,4 +95,4 @@ auto MergeAndSplitDistance::between(std::string v, std::string w)
return set(key, 1 + min_distance);
}

} // namespace liblevenshtein
} // namespace liblevenshtein::distance
12 changes: 6 additions & 6 deletions src/liblevenshtein/distance/merge_and_split_distance.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@

#include "liblevenshtein/distance/memoized_distance.h"

namespace liblevenshtein {
namespace liblevenshtein::distance {

class MergeAndSplitDistance : public MemoizedDistance {
public:
auto between(std::string v, std::string w) -> std::size_t override;
};
class MergeAndSplitDistance : public MemoizedDistance {
public:
auto between(std::string v, std::string w) -> std::size_t override;
};

} // namespace liblevenshtein
} // namespace liblevenshtein::distance

#endif // LIBLEVENSHTEIN_DISTANCE_MERGE_AND_SPLIT_DISTANCE_H
4 changes: 2 additions & 2 deletions src/liblevenshtein/distance/standard_distance.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#include "liblevenshtein/distance/standard_distance.h"
#include "liblevenshtein/distance/symmetric_pair.h"

namespace liblevenshtein {
namespace liblevenshtein::distance {

auto StandardDistance::between(std::string v, std::string w) -> std::size_t {
const SymmetricPair key(v, w);
Expand Down Expand Up @@ -64,4 +64,4 @@ auto StandardDistance::between(std::string v, std::string w) -> std::size_t {
return set(key, 1 + min_distance);
}

} // namespace liblevenshtein
} // namespace liblevenshtein::distance
12 changes: 6 additions & 6 deletions src/liblevenshtein/distance/standard_distance.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@

#include "liblevenshtein/distance/memoized_distance.h"

namespace liblevenshtein {
namespace liblevenshtein::distance {

class StandardDistance : public MemoizedDistance {
public:
auto between(std::string v, std::string w) -> std::size_t override;
};
class StandardDistance : public MemoizedDistance {
public:
auto between(std::string v, std::string w) -> std::size_t override;
};

} // namespace liblevenshtein
} // namespace liblevenshtein::distance

#endif // LIBLEVENSHTEIN_DISTANCE_STANDARD_DISTANCE_H
8 changes: 4 additions & 4 deletions src/liblevenshtein/distance/symmetric_pair.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

#include "liblevenshtein/distance/symmetric_pair.h"

namespace liblevenshtein {
namespace liblevenshtein::distance {

SymmetricPair::SymmetricPair(const std::string &first, const std::string &second) {
if (first.compare(second) < 0) {
Expand All @@ -30,10 +30,10 @@ auto operator<<(std::ostream &out, const SymmetricPair &pair)
return out;
}

} // namespace liblevenshtein
} // namespace liblevenshtein::distance

auto std::hash<liblevenshtein::SymmetricPair>::operator()(
const liblevenshtein::SymmetricPair &pair) const -> std::size_t {
auto std::hash<liblevenshtein::distance::SymmetricPair>::operator()(
const liblevenshtein::distance::SymmetricPair &pair) const -> std::size_t {
std::uint64_t hash_code = 0xDEADBEEF;
hash_code = MurmurHash64A(pair.first.c_str(), (int) pair.first.length(), hash_code);
return MurmurHash64A(pair.second.c_str(), (int) pair.second.length(), hash_code);
Expand Down
Loading

0 comments on commit a94d962

Please sign in to comment.