Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial join area #1713

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
c7a1ea8
untested changes
Jonathan24680 Dec 17, 2024
a266201
continued development
Jonathan24680 Dec 18, 2024
2cc8504
added all areas to the dataset and some tests
Jonathan24680 Dec 23, 2024
d784513
make file_buffer_size in the test index be a parameter
Jonathan24680 Dec 28, 2024
f6e1647
bugfix area dataset
Jonathan24680 Dec 28, 2024
1a4ff8f
backup commit
Jonathan24680 Dec 30, 2024
e3a2644
area test with points and areas working
Jonathan24680 Dec 31, 2024
f38cb1d
moved some stuff from dev space into the project
Jonathan24680 Jan 2, 2025
4d71c4e
new tests
Jonathan24680 Jan 4, 2025
38c9585
first test works now for the areas as well
Jonathan24680 Jan 11, 2025
ddea6d3
all tests are now executed and passed
Jonathan24680 Jan 12, 2025
1902d44
clang format
Jonathan24680 Jan 12, 2025
31e7054
merge master
Jonathan24680 Jan 12, 2025
cba4324
backup with removed filebuffersize compiling but not testing
Jonathan24680 Jan 17, 2025
d5c934c
make buffer size of test index variable
Jonathan24680 Jan 17, 2025
c1dc963
clean up code
Jonathan24680 Jan 17, 2025
c5284ef
sonarqube
Jonathan24680 Jan 20, 2025
fc1a1fe
correct bug for function without inline in header file
Jonathan24680 Jan 21, 2025
ec75e89
arbitrary geometry types and areas dont need to be approximated as po…
Jonathan24680 Jan 25, 2025
4fd698c
PR feedback
Jonathan24680 Jan 31, 2025
173e646
PR feedback
Jonathan24680 Jan 31, 2025
7dd8156
backup
Jonathan24680 Feb 1, 2025
1c60bf9
PR feedback
Jonathan24680 Feb 1, 2025
3b8ea1a
spelling mistake
Jonathan24680 Feb 1, 2025
a121ef2
Sonarqube issues
Jonathan24680 Feb 4, 2025
1168f32
PR changes
Jonathan24680 Feb 4, 2025
ca7f08e
PR changes
Jonathan24680 Feb 4, 2025
9d2956e
PR changes
Jonathan24680 Feb 4, 2025
14032bd
PR changes
Jonathan24680 Feb 4, 2025
29171b5
Sonarqube issues
Jonathan24680 Feb 4, 2025
457f8d4
format check
Jonathan24680 Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/engine/SpatialJoin.h
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,10 @@ class SpatialJoin : public Operation {
return childRight_;
}

PreparedSpatialJoinParams onlyForTestingGetPrepareJoin() const {
return prepareJoin();
}

private:
// helper function to generate a variable to column map from `childRight_`
// that only contains the columns selected by `config_.payloadVariables_`
Expand Down
289 changes: 245 additions & 44 deletions src/engine/SpatialJoinAlgorithms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

#include <cmath>

#include "engine/ExportQueryExecutionTrees.h"
#include "engine/SpatialJoin.h"
#include "util/GeoSparqlHelpers.h"

Expand All @@ -37,19 +38,181 @@
: std::nullopt;
};

// ____________________________________________________________________________
std::string_view SpatialJoinAlgorithms::betweenQuotes(
std::string_view extractFrom) const {
size_t pos1 = extractFrom.find("\"", 0);
size_t pos2 = extractFrom.find("\"", pos1 + 1);
if (pos1 != std::string::npos && pos2 != std::string::npos) {
return extractFrom.substr(pos1 + 1, pos2 - pos1 - 1);
} else {
return extractFrom;
}

Check warning on line 50 in src/engine/SpatialJoinAlgorithms.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/SpatialJoinAlgorithms.cpp#L49-L50

Added lines #L49 - L50 were not covered by tests
}

std::optional<AnyGeometry> SpatialJoinAlgorithms::getAnyGeometry(
const IdTable* idtable, size_t row, size_t col) const {
auto printWarning = [alreadyWarned = false,
&spatialJoin = spatialJoin_]() mutable {
if (!alreadyWarned) {
std::string warning =
"The input to a spatial join contained at least one element, "
"that is not a Point, Linestring, Polygon, MultiPoint, "
"MultiLinestring or MultiPolygon geometry and is thus skipped. Note "
"that QLever currently only accepts those geometries for "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alreadyWarned doesn't work here, as it is reset to false for each call of the function.
The easiest way is to implement the lambda outside (you have a loop etc. where you call the getAnyGeometry repeatedly, and there you then do
auto bla = getAnyGeometry; if (!bla.has_value()) {printWarning();}...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have an alternative idea:
the outer function here (getAnyGeometry) gets a reference to a counter variable (the number of previously encountered invalid elements). This is then captured by reference with the lambda. and incremented. Then you only have to manage the counter outside this function and can even use it to print statistics (the number of invalid elements). But first do something that doesn't warn n times, as this might break the UI etc.

"the spatial joins";
AD_LOG_WARN << warning << std::endl;
alreadyWarned = true;
if (spatialJoin.has_value()) {
AD_CORRECTNESS_CHECK(spatialJoin.value() != nullptr);
spatialJoin.value()->addWarning(warning);
}
}
};

// unfortunately, the current implementation requires the fully materialized
// string. In the future this might get changed. When only the bounding box
// is needed, one could store it in an ID similar to GeoPoint (but with less
// precision), and then the full geometry would only need to be read, when
// the exact distance is wanted
std::string str(betweenQuotes(ExportQueryExecutionTrees::idToStringAndType(
Jonathan24680 marked this conversation as resolved.
Show resolved Hide resolved
qec_->getIndex(), idtable->at(row, col), {})
.value()
.first));
AnyGeometry geometry;
try {
bg::read_wkt(str, geometry);
} catch (...) {
printWarning();
return std::nullopt;
}
return geometry;
}

// ____________________________________________________________________________
double SpatialJoinAlgorithms::computeDist(const AnyGeometry& geometry1,
const AnyGeometry& geometry2) const {
return boost::apply_visitor(DistanceVisitor(), geometry1, geometry2) * 78.630;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the multiplication doing here, the magic constant should be explained.
Is the function name actually computeDistanceInMeters (you first compute it, and then do some magic multiplication to convert)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a very verbose explanation of this is in the header file. But you're right, the function should be called different, as it first gets the distance in degrees and then converts it. I called it computeDist, since that's what the other compute distance functions are called. Then it's consistent and depending on the parameters available, one can call one or the other computeDist function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification, I am much less confused now:)

};

// ____________________________________________________________________________
Point SpatialJoinAlgorithms::convertGeoPointToPoint(GeoPoint point) const {
return Point(point.getLng(), point.getLat());
};

Check warning on line 101 in src/engine/SpatialJoinAlgorithms.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/SpatialJoinAlgorithms.cpp#L99-L101

Added lines #L99 - L101 were not covered by tests

// ____________________________________________________________________________
Id SpatialJoinAlgorithms::computeDist(const rtreeEntry& geo1,
const rtreeEntry& geo2) const {
auto convertPoint = [&](const AnyGeometry& geometry,
const std::optional<Box>& bbox) {
Box areaBox;
areaBox =
bbox.value_or(boost::apply_visitor(BoundingBoxVisitor(), geometry));
Comment on lines +109 to +110
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That syntax looks nice, unfortunately it always (even if not needed) calls the apply_visitor, and in that unneeded case also copies the bounding box (the first thing is probably worse).
If boxes are rather cheap to copy, then you can take again the optional<Box> by value, and then do
if (bbox.has_value() { bbox = apply_visitor(...)}; const auto& areaBox = bbox.value(); ...

Point p = calculateMidpointOfBox(areaBox);
return GeoPoint(p.get<1>(), p.get<0>());
};
// use the already parsed geometries to calculate the distance
if (useMidpointForAreas_) {
auto point1 = geo1.geoPoint_;
auto point2 = geo2.geoPoint_;
if (!point1) {
point1 = convertPoint(geo1.geometry_.value(), geo1.boundingBox_);
}
if (!point2) {
point2 = convertPoint(geo2.geometry_.value(), geo2.boundingBox_);
}
return Id::makeFromDouble(
ad_utility::detail::wktDistImpl(point1.value(), point2.value()));
} else {
if (geo1.geoPoint_.has_value() && geo2.geoPoint_.has_value()) {
return Id::makeFromDouble(ad_utility::detail::wktDistImpl(
geo1.geoPoint_.value(), geo2.geoPoint_.value()));

Check warning on line 129 in src/engine/SpatialJoinAlgorithms.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/SpatialJoinAlgorithms.cpp#L128-L129

Added lines #L128 - L129 were not covered by tests
} else if (geo1.geometry_.has_value() && geo2.geometry_.has_value()) {
return Id::makeFromDouble(
computeDist(geo1.geometry_.value(), geo2.geometry_.value()));
} else {
// one point and one area
std::optional<AnyGeometry> geom1 = geo1.geometry_;
std::optional<AnyGeometry> geom2 = geo2.geometry_;

Check warning on line 136 in src/engine/SpatialJoinAlgorithms.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/SpatialJoinAlgorithms.cpp#L135-L136

Added lines #L135 - L136 were not covered by tests
Comment on lines +135 to +136
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copies the geometry. I think you can do const std::optional<AnyG>& geom1 = ... or even const auto& geom1 = ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now see what you are doing, and what you are trying to do, and my thing doesn't work.

what about

if (! geo1.geometry_.has_value() { geo1.geometry_ = geoPointToPoint(geo1.geoPoint_.value());

(same for geo2, then you alsways have the geometry_ value.
If you do this directly when reading those geometries from the IdTable, then the geometry_ member doesn't even have to be optional, and this code gets much simpler. Otherwise, if you do it here (if I am missing something obvious), then factor that code snippet into a separate function.

if (geo1.geoPoint_.has_value()) {
geom1 = convertGeoPointToPoint(geo1.geoPoint_.value());

Check warning on line 138 in src/engine/SpatialJoinAlgorithms.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/SpatialJoinAlgorithms.cpp#L138

Added line #L138 was not covered by tests
} else if (geo2.geoPoint_.has_value()) {
geom2 = convertGeoPointToPoint(geo2.geoPoint_.value());
}
return Id::makeFromDouble(computeDist(geom1.value(), geom2.value()));
}

Check warning on line 143 in src/engine/SpatialJoinAlgorithms.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/SpatialJoinAlgorithms.cpp#L140-L143

Added lines #L140 - L143 were not covered by tests
}
}

// ____________________________________________________________________________
Id SpatialJoinAlgorithms::computeDist(const IdTable* idTableLeft,
const IdTable* idTableRight,
size_t rowLeft, size_t rowRight,
ColumnIndex leftPointCol,
ColumnIndex rightPointCol) const {
auto point1 = getPoint(idTableLeft, rowLeft, leftPointCol);
auto point2 = getPoint(idTableRight, rowRight, rightPointCol);
if (!point1.has_value() || !point2.has_value()) {
return Id::makeUndefined();
auto getAreaOrPointGeometry = [&](const IdTable* idtable, size_t row,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One questions:

  1. In the current implementation, this function should never be used, as you have already had at least one of the geometries available because you had to build an index for them. Is that correct?

  2. If that is correct, The you can drop it. If it once becomes needed again, then because we have stored bounding boxes or something else , we in that case have to refactor these whole functions anyway, and currently this function consists of a lot of duplicated code anyway.

Please let me know if I am missing a point.

size_t col, std::optional<GeoPoint> point) {
std::optional<AnyGeometry> geometry;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier control flow:
if (point.has_value) { return point; } else { return getAnyGeometry(...)} Everything else then is not nested in the if`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'(with my suggestion for the AnyGeometry)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand what you mean

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (point.has_value()) {
  return std::optional{AnyGeometry{convertPoint...()}};
}
return std::optional{getAnyGeometry(...);

You can drop the std::optional if you explicitly set the return type of the lambda to std::optional<AnyGeometry>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course only relevant, if we still have this code at all.

if (!point) {
// nothing to do. When parsing a point or an area fails, a warning
// message gets printed at another place and the point/area just gets
// skipped
geometry = getAnyGeometry(idtable, row, col);
} else {
geometry = convertGeoPointToPoint(point.value());
}

Check warning on line 164 in src/engine/SpatialJoinAlgorithms.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/SpatialJoinAlgorithms.cpp#L163-L164

Added lines #L163 - L164 were not covered by tests

return geometry;
};

// for now we need to get the data from the disk, but in the future, this
// information will be stored in an ID, just like GeoPoint
auto getAreaPoint = [&](const IdTable* idtable, size_t row,
size_t col) -> std::optional<GeoPoint> {
std::optional<AnyGeometry> geometry = getAnyGeometry(idtable, row, col);
if (!geometry.has_value()) {
// nothing to do. When parsing a point or an area fails, a warning message
// gets printed at another place and the point/area just gets skipped
return std::nullopt;
}

Box areaBox = boost::apply_visitor(BoundingBoxVisitor(), geometry.value());

Point p = calculateMidpointOfBox(areaBox);
return GeoPoint(p.get<1>(), p.get<0>());
};

rtreeEntry entryLeft{rowLeft, std::nullopt, std::nullopt, std::nullopt};
rtreeEntry entryRight{rowRight, std::nullopt, std::nullopt, std::nullopt};
entryLeft.geoPoint_ = getPoint(idTableLeft, rowLeft, leftPointCol);
entryRight.geoPoint_ = getPoint(idTableRight, rowRight, rightPointCol);
if (entryLeft.geoPoint_ && entryRight.geoPoint_) {
return computeDist(entryLeft, entryRight);
} else if (useMidpointForAreas_) {
if (!entryLeft.geoPoint_) {
entryLeft.geoPoint_ = getAreaPoint(idTableLeft, rowLeft, leftPointCol);
}
if (!entryRight.geoPoint_) {
entryRight.geoPoint_ =
getAreaPoint(idTableRight, rowRight, rightPointCol);
}
if (entryLeft.geoPoint_ && entryRight.geoPoint_) {
return computeDist(entryLeft, entryRight);
} else {
return Id::makeUndefined();
}
} else {
entryLeft.geometry_ = getAreaOrPointGeometry(
idTableLeft, rowLeft, leftPointCol, entryLeft.geoPoint_);
entryRight.geometry_ = getAreaOrPointGeometry(
idTableRight, rowRight, rightPointCol, entryRight.geoPoint_);
if (entryLeft.geometry_ && entryRight.geometry_) {
return computeDist(entryLeft, entryRight);
} else {
return Id::makeUndefined();
}

Check warning on line 214 in src/engine/SpatialJoinAlgorithms.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/SpatialJoinAlgorithms.cpp#L213-L214

Added lines #L213 - L214 were not covered by tests
}
return Id::makeFromDouble(
ad_utility::detail::wktDistImpl(point1.value(), point2.value()));
}

// ____________________________________________________________________________
Expand Down Expand Up @@ -241,7 +404,7 @@

// ____________________________________________________________________________
std::vector<Box> SpatialJoinAlgorithms::computeBoundingBox(
const Point& startPoint) const {
const Point& startPoint, double additionalDist) const {
const auto [idTableLeft, resultLeft, idTableRight, resultRight, leftJoinCol,
rightJoinCol, rightSelectedCols, numColumns, maxDist,
maxResults] = params_;
Expand All @@ -254,8 +417,9 @@
auto archaversine = [](double theta) { return std::acos(1 - 2 * theta); };

// safety buffer for numerical inaccuracies
double maxDistInMetersBuffer;
if (maxDist.value() < 10) {
double maxDistInMetersBuffer =
static_cast<double>(maxDist.value()) + additionalDist;
if (maxDistInMetersBuffer < 10) {
maxDistInMetersBuffer = 10;
} else if (static_cast<double>(maxDist.value()) <
static_cast<double>(std::numeric_limits<long long>::max()) /
Expand Down Expand Up @@ -442,23 +606,49 @@
return std::array{northPoleReached, southPoleReached};
}

// ____________________________________________________________________________
Point SpatialJoinAlgorithms::calculateMidpointOfBox(const Box& box) const {
double lng = (box.min_corner().get<0>() + box.max_corner().get<0>()) / 2.0;
double lat = (box.min_corner().get<1>() + box.max_corner().get<1>()) / 2.0;
return Point(lng, lat);
}

// ____________________________________________________________________________
double SpatialJoinAlgorithms::getMaxDistFromMidpointToAnyPointInsideTheBox(
const Box& box, std::optional<Point> midpoint) const {
if (!midpoint) {
midpoint = calculateMidpointOfBox(box);
}
double distLng =
std::abs(box.min_corner().get<0>() - midpoint.value().get<0>());
double distLat =
std::abs(box.min_corner().get<1>() - midpoint.value().get<1>());
// convert to meters and return
return (distLng + distLat) * 40075000 / 360;
}

// ____________________________________________________________________________
Result SpatialJoinAlgorithms::BoundingBoxAlgorithm() {
auto printWarning = [alreadyWarned = false,
&spatialJoin = spatialJoin_]() mutable {
if (!alreadyWarned) {
std::string warning =
"The input to a spatial join contained at least one element, "
"that is not a point geometry and is thus skipped. Note that "
"QLever currently only accepts point geometries for the "
"spatial joins";
AD_LOG_WARN << warning << std::endl;
alreadyWarned = true;
if (spatialJoin.has_value()) {
AD_CORRECTNESS_CHECK(spatialJoin.value() != nullptr);
spatialJoin.value()->addWarning(warning);
auto getRtreeEntry = [&](const IdTable* idTable, const size_t row,
const ColumnIndex col) -> std::optional<rtreeEntry> {
rtreeEntry entry{row, std::nullopt, std::nullopt, std::nullopt};
entry.geoPoint_ = getPoint(idTable, row, col);

if (!entry.geoPoint_) {
entry.geometry_ = getAnyGeometry(idTable, row, col);
if (!entry.geometry_.has_value()) {
return std::nullopt;
}
entry.boundingBox_ =
boost::apply_visitor(BoundingBoxVisitor(), entry.geometry_.value());
} else {
entry.boundingBox_ =
Box(Point(entry.geoPoint_.value().getLng(),
entry.geoPoint_.value().getLat()),
Point(entry.geoPoint_.value().getLng() + 0.00000001,
entry.geoPoint_.value().getLat() + 0.00000001));
}
return entry;
};

const auto [idTableLeft, resultLeft, idTableRight, resultRight, leftJoinCol,
Expand All @@ -478,52 +668,63 @@
std::swap(smallerResJoinCol, otherResJoinCol);
}

// build rtree with one child
bgi::rtree<Value, bgi::quadratic<16>, bgi::indexable<Value>,
bgi::equal_to<Value>, ad_utility::AllocatorWithLimit<Value>>
rtree(bgi::quadratic<16>{}, bgi::indexable<Value>{},
bgi::equal_to<Value>{}, qec_->getAllocator());
for (size_t i = 0; i < smallerResult->numRows(); i++) {
// get point of row i
auto geopoint = getPoint(smallerResult, i, smallerResJoinCol);

if (!geopoint) {
printWarning();
// add every box together with the additional information into the rtree
std::optional<rtreeEntry> entry =
getRtreeEntry(smallerResult, i, smallerResJoinCol);
if (!entry) {
// nothing to do. When parsing a point or an area fails, a warning
// message gets printed at another place and the point/area just gets
// skipped
continue;
}

Point p(geopoint.value().getLng(), geopoint.value().getLat());

// add every point together with the row number into the rtree
rtree.insert(std::make_pair(std::move(p), i));
rtree.insert(std::make_pair(entry.value().boundingBox_.value(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rtree.insert(std::make_pair(entry.value().boundingBox_.value(),
rtree.insert(std::pair(entry.value().boundingBox_.value(),

technically with the function call, the second argument can be evaluated first, and then you can have a moved out value etc...
(an oddity of the C-standard, the pair{...} constructor directly doesn't have this issue.

std::move(entry.value())));
}

// query rtree with the other child
std::vector<Value, ad_utility::AllocatorWithLimit<Value>> results{
qec_->getAllocator()};
for (size_t i = 0; i < otherResult->numRows(); i++) {
auto geopoint1 = getPoint(otherResult, i, otherResJoinCol);

if (!geopoint1) {
printWarning();
std::optional<rtreeEntry> entry =
getRtreeEntry(otherResult, i, otherResJoinCol);
if (!entry) {
// nothing to do. When parsing a point or an area fails, a warning
// message gets printed at another place and the point/area just gets
// skipped
continue;
}
std::vector<Box> queryBox;
if (!entry.value().geoPoint_) {
auto midpoint =
calculateMidpointOfBox(entry.value().boundingBox_.value());
queryBox = computeBoundingBox(
midpoint, getMaxDistFromMidpointToAnyPointInsideTheBox(
entry.value().boundingBox_.value(), midpoint));
Comment on lines +704 to +708
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it always uses a midpoint approximation, even if the midpoint setting is not activated, why don't we use the exact bounding box?
If this case is handled by one of the nested functions called here, then this should be noted in a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe I recall that you use your padding of the midpoint, s.t. the create box is always correct...

} else {
queryBox =
computeBoundingBox(Point(entry.value().geoPoint_.value().getLng(),
entry.value().geoPoint_.value().getLat()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing (get the vector for a query with the added distance) should also be a separate function.

}

Point p(geopoint1.value().getLng(), geopoint1.value().getLat());

// query the other rtree for every point using the following bounding box
std::vector<Box> bbox = computeBoundingBox(p);
results.clear();

ql::ranges::for_each(bbox, [&](const Box& bbox) {
ql::ranges::for_each(queryBox, [&](const Box& bbox) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think technically you might get the same result multiple times from different query boxes, so if you have multiple query boxes, you also have to deduplicate.
  2. Are your results (including the possibly long and large geometries) copied here? As I have suggested earlier, you probably shouldn't store the complete geometries in the rtree, but only pointers or indices or something, s.t. they don't get copied the whole time.

rtree.query(bgi::intersects(bbox), std::back_inserter(results));
});

ql::ranges::for_each(results, [&](const Value& res) {
size_t rowLeft = res.second;
size_t rowLeft = res.second.row_;
size_t rowRight = i;
if (!leftResSmaller) {
std::swap(rowLeft, rowRight);
}
auto distance = computeDist(idTableLeft, idTableRight, rowLeft, rowRight,
leftJoinCol, rightJoinCol);
auto distance = computeDist(res.second, entry.value());
AD_CORRECTNESS_CHECK(distance.getDatatype() == Datatype::Double);
if (distance.getDouble() * 1000 <= static_cast<double>(maxDist.value())) {
addResultTableEntry(&result, idTableLeft, idTableRight, rowLeft,
Expand Down
Loading
Loading