-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spatial join area #1713
base: master
Are you sure you want to change the base?
Spatial join area #1713
Changes from all commits
c7a1ea8
a266201
2cc8504
d784513
f6e1647
1a4ff8f
e3a2644
f38cb1d
4d71c4e
38c9585
ddea6d3
1902d44
31e7054
cba4324
d5c934c
c1dc963
c5284ef
fc1a1fe
ec75e89
4fd698c
173e646
7dd8156
1c60bf9
3b8ea1a
a121ef2
1168f32
ca7f08e
9d2956e
14032bd
29171b5
457f8d4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -13,6 +13,7 @@ | |||||
|
||||||
#include <cmath> | ||||||
|
||||||
#include "engine/ExportQueryExecutionTrees.h" | ||||||
#include "engine/SpatialJoin.h" | ||||||
#include "util/GeoSparqlHelpers.h" | ||||||
|
||||||
|
@@ -37,19 +38,181 @@ | |||||
: std::nullopt; | ||||||
}; | ||||||
|
||||||
// ____________________________________________________________________________ | ||||||
std::string_view SpatialJoinAlgorithms::betweenQuotes( | ||||||
std::string_view extractFrom) const { | ||||||
size_t pos1 = extractFrom.find("\"", 0); | ||||||
size_t pos2 = extractFrom.find("\"", pos1 + 1); | ||||||
if (pos1 != std::string::npos && pos2 != std::string::npos) { | ||||||
return extractFrom.substr(pos1 + 1, pos2 - pos1 - 1); | ||||||
} else { | ||||||
return extractFrom; | ||||||
} | ||||||
} | ||||||
|
||||||
std::optional<AnyGeometry> SpatialJoinAlgorithms::getAnyGeometry( | ||||||
const IdTable* idtable, size_t row, size_t col) const { | ||||||
auto printWarning = [alreadyWarned = false, | ||||||
&spatialJoin = spatialJoin_]() mutable { | ||||||
if (!alreadyWarned) { | ||||||
std::string warning = | ||||||
"The input to a spatial join contained at least one element, " | ||||||
"that is not a Point, Linestring, Polygon, MultiPoint, " | ||||||
"MultiLinestring or MultiPolygon geometry and is thus skipped. Note " | ||||||
"that QLever currently only accepts those geometries for " | ||||||
"the spatial joins"; | ||||||
AD_LOG_WARN << warning << std::endl; | ||||||
alreadyWarned = true; | ||||||
if (spatialJoin.has_value()) { | ||||||
AD_CORRECTNESS_CHECK(spatialJoin.value() != nullptr); | ||||||
spatialJoin.value()->addWarning(warning); | ||||||
} | ||||||
} | ||||||
}; | ||||||
|
||||||
// unfortunately, the current implementation requires the fully materialized | ||||||
// string. In the future this might get changed. When only the bounding box | ||||||
// is needed, one could store it in an ID similar to GeoPoint (but with less | ||||||
// precision), and then the full geometry would only need to be read, when | ||||||
// the exact distance is wanted | ||||||
std::string str(betweenQuotes(ExportQueryExecutionTrees::idToStringAndType( | ||||||
Jonathan24680 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
qec_->getIndex(), idtable->at(row, col), {}) | ||||||
.value() | ||||||
.first)); | ||||||
AnyGeometry geometry; | ||||||
try { | ||||||
bg::read_wkt(str, geometry); | ||||||
} catch (...) { | ||||||
printWarning(); | ||||||
return std::nullopt; | ||||||
} | ||||||
return geometry; | ||||||
} | ||||||
|
||||||
// ____________________________________________________________________________ | ||||||
double SpatialJoinAlgorithms::computeDist(const AnyGeometry& geometry1, | ||||||
const AnyGeometry& geometry2) const { | ||||||
return boost::apply_visitor(DistanceVisitor(), geometry1, geometry2) * 78.630; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What exactly is the multiplication doing here, the magic constant should be explained. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a very verbose explanation of this is in the header file. But you're right, the function should be called different, as it first gets the distance in degrees and then converts it. I called it computeDist, since that's what the other compute distance functions are called. Then it's consistent and depending on the parameters available, one can call one or the other computeDist function There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the clarification, I am much less confused now:) |
||||||
}; | ||||||
|
||||||
// ____________________________________________________________________________ | ||||||
Point SpatialJoinAlgorithms::convertGeoPointToPoint(GeoPoint point) const { | ||||||
return Point(point.getLng(), point.getLat()); | ||||||
}; | ||||||
|
||||||
// ____________________________________________________________________________ | ||||||
Id SpatialJoinAlgorithms::computeDist(const rtreeEntry& geo1, | ||||||
const rtreeEntry& geo2) const { | ||||||
auto convertPoint = [&](const AnyGeometry& geometry, | ||||||
const std::optional<Box>& bbox) { | ||||||
Box areaBox; | ||||||
areaBox = | ||||||
bbox.value_or(boost::apply_visitor(BoundingBoxVisitor(), geometry)); | ||||||
Comment on lines
+109
to
+110
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That syntax looks nice, unfortunately it always (even if not needed) calls the |
||||||
Point p = calculateMidpointOfBox(areaBox); | ||||||
return GeoPoint(p.get<1>(), p.get<0>()); | ||||||
}; | ||||||
// use the already parsed geometries to calculate the distance | ||||||
if (useMidpointForAreas_) { | ||||||
auto point1 = geo1.geoPoint_; | ||||||
auto point2 = geo2.geoPoint_; | ||||||
if (!point1) { | ||||||
point1 = convertPoint(geo1.geometry_.value(), geo1.boundingBox_); | ||||||
} | ||||||
if (!point2) { | ||||||
point2 = convertPoint(geo2.geometry_.value(), geo2.boundingBox_); | ||||||
} | ||||||
return Id::makeFromDouble( | ||||||
ad_utility::detail::wktDistImpl(point1.value(), point2.value())); | ||||||
} else { | ||||||
if (geo1.geoPoint_.has_value() && geo2.geoPoint_.has_value()) { | ||||||
return Id::makeFromDouble(ad_utility::detail::wktDistImpl( | ||||||
geo1.geoPoint_.value(), geo2.geoPoint_.value())); | ||||||
} else if (geo1.geometry_.has_value() && geo2.geometry_.has_value()) { | ||||||
return Id::makeFromDouble( | ||||||
computeDist(geo1.geometry_.value(), geo2.geometry_.value())); | ||||||
} else { | ||||||
// one point and one area | ||||||
std::optional<AnyGeometry> geom1 = geo1.geometry_; | ||||||
std::optional<AnyGeometry> geom2 = geo2.geometry_; | ||||||
Comment on lines
+135
to
+136
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This copies the geometry. I think you can do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I now see what you are doing, and what you are trying to do, and my thing doesn't work. what about
(same for |
||||||
if (geo1.geoPoint_.has_value()) { | ||||||
geom1 = convertGeoPointToPoint(geo1.geoPoint_.value()); | ||||||
} else if (geo2.geoPoint_.has_value()) { | ||||||
geom2 = convertGeoPointToPoint(geo2.geoPoint_.value()); | ||||||
} | ||||||
return Id::makeFromDouble(computeDist(geom1.value(), geom2.value())); | ||||||
} | ||||||
} | ||||||
} | ||||||
|
||||||
// ____________________________________________________________________________ | ||||||
Id SpatialJoinAlgorithms::computeDist(const IdTable* idTableLeft, | ||||||
const IdTable* idTableRight, | ||||||
size_t rowLeft, size_t rowRight, | ||||||
ColumnIndex leftPointCol, | ||||||
ColumnIndex rightPointCol) const { | ||||||
auto point1 = getPoint(idTableLeft, rowLeft, leftPointCol); | ||||||
auto point2 = getPoint(idTableRight, rowRight, rightPointCol); | ||||||
if (!point1.has_value() || !point2.has_value()) { | ||||||
return Id::makeUndefined(); | ||||||
auto getAreaOrPointGeometry = [&](const IdTable* idtable, size_t row, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One questions:
Please let me know if I am missing a point. |
||||||
size_t col, std::optional<GeoPoint> point) { | ||||||
std::optional<AnyGeometry> geometry; | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Easier control flow: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. '(with my suggestion for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i don't understand what you mean There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You can drop the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Of course only relevant, if we still have this code at all. |
||||||
if (!point) { | ||||||
// nothing to do. When parsing a point or an area fails, a warning | ||||||
// message gets printed at another place and the point/area just gets | ||||||
// skipped | ||||||
geometry = getAnyGeometry(idtable, row, col); | ||||||
} else { | ||||||
geometry = convertGeoPointToPoint(point.value()); | ||||||
} | ||||||
|
||||||
return geometry; | ||||||
}; | ||||||
|
||||||
// for now we need to get the data from the disk, but in the future, this | ||||||
// information will be stored in an ID, just like GeoPoint | ||||||
auto getAreaPoint = [&](const IdTable* idtable, size_t row, | ||||||
size_t col) -> std::optional<GeoPoint> { | ||||||
std::optional<AnyGeometry> geometry = getAnyGeometry(idtable, row, col); | ||||||
if (!geometry.has_value()) { | ||||||
// nothing to do. When parsing a point or an area fails, a warning message | ||||||
// gets printed at another place and the point/area just gets skipped | ||||||
return std::nullopt; | ||||||
} | ||||||
|
||||||
Box areaBox = boost::apply_visitor(BoundingBoxVisitor(), geometry.value()); | ||||||
|
||||||
Point p = calculateMidpointOfBox(areaBox); | ||||||
return GeoPoint(p.get<1>(), p.get<0>()); | ||||||
}; | ||||||
|
||||||
rtreeEntry entryLeft{rowLeft, std::nullopt, std::nullopt, std::nullopt}; | ||||||
rtreeEntry entryRight{rowRight, std::nullopt, std::nullopt, std::nullopt}; | ||||||
entryLeft.geoPoint_ = getPoint(idTableLeft, rowLeft, leftPointCol); | ||||||
entryRight.geoPoint_ = getPoint(idTableRight, rowRight, rightPointCol); | ||||||
if (entryLeft.geoPoint_ && entryRight.geoPoint_) { | ||||||
return computeDist(entryLeft, entryRight); | ||||||
} else if (useMidpointForAreas_) { | ||||||
if (!entryLeft.geoPoint_) { | ||||||
entryLeft.geoPoint_ = getAreaPoint(idTableLeft, rowLeft, leftPointCol); | ||||||
} | ||||||
if (!entryRight.geoPoint_) { | ||||||
entryRight.geoPoint_ = | ||||||
getAreaPoint(idTableRight, rowRight, rightPointCol); | ||||||
} | ||||||
if (entryLeft.geoPoint_ && entryRight.geoPoint_) { | ||||||
return computeDist(entryLeft, entryRight); | ||||||
} else { | ||||||
return Id::makeUndefined(); | ||||||
} | ||||||
} else { | ||||||
entryLeft.geometry_ = getAreaOrPointGeometry( | ||||||
idTableLeft, rowLeft, leftPointCol, entryLeft.geoPoint_); | ||||||
entryRight.geometry_ = getAreaOrPointGeometry( | ||||||
idTableRight, rowRight, rightPointCol, entryRight.geoPoint_); | ||||||
if (entryLeft.geometry_ && entryRight.geometry_) { | ||||||
return computeDist(entryLeft, entryRight); | ||||||
} else { | ||||||
return Id::makeUndefined(); | ||||||
} | ||||||
} | ||||||
return Id::makeFromDouble( | ||||||
ad_utility::detail::wktDistImpl(point1.value(), point2.value())); | ||||||
} | ||||||
|
||||||
// ____________________________________________________________________________ | ||||||
|
@@ -241,7 +404,7 @@ | |||||
|
||||||
// ____________________________________________________________________________ | ||||||
std::vector<Box> SpatialJoinAlgorithms::computeBoundingBox( | ||||||
const Point& startPoint) const { | ||||||
const Point& startPoint, double additionalDist) const { | ||||||
const auto [idTableLeft, resultLeft, idTableRight, resultRight, leftJoinCol, | ||||||
rightJoinCol, rightSelectedCols, numColumns, maxDist, | ||||||
maxResults] = params_; | ||||||
|
@@ -254,8 +417,9 @@ | |||||
auto archaversine = [](double theta) { return std::acos(1 - 2 * theta); }; | ||||||
|
||||||
// safety buffer for numerical inaccuracies | ||||||
double maxDistInMetersBuffer; | ||||||
if (maxDist.value() < 10) { | ||||||
double maxDistInMetersBuffer = | ||||||
static_cast<double>(maxDist.value()) + additionalDist; | ||||||
if (maxDistInMetersBuffer < 10) { | ||||||
maxDistInMetersBuffer = 10; | ||||||
} else if (static_cast<double>(maxDist.value()) < | ||||||
static_cast<double>(std::numeric_limits<long long>::max()) / | ||||||
|
@@ -442,23 +606,49 @@ | |||||
return std::array{northPoleReached, southPoleReached}; | ||||||
} | ||||||
|
||||||
// ____________________________________________________________________________ | ||||||
Point SpatialJoinAlgorithms::calculateMidpointOfBox(const Box& box) const { | ||||||
double lng = (box.min_corner().get<0>() + box.max_corner().get<0>()) / 2.0; | ||||||
double lat = (box.min_corner().get<1>() + box.max_corner().get<1>()) / 2.0; | ||||||
return Point(lng, lat); | ||||||
} | ||||||
|
||||||
// ____________________________________________________________________________ | ||||||
double SpatialJoinAlgorithms::getMaxDistFromMidpointToAnyPointInsideTheBox( | ||||||
const Box& box, std::optional<Point> midpoint) const { | ||||||
if (!midpoint) { | ||||||
midpoint = calculateMidpointOfBox(box); | ||||||
} | ||||||
double distLng = | ||||||
std::abs(box.min_corner().get<0>() - midpoint.value().get<0>()); | ||||||
double distLat = | ||||||
std::abs(box.min_corner().get<1>() - midpoint.value().get<1>()); | ||||||
// convert to meters and return | ||||||
return (distLng + distLat) * 40075000 / 360; | ||||||
} | ||||||
|
||||||
// ____________________________________________________________________________ | ||||||
Result SpatialJoinAlgorithms::BoundingBoxAlgorithm() { | ||||||
auto printWarning = [alreadyWarned = false, | ||||||
&spatialJoin = spatialJoin_]() mutable { | ||||||
if (!alreadyWarned) { | ||||||
std::string warning = | ||||||
"The input to a spatial join contained at least one element, " | ||||||
"that is not a point geometry and is thus skipped. Note that " | ||||||
"QLever currently only accepts point geometries for the " | ||||||
"spatial joins"; | ||||||
AD_LOG_WARN << warning << std::endl; | ||||||
alreadyWarned = true; | ||||||
if (spatialJoin.has_value()) { | ||||||
AD_CORRECTNESS_CHECK(spatialJoin.value() != nullptr); | ||||||
spatialJoin.value()->addWarning(warning); | ||||||
auto getRtreeEntry = [&](const IdTable* idTable, const size_t row, | ||||||
const ColumnIndex col) -> std::optional<rtreeEntry> { | ||||||
rtreeEntry entry{row, std::nullopt, std::nullopt, std::nullopt}; | ||||||
entry.geoPoint_ = getPoint(idTable, row, col); | ||||||
|
||||||
if (!entry.geoPoint_) { | ||||||
entry.geometry_ = getAnyGeometry(idTable, row, col); | ||||||
if (!entry.geometry_.has_value()) { | ||||||
return std::nullopt; | ||||||
} | ||||||
entry.boundingBox_ = | ||||||
boost::apply_visitor(BoundingBoxVisitor(), entry.geometry_.value()); | ||||||
} else { | ||||||
entry.boundingBox_ = | ||||||
Box(Point(entry.geoPoint_.value().getLng(), | ||||||
entry.geoPoint_.value().getLat()), | ||||||
Point(entry.geoPoint_.value().getLng() + 0.00000001, | ||||||
entry.geoPoint_.value().getLat() + 0.00000001)); | ||||||
} | ||||||
return entry; | ||||||
}; | ||||||
|
||||||
const auto [idTableLeft, resultLeft, idTableRight, resultRight, leftJoinCol, | ||||||
|
@@ -478,52 +668,63 @@ | |||||
std::swap(smallerResJoinCol, otherResJoinCol); | ||||||
} | ||||||
|
||||||
// build rtree with one child | ||||||
bgi::rtree<Value, bgi::quadratic<16>, bgi::indexable<Value>, | ||||||
bgi::equal_to<Value>, ad_utility::AllocatorWithLimit<Value>> | ||||||
rtree(bgi::quadratic<16>{}, bgi::indexable<Value>{}, | ||||||
bgi::equal_to<Value>{}, qec_->getAllocator()); | ||||||
for (size_t i = 0; i < smallerResult->numRows(); i++) { | ||||||
// get point of row i | ||||||
auto geopoint = getPoint(smallerResult, i, smallerResJoinCol); | ||||||
|
||||||
if (!geopoint) { | ||||||
printWarning(); | ||||||
// add every box together with the additional information into the rtree | ||||||
std::optional<rtreeEntry> entry = | ||||||
getRtreeEntry(smallerResult, i, smallerResJoinCol); | ||||||
if (!entry) { | ||||||
// nothing to do. When parsing a point or an area fails, a warning | ||||||
// message gets printed at another place and the point/area just gets | ||||||
// skipped | ||||||
continue; | ||||||
} | ||||||
|
||||||
Point p(geopoint.value().getLng(), geopoint.value().getLat()); | ||||||
|
||||||
// add every point together with the row number into the rtree | ||||||
rtree.insert(std::make_pair(std::move(p), i)); | ||||||
rtree.insert(std::make_pair(entry.value().boundingBox_.value(), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
technically with the function call, the second argument can be evaluated first, and then you can have a moved out value etc... |
||||||
std::move(entry.value()))); | ||||||
} | ||||||
|
||||||
// query rtree with the other child | ||||||
std::vector<Value, ad_utility::AllocatorWithLimit<Value>> results{ | ||||||
qec_->getAllocator()}; | ||||||
for (size_t i = 0; i < otherResult->numRows(); i++) { | ||||||
auto geopoint1 = getPoint(otherResult, i, otherResJoinCol); | ||||||
|
||||||
if (!geopoint1) { | ||||||
printWarning(); | ||||||
std::optional<rtreeEntry> entry = | ||||||
getRtreeEntry(otherResult, i, otherResJoinCol); | ||||||
if (!entry) { | ||||||
// nothing to do. When parsing a point or an area fails, a warning | ||||||
// message gets printed at another place and the point/area just gets | ||||||
// skipped | ||||||
continue; | ||||||
} | ||||||
std::vector<Box> queryBox; | ||||||
if (!entry.value().geoPoint_) { | ||||||
auto midpoint = | ||||||
calculateMidpointOfBox(entry.value().boundingBox_.value()); | ||||||
queryBox = computeBoundingBox( | ||||||
midpoint, getMaxDistFromMidpointToAnyPointInsideTheBox( | ||||||
entry.value().boundingBox_.value(), midpoint)); | ||||||
Comment on lines
+704
to
+708
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks like it always uses a midpoint approximation, even if the midpoint setting is not activated, why don't we use the exact bounding box? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, maybe I recall that you use your padding of the midpoint, s.t. the create box is always correct... |
||||||
} else { | ||||||
queryBox = | ||||||
computeBoundingBox(Point(entry.value().geoPoint_.value().getLng(), | ||||||
entry.value().geoPoint_.value().getLat())); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This whole thing (get the vector for a query with the added distance) should also be a separate function. |
||||||
} | ||||||
|
||||||
Point p(geopoint1.value().getLng(), geopoint1.value().getLat()); | ||||||
|
||||||
// query the other rtree for every point using the following bounding box | ||||||
std::vector<Box> bbox = computeBoundingBox(p); | ||||||
results.clear(); | ||||||
|
||||||
ql::ranges::for_each(bbox, [&](const Box& bbox) { | ||||||
ql::ranges::for_each(queryBox, [&](const Box& bbox) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
rtree.query(bgi::intersects(bbox), std::back_inserter(results)); | ||||||
}); | ||||||
|
||||||
ql::ranges::for_each(results, [&](const Value& res) { | ||||||
size_t rowLeft = res.second; | ||||||
size_t rowLeft = res.second.row_; | ||||||
size_t rowRight = i; | ||||||
if (!leftResSmaller) { | ||||||
std::swap(rowLeft, rowRight); | ||||||
} | ||||||
auto distance = computeDist(idTableLeft, idTableRight, rowLeft, rowRight, | ||||||
leftJoinCol, rightJoinCol); | ||||||
auto distance = computeDist(res.second, entry.value()); | ||||||
AD_CORRECTNESS_CHECK(distance.getDatatype() == Datatype::Double); | ||||||
if (distance.getDouble() * 1000 <= static_cast<double>(maxDist.value())) { | ||||||
addResultTableEntry(&result, idTableLeft, idTableRight, rowLeft, | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
alreadyWarned
doesn't work here, as it is reset tofalse
for each call of the function.The easiest way is to implement the lambda outside (you have a loop etc. where you call the
getAnyGeometry
repeatedly, and there you then doauto bla = getAnyGeometry; if (!bla.has_value()) {printWarning();}...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have an alternative idea:
the outer function here (
getAnyGeometry
) gets a reference to a counter variable (the number of previously encountered invalid elements). This is then captured by reference with the lambda. and incremented. Then you only have to manage the counter outside this function and can even use it to print statistics (the number of invalid elements). But first do something that doesn't warnn
times, as this might break the UI etc.