Skip to content
This repository has been archived by the owner on Jun 3, 2024. It is now read-only.

Commit

Permalink
Migrate mimir to ElasticSearch7.x (#277) (#279)
Browse files Browse the repository at this point in the history
* Migrate mimir to ElasticSearch7.x (#277)

* fix lint

* keep httpx 1.18 in master

* PoC: compatibility with es7

* tests migration from ES2 to ES7 for mimir

* fix lint

* fix linting warning of pylint 2.11

* black reformat

* update getting type with ES7

* only maintained ES7 for mimir

* fix linting test

* fix wiki tests

* Build docker image for es7 branch with github action

* remove encoding warning from lintage + weird import position

* disable='unspecified-encoding' for the lintage

* disable='unspecified-encoding' for the lintage

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>
Co-authored-by: Adrien Matissart <a.matissart@qwantresearch.com>

* Hotel pricing Tripadvisor api endpoint (#281)

* add hotel pricing tripadvisor api call endpoint

* limit docker elastic memory size + upgrade to latest elasticsearch version

* fix review (add todo cleanup endpoint and clean useless fixture)

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* Add tripadvisor feeds endpoint with categories (#285)

* add tripadvisor feeds endpoint with categories

* fix fomat + add enum for type of POI called from mimir

* fix tests

* fix formatting

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* add logger url bragi

* QMAPS-2334 fix readonly fs gunicorn pid file (#286)

Co-authored-by: Aureliano Sinatra <a.sinatra@qwant.com>

* Refacto datasources with a factory (#288)

* simply add tripadvisor trigger logic to suggest

* add tests for tripadvisor

* first version refacto datasources with factory pattern

* unravel circular depedencies : multiple utils module file were shared between multiples modules -> I migrate then to the utils module in a "half-dirty" way

* improve factory and clean `get_name` function that was in utils but only used once

* use classic factory instead to make test success

* divide `get_places_bbox_impl` into sub-functions

* create two functions to differentiate france and worldwide behavior for the datasource selection

* fix review

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* fix linting

* Trigger instant answer with tripadvisor  (#290)

* add tests instant answer tripadvisor with intention detected

* create OSMPoi and TripadvisorPoi and rename index to poi-tripadvisor + tests

* fix previous behavior/tests. New feature test is still in red state

* fix new behavioral tests to fallback on pages if no hotel were fetched with tripadvisor for a single poi

* fix async task and trgger order between OSM and PagesJaunes

* reverse pagesjaunes and osm priority

* Update default argument with Optional type hint

Co-authored-by: Rémi Dupré <r.dupre@qwant.com>

* Make POI an abstract class

* add type source choice on Bragi init

* fix lint

* fix instantiate POI type depending on id

* Update idunn/places/poi.py

Co-authored-by: Rémi Dupré <r.dupre@qwant.com>

* convert poi-tripadvisor index to poi_tripadvisor + with review

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>
Co-authored-by: Rémi Dupré <r.dupre@qwant.com>

* Add MCID tripadvisor url (#292)

* add MCID tripadvisor url in PlaceMeta object with source_url and contribute_url

* fix format

* fix review

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* Bragi new params (#293)

* update bragi API model

* codes is optional with new Bragi

* geocoder/models/params: stops type is not used

* Handle POI detail for tripadvisor (#294)

* fix review

* fix consistent naming

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* Add bubble star url for tripadvisor (#296)

* add rating url for tripadvisor

* fix lint

* fix tests

* format

* add review block for tripadvisor

* fix format

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* Fix wrong merge (#299)

* add rating url for tripadvisor

* fix lint

* fix tests

* format

* add review block for tripadvisor

* fix format

* fix instant answer for the bragi version (#298)

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* update dependencies (#302)

* update dependencies

* disable failing pylint

* add heathcheck for all external sources (#300)

* add heathcheck for all external sources

* use dict to display heathcheck

* fix lint

* fix tests

* add bragi healthcheck

* fix lint

* add tagger/classifier healthcheck and clean bragi healthcheck

* add redis heathcheck

* fix format

* remove print

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* fix wrong import

* add timeout on status call

* build tripadvisor url with lang (#303)

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* Catch exception on healthcheck timeout (#304)

* catch exception on healthcheck

* fix format

* log error

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* Qmaps 2453 fix TA module priority (#305)

* dirty code to test now module tripadvisor priority

* fix tests

* add comment

* fix lint

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* fix property address and category in poi list (#306)

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* fix None type exception

* Add TRIPADVISOR_ENABLED flag (#307)

* add TRIPADVISOR_ENABLED flag

* fix call hotel pj when tripadvisor is disabled

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>

* wrap tripadvisor's ratings image in thumbr (#308)

* wrapping tripadvisor's ratings image in thumbr

* tripadvisor: fix rating url for full values

* add small comment

* image block: resize images images to fit what is displayed in erdapfel (#309)

* image block: resize images images to fit what is displayed in erdapfel

Some images can get pretty big, especially when displaying several of
them in the list view.

If we want more granularity it could make sense to specify that as a
parameter, but I don't think this would make any improvement in our
current use.

* image block: fix broken tests due to Thumbr url change

* instant_answer: mix OSM and tripadvisor in direct query (#311)

* instant_answer: mix OSM and tripadvisor in direct query

As weights for TA POIs are rather high, they are already prioritized
compared to OSM POIs and this won't be too agressive by hiding potential
much more relevent OSM documents.

* instant_answer: we do not need to filter tripadvisor results

* instant_answer: fix exception for unsupported nlu langs (#312)

* urlsolver: do not raise an exception for 4xx/5xx and fix errors for 3xx (#313)

httpx seems to have changed its behavior, and it doesn't really make
sense to raise an exception in Idunn when provided with an invalid URL
anyway.

* fix import get_mimir_elasticsearch

* fix lint

* fix lint

Co-authored-by: sdirollo <sebastien.dirollo@gmail.com>
Co-authored-by: Adrien Matissart <a.matissart@qwantresearch.com>
Co-authored-by: aureliano sinatra <sinaure@gmail.com>
Co-authored-by: Aureliano Sinatra <a.sinatra@qwant.com>
Co-authored-by: Rémi Dupré <r.dupre@qwant.com>
Co-authored-by: Rémi Dupré <remi@dupre.io>
  • Loading branch information
7 people authored Mar 16, 2022
1 parent b9b9c4a commit f560d55
Show file tree
Hide file tree
Showing 78 changed files with 9,342 additions and 654 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/docker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ jobs:
IMAGE_TAG=$GITHUB_SHA
elif [ "$VERSION" == "master" ]; then
IMAGE_TAG=latest
elif [ "$VERSION" == "es7" ]; then
IMAGE_TAG=latest-es7
else
IMAGE_TAG=$VERSION
fi
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,5 @@ EXPOSE 5000

# You can set the number of workers by passing --workers=${NB_WORKER} to the docker run command.
# For some reason, an array is required here to accept other params on run.
ENTRYPOINT ["gunicorn", "app:app", "--bind=0.0.0.0:5000", "--pid=pid", \
ENTRYPOINT ["gunicorn", "app:app", "--bind=0.0.0.0:5000", "--pid=/tmp/gunicorn.pid", \
"-k", "uvicorn.workers.UvicornWorker", "--preload"]
3 changes: 2 additions & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ pylint = "*"

[packages]
fastapi="==0.65.*"
elasticsearch = ">=2.0.0,<3.0.0"
elasticsearch2 = ">=2.0.0,<3.0.0"
elasticsearch = "==7.15.1"
requests = ">=2.20.0"
tzwhere = "*"
babel = "*"
Expand Down
598 changes: 401 additions & 197 deletions Pipfile.lock

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions idunn/api/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .places_list import PlacesQueryParam
15 changes: 9 additions & 6 deletions idunn/api/closest.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,12 @@
from pydantic import confloat

from idunn import settings
from idunn.utils.es_wrapper import get_elasticsearch
from idunn.utils.es_wrapper import get_mimir_elasticsearch
from idunn.utils import prometheus
from idunn.places import Street, Address, Place
from idunn.datasources.mimirsbrunn import fetch_closest

from idunn.datasources.mimirsbrunn import fetch_closest, get_es_place_type

from idunn.utils.verbosity import Verbosity

logger = logging.getLogger(__name__)
Expand All @@ -18,21 +20,22 @@

def get_closest_place(lat: float, lon: float, es=None):
if es is None:
es = get_elasticsearch()
es = get_mimir_elasticsearch()
es_addr = fetch_closest(lat, lon, es=es, max_distance=MAX_DISTANCE_IN_METERS)

places = {
"addr": Address,
"street": Street,
}
loader = places.get(es_addr.get("_type"))
places_type = get_es_place_type(es_addr)
loader = places.get(places_type)

if loader is None:
logger.warning("Found a place with the wrong type")
prometheus.exception("FoundPlaceWithWrongType")
raise HTTPException(
status_code=404,
detail=f"Closest address to '{lat}:{lon}' has a wrong type: '{es_addr.get('_type')}'",
detail=f"Closest address to '{lat}:{lon}' has a wrong type: '{places_type}'",
)

return loader(es_addr["_source"])
Expand All @@ -46,7 +49,7 @@ def closest_address(
) -> Place:
"""Find the closest address to a point."""

es = get_elasticsearch()
es = get_mimir_elasticsearch()

if not lang:
lang = settings["DEFAULT_LANGUAGE"]
Expand Down
1 change: 1 addition & 0 deletions idunn/api/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
class PoiSource(str, Enum):
OSM = "osm"
PAGESJAUNES = "pages_jaunes"
TRIPADVISOR = "tripadvisor"


ALL_POI_SOURCES = [s.value for s in PoiSource]
Expand Down
8 changes: 4 additions & 4 deletions idunn/api/directions.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
from pydantic import confloat

from idunn import settings
from idunn.places import Latlon, place_from_id
from idunn.places import Latlon
from idunn.places.exceptions import IdunnPlaceError
from idunn.utils.rate_limiter import IdunnRateLimiter
from ..directions.client import directions_client

from ..utils.place import place_from_id

rate_limiter = IdunnRateLimiter(
resource="idunn.api.directions",
Expand Down Expand Up @@ -58,8 +58,8 @@ def get_directions(
):
"""Get directions to get from a places to another."""
try:
from_place = place_from_id(origin, follow_redirect=True)
to_place = place_from_id(destination, follow_redirect=True)
from_place = place_from_id(origin, language, follow_redirect=True)
to_place = place_from_id(destination, language, follow_redirect=True)
except IdunnPlaceError as exc:
raise HTTPException(status_code=404, detail=exc.message) from exc

Expand Down
88 changes: 88 additions & 0 deletions idunn/api/hotel_pricing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
import json
from enum import Enum
from typing import Optional

from fastapi import Depends, Query
from pydantic import BaseModel

from idunn import settings
from idunn.datasources.tripadvisor import tripadvisor_api
from idunn.places.models.ta_api_reponse import HotelPricingResponse


class HotelIdType(Enum):
TRIPADVISOR = "TA"
HOTELS = "EAN"
PRICELINE = "PCLN"
BOOKING = "BCOM"
EXPEDIA = "EXPE"


class CommonQueryParam(BaseModel):
key: str = Query(
settings.get("TA_API_KEY"),
description="Access key to the API. "
"This value is provided by your Tripadvisor Account Manager",
)
check_in: str = Query(None, description="The check-in date in the YYYY-MM-DD format")
check_out: str = Query(None, description="The check-out date in the YYYY-MM-DD format")
user_agent: str = Query(
None,
description="When making this request client side set this value to 'infer'"
" and the service will automatically resolve the user-agent",
)
ip_address: str = Query(
None,
description="When making this request client side, set this value to 'infer' "
"and the service will automatically resolve the user-agent",
)
num_adults: Optional[int] = Query(
None,
description="The total number of adults that will stay at the accommodation."
" Supported values are between 1 to 4. The default value is '1'",
)
num_rooms: Optional[int] = Query(
None, description="The number of rooms to be booked. Supported values are between 1 to 8"
)
currency: Optional[str] = Query(
None,
description="The currency code to be used to display prices. "
"It should follow ISO 4217 format. The default value is 'USD'",
)
locale: Optional[str] = Query(
None,
description="The Preferred locale for the current request following the RFC 3066 format",
)
custom_tracking_var1: Optional[str] = Query(
None,
description="The Preferred locale for the current request following the RFC 3066 format",
)


class SearchHotelByLocationParam(CommonQueryParam):
location_id: str = Query(
None,
description="The Tripadvisor location ID of"
" the location for which you are requesting hotels",
)
limit: Optional[int] = Query(
None, description="The maximum number of hotels to return for the given location"
)


class SearchHotelByIdParam(CommonQueryParam):
hotel_ids: str = Query(
None,
description="Unique IDs of the hotel (s) for which you are requesting pricing."
" Used in conjuction with hotel_id_type",
)
hotel_id_type: Optional[HotelIdType] = Query(
None,
description="The hotel ID type for corresponding hotel_id. "
"If not provided, defaults to Tripadvisor hotel ID",
)


async def get_hotel_pricing(params: SearchHotelByIdParam = Depends()) -> HotelPricingResponse:
"""Get availability and price for a given hotel Id with TripAdvisor api"""
return await tripadvisor_api.get_hotel_pricing_by_hotel_id(json.loads(params.json()))
127 changes: 101 additions & 26 deletions idunn/api/instant_answer.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,35 @@
from idunn.geocoder.bragi_client import bragi_client
from idunn.geocoder.models import QueryParams
from idunn.geocoder.models.geocodejson import IntentionType
from idunn.places import place_from_id, Place
from idunn.places import Place
from idunn.api.places_list import get_places_bbox_impl, PlacesQueryParam
from idunn.utils import maps_urls
from idunn.utils.regions import get_region_lonlat
from idunn.utils.result_filter import ResultFilter
from idunn.instant_answer import normalize
from .constants import PoiSource

from ..utils.place import place_from_id
from ..utils.verbosity import Verbosity
from copy import deepcopy


logger = logging.getLogger(__name__)
result_filter = ResultFilter()

nlu_allowed_languages = settings["NLU_ALLOWED_LANGUAGES"].split(",")
ia_max_query_length = int(settings["IA_MAX_QUERY_LENGTH"])
AVAILABLE_CLASS_TYPE_TRIPADVISOR = [
"class_hotel",
"class_lodging",
"class_restaurant",
"class_leisure",
]

AVAILABLE_CLASS_TYPE_HOTEL_TRIPADVISOR = [
"class_hotel",
"class_lodging",
]


class InstantAnswerResult(BaseModel):
Expand All @@ -37,7 +52,7 @@ class InstantAnswerResult(BaseModel):
source: Optional[PoiSource] = Field(
description=(
"Data source for the returned place, or data provider for the list of results. This "
"field is not provided when the instant answer relates to an admnistrative area or an "
"field is not provided when the instant answer relates to an administrative area or an "
"address."
)
)
Expand Down Expand Up @@ -115,9 +130,11 @@ def build_response(result: InstantAnswerResult, query: str, lang: str):
)


def get_instant_answer_single_place(place_id: str, query: str, lang: str) -> Response:
def get_instant_answer_single_place(
place_id: str, query: str, lang: str, type: Optional[str] = None
) -> Response:
try:
place = place_from_id(place_id, follow_redirect=True)
place = place_from_id(place_id, lang, type=type, follow_redirect=True)
except Exception:
logger.warning(
"get_instant_answer: Failed to get place with id '%s'", place_id, exc_info=True
Expand Down Expand Up @@ -226,6 +243,7 @@ async def get_instant_answer(
return build_response(result, query=q, lang=lang)
return no_instant_answer(query=q, lang=lang, region=user_country)

intentions = []
if lang in nlu_allowed_languages:
try:
intentions = await nlu_client.get_intentions(
Expand All @@ -249,6 +267,9 @@ async def get_instant_answer(

# Direct geocoding query
query = QueryParams.build(q=normalized_query, lang=lang, limit=5, **extra_geocoder_params)
if settings["TRIPADVISOR_ENABLED"]:
query_tripadvisor = deepcopy(query)
query_tripadvisor.poi_dataset += ["tripadvisor", "default"]

async def fetch_pj_response():
if not (settings["IA_CALL_PJ_POI"] and user_country == "fr" and intentions):
Expand All @@ -260,31 +281,85 @@ async def fetch_pj_response():
intentions[0].description._place_in_query,
)

# Query PJ API asynchronously as a task which may be cancelled if bragi
# finds results.
# Query PJ API and Bragi osm asynchronously as a task which may be cancelled
fetch_pj = asyncio.create_task(fetch_pj_response())
bragi_response = await bragi_client.autocomplete(query)
bragi_features = result_filter.filter_bragi_features(
normalized_query, bragi_response["features"]
)

if bragi_features:
fetch_pj.cancel()
place_id = bragi_features[0]["properties"]["geocoding"]["id"]
return await run_in_threadpool(
get_instant_answer_single_place, query=q, place_id=place_id, lang=lang
fetch_bragi_osm = asyncio.create_task(bragi_client.autocomplete(query))
if settings["TRIPADVISOR_ENABLED"]:
bragi_tripadvisor_response = await bragi_client.autocomplete(query_tripadvisor)
bragi_tripadvisor_features = result_filter.filter_bragi_features(
normalized_query, bragi_tripadvisor_response["features"]
)

pj_response = result_filter.filter_places(normalized_query, await fetch_pj)
# Select datasource instant answer in France
if (
intentions
and intentions[0].filter is not None
and intentions[0].filter.bbox is not None
and pj_source.bbox_is_covered(intentions[0].filter.bbox)
):
if settings["TRIPADVISOR_ENABLED"]:
for bragi_tripadvisor_feature in bragi_tripadvisor_features:
feature_properties = bragi_tripadvisor_feature["properties"]["geocoding"]
if (
"poi_types" in feature_properties
and feature_properties["poi_types"][0]["id"].split(":")[0]
in AVAILABLE_CLASS_TYPE_HOTEL_TRIPADVISOR
):
fetch_pj.cancel()
fetch_bragi_osm.cancel()
place_id = feature_properties["id"]
return await run_in_threadpool(
get_instant_answer_single_place,
query=q,
place_id=place_id,
lang=lang,
type="poi_tripadvisor",
)

pj_response = result_filter.filter_places(normalized_query, await fetch_pj)

if pj_response:
fetch_bragi_osm.cancel()
place_id = pj_response[0].get_id()
result = InstantAnswerResult(
places=[pj_response[0].load_place(lang)],
source=pj_response[0].get_source(),
maps_url=maps_urls.get_place_url(place_id),
maps_frame_url=maps_urls.get_place_url(place_id, no_ui=True),
)
return build_response(result, query=q, lang=lang)
# Call tripadvisor datasource instant answer outside France or
# without intention location detection
else:
if settings["TRIPADVISOR_ENABLED"]:
bragi_tripadvisor_feature = next(iter(bragi_tripadvisor_features), None)

if pj_response:
place_id = pj_response[0].get_id()
result = InstantAnswerResult(
places=[pj_response[0].load_place(lang)],
source=pj_response[0].get_source(),
maps_url=maps_urls.get_place_url(place_id),
maps_frame_url=maps_urls.get_place_url(place_id, no_ui=True),
)
return build_response(result, query=q, lang=lang)
if bragi_tripadvisor_feature is not None:
fetch_pj.cancel()
fetch_bragi_osm.cancel()
feature_properties = bragi_tripadvisor_feature["properties"]["geocoding"]
place_id = feature_properties["id"]
return await run_in_threadpool(
get_instant_answer_single_place,
query=q,
place_id=place_id,
lang=lang,
)

return await instant_answer_fallback(fetch_bragi_osm, lang, normalized_query, q, user_country)


async def instant_answer_fallback(fetch_bragi_osm, lang, normalized_query, q, user_country):
"""
Call OSM datasource as fallback. Return No instant answer if there are no results found
"""
bragi_osm_response = await fetch_bragi_osm
bragi_osm_features = result_filter.filter_bragi_features(
normalized_query, bragi_osm_response["features"]
)
if bragi_osm_features:
place_id = bragi_osm_features[0]["properties"]["geocoding"]["id"]
return await run_in_threadpool(
get_instant_answer_single_place, query=q, place_id=place_id, lang=lang
)
return no_instant_answer(query=q, lang=lang, region=user_country)
Loading

0 comments on commit f560d55

Please sign in to comment.