Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[98] add query for curator comments #154

Merged
merged 16 commits into from
Oct 28, 2024

Conversation

stanislaw-zakrzewski
Copy link
Collaborator

Solves: #98

Changes:

  • Free-text search now is capable of searching through the curator comments
  • Highlighter now works with combinations of terms (quoted)
  • DataGuideDialog component content was updated

Screenshots:

Searching for a case with free-text (content in curators comment):
image
image

Updated DataGuideDialog content:
image

@stanislaw-zakrzewski stanislaw-zakrzewski added the work in progress There's still something left to do. label Oct 18, 2024
@stanislaw-zakrzewski stanislaw-zakrzewski self-assigned this Oct 18, 2024
@stanislaw-zakrzewski stanislaw-zakrzewski linked an issue Oct 18, 2024 that may be closed by this pull request
@stanislaw-zakrzewski stanislaw-zakrzewski removed the work in progress There's still something left to do. label Oct 18, 2024
Copy link

@shaileshmahajanBCH shaileshmahajanBCH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @stanislaw-zakrzewski ,

  • The data guide and single word search looks good.
    image

image

  • The indexes look good
    image

  • I do see following issue with search for terms like hello $, hello /, hello, this. Can you please take a look if there are some restrictions on search terms

image

@stanislaw-zakrzewski
Copy link
Collaborator Author

Hey @shaileshmahajanBCH
There were some issues regarding the filers and text search combinations. Hopefully I managed to resolve them as well as searching for special characterd.

Copy link

@shaileshmahajanBCH shaileshmahajanBCH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stanislaw-zakrzewski for all the work on this PR.

  • Did notice that the search is not returning case records some times. I have 2 cases with the word "this" in it, But search result does not return that case

image

image

@stanislaw-zakrzewski
Copy link
Collaborator Author

Thanks @stanislaw-zakrzewski for all the work on this PR.

  • Did notice that the search is not returning case records some times. I have 2 cases with the word "this" in it, But search result does not return that case

image

image

@shaileshmahajanBCH Initially I thought that it is an issue of parsing a query, but it is much more interesting 😮

The problem with searching for this term is actually part of the MongoDB feature. When we create indexes for the text search we specify the language:

    {
        "name": "commentIdx",
        "key": {
            "comment": -1
        },
        "collation": {
            "locale": "en_US",
            "strength": 2
        }
    },

According to documentation one of the features of text search with specified language (locale) is dropping "stop words" from search query: https://www.mongodb.com/docs/v5.3/core/index-text/#supported-languages-and-stop-words (full list of stop words for English language: https://github.com/igorbrigadir/stopwords/blob/master/en/mongodb.txt). And one of the stop words is this, which causes MongoDB to ignore it in a search query.

We can resolve this problem by disabling all language specific features (stemming and stop words) by adding "default_language": "none" to textIdx definition:

    {
        "name": "textIdx",
        "key": {
            "demographics.occupation": "text",
            "location.country": "text",
            "location.admin1": "text",
            "location.admin2": "text",
            "location.admin3": "text",
            "caseReference.sourceUrl": "text",
            "caseStatus": "text",
            "comment": "text"
        },
        "default_language": "none"
    },

And now we can search for this:
image
image

But I'm not sure we should do that. Maybe we can just add a disclaimer that some of the "meaningless" words are not going to be taken care of by the search?

@shaileshmahajanBCH shaileshmahajanBCH self-requested a review October 24, 2024 14:50
Copy link

@shaileshmahajanBCH shaileshmahajanBCH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stanislaw-zakrzewski You are correct. Stop words are not indexed.

@stanislaw-zakrzewski stanislaw-zakrzewski merged commit 881e39e into main Oct 28, 2024
2 checks passed
@stanislaw-zakrzewski stanislaw-zakrzewski deleted the 98-add-query-for-curator-comments branch October 28, 2024 14:56
@stanislaw-zakrzewski stanislaw-zakrzewski restored the 98-add-query-for-curator-comments branch October 30, 2024 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add query for curator comments
2 participants