Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the parsing of the text index builder #1695

Merged
merged 19 commits into from
Jan 22, 2025
Merged
Changes from 1 commit
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
b93fde4
Extra classes for Words- and Docsfile parsing
Flixtastic Dec 28, 2024
9c40084
Added method to tokenize and normalize at the same time.
Flixtastic Dec 28, 2024
c365935
Added the tokenization to the ql_utility namespace
Flixtastic Dec 28, 2024
479b763
Revert "Added the tokenization to the ql_utility namespace"
Flixtastic Dec 28, 2024
d0ec708
Used the custom InputRangeMixin to lazily tokenize and normalize word…
Flixtastic Jan 2, 2025
a7823fb
Merge branch 'ad-freiburg:master' into words-and-docs-file-parsing
Flixtastic Jan 4, 2025
5f28add
Merge branch 'ad-freiburg:master' into words-and-docs-file-parsing
Flixtastic Jan 6, 2025
f129ecd
Added comments and necessary tests to WordsAndDocsFileParser
Flixtastic Jan 8, 2025
b699551
Merge branch 'ad-freiburg:master' into words-and-docs-file-parsing
Flixtastic Jan 8, 2025
1642175
Merge branch 'ad-freiburg:master' into words-and-docs-file-parsing
Flixtastic Jan 9, 2025
8c8a1a1
Added comments to WordsAndDcosFileParser.h. Improved useability of te…
Flixtastic Jan 9, 2025
0369de6
Rewrite the tokenizer as a view.
joka921 Jan 10, 2025
c412983
Improved comment, addressed small requested changes
Flixtastic Jan 10, 2025
46fbb98
Addressed sonar issues
Flixtastic Jan 10, 2025
1e0fc14
Removed the temporary localeManagers in WordsAndDocsFileParserTest.cpp
Flixtastic Jan 10, 2025
9f9738c
Addressed more SonarQube problems
Flixtastic Jan 11, 2025
a55f2be
For now excluding helper functions from code coverage since they coul…
Flixtastic Jan 11, 2025
bea5936
Reverting last commit
Flixtastic Jan 11, 2025
349be6d
Small improvement
Flixtastic Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
For now excluding helper functions from code coverage since they coul…
…d be outsourced in further refactorings
Flixtastic committed Jan 11, 2025
commit a55f2be2ba377e2f1f8c8b3c0e90b419a6fbb55a
2 changes: 2 additions & 0 deletions src/index/IndexImpl.Text.cpp
Original file line number Diff line number Diff line change
@@ -62,6 +62,7 @@
}
}

// LCOV_EXCL_START
// _____________________________________________________________________________
void IndexImpl::processEntityCaseDuringInvertedListProcessing(
const WordsFileLine& line,
@@ -91,10 +92,10 @@
bool ret = textVocab_.getId(line.word_, &vid);
WordIndex wid = vid.get();
if (!ret) {
LOG(ERROR) << "ERROR: word \"" << line.word_ << "\" "
<< "not found in textVocab. Terminating\n";
AD_FAIL();
}

Check warning on line 98 in src/index/IndexImpl.Text.cpp

Codecov / codecov/patch

src/index/IndexImpl.Text.cpp#L95-L98

Added lines #L95 - L98 were not covered by tests
wordsInContext[wid] += line.score_;
}

@@ -104,13 +105,14 @@
if (entityNotFoundErrorMsgCount < 20) {
LOG(WARN) << "Entity from text not in KB: " << word << '\n';
if (++entityNotFoundErrorMsgCount == 20) {
LOG(WARN) << "There are more entities not in the KB..."
<< " suppressing further warnings...\n";
}

Check warning on line 110 in src/index/IndexImpl.Text.cpp

Codecov / codecov/patch

src/index/IndexImpl.Text.cpp#L108-L110

Added lines #L108 - L110 were not covered by tests
} else {
entityNotFoundErrorMsgCount++;
}

Check warning on line 113 in src/index/IndexImpl.Text.cpp

Codecov / codecov/patch

src/index/IndexImpl.Text.cpp#L112-L113

Added lines #L112 - L113 were not covered by tests
}
// LCOV_EXCL_STOP

// _____________________________________________________________________________
void IndexImpl::addTextFromContextFile(const string& contextFile,