Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add tokenizer for the query string #3

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 21 additions & 10 deletions lunr-repro.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -93,19 +93,30 @@ const Search: React.FC<{
query: string;
docs: Record<string, string>;
}> = function ({ query, docs }) {
const lunrIndex = useMemo((() => lunr(function () {
this.ref('name');
this.field('body');
for (const [name, body] of Object.entries(docs)) {
this.use((lunr as any).ja);
this.add({ name, body });
}
//console.debug(`Indexed ${docs.length} docs`);
})), [docs]);
const lunrIndex = useMemo(
() =>
lunr(function () {
this.use((lunr as any).ja);
this.ref("name");
this.field("body");
for (const [name, body] of Object.entries(docs)) {
this.add({ name, body });
}
//console.debug(`Indexed ${docs.length} docs`);
}),
[docs]
);

const [results, error] = useMemo(() => {
// example input = 標準製品
// in Japanese these are two words, 標準 and 製品. Which will be tokenized in the docs as such (separated).
// in order to search for 標準製品. Use the same tokenizer, to separate the query.
// create a tokenized query, which should use the function to tokenize the doc
const queryTokenized = lunr.ja.tokenizer(query);
// turn the array back into proper lunr query search term. such as "標準 製品"
const queryTerm = queryTokenized.join(" ");
try {
return [lunrIndex.search(query) ?? [], null];
return [lunrIndex.search(queryTerm) ?? [], null];
} catch (e) {
return [[], `${e.message}`];
}
Expand Down