-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Linker with Query type #148
Comments
Hm, is the generated QueryString the one you describe? The outer The reason why the library does not support Lucene queries is that Lucene boolean queries are not serializable. Hence, I decided to allow only QueryString. |
From the user's list I found this explanation :
Can we build a custom Object and wrap the query similar to what is described here? |
Yes, you can definetely build a custom Object and wrap the query as described in your link above. The only issue is that query object cannot be defined during runtime, only at compile time and submitted to the executors. For example, I use the "trick" mentioned in the link for Lucene's analyzers that are not serializable, see |
What can we do about this? At the moment , I can simulate the min_should_match manually , but ideally I'd like to use the facilities lucene provide and all the different queries we could run using their builders |
I think you have a valid point here and put some effort on this. I have a prototype (see below), where you can specify a linker as a function https://github.com/zouzias/spark-lucenerdd/pull/154/files#diff-4a4a384e7770d218cc77733cd7d485a9R547 Thank you a lot for the feedback / suggestion, I think such an improvement will improve a lot the API of If you agree on the feature, I can quickly make a snapshot release to test. |
I believe it looks as expected but it is hard for me to tell from just the diff. I would need to test and see what happens. Please release a snapshot so I can test it. Thank you a lot for the effort , this will definitely improve the API if implemented successfully. |
It is currently available under One of my concerns is that since we specify the |
I believe it will be a mismatch if we don't specify the analyzer with the query. It is there a way we could 'overwrite ' it before running the query? For example , how can I set up a custom analyzer? I am actually not sure who analyzers work at the moment with this project other than the config file. Could you please clarify it? |
I tested the query building and it works as expected. You did an amazing job , thank you! The only question that remains is the analyzer. |
I believe something it is not working as expected. Given my config :
Normally, using Standard Analyzer at index time , I expect LLC and L.L.C to match. Is this incorrect? |
My assumption is incorrect. The character "." is not part of the characters removed by the Standard Analyzer. |
Query / index time analysis can be tricky. |
I believe this issue can be considered done unless we find any other bug. #156 addresses the possibility of custom analyzers so it is probably better to move the conversation there. As far as I know , queries use the analyzer defined by the type or a custom one at search time. Using a query string or an object should not make any difference in that (?) |
It seems that not all the examples allow a Query. For example this example does not seem to work with a Query type
|
Many other types do not support query as well. For example :
I am aware you specified which types are supported in your release , but I believe we should re-open it to track the work needed for other types. |
Yes only the block linkage methods support Query type for now.
…On Tue, 19 Mar 2019, 15:59 Yeikel, ***@***.***> wrote:
Many other types do not support query as well. For example :
def dedup[T1: ClassTag](searchQueryGen: T1 => String,
topK: Int = DefaultTopK,
linkerMethod: String = getLinkerMethod)
: RDD[(T1, Array[SparkScoreDoc])] = {
// FIXME: is this asInstanceOf necessary?
link[T1](this.asInstanceOf[RDD[T1]], searchQueryGen, topK, linkerMethod)
}
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AByNKORkBl_G95aDQpCnhSQAW93rL2ikks5vYPs1gaJpZM4amwAZ>
.
|
Can I help to increase the support? Or do you have it planned? I'd like to test all the different options in this library using a query rather than a query string |
I tried to enhance it but I am not sure why it works sometimes and why it is producing a serialization error for me . How were you able to serialize it? I am currently trying to add it to
|
So if I still don't understand how it is not failing for the tests you wrote. |
I added a little bit more of support in this commit yeikel@3e408c7. Could you please review it? |
Describe the solution you'd like
I'd like to use Lucence query builder to generate my queries instead of the query String. The
toString
method does not guarantee compatibility and it fails for some cases.For example :
Produces the following exception :
(address:harvard~2 address:denver~2 )~3': Encountered " <FUZZY_SLOP> "~3 "" at line 1, column 95.
Doing a search and replace for the token "~" is not really an option because the query won't behave as expected
The text was updated successfully, but these errors were encountered: