Combining Querys with BooleanQuerys #11
Replies: 7 comments
-
That's right. Unfortunately some of the query types have static constructors, so there's no way to subclass them. But there is a workaround: classmethods Another observation, the phrase queries don't automatically parse. So it should be |
Beta Was this translation helpful? Give feedback.
-
Better documented in 164f99c. |
Beta Was this translation helpful? Give feedback.
-
Thank you, this is super helpful and makes a lot more sense! In case anyone stumbles upon this and has a similar question, here is an example of how you would apply any() assuming the index I described above. query_str: str = 'lupyne is great'
query = query_engine.all(indexer.fields['date'].range(start_date, None), query_engine.phrase('text', *query_str.split(' '))) I've got one last question if you don't mind. The other field I'm trying to filter on is 'company_name', which honestly I thought would be the simplest, but is actually giving me some trouble. During indexing, I've tried it a couple ways, where I set it as the default field type and then also explicitly setting the field type as engine.Field.Text. Either way, I'm unable to do a prefix search as you demonstrate in the examples here, unless I set it as a NestedField and use a separator that doesn't occur in any of the company names, which is fine, but does seem a bit awkward and makes me think I'm just doing something wrong. Also, any prefix searches I construct as if the field is just a regular text field don't seem to work. Term, Terms, and Phrase, none of them seem to work when the field is just set as the default or engine.Field.Text. Is there something I'm missing? Here's some examples I've tried that don't work and the one that does seem to. # We know there exists a result for this query, but none are returned
query = query_engine.all(indexer.fields['date'].range(start_date, None), query_engine.prefix('company_name', 'company a'))
query = query_engine.all(indexer.fields['date'].range(start_date, None), query_engine.phrase('company_name', *'company a'.split(' ')))
# If I create the field as follows things seem to work as expected.
indexer.set('company_name', engine.NestedField, sep='#', stored=True) Thanks again for your time! |
Beta Was this translation helpful? Give feedback.
-
Lucene doesn't really have a default field type (i.e. its default does nothing). A It's hard to say from this example, but I don't think you want If it's a if it's a |
Beta Was this translation helpful? Give feedback.
-
Yes, that's exactly what I thought. But setting the field as Thanks so much for all your help and hard work! Lupyne was invaluable to me in figuring out how to get everything setup, even compilation of pylucene and JCC. Also, for what it's worth, I was able to make JCC generate a semi-portable wheel file for PyLucene using the somewhat new, but undocumented |
Beta Was this translation helpful? Give feedback.
-
Hi @ZeroCool2u Currently playing with PyLucene and Lupyne, I'm highly interested by your method to create a wheel. Btw, I think that it will make the adoption of Lupyne much greater if PyLucene was easier to install as dependency. I will be glad to review your PR if submitted :) Thanks! and Thanks to @coady for the hard work behind Lupyne! |
Beta Was this translation helpful? Give feedback.
-
@ljak been busy, so I haven't gotten around to doing this yet, but I am happy to. We'll have to be very careful to clarify that the portability is truly more limited than the wheel file name implies and still requires a working OpenJDK 8 (I haven't tried with JDK 11) installation setup similarly to the one that exists at compilation time. However, it seems as long as the JDK and tool chain matchup the wheel file can still be used and you can dodge dealing with compilation. |
Beta Was this translation helpful? Give feedback.
-
Hi @coady, thanks for all your hard work on lupyne, its been super helpful for me! I used your Dockerfile as a basis for compiling JCC & PyLucene to wheel files in my own non-Docker environment and now I've been able to successfully run some of the examples and setup my own 14 GB corpus, index it to a directory, and do some basic searches based on the examples you provided in the docs.
Right now I'm trying to write a slightly more complex query, but was having some trouble and hoping you might be able to point me in the right direction.
I have a fairly simple index that has 4 stored fields. A text field containing the article text, a text field containing the name of the company (the list of company names is finite and each document is associated with exactly one company), a datetime field that contains the date the article was published, and an article id.
I'm trying to write a query that does the following: find all documents that contain the phrase "lupyne is great" and occur between some arbitrary date range and that have a company_name field value of 'company a', 'company_b', or 'company_c'.
I've tried the following:
Any suggestions on how I might go about this? Thanks again for all the hard work!
EDIT: So, it looks like this might be because Query.ranges() doesn't return a lupyne Query object as seen here, but instead directly returns a pylucene query object. Any good way to get around this?
Beta Was this translation helpful? Give feedback.
All reactions