Group and term query combination in Lucene - search

I am new to Lucene and I wanted I wanted to filter my search result based on 3 criterion:
value of field document_type should be Product
value of field brand_id should be 4
value of field family_id should be all of the values from (121, 232, 343)
So what I basically want is to have combinations like following in the search result:
document_type:Product AND brand_id:4 AND family_id:121
document_type:Product AND brand_id:4 AND family_id:232
document_type:Product AND brand_id:4 AND family_id:343
I thought document_type:Product AND brand_id:4 AND family_id:(121 232 343) should do the trick but while parsing this query standard analyzer makes Product to even when while indexing field document_type for value Product was set to Field.Index.NOT_ANALYZED and Field.Store.YES.
I was wondering if it is possible to create a boolean query by combining 3 possible queries for the given 3 cases.
I am quite new with Lucene, could someone help me with it?
Thanks.

Query.combine(Query[]) worked like a charm for the given situation.
The documentation for the given method is available here.
The query turned out to be like following once the combine was applied:
(+document_type:Product +brand_id:4 +family_id:121) (+document_type:Product +brand_id:4 +family_id:232) (+document_type:Product +brand_id:4 +family_id:343)
Thanks.

Related

flux query: filter out all records related to one matching the condition

I'm trying to filter an influx DB query (using the nodeJS influxdb-client library).
As far as I can tell, it only works with "flux" queries.
I would like to filter out all records that share a specific attribute with any record that matches a particular condition. I'm filtering using the filter-function, but I'm not sure how I can continue from there. Is this possible in a single query?
My filter looks something like this:
|> filter(fn:(r) => r["_value"] == 1 and r["button"] == "1" ) and I would like to leave out all the record that have the same r["session"] as any that match this filter.
Do I need two queries; one to get those r["session"]s and one to filter on those, or is it possible in one?
Update:
Trying the two-step process. Got the list of r["session"]s into an array, and attempting to use the contains() flux function now to filter values included in that array called sessionsExclude.
Flux query section:
|> filter(fn:(r) => contains(value: r["session"], set: ${sessionsExclude}))
Getting an error unexpected token for property key: INT ("102")'. Not sure why. Looks like flux tries to turn the values into Integers? The r["session"] is also a String (and the example in the docs also uses an array of Strings)...
Ended up doing it in two queries. Still confused about the Strings vs Integers, but casting the value as an Int and printing out the array of r["session"] within the query seems to work like this:
'|> filter(fn:(r) => not contains(value: int(v: r["session"]), set: [${sessionsExclude.join(",")}]))'
Added the "not" to exclude instead of retain the values matching the array...

solr distinct query I want only certain fields to be listed

I want to find the number of unique records based on myparam value.
Solr distinct query I want only certain fields to be listed.
too many ifs in the distinctValues ​​array in the results. whereas I just want to get the countDistinct value.
url:
http://xxxxxxx:18282/solr/2022/select?q=:&wt=json&rows=0&stats=on&stats.calcdistinct=true&stats.field=myparam
In fact, it would be great if I could get a result like the one below.
result:
{
"responseHeader":{
"status":0,
"QTime":10627,
"params":{
"q":"*:*",
"stats.calcdistinct":"true",
"stats":"on",
"rows":"0",
"wt":"json",
"stats.field":"myparam"}},
"response":{"numFound":816091,"start":0,"docs":[]
},
"stats":{
"stats_fields":{
"myparam":{
"countDistinct":5,
}}}}
I found the answer I was looking for in solr functions.
adress:18282/solr/2021/select?q=:&wt=json&json.facet={x:'unique(username)'}&rows=0
Accordingly, I can find out how many different users there are.

Select one column from Type-ORM query - Node

I have a type ORM query that returns five columns. I just want the company column returned but I need to select all five columns to generate the correct response.
Is there a way to wrap my query in another select statement or transform the results to just get the company column I want?
See my code below:
This is what the query returns currently:
https://i.stack.imgur.com/MghEJ.png
I want it to return:
https://i.stack.imgur.com/qkXJK.png
const qb = createQueryBuilder(Entity, 'stats_table');
qb.select('stats_table.company', 'company');
qb.addSelect('stats_table.title', 'title');
qb.addSelect('city_code');
qb.addSelect('country_code');
qb.addSelect('SUM(count)', 'sum');
qb.where('city_code IS NOT NULL OR country_code IS NOT NULL');
qb.addGroupBy('company');
qb.addGroupBy('stats_table.title');
qb.addGroupBy('country_code');
qb.addGroupBy('city_code');
qb.addOrderBy('sum', 'DESC');
qb.addOrderBy('company');
qb.addOrderBy('title');
qb.limit(3);
qb.cache(true);
return qb.getRawMany();
};```
[1]: https://i.stack.imgur.com/MghEJ.png
[2]: https://i.stack.imgur.com/qkXJK.png
TypeORM didn't meet my criteria, so I'm not experienced with it, but as long as it doesn't cause problems with TypeORM, I see an easy SQL solution and an almost as easy TypeScript solution.
The SQL solution is to simply not select the undesired columns. SQL will allow you to use fields you did not select in WHERE, GROUP BY, and/or ORDER BY clauses, though obviously you'll need to use 'SUM(count)' instead of 'sum' for the order. I have encountered some ORMs that are not happy with this though.
The TS solution is to map the return from qb.getRawMany() so that you only have the field you're interested in. Assuming getRawMany() is returning an array of objects, that would look something like this:
getRawMany().map(companyRecord => {return {company: companyRecord.company}});
That may not be exactly correct, I've taken the day off precisely because I'm sick and my brain is fuzzy enough I was making too many stupid mistakes, but the concept should work even if the code itself doesn't.
EDIT: Also note that map returns a new array, it does not modify the existing array, so you would use this in place of the getRawMany() when assigning, not after the assignment.

Sum or Difference operation of two keys in document using Mongoengine

I have defined a model like
Class Orders(Document):
orderAmount = fields.FloatField()
cashbackAmount = fields.FloatField()
meta = {'strict': False}
I want to get all orders where (orderAmount - cashbackAmount value > 500). I am using Mongoengine and using that I want to perform this operation. I am not using Django Framework so I cannot use solutions of that.
Let's approach this if you had to do this without Mongoengine. You would start by dividing this problem into two steps
1) How to get the difference between two fields and output it as the new field?
2) How to filter all the documents based on that field's value?
You can see that it consists of several steps, so it looks like a great use case for the aggregation framework.
The first problem can be solved using addFields and subtract operators.
{$addFields: {difference: {$subtract: ["$a", "$b"]}}}
what can be translated into "for every document add a new field called difference where difference=a-b".
The second problem is a simple filtering:
{$match: {difference:{$gt: 500}}}
"give me all documents where difference field is greater than 500"
So the whole query in MongoDB would look like this
db.collectionName.aggregate([{$addFields: {difference: {$subtract: ["$a", "$b"]}}}, {$match: {difference:{$gt: 500}}}])
Now we have to translate it into Mongoengine. It turns out that there is aggregate method defined, so we can easily make small adjustments to make this query work.
Diff.objects.aggregate({"$addFields": {"difference": {"$subtract": ["$a", "$b"]}}}, {"$match": {"difference":{"$gt": 500}}})
As a result, you get CommandCursor. You can interact with that object or just convert it to the list, to get a list of dictionaries.

mongo: "2d" index and normal index

location: {lat: Number,
lng: Number}
location is a 2d index in my mongodb and I have been using this for geospatial search, which is working fine.
Now if I need to search as db.find({lat:12.121212, lng:70.707070}), will it use the same index ? or, do I need to define a new index ? If so, how ?
I am using mongoose driver in node.js
The 2d index used for doing the geospatial commands is not going to help for an equivalency match on the two fields. For that you will need to define a compound index on the two sub-documents, something like this:
db.collection.ensureIndex({"location.lat" : 1, "location.lng" : 1})
This worked best for me with a test set of data - you can also define a normal index on the location field itself but that will be less efficient. You can test out the relative performance using hint and explain for any index combination. For example:
db.collection.find({"location.lat" : 179.45, "location.lng" : 90.23}).hint("location.lat_1_location.lng_1").explain()
You can do this for any index you wish in fact, though to check the results returned you will need to drop the .explain()
Please also bear in mind that a query can only use one index at a time, so if you are looking to combine the two (equivalency and a geospatial search) then the 2d index will be the only one used.
Note: all of the above examples are from the MongoDB JS shell, not node.js

Resources