Shopware filter is using parents price also for products where only variants are being used - shopware

We have a sage integration to shopware 6. They integrate product variants as well.
BUT: For the parent product of variant products - they put into the price field 999.999€. The product variants indeed have correct prices.
The problem is: We use the client-api to fetch all products of a category and with that also their filters. (see https://shopware.stoplight.io/docs/store-api/cf2592a37b40b-fetch-a-product-listing-by-category)
The price filter does show the very high 999.999€ price, even though no product variant has this price. Only the (not used/shown) parents.
Is there a way around this? Especially without maintaining the Parent-Price manually? Ideally the Shopware API would not include the prices of the parent products... Can this be configured maybe?
Any hint appreciated :wink:
Example response from /store-api/product-listing/CATEGORY_ID
"aggregations": {
"price": {
"min": "249.0000",
"max": "999999.9900",
"avg": null,
"sum": null,
"apiAlias": "price_aggregation"
},
I called the api endpoint for a category that contains variant products. Their parent products have 999.999€ as a price. As a filter aggregation for price I get a max price of 999999 where I would expect the highest price of all products and variant products. But not having the price of the variants parents included.

One possible approach would be to subscribe to the ProductListingCriteriaEvent and wrap the original price aggregation in a FilterAggregation. With the filter aggregation you can limit the range of the inner StatsAggregation named price.
class MyProductListingSubscriber implements EventSubscriberInterface
{
public static function getSubscribedEvents(): array
{
// set a priority for the listener below -100
// so it is executed after the original aggregations are set
return [
ProductListingCriteriaEvent::class => [
['enhanceCriteria', -101],
],
ProductSearchCriteriaEvent::class => [
['enhanceCriteria', -101],
],
];
}
public function enhanceCriteria(ProductListingCriteriaEvent $event): void
{
$criteria = $event->getCriteria();
$aggregations = $criteria->getAggregations();
$criteria->resetAggregations(); // remove all aggregations
foreach ($aggregations as $aggregation) {
if ($aggregation->getName() !== 'price') {
// re-add aggregations not name 'price'
$criteria->addAggregation($aggregation);
continue;
}
// wrap original aggregation and set a `RangeFilter` for the gross price
$filterAggregration = new FilterAggregation(
'filter-price-stats',
$aggregation,
[
new RangeFilter(
'cheapestPrice.gross',
[
RangeFilter::LT => 999999.99,
]
)
]
);
$criteria->addAggregation($filterAggregration);
}
}
}
This should yield a max price for the filter with the exception of prices equal to or larger than 999999.99.

The problem was the indexing behavior. We had products on this installation before, which were deleted. New products were imported then. Something must have gone wrong with the indexing.
What solved our issue was refreshing the index using the console
.bin/console dal:refresh:index

Related

Nodejs Elasticsearch query default behaviour

On a daily basis, I'm pushing data (time_series) to Elasticsearch. I created an index pattern, and my index have the name: myindex_* , where * is today date (an index pattern has been setup). Thus after a week, I have: myindex_2022-06-20, myindex_2022-06-21... myindex_2022-06-27.
Let's assume my index is indexing products' prices. Thus inside each myindex_*, I have got:
myindex_2022-06-26 is including many products prices like this:
{
"reference_code": "123456789",
"price": 10.00
},
...
myindex_2022-06-27:
{
"reference_code": "123456789",
"price": 12.00
},
I'm using this query to get the reference code and the corresponding prices. And it works great.
const data = await elasticClient.search({
index: myindex_2022-06-27,
body: {
query: {
match: {
"reference_code": "123456789"
}
}
}
});
But, I would like to have a query that if in the index of the date 2022-06-27, there is no data, then it checks, in the previous index 2022-06-26, and so on (until e.g. 10x).
Not sure, but it seems it's doing this when I replace myindex_2022-06-27 by myindex_* (not sure it's the default behaviour).
The issue is that when I'm using this way, I got prices from other index but it seems to use the oldest one. I would like to get the newest one instead, thus the opposite way.
How should I proceed?
If you query with index wildcard, it should return a list of documents, where every document will include some meta fields as _index and _id.
You can sort by _index, to make elastic search return the latest document at position [0] in your list.
const data = await elasticClient.search({
index: myindex_2022-*,
body: {
query: {
match: {
"reference_code": "123456789"
}
}
sort : { "_index" : "desc" },
}
});

MongoDb aggregation condition based on elements in array

Following is the structure of my document:
{
_id:1,
testId:3,
marksObtained:10,
subjects:[
{
subjectId:1,
marksObtained:6
},
{
subjectId:2,
marksObtained:4
}
]
}
Now, I need to calculate the number of students who scored higher for each subject and update it in the document. I am able to do this on the test level but unable to figure out how to do it on the level of array element.
So lets say there's one students who has score higher for subject 1 and 2 students who scored higher for subject to, I need to modify the document to look like the following:
{
_id:1,
testId:3,
marksObtained:10,
subjects:[
{
subjectId:1,
marksObtained:6,
rank:2
},
{
subjectId:2,
marksObtained:4,
rank:3
}
]
}

CouchDB Count Reduce with timestamp filtering

Let's say I have documents like so:
{
_id: "a98798978s978dd98d",
type: "signature",
uid: "u12345",
category: "cat_1",
timestamp: UNIX_TIMESTAMP
}
My goal is to be able to count all signature's created by a certain uid but being able to filter by timestamp
Thanks to Alexis, I've gotten to this far with a reduce _count function:
function (doc) {
if (doc.type === "signature") {
emit([doc.uid, doc.timestamp], 1);
}
}
With the following queries:
start_key=[null,lowerTimestamp]
end_key=[{},higherTimestamp]
reduce=true
group_level=1
Response:
{
"rows": [
{
"key": [ "u11111" ],
"value": 3
},
{
"key": [ "u12345" ],
"value": 26
}
]
}
It counts the uid correctly but the filter doesn't work properly. At first I thought it might be a CouchDB 2.2 bug, but I tried on Cloudant and I got the same response.
Does anyone have any ideas on how I could get this to work with being ale to filter timestamps?
When using compound keys in MapReduce (i.e. the key is an array of things), you cannot query a range of keys with a "leading" array element missing. i.e. you can query a range of uuids and get the results ordered by timestamp, but your use-case is the other way round - you want to query uuids by time.
I'd be tempted to put time first in the array, but unix timestamps are not so good for grouping ;). I don't known the ins and outs of your application but if you were to index a date instead of a timestamp like so:
function (doc) {
if (doc.type === "signature") {
var date = new Date(doc.timestamp)
var datestr = date.toISOString().split('T')[0]
emit([datestr, doc.uuid], 1);
}
}
This would allow you to query a range of dates (to the resolution of a whole day):
?startkey=["2018-01-01"]&endkey=["2018-02-01"]&group_level=2
albeit with your uuids grouped by day.

Mongoose query returning repeated results

The query receives a pair of coordinates, a maximum Distance radius, a "skip" integer and a "limit" integer. The function should return the closest and newest locations according to the position given. There is no visible error in my code, however, when I call the query again, it returns repeated results. "skip" variable is updated according to the results returned.
Example:
1) I make query with skip = 0, limit = 10. I receive 10 non-repeated locations.
2) Query is called again now, skip = 10, limit = 10. I receive another 10 locations with repeated results from the first query.
QUERY
Locations.find({ coordinates :
{ $near : [ x , y ],
$maxDistance: maxDistance }
})
.sort('date_created')
.skip(skip)
.limit(limit)
.exec(function(err, locations) {
console.log("[+]Found Locations");
callback(locations);
});
SCHEMA
var locationSchema = new Schema({
date_created: { type: Date },
coordinates: [],
text: { type: String }
});
I have tried looking everywhere for a solution. My only option would be versions of Mongo? I use mongoose 4.x.x and mongodb is like 2.5.6. I believe. Any ideas?
There are a couple of things to consider here in the sort of results that you want, with the first consideration being that you have a "secondary" sort criteria in the "date_created" to deal with.
The basic problem there is that the $near operator and like operators in MongoDB do not at present "project" any field to indicate the "distance" from the queried location, and simply just "default sort" the data. So in order to do that "secondary" sort, a field with the "distance" needs to be present. There are therefore other options for this.
The second case is that "skip" and "limit" style paging is horrible form performance on large sets of data and should be avoided where you can. So it's better to select data based on a "range" where it occurs rather than "skip" through all the results you have previously displayed.
The first thing to do here is use a command that can "project" the distance into the document along with the other information. The aggregation command of $geoNear is good for this, and especially since we want to do other sorting:
var seenIds = [],
lastDistance = null,
lastDate = null;
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance
"distanceField": "dist",
"limit": 10
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
That is the first iteration of your results where you fetch the first 10. Noting the logic inside the loop, where each document in the results is inspected for either a change in the "date_created" or the projected "dist" field now present in the document and where this occurs the "seenIds" array is wiped of all current entries. The general action is that all the variables are tested and possibly updated on each iteration and where there is no change then items are added to the list of "seenIds".
All those three variables being worked on need to be stored somewhere awaiting the next request. For web applications the session store is ideal, but different approaches vary. You just want those values to be recalled when we start the next request, as on the next and subsequent iterations we alter the query a bit:
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance,
"minDistance": lastDistance,
"distanceField": "dist",
"limit": 10,
"query": {
"_id": { "$nin": seenIds },
"date_created": { "$lt": lastDate }
}
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
So there the "minDistance" parameter is entered as you want to exclude any of the "nearer" results that have already been seen, and the additional checks are placed in the query with the "date_created" needing to be "less than" the "lastDistance" recorded as well since we are in descending order of sort, with the final "sure" filter in excluding any "_id" values that were recorded within the list because the values had not changed.
Now with geospatial data that "seenIds" list is not likely to grow as generally you are not going to find things all at the same distance, but it is a general process of paging a sorted list of data like this, so it is worth understanding the concept.
So if you want to be able to use a secondary field to sort on with geospatial data and also considering the "near" distance then this is the general approach, by projecting a distance value into the document results as well as storing the last seen values before any changes that would not make them unique.
The general concept is "advancing the minimum distance" to enable each page of results to get gradually "further away" from the source point of origin used in the query.

Limit ElasticSearch aggregation to top n query results

I have a set of 2.8 million docs with sets of tags that I'm querying with ElasticSearch, but many of these docs can be grouped together by one ID. I want to query my data using the tags, and then aggregate them by the ID that repeats. Often my search results have tens of thousands of documents, but I only want to aggregate the top 100 results of the search. How can I constrain an aggregation to only the top 100 results from a query?
Sampler Aggregation :
A filtering aggregation used to limit any sub aggregations' processing
to a sample of the top-scoring documents.
"aggs": {
"bestDocs": {
"sampler": {
// "field": "<FIELD>", <-- optional, Controls diversity using a field
"shard_size":100
},
"aggs": {
"bestBuckets": {
"terms": {
"field": "id"
}
}
}
}
}
This query will limit the sub aggregation to top 100 docs from the result and then bucket them by ID.
Optionally, you can use the field or script and max_docs_per_value settings to control the maximum number of documents collected on any one shard which share a common value.
The size parameter can be set to define how many term buckets should be returned out of the overall terms list.
By default, the node coordinating the search process will request each shard to provide its own top size term buckets and once all shards respond, it will reduce the results to the final list that will then be returned to the client. This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size buckets was not returned).
If set to 0, the size will be set to Integer.MAX_VALUE.
Here is an example code to return top 100:
{
"aggs" : {
"products" : {
"terms" : {
"field" : "product",
"size" : 100
}
}
}
}
You can refer to this for more information.
You can use the min_doc_count parameter
{
"aggs" : {
"products" : {
"terms" : {
"field" : "product",
"min_doc_count" : 100
}
}
}
}

Resources