Cloudant Query 2.0 unexpected behavior - couchdb

I create an index using the below function:
function (doc) {
if(doc.type === 'Property') {
if(doc.Beds_Max) {
try {
index("Beds_Max", parseInt(doc.Beds_Max));
}
catch(err) {
//ooopss
}
}
if(doc.YearBuilt) {
try {
index("YearBuilt", parseInt(doc.YearBuilt));
}
catch(err) {
//ooopss
}
}
}
}
using the cloudant Design Documents -> New Search Index and after the index is built I can issue queries like
"YearBuilt": [2010 TO Infinity]
But if I try to query the same index using Cloudant query I see weird behavior. If I go to Cloudant Dashboard -> Query and pass something like
{"limit": 5,
"selector": {
"_id": {
"$gt": null
},
"Beds_Max": {"$gte": 7}
},
fields: ["_id"]}
I see huge spike in data transmission, it keeps on receiving huge amounts of data even for the most unusual queries which are only supposed to return no more than 1 or 2 results and then hangs my computer so that most probably is not right. When I use Pouchdb-find npm module which has support for Cloudant 2.0 Query and issue the same selector as above I see inconsistent behavior, e.g. sometimes it returns 0 rows and sometimes it gives a ETIMEOUTERROR. If I change the index and exclude parseInt I can query using the same Pouchdb-find and even Cloudant Dashboard-> Query and get the results but in that case I lose the ability to use inequality operators which is a no go for me.
I'm open to work-arounds and even altogether different features to achieve the desired result.

Related

Nodejs Elasticsearch query default behaviour

On a daily basis, I'm pushing data (time_series) to Elasticsearch. I created an index pattern, and my index have the name: myindex_* , where * is today date (an index pattern has been setup). Thus after a week, I have: myindex_2022-06-20, myindex_2022-06-21... myindex_2022-06-27.
Let's assume my index is indexing products' prices. Thus inside each myindex_*, I have got:
myindex_2022-06-26 is including many products prices like this:
{
"reference_code": "123456789",
"price": 10.00
},
...
myindex_2022-06-27:
{
"reference_code": "123456789",
"price": 12.00
},
I'm using this query to get the reference code and the corresponding prices. And it works great.
const data = await elasticClient.search({
index: myindex_2022-06-27,
body: {
query: {
match: {
"reference_code": "123456789"
}
}
}
});
But, I would like to have a query that if in the index of the date 2022-06-27, there is no data, then it checks, in the previous index 2022-06-26, and so on (until e.g. 10x).
Not sure, but it seems it's doing this when I replace myindex_2022-06-27 by myindex_* (not sure it's the default behaviour).
The issue is that when I'm using this way, I got prices from other index but it seems to use the oldest one. I would like to get the newest one instead, thus the opposite way.
How should I proceed?
If you query with index wildcard, it should return a list of documents, where every document will include some meta fields as _index and _id.
You can sort by _index, to make elastic search return the latest document at position [0] in your list.
const data = await elasticClient.search({
index: myindex_2022-*,
body: {
query: {
match: {
"reference_code": "123456789"
}
}
sort : { "_index" : "desc" },
}
});

Convert a string parameter into float in Cloudant query

I need to do a query in an Cloudant DataBase where you compare a number with decimals that is defined as a string with another number sent from the server. The problem is that a comparison of strings is made and I need it to be a numerical comparison. There is there any way to perform this search by converting the database parameter to float while doing the query? O there are another way to do this query?
This is the query in the server, value.precio is sent from the client as a string.
value.precio = value.precio.split("-");
var precio_init = value.precio[0];
var precio_final = value.precio[1];
value.precio = {
"$gte":precio_init,
"$lte":precio_final
};
And in my database this is the parameter I want to search is:
"precio": "13.39"
Thanks
I don't think you will be able to do this with Cloudant Query, but you could try Cloudant Search. Create a new search index similar to the following:
Design Doc: myDesignDoc
Index Name: byPrecio
Index:
function (doc) {
if (doc.precio) {
index("precio", parseFloat(doc.precio));
}
}
Then you can uses ranges to search. For example:
precio:[13 TO 14]
Full search on Cloudant would look like this:
https://xxx.cloudant.com/YOUR_DB/_design/myDesignDoc/_search/byPrecio?q=precio:[13%20TO%2014]&include_docs=true
Sample response:
{
"total_rows":1,
"bookmark":"g2wAAAAxxx",
"rows":[
{
"id":"74fa6ff1b6dbca8c10d677832f6a3de2",
"order":[
1.0,
0
],
"fields":{
},
"doc":{
"_id":"74fa6ff1b6dbca8c10d677832f6a3de2",
"_rev":"2-17c984e51102b719fe9f80fc5d5bc78e",
"precio":"13.39",
"otherField":"otherValue"
}
}
]
}
More info on Cloudant Search here

IBM Bluemix Discovery - query parameter

I have created a Discovery service on my bluemix account. I want to query my documents from a nodejs application.
I have built a query with some aggregation, tested it using the bluemix online tool and it's working well.
Now when I query the collection from my code, whatever my parameters are, I always receive all of my documents with the enriched text and so on. I think I am missing how to send the query attributes to the service (like filters and aggregations).
Here is my code:
var queryParams = {
query:'CHLOE RICHARDS',
return:'title',
count:1,
aggregations:'nested(enriched_text.entities).filter(enriched_text.entities.type:Person).term(enriched_text.entities.text, count:5)'
};
discovery.query({environment_id:that.environment_id, collection_id:that.collection_id, query_options:queryParams }, function(error, data) {
if(error){
console.error(error);
reject(error);
}
else{
console.log(JSON.stringify(data, null, 2));
resolve(data.matching_results);
}
});
And the result is always:
{
"matching_results": 28,
"results": [
{
"id": "fe5e2a38e6cccfbd97dbdd0c33c9c8fd",
"score": 1,
"extracted_metadata": {
"publicationdate": "2016-01-05",
"sha1": "28434b0a7e2a94dd62cabe9b5a82e98766584dd412",
"author": "Richardson, Heather S",
"filename": "whatever.docx",
"file_type": "word",
"title": "no title"
},
"text": "......
Independantly of the value of the query_optionparameter. Can you help me?
EDIT
Instead of the query_options:queryParams, I have used query:"text:CHLOE RICHARDS" and it's working well. Now my problem still remains to find the right parameter format to add the aggregations I want
EDIT 2
So I have looked at IBM's example on Github more carefully, and the parameters are now formatted like this:
const queryParams = {
count: 5,
return: 'title,enrichedTitle.text',
query: '"CHLOE RICHARDS"',
aggregations: [ 'nested(enriched_text.entities).filter(enriched_text.entities.type:Person).term(enriched_text.entities.text, count:5)' ],
environment_id: '1111111111',
collection_id: '11111111111'
};
It works well if I use only the query attribute. Now if I only use the aggregations one, all the documents are sent back as a result (which is understandable) but I have no aggregation part, so I can not access the list of proper name in my documents.
Your query does not look right. I you are going to use query then you will need to construct a query search like text:"CHLOE RICHARDS"
If you want to perform a natural language query then you should be setting the parameter natural_language_query.

Using MapReduce for results of geospatial indexes in Cloudant

I am using a geospatial index in Cloudant for retrieving all documents inside a polygon. Now I want to calculate some basic static values for those documents (e.g. average age and sum of earnings in a region).
Is it possible to query the geo index and then pass the result on to the MapReduce function?
How can I achieve this, preferable inside the database? Can I avoid querying for the document ids inside the polygon first and then sending the retrieved ids for performing the MapReduce (I am working with large data sets)?
What is working so far is querying the index as well as using the view (separately).
My geo index
function (doc) {
if (doc.geometry && doc.geometry.coordinates) {
st_index(doc.geometry);
}
}
My view
function (doc) {
var beitrag = doc.properties.beitrag;
var schadenaufwand = doc.schadenaufwand;
if(beitrag !== null && typeof beitrag === 'number' ) {
emit(doc._id, doc.properties.beitrag);
}
}
A sample geoJson document (original data looks similar)
{
"_id": "01bff77f642fc4249e787d2ded011504",
"_rev": "1-25a9a1a15939d5b21af3fbcc5c2d6ed1",
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
7.2316,
40.99
]
},
"properties": {
"age": 34,
"earnings": 982.7
}
}
This question is similar, but did not really help me: Cloudant - apply a view/mapReduce to a geospatial query
This demo could be something in the right direction: https://examples.cloudant.com/simplegeo_places/_design/geo/index.html
It seems like it would be a useful feature, but the answer to this is 'no'. The Geo indexer can't perform aggregations over the data.
I think you'll have to do as you were thinking -- use the returned list of doc ids to distribute the calculation in another map-reduce system.

CouchDB - Map Reduce similar to SQL Group by

Consider following sample documents stored in CouchDB
{
"_id":....,
"rev":....,
"type":"orders",
"Period":"2013-01",
"Region":"East",
"Category":"Stationary",
"Product":"Pen",
"Rate":1,
"Qty":10,
"Amount":10
}
{
"_id":....,
"rev":....,
"type":"orders",
"Period":"2013-02",
"Region":"South",
"Category":"Food",
"Product":"Biscuit",
"Rate":7,
"Qty":5,
"Amount":35
}
Consider following SQL query
SELECT Period, Region,Category, Product, Min(Rate),Max(Rate),Count(Rate), Sum(Qty),Sum(Amount)
FROM Sales
GROUP BY Period,Region,Category, Product;
Is it possible to create map/reduce views in couchdb equivalent to the above SQL query and to produce output like
[
{
"Period":"2013-01",
"Region":"East",
"Category":"Stationary",
"Product":"Pen",
"MinRate":1,
"MaxRate":2,
"OrdersCount":20,
"TotQty":1000,
"Amount":1750
},
{
...
}
]
Up front, I believe #benedolph's answer is best-practice and best-case-scenario. Each reduce should ideally return 1 scalar value to keep the code as simple as possible.
However, it is true you'd have to issue multiple queries to retrieve the full resultset described by your question. If you don't have the option to run queries in parallel, or it is really important to keep the number of queries down it is possible to do it all at once.
Your map function will remain pretty simple:
function (doc) {
emit([ doc.Period, doc.Region, doc.Category, doc.Product ], doc);
}
The reduce function is where it gets lengthy:
function (key, values, rereduce) {
// helper function to sum all the values of a specified field in an array of objects
function sumField(arr, field) {
return arr.reduce(function (prev, cur) {
return prev + cur[field];
}, 0);
}
// helper function to create an array of just a single property from an array of objects
// (this function came from underscore.js, at least it's name and concept)
function pluck(arr, field) {
return arr.map(function (item) {
return item[field];
});
}
// rereduce made this more challenging, and I could not thoroughly test this right now
// see the CouchDB wiki for more information
if (rereduce) {
// a rereduce handles transitionary values
// (so the "values" below are the results of previous reduce functions, not the map function)
return {
OrdersCount: sumField(values, "OrdersCount"),
MinRate: Math.min.apply(Math, pluck(values, "MinRate")),
MaxRate: Math.max.apply(Math, pluck(values, "MaxRate")),
TotQty: sumField(values, "TotQty"),
Amount: sumField(values, "Amount")
};
} else {
var rates = pluck(values, "Rate");
// This takes a group of documents and gives you the stats you were asking for
return {
OrdersCount: values.length,
MinRate: Math.min.apply(Math, rates),
MaxRate: Math.max.apply(Math, rates),
TotQty: sumField(values, "Qty"),
Amount: sumField(values, "Amount")
};
}
}
I was not able to test the "rereduce" branch of this code at all, you'll have to do that on your end. (but this should work) See the wiki for information about reduce vs rereduce.
The helper functions I added at the top actually made the code overall much shorter and easier to read, they're largely influenced by my experience with Underscore.js. However, you can't include CommonJS modules in reduce functions, so it has to be written manually.
Again, best-case scenario is to have each aggregated field get it's own map/reduce index, but if that isn't on option to you, the above code should get you what you've described here in the question.
I will propose a very simple solution that requires one view per variable you want to aggregate in your "select" clause. While it is certainly possible to aggregate all variables in a single view, the reduce function would be far more complex.
The design document looks like this:
{
"_id": "_design/ddoc",
"_rev": "...",
"language": "javascript",
"views": {
"rates": {
"map": "function(doc) {\n emit([doc.Period, doc.Region, doc.Category, doc.Product], doc.Rate);\n}",
"reduce": "_stats"
},
"qty": {
"map": "function(doc) {\n emit([doc.Period, doc.Region, doc.Category, doc.Product], doc.Qty);\n}",
"reduce": "_stats"
}
}
}
Now, you can query <couchdb>/<database>/_design/ddoc/_view/rates?group_level=4 to get the statistics about the "Rate" variable. The result should look like this:
{"rows":[
{"key":["2013-01","East","Stationary","Pen"],"value":{"sum":4,"count":3,"min":1,"max":2,"sumsqr":6}},
{"key":["2013-01","North","Stationary","Pen"],"value":{"sum":1,"count":1,"min":1,"max":1,"sumsqr":1}},
{"key":["2013-01","South","Stationary","Pen"],"value":{"sum":0.5,"count":1,"min":0.5,"max":0.5,"sumsqr":0.25}},
{"key":["2013-02","South","Food","Biscuit"],"value":{"sum":7,"count":1,"min":7,"max":7,"sumsqr":49}}
]}
For the "Qty" variable, the query would be <couchdb>/<database>/_design/ddoc/_view/qty?group_level=4.
With the group_level property you can control over which levels the aggregation is to be performed. For example, querying with group_level=2 will aggregate up to "Period" and "Region".

Resources