I am a contributor at http://airpollution.online/ which is open environment web platform built open source having IBM Cloudant as it's Database service.
Platform's architecture is such way that we need to fetch latest data of each air pollution measurement devices from a collection. As far as my experience go with MongoDB, I have wrote aggregate query to fetch each devices' latest data as per epoch time key in each and every document available in respective collection.
Sample Aggregate query is :
db.collection("hourly_analysis").aggregate([
{
$sort: {
"time": -1,
"Id": -1
}
}, {
$project: {
"Id": 1,
"data": 1,
"_id": 0
}
}, {
$group: {
"_id": "$Id",
"data": {
"$last": "$$ROOT"
}
}
}
If someone has idea/suggestions about how can I write design documents in IBM Cloudant, Please help me! Thanks!
P.S. We still have to make backend open source for this project. (may take some time)
In CouchDB/Cloudant this is usually better done as a view than an ad-hoc query. It's a tricky one but try this:
- a map step that emits the device ID and timestamp as two parts of a composite key, plus the device reading as the value
- a reduce step that looks for the largest timestamp and returns both the biggest (most recent) timestamp and the reading that goes with it (both values are needed because when rereducing, we need to know the timestamp so we can compare them)
- the view with group_level set to 1 will give you the newest reading for each device.
In most cases in Cloudant you can use the built-in reduce functions but here you want a function of a key.
(The way that I solved this problem previously was to copy incoming data into a "newest readings" store as well as writing it to a database in the normal way. This makes it very quick to access if you only ever want the newest reading.)
Related
I m developing an app that uses Cosmos DB SQL. The intention is to identify if a potential construction site is within various restricted zones, such as national parks and sites of special scientific interest. This information is very useful in obtaining all the appropriate planning permissions.
I have created a container named 'geodata' containing 15 documents that I imported using data from a Geojson file provided by the UK National Parks. I have confirmed that all the polygons are valid using a ST_ISVALIDDETAILED SQL statement. I have also checked that the coordinates are anti-clockwise. A few documents contain MultiPolygons. The Geospatial Configuration of the container is 'Geography'.
I am using the Azure Cosmos Data Explorer to identify the correct format of a SELECT statement to identify if given coordinates (Point) are within any of the polygons within the 15 documents.
SELECT c.properties.npark18nm
FROM c
WHERE ST_WITHIN({"type": "Point", "coordinates":[-3.139638969259495,54.595188276959284]}, c.geometry)
The embedded coordinates are within a National Park, in this case, the Lake District in the UK (it also happens to be my favourite coffee haunt).
'c.geometry' is the JSON field within the documents.
"type": "Feature",
"properties": {
"objectid": 3,
"npark18cd": "E26000004",
"npark18nm": "Northumberland National Park",
"npark18nmw": " ",
"bng_e": 385044,
"bng_n": 600169,
"long": -2.2370801,
"lat": 55.29539871,
"st_areashape": 1050982397.6985701,
"st_lengthshape": 339810.592994494
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-2.182235310191206,
55.586659699934806
],
[
-2.183754259805564,
55.58706479201416
], ......
Link to the full document: https://www.dropbox.com/s/yul6ft2rweod75s/lakedistrictnationlpark.json?dl=0
I have not been able to format the SELECT query to return the name of the park successfully.
Can you help me?
Is what I want to achieve possible?
Any guidance would be appreciated.
I appreciate any help you can provide.
You haven't mentioned what error you are getting. And you have misspelt "c.geometry".
This should work
SELECT c.properties.npark18nm
FROM c
WHERE ST_WITHIN({"type": "Point", "coordinates": [-3.139638969259495,54.595188276959284]}, c.geometry)
When running the query with your sample document, I was able to get the correct response(see image).
So this particular document is fine and the query in your question works too. Can you recheck your query on the explorer again? Also, are you referring to the incorrect database/collection by any chance?
Maybe a full screen shot of cosmos data explorer showing the dbs/collections, your query and response will also help.
I have fixed this problem. Not by altering the SQL statement but by deleting the container the data was held in, recreating it and reloading the data.
The SQL statement now produces the expected results.
If this question seems too trivial then please let me know in the comments, I will do further research on how to solve it.
I have a collection called products where I store details of a particular product from different retailers. The schema of a document looks like this -
{
"_id": "uuid of a product",
"created_at": "timestamp",
"offers": [{
"retailer_id": 123,
"product_url": "url - of -a - product.com",
"price": "1"
},
{
"retailer_id": 456,
"product_url": "url - of -a - product.com",
"price": "1"
}
]
}
_id of a product is system generated. Consider a product like 'iPhone X'. This will be a single document with URLs and prices from multiple retailers like Amazon, eBay, etc.
Now if a new URL comes into the system, I need to make a query if this URL already exists in our database. The obvious way to do this is to iterate every offer of every product document and see if the product_url field matches with the input URL. That would require loading up all the documents into the memory and iterate through the offers of every product one by one. Now my question arises -
Is there a simpler method to achieve this? Using pymongo?
Or my database schema needs to be changed since this basic check_if_product_url_exists() is too complex?
MongoDB provides searching within arrays using dot notation.
So your query would be:
db.collection.find({'offers.product_url': 'url - of -a - product.com'})
The same syntax works in MongoDB shell or pymongo.
I am trying to find out how RUs are working in order to optimize the requests made to the DB.
I have a simple query where is select by id
SELECT * FROM c WHERE c.id='cl0'
That query costs 277.08 RUs
Then I have another query where I select by another property
SELECT * FROM c WHERE c.name[0].id='35bfea78-ccda-4cc5-9539-bd7ff1dd474b'
That query costs 2.95 RUs
I cant figure out why there is that a big a difference in the consumed RUs between these two queries.
Thw two queries return the exact same result
[
{
"label": "class",
"id": "cl0",
"_id": "cl0",
"name": [
{
"_value": "C0.Iklos0",
"id": "35bfea78-ccda-4cc5-9539-bd7ff1dd474b"
}
],
"_rid": "6Ds6AJHyfgBfAAAAADFT==",
"_self": "dbs/6Ds4FA==/colls/6Ds6DFewfgA=/docs/6Ds6AJHyfgBdESFAAAAAAA==/",
"_etag": "\"00007200-0000-0000-0000-w3we73140000\"",
"_attachments": "attachments/",
"_ts": 1528722196
}
]
I faced similar issue previously so you are not the only person facing this issue. I provide you with two solutions.
1.sql SELECT * FROM c WHERE c.id='cl0' query documents across total database.If you could make a partition key to properly field, it will greatly improve your performance.
You could refer to this doc to know how to choose partition key.
2.I founded below answer in the thread: Azure DocumentDB Query by Id is very slow
Microsoft support responded and they've resolved the issue. They've added IndexVersion 2 for the collection. Unfortunately, it is not yet available from the portal and newly created accounts/collection are still not using the new version. You'll have to contact Microsoft Support to made changes to your accounts.
I suggest you committing feedback here to trace this announcement.
Hope it helps you.
-- Edit
To upgrade to index version 2 use the following code
var collection = (await client.ReadDocumentCollectionAsync(string.Format("/dbs/{0}/colls/{1}", databaseId, collectionId))).Resource;
collection.SetPropertyValue("IndexVersion", 2);
var replacedCollection = await client.ReplaceDocumentCollectionAsync(collection);
RU consumption depends on your document size and your query, I will highly recommend below link to query metrics. If you want to tune your query or want to understand latency check query Feed Details
x-ms-documentdb-query-metrics:
totalExecutionTimeInMs=33.67;queryCompileTimeInMs=0.06;queryLogicalPlanBuildTimeInMs=0.02;queryPhysicalPlanBuildTimeInMs=0.10;queryOptimizationTimeInMs=0.00;VMExecutionTimeInMs=32.56;indexLookupTimeInMs=0.36;documentLoadTimeInMs=9.58;systemFunctionExecuteTimeInMs=0.00;userFunctionExecuteTimeInMs=0.00;retrievedDocumentCount=2000;retrievedDocumentSize=1125600;outputDocumentCount=2000;writeOutputTimeInMs=18.10;indexUtilizationRatio=1.00
x-ms-request-charge: 604.42
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sql-query-metrics
Lets say I have these documents in my CosmosDB. (DocumentDB API, .NET SDK)
{
// partition key of the collection
"userId" : "0000-0000-0000-0000",
"emailAddresses": [
"someaddress#somedomain.com", "Another.Address#someotherdomain.com"
]
// some more fields
}
I now need to find out if I have a document for a given email address. However, I need the query to be case insensitive.
There are ways to search case insensitive on a field (they do a full scan however):
How to do a Case Insensitive search on Azure DocumentDb?
select * from json j where LOWER(j.name) = 'timbaktu'
e => e.Id.ToLower() == key.ToLower()
These do not work for arrays. Is there an alternative way? A user defined function looks like it could help.
I am mainly looking for a temporary low-effort solution to support the scenario (I have multiple collections like this). I probably need to switch to a data structure like this at some point:
{
"userId" : "0000-0000-0000-0000",
// Option A
"emailAddresses": [
{
"displayName": "someaddress#somedomain.com",
"normalizedName" : "someaddress#somedomain.com"
},
{
"displayName": "Another.Address#someotherdomain.com",
"normalizedName" : "another.address#someotherdomain.com"
}
],
// Option B
"emailAddressesNormalized": {
"someaddress#somedomain.com", "another.address#someotherdomain.com"
}
}
Unfortunately, my production database already contains documents that would need to be updated to support the new structure.
My production collections contain only 100s of these items, so I am even tempted to just get all items and do the comparison in memory on the client.
If performance matters then you should consider one of the normalization solution you have proposed yourself in question. Then you could index the normalized field and get results without doing a full scan.
If for some reason you really don't want to retouch the documents then perhaps the feature you are missing is simple join?
Example query which will do case-insensitive search from within array with a scan:
SELECT c FROM c
join email in c.emailAddresses
where lower(email) = lower('ANOTHER.ADDRESS#someotherdomain.com')
You can find more examples about joining from Getting started with SQL commands in Cosmos DB.
Note that where-criteria in given example cannot use an index, so consider using it only along another more selective (indexed) criteria.
Current MongoDB documentation states the following:
You may only have 1 geospatial index per collection, for now. While
MongoDB may allow to create multiple indexes, this behavior is
unsupported. Because MongoDB can only use one index to support a
single query, in most cases, having multiple geo indexes will produce
undesirable behavior.
However, when I create two geospatial indices in a collection (using Mongoose), they work just fine:
MySchema.index({
'loc1': '2d',
extraField1: 1,
extraField2: 1
});
MySchema.index({
'loc2': '2d',
extraField1: 1,
extraField2: 1
});
My question is this: while it seems to work, the MongoDB documentation says this could "produce undesirable behavior". So far, nothing undesirable has not yet been discovered neither in testing or use.
Should I be concerned about this? If the answer is yes then what would you recommend as a workaround?
It is still not supported, so even although you can create two of them, it doesn't mean they are actually used properly. I would investigate explain output, on the mongo shell and issue a few queries that make use of the loc and loc2 fields in a geospatial way. For example with:
use yourDbName
db.yourCollection.find( { loc: { $nearSphere: [ 0, 0 ] } } ).explain();
and:
db.yourCollection.find( { loc2: { $nearSphere: [ 0, 0 ] } } ).explain();
And then compare what the explain information gives you. You will likely see that only the first created geo index is used for both searches. There are a few tickets in JIRA for this that you might want to vote on:
https://jira.mongodb.org/browse/SERVER-2331
https://jira.mongodb.org/browse/SERVER-3653