Set Firestore documents with partially merged fields - Firestore limitation - python-3.x

I am using the Python library of Firestore to communicate with Firestore.
I have now run into a limitation of Firestore and I am wondering if there is a way around it.
Imagine we have this map / Dict (dictVar1):
dictVar1 = {
"testArray": ["Yes"],
"testMap": {
"test1": 1,
"test2": 1
}
}
To begin with, I used to store my testMap in an array, but due to Firestore query limitations (you can only have a single array-contains operation in a query), I changed my structure to a map instead (as you can see in the dictVar1 structure above). If Firestore queries did not have this limitation, I would not change my structure from an array.
Now I am facing another Firestore limitation due to the new structure.
What I would like to do & other conditions:
I want to add this map / dict to a Firestore document.
I would like to do it in one Firestore operation using Firestore batch
I don't know if the document exists or not before updating/creating
One batch can contain anything between 1 and 500 operations
If the document exists, I do not want to remove any other fields from the existing document if these fields are not present in dictVar1 dict / map.
The fields in dictVar1 dict / map should replace the fields in the document completely
So if the existing document would contain this data:
{
"doNotChange": "String",
"testMap": {
"test0": 1
}
}
It would be updated to ("test0" is removed from the inner map, basically how an array would work):
{
"doNotChange": "String",
"testArray": ["Yes"],
"testMap": {
"test1": 1,
"test2": 1
}
}
And if the document doesn't exist, the document would be set to:
{
"testArray": ["Yes"],
"testMap": {
"test1": 1,
"test2": 1
}
}
I see two ways to do this:
Do this in two operations
Instead of using testMap as a map, replace it with an array.
99% of the time the document exists, therefore I am fine with doing this in two operations if the document doesn't exist, but one operation if the document exists.
This could be done using Firestore's update function, but since I am using batch and potentially updating 100s of documents in one batch, if the document doesn't exist, it would ruin the whole batch operation.
Another potential solution would be to:
Run batch with updates, if it succeeds, then great, if 404 (document not found) is raised then:
Change the operation to set instead of an update for this document and then redo the batch, in a loop until the batch is successful
Two potential problems I see with this:
Will I be fully charged for all the failed batch operations or will I be just be charged 1 read per failed batch operation? If I get fully charged for the batch, then this is still not a good solution.
Is it possible to easily change the operation type for a specific document reference to a different operation type without having to recreate the batch operation totally from scratch?
Do you have any ideas on how I could solve one of these problems?
Here is the Python code to test out:
from json import dumps
from google.cloud import firestore
db = firestore.Client.from_service_account_json("firebaseKeysDev.json")
originalDoc = {
"doNotChange": "String",
"testMap": {
"test0": 1
}
}
dictVar1 = {
"testArray": ["Yes"],
"testMap": {
"test1": 1,
"test2": 1
}
}
prefOutput = {
"doNotChange": "String",
"testArray": [
"Yes"
],
"testMap": {
"test1": 1,
"test2": 1
}
}
# Let's first create the document with the original dict / map
originalSetOp = db.collection("test").document("testDoc").set(originalDoc)
# Now let's get the original map / dict from Firestore
originalOpDoc = db.collection("test").document("testDoc").get()
# Convert to Python Dict
originalOpDocDict = originalOpDoc.to_dict()
# Now let's print out the original document dict
print("Here is the original map:")
print(dumps(originalOpDocDict, ensure_ascii=False, sort_keys=True, indent=4))
# Now let's merge the original dict / map with our dictVar1 dict / map
mergeDictVar1WithODoc = db.collection("test").document("testDoc").set(dictVar1, merge=True)
# Now let's get the new merged map / dict from Firestore
newDictDoc = db.collection("test").document("testDoc").get()
# Convert to Python Dict
newDictDocDict = newDictDoc.to_dict()
# Let's print the new merged dict / map
print("\nHere is the merged map:")
print(dumps(newDictDocDict, ensure_ascii=False, sort_keys=True, indent=4))
print("\nHere is the output we want:")
print(dumps(prefOutput, ensure_ascii=False, sort_keys=True, indent=4))
Output:
Here is the original map:
{
"doNotChange": "String",
"testMap": {
"test0": 1
}
}
Here is the map we want to merge:
{
"testArray": [
"Yes"
],
"testMap": {
"test1": 1,
"test2": 1
}
}
Here is the merged map:
{
"doNotChange": "String",
"testArray": [
"Yes"
],
"testMap": {
"test0": 1,
"test1": 1,
"test2": 1
}
}
Here is the output we want:
{
"doNotChange": "String",
"testArray": [
"Yes"
],
"testMap": {
"test1": 1,
"test2": 1
}
}

You can try using .set() with SetOptions of merge or mergeFields instead of .update() - the field, in this case, would be your map.
Specifically, .set() will create a document if it doesn't exist. It seems (I'm not on the Firebase team) the PURPOSE of .update() failing is to signal the document doesn't already exist.
I use this extensively in a wrapper library I created for Firestore in my app.
Documented Here

Related

MongoDB: Searching a text field using mathematical operators

I have documents in a MongoDB as below -
[
{
"_id": "17tegruebfjt73efdci342132",
"name": "Test User1",
"obj": "health=8,type=warrior",
},
{
"_id": "wefewfefh32j3h42kvci342132",
"name": "Test User2",
"obj": "health=6,type=magician",
}
.
.
]
I want to run a query say health>6 and it should return the "Test User1" entry. The obj key is indexed as a text field so I can do {$text:{$search:"health=8"}} to get an exact match but I am trying to incorporate mathematical operators into the search.
I am aware of the $gt and $lt operators, however, it cannot be used in this case as health is not a key of the document. The easiest way out is to make health a key of the document for sure, but I cannot change the document structure due to certain constraints.
Is there anyway this can be achieved? I am aware that mongo supports running javascript code, not sure if that can help in this case.
I don't think it's possible in $text search index, but you can transform your object conditions to an array of objects using an aggregation query,
$split to split obj by "," and it will return an array
$map to iterate loop of the above split result array
$split to split current condition by "=" and it will return an array
$let to declare the variable cond to store the result of the above split result
$first to return the first element from the above split result in k as a key of condition
$last to return the last element from the above split result in v as a value of the condition
now we have ready an array of objects of string conditions:
"objTransform": [
{ "k": "health", "v": "9" },
{ "k": "type", "v": "warrior" }
]
$match condition for key and value to match in the same object using $elemMatch
$unset to remove transform array objTransform, because it's not needed
db.collection.aggregate([
{
$addFields: {
objTransform: {
$map: {
input: { $split: ["$obj", ","] },
in: {
$let: {
vars: {
cond: { $split: ["$$this", "="] }
},
in: {
k: { $first: "$$cond" },
v: { $last: "$$cond" }
}
}
}
}
}
}
},
{
$match: {
objTransform: {
$elemMatch: {
k: "health",
v: { $gt: "8" }
}
}
}
},
{ $unset: "objTransform" }
])
Playground
The second upgraded version of the above aggregation query to do less operation in condition transformation if it's possible to manage in your client-side,
$split to split obj by "," and it will return an array
$map to iterate loop of the above split result array
$split to split current condition by "=" and it will return an array
now we have ready a nested array of string conditions:
"objTransform": [
["type", "warrior"],
["health", "9"]
]
$match condition for key and value to match in the array element using $elemMatch, "0" to match the first position of the array and "1" to match the second position of the array
$unset to remove transform array objTransform, because it's not needed
db.collection.aggregate([
{
$addFields: {
objTransform: {
$map: {
input: { $split: ["$obj", ","] },
in: { $split: ["$$this", "="] }
}
}
}
},
{
$match: {
objTransform: {
$elemMatch: {
"0": "health",
"1": { $gt: "8" }
}
}
}
},
{ $unset: "objTransform" }
])
Playground
Using JavaScript is one way of doing what you want. Below is a find that uses the index on obj by finding documents that have health= text followed by an integer (if you want, you can anchor that with ^ in the regex).
It then uses a JavaScript function to parse out the actual integer after substringing your way past the health= part, doing a parseInt to get the int, and then the comparison operator/value you mentioned in the question.
db.collection.find({
// use the index on obj to potentially speed up the query
"obj":/health=\d+/,
// now apply a function to narrow down and do the math
$where: function() {
var i = this.obj.indexOf("health=") + 7;
var s = this.obj.substring(i);
var m = s.match(/\d+/);
if (m)
return parseInt(m[0]) > 6;
return false;
}
})
You can of course tweak it to your heart's content to use other operators.
NOTE: I'm using the JavaScript regex capability, which may not be supported by MongoDB. I used Mongo-Shell r4.2.6 where it is supported. If that's the case, in the JavaScript, you will have to extract the integer out a different way.
I provided a Mongo Playground to try it out in if you want to tweak it, but you'll get
Invalid query:
Line 3: Javascript regex are not supported. Use "$regex" instead
until you change it to account for the regex issue noted above. Still, if you're using the latest and greatest, this shouldn't be a limitation.
Performance
Disclaimer: This analysis is not rigorous.
I ran two queries against a small collection (a bigger one could possibly have resulted in different results) with Explain Plan in MongoDB Compass. The first query is the one above; the second is the same query, but with the obj filter removed.
and
As you can see the plans are different. The number of documents examined is fewer for the first query, and the first query uses the index.
The execution times are meaningless because the collection is small. The results do seem to square with the documentation, but the documentation seems a little at odds with itself. Here are two excerpts
Use the $where operator to pass either a string containing a JavaScript expression or a full JavaScript function to the query system. The $where provides greater flexibility, but requires that the database processes the JavaScript expression or function for each document in the collection.
and
Using normal non-$where query statements provides the following performance advantages:
MongoDB will evaluate non-$where components of query before $where statements. If the non-$where statements match no documents, MongoDB will not perform any query evaluation using $where.
The non-$where query statements may use an index.
I'm not totally sure what to make of this, TBH. As a general solution it might be useful because it seems you could generate queries that can handle all of your operators.

DynamoDB, dynamic atomic update of mapped values with AWS Lambda (NodeJS runtime)

I am trying to figure out how I could perform atomic updates on an item where the source data contains mapped values with the keys of those maps being dynamic.
If you look at the sample data below, I am trying to figure out how I could do atomic updates of the values in BSSentDestIp and BSRecvDestIp over the same item. I was reading the documentation but the only thing I could find was list_append, which would leave me with a list of appended keys/values that I would need to traverse and sum later.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.UpdateExpressions.html
Example of input data:
{
"RecordId": 31,
"UUID": "170ae748-f8cf-4df9-6e08-c0c8a5f029d4",
"UserId": "username",
"DeviceId": "e0:cb:4e:53:ae:ff",
"ExpireTime": 1501445446,
"StartTime": 1501441846,
"EndTime": 1501441856,
"MinuteId": 10,
"PacketCount": 1028,
"ByteSum": 834111,
"BSSent": 98035,
"BSRecv": 736076,
"BSSentDestIp": {
"151.101.129.69": 2518,
"192.168.1.254": 4780,
"192.168.1.80": 14089,
"192.33.31.162": 2386,
"54.239.30.232": 21815,
"54.239.31.129": 6423,
"54.239.31.69": 3255,
"54.239.31.83": 18447,
"98.138.253.109": 3020
},
"BSRecvDestIp": {
"151.101.129.69": 42414,
"151.101.57.174": 20792,
"192.230.66.108": 130175,
"192.33.31.162": 56398,
"23.194.140.100": 26209,
"54.239.26.209": 57210,
"54.239.31.129": 188747,
"54.239.31.69": 41115,
"98.138.253.109": 111775
}
}
NodeJS function executed via Lambda to update Dynamo:
function updateItem(UserIdValue, MinuteIdValue) {
var UpdateExpressionString = "set PacketCount = PacketCount + :PacketCount, \
ByteSum = ByteSum + :ByteSum, \
BSSent = BSSent + :BSSent, \
BSRecv = BSRecv + :BSRecv";
var params = {
TableName: gDynamoTable,
Key: {
"UserId": UserIdValue,
"MinuteId": MinuteIdValue
},
UpdateExpression: UpdateExpressionString,
ExpressionAttributeValues: {
":PacketCount": gRecordObject.PacketCount,
":ByteSum": gRecordObject.ByteSum,
":BSSent": gRecordObject.BSSent,
":BSRecv": gRecordObject.BSRecv
},
ReturnValues: "UPDATED_NEW"
};
dynamo.updateItem(params, function(err, data) {
if (err) {
console.log("updateItem Error: " + err);
} else {
console.log("updateItem Success: " + JSON.stringify(data));
}
});
}
Updating a singe item is atomic in DynamoDB if you read an item, and call PutItem it is guaranteed to be atomic. It either update all fields or update none of them.
Now the only issue that I see with that is that you can have write conflicts. Say if one process reads an item, update one map, while another process in parallel does the same thing it will result in one PutItem overwriting recent update and you can lose data.
To solve this issue you can use conditional updates. In a nutshell it allows you to update an item only if a specified condition is met. What you can do is to maintain a version number with every item. When you update an item you can increment a version attribute and, when you write an item, check that version number is the one you expect. Otherwise you need to read the item again (somebody updated it while you were working with it) perform your update again and try to write again.

Using MapReduce for results of geospatial indexes in Cloudant

I am using a geospatial index in Cloudant for retrieving all documents inside a polygon. Now I want to calculate some basic static values for those documents (e.g. average age and sum of earnings in a region).
Is it possible to query the geo index and then pass the result on to the MapReduce function?
How can I achieve this, preferable inside the database? Can I avoid querying for the document ids inside the polygon first and then sending the retrieved ids for performing the MapReduce (I am working with large data sets)?
What is working so far is querying the index as well as using the view (separately).
My geo index
function (doc) {
if (doc.geometry && doc.geometry.coordinates) {
st_index(doc.geometry);
}
}
My view
function (doc) {
var beitrag = doc.properties.beitrag;
var schadenaufwand = doc.schadenaufwand;
if(beitrag !== null && typeof beitrag === 'number' ) {
emit(doc._id, doc.properties.beitrag);
}
}
A sample geoJson document (original data looks similar)
{
"_id": "01bff77f642fc4249e787d2ded011504",
"_rev": "1-25a9a1a15939d5b21af3fbcc5c2d6ed1",
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
7.2316,
40.99
]
},
"properties": {
"age": 34,
"earnings": 982.7
}
}
This question is similar, but did not really help me: Cloudant - apply a view/mapReduce to a geospatial query
This demo could be something in the right direction: https://examples.cloudant.com/simplegeo_places/_design/geo/index.html
It seems like it would be a useful feature, but the answer to this is 'no'. The Geo indexer can't perform aggregations over the data.
I think you'll have to do as you were thinking -- use the returned list of doc ids to distribute the calculation in another map-reduce system.

MongoDB update/insert document and Increment the matched array element

I use Node.js and MongoDB with monk.js and i want to do the logging in a minimal way with one document per hour like:
final doc:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1 }, {action: action2, count: 27 }, {action: action3, count: 5 } ] }
the complete document should be created by incrementing one value.
e.g someone visits a webpage first this hour and the incrementation of action1 should create the following document with a query:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1} ] }
an other user in this hour visits an other webpage and document should be exteded to:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1}, {action: action2, count: 1} ] }
and the values in count should be incremented on visiting the different webpages.
At the moment i create vor each action a doc:
tracking.update({
time: moment().format('YYYY-MM-DD_HH'),
action: action,
info: info
}, { $inc: {count: 1} }, { upsert: true }, function (err){}
Is this possible with monk.js / mongodb?
EDIT:
Thank you. Your solution looks clean and elegant, but it looks like my server can't handle it, or i am to nooby to make it work.
i wrote a extremly dirty solution with the action-name as key:
tracking.update({ time: time, ts: ts}, JSON.parse('{ "$inc":
{"'+action+'": 1}}') , { upsert: true }, function (err) {});
Yes it is very possible and a well considered question. The only variation I would make on the approach is to rather calculate the "time" value as a real Date object ( Quite useful in MongoDB, and manipulative as well ) but simply "round" the values with basic date math. You could use "moment.js" for the same result, but I find the math simple.
The other main consideration here is that mixing array "push" actions with possible "updsert" document actions can be a real problem, so it is best to handle this with "multiple" update statements, where only the condition you want is going to change anything.
The best way to do that, is with MongoDB Bulk Operations.
Consider that your data comes in something like this:
{ "timestamp": 1439381722531, "action": "action1" }
Where the "timestamp" is an epoch timestamp value acurate to the millisecond. So the handling of this looks like:
// Just adding for the listing, assuming already defined otherwise
var payload = { "timestamp": 1439381722531, "action": "action1" };
// Round to hour
var hour = new Date(
payload.timestamp - ( payload.timestamp % ( 1000 * 60 * 60 ) )
);
// Init transaction
var bulk = db.collection.initializeOrderedBulkOp();
// Try to increment where array element exists in document
bulk.find({
"time": hour,
"log.action": payload.action
}).updateOne({
"$inc": { "log.$.count": 1 }
});
// Try to upsert where document does not exist
bulk.find({ "time": hour }).upsert().updateOne({
"$setOnInsert": {
"log": [{ "action": payload.action, "count": 1 }]
}
});
// Try to "push" where array element does not exist in matched document
bulk.find({
"time": hour,
"log.action": { "$ne": payload.action }
}).updateOne({
"$push": { "log": { "action": payload.action, "count": 1 } }
});
bulk.execute();
So if you look through the logic there, then you will see that it is only ever possible for "one" of those statements to be true for any given state of the document either existing or not. Technically speaking, the statment with the "upsert" can actually match a document when it exists, however the $setOnInsert operation used makes sure that no changes are made, unless the action actually "inserts" a new document.
Since all operations are fired in "Bulk", then the only time the server is contacted is on the .execute() call. So there is only "one" request to the server and only "one" response, despite the multiple operations. It is actually "one" request.
In this way the conditions are all met:
Create a new document for the current period where one does not exist and insert initial data to the array.
Add a new item to the array where the current "action" classification does not exist and add an initial count.
Increment the count property of the specified action within the array upon execution of the statement.
All in all, yes posssible, and also a great idea for storage as long as the action classifications do not grow too large within a period ( 500 array elements should be used as a maximum guide ) and the updating is very efficient and self contained within a single document for each time sample.
The structure is also nice and well suited to other query and possible addtional aggregation purposes as well.

$addToSet and return all new items added?

Is it possible to $addToSet and determine which items were added to the set?
i.e. $addToSet tags to a post and return which ones were actually added
Not really, and not with a single statement. The closest you can get is with the findAndModify() method, and compare the orginal document form to the fields that you submitted in your $addToSet statement:
So considering an initial document:
{
"fields": [ "B", "C" ]
}
And then processing this code:
var setInfo = [ "A", "B" ];
var matched = [];
var doc = db.collection.findAndModify(
{ "_id": "myid" },
{
"$addToSet": { "fields": { "$each": setInfo } }
}
);
doc.fields.forEach(function(field) {
if ( setInfo.indexOf(field) != -1 ) {
matched.push(field);
}
});
return matched;
So that is a basic JavaScript abstraction of the methods and not actually nodejs general syntax for either the native node driver or the Mongoose syntax, but it does describe the basic premise.
So as long as you are using a "default" implementation method that returns the "original" state of the document before it was modified the you can play "spot the difference" as it were, and as is shown in the code example.
But doing this over general "update" operations is just not possible, as they are designed to possibly affect one or more objects and never return this detail.

Resources