Scenario:
Some code is listening on collection A for changes. When one occurs, it does some calculation and updates collection B.
Time between changes in A: 20-50ms.
Time for actual calculation: 20-30ms.
Time for roundtrip sending updates to firebase: 250-300ms.
So the code is something like this:
const runUpdates = async (snapshot) => {
const inputData = snapshot && snapshot.exists() && snapshot.val() || undefined
if (inputData) {
const calculatedData = calculateStuff(inputData)
await firebase.database().ref().update({ 'collectionB': calculatedData })
}
}
firebase.database().ref('collectionA').on('value', runUpdates)
I'm using Firebase Realtime Database.
Actual question:
Does the firebase package (using local cache or any other means necessary) assure that the updates will be done in firebase in the same order that I have done them in my code or do I need to await for every update to finish before I can move on to my next computation & update?
More details:
There is a mechanism in place for cases when there is a trigger event but the calculation/update is not yet finished. I'm porpusfully ignoring that or clarity.
I'm trying to improve this code and it seems that in many cases the calculation is relatively short, but then I need to wait for the response from firebase to start the next calculation.
I've been told that firebase has a local cache (server-side) and that my firebase update command actually updates that locally (and therefore is "immediate") and then works to propagate the change to firebase itself. Any sequential updates would also be propagated, while the sequence is assured.
(Needless to say, I tried looking around for this info in the docs etc)
Queries to Realtime Database are pipelined over a single socket connection. The results will be delivered in the order that the queries were issued.
If you need to know when the results of write have been fully committed to the server, you will need to pay attention to the promise returned by update(). That promise will become fulfilled only after the write completes on the server, and not when changes are available just locally.
Whether your use await or then on that promise doesn't really matter. Either way, you will know the result of the update.
Related
The title might be misleading (I couldn't come up with a better title to be honest) so please read my explanation:
Let's say we are trying to create a user and also update the cache:
Create user and insert to database.
Update the cache with created user.
OR
We are trying to publish an event after user is created (for example in microservices)
Create user and insert to database.
Publish an event with created user.
OR
We are trying to do n things and we want to ensure either all of them get completed or none.
Create user and insert to database.
Update cache.
Send an email.
Send SMS.
Publish an event, ... ( the list goes on )
In a perfect world where there are no failures, we can just write them in order and that's it, but what happens when we have a failure after user creation is complete? (Before adding to cache OR Sending the event, etc)
These examples are made up and are for the cache example:
const data = {
id: 1
};
const user = database.createUser(data);
// Power goes out here (or any kind of failure)
cache.setCache(user);
Here, We've successfully created the user but failed to update the cache.
Let's give another example using database transactions:
const data = {
id: 1
};
const transaction = database.startTransaction();
try {
const user = database.createUser(data);
cache.setCache(user);
// Power goes out here (or any kind of failure)
transaction.commit();
} catch(err) {
transaction.rollback();
}
Here, We've successfully updated the cache but the user was never created because of the failure.
Thank you for your time.
When working with microservices, the usual ACID transactions that we are used to work with won't apply. Instead you could have a look at BASE transactions.
See here : https://www.johndcook.com/blog/2009/07/06/brewer-cap-theorem-base/
An alternative to ACID is BASE:
Basic Availability
Soft-state
Eventual consistency
Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state. (Accounting systems do this all the time. It’s called “closing out the books.”) It’s OK to use stale data, and it’s OK to give approximate answers.
Technically it means that you're gonna have to find a clean way to deal with failure, for example by sending Events in case of failure (which means the user you created should be removed from the cache, or event send an email saying there's been an error).
We often see examples in orders or payments system, where you can receive an email saying that the order could not be processed.
How does one cache database data in firebase function?
Based on this SO answer, firebase caches data for as long as there is an active listener.
Considering the following example:
exports.myTrigger = functions.database.ref("some/data/path").onWrite((data, context) => {
var dbRootRef = data.after.ref.root;
dbRootRef.child("another/data/path").on("value", function(){});
return dbRootRef.child("another/data/path").once("value").then(function(snap){/*process data*/})
}
This will cache the data but the question is - is this valid approach for server side? Should I call .off() at some point in time so it doesn't produce any issues since this call can scale quickly producing tons of '.on()' listeners? Or is it ok to keep 'on()' indefinitely?
Since active data is kept in memory, your code will keep a snapshot of the latest data at another/data/path in memory as long as the listener is active. Since you never call off in your code, that will be as long as the container that runs the function is active, not just for the duration that this function is active.
Even if you have other Cloud Functions in that container, and those other functions don't need this data, it'll still be using memory.
If that is the behavior you want, then it's a valid approach. I'd just recommend doing a cost/benefit analysis, because I expect this may lead to hard-to-understand behavior at some point.
I am running a transaction to update an item that needs to be stored in two keys. To accomplish this, I have setup a nested transaction as follows, and it seems to run as expected:
firebaseOOO.child('relationships/main').child(accountID).child(friendAccountID).transaction(function(data) {
data.prop = 'newval';
firebaseOOO.child('relationships/main').child(friendAccountID).child(accountID).transaction(function(data) {
return r;
});
return r;
});
Are there any gotchas or possible unexpected implications to this? I am most worried about getting stuck in some sort of transaction loop under load, where each transaction cancels the other out forcing them both to restart, or similar.
Is there a better way of doing this?
I am using the NodeJS client.
You probably don't want to start another transaction from within the callback to the first one. There is no guarantee as to how many times the function for your first transaction will run, particularly if there is a lot of contention at the location you are trying to update.
A better solution, which I believe you hit on in your other question, is to start the second transaction from the completion callback, after checking that the first one committed.
Is this a "proper" way to run Firebase transactions that depend on each other sequentially using the NodeJS client:
ref.child('relationships/main').child(accountID).transaction(function(data) {
return r;
}, function(error, committed, snapshot) {
if (error) {}
else if (!committed) {}
else {
runNextTransaction();
}
});
Originally I was going to put runNextTransaction() in the core function because transactions first run locally, but wouldn't that then hold open the original transaction until the last transaction in the chain is complete, possibly causing issues? (Also I need good data for the next step so I would have to handle collisions before moving on.)
Transactions run asynchronously, so kicking off the next transaction from within the first one would work, but it may not do what you want. Transactions functions can be run more than one time, and you likely don't want to initiate multiple secondary transactions in that case. What you have looks like the right way to do serial transactions. If you're interested in making things a little cleaner, especially if you're going to chain multiple transactions, consider looking into Promises.
I'm playing around with node.js and redis and installed the hiredis library via this command
npm install hiredis redis
I looked at the multi examples here:
https://github.com/mranney/node_redis/blob/master/examples/multi2.js
At line 17 it says
// you can re-run the same transaction if you like
which implies that the internal multi.queue object is never cleared once the commands finished executing.
My question is: How would you handle the situation in an http environment? For example, tracking the last connected user (this doesn't really need multi as it just executes one command but it's easy to follow)
var http = require('http');
redis = require('redis');
client = redis.createClient()
multi = client.multi();
http.createServer(function (request, response) {
multi.set('lastconnected', request.ip); // won't work, just an example
multi.exec(function(err, replies) {
console.log(replies);
});
});
In this case, multi.exec would execute 1 transaction for the first connected user, and 100 transactions for the 100th user (because the internal multi.queue object is never cleared).
Option 1: Should I create the multi object inside the http.createServer callback function, which would effectivly kill it at the end of the function's execution? How expensive in terms of CPU cycles would creating and destroying of this object be?
Option 2: The other option would be to create a new version of multi.exec(), something like multi.execAndClear() which will clear the queue the moment redis executed that bunch of commands.
Which option would you take? I suppose option 1 is better - we're killing one object instead of cherry picking parts of it - I just want to be sure as I'm brand new to both node and javascript.
The multi objects in node_redis are very inexpensive to create. As a side-effect, I thought it would be fun to let you re-use them, but this is obviously only useful under some circumstances. Go ahead and create a new multi object every time you need a new transaction.
One thing to keep in mind is that you should only use multi if you actually need all of the operations to execute atomically in the Redis server. If you just want to batch up a series of commands efficiently to save network bandwidth and reduce the number of callbacks you have to manage, just send the individual commands, one after the other. node_redis will automatically "pipeline" these requests to the server in order, and the individual command callbacks, if any, will be invoked in order.