Updating and retrieving keys in Redis - multithreading

I am using Redis key-value pair for storing the data. The data against a particular key can change at any point of time, so after every retrieval request I asynchronously update the data stored against the requested key so that the next request can be served with updated data.
I have done quite a bit of testing but still I am wondering if there could be any case where this approach might have some negative consequences?
PS: The data is consolidated from multiple servers.
Thanks in advance for any help/suggestions.

If you already know the value to be stored, you can use GETSET (or a transaction if it is not a simple string type).
If the new value is some manipulation on the value i.e. f(value), you should do it in a LUA script.
Otherwise some other client might read the old value before you update it.

Related

What is the best practice for storing rarely modified database values in NodeJS?

I've got a node app that works with Salesforce for a few different things. One of the features is letting users fill in a form and pushing it to Salesforce.
The form has a dropdown list, so I query salesforce to get the list of available dropdown items and make them available to my form via res.locals. Currently I'm getting these values via some middleware, storing them in the users session, and then checking if the session value is set, use them, if not, query salesforce and pull them in.
This works, but it means every users session data in Mongo holds a whole bunch of picklist vals (they are the same for all users). I very rarely make changes to the values on the Salesforce side of things, so I'm wondering if there is a "proper" way of storing these vals in my app?
I could pull them into a Mongo collection, and trigger a manual refresh of them whenever they change. I could expire them in Mongo (but realistically if they do need to change, it's because someone needs to access the new values immediately), so not sure that makes the most sense...
Is storing them in everyone's session the best way to tackle this, or is there something else I should be doing?
To answer your question quickly, you could add them to a singleton object (instead of session data, which is per user). But not sure how you will manage their lifetime (i.e. pull them again when they change). A singleton can be implemented using a simple script file that can be required which returns a simple object...
But if I was to do something like this, I would go about doing it differently:
I would create an API endpoint that returns your list data (possibly giving it a query parameters to return different lists).
If you can afford the data being outdated for a short period of time then, you can write your API so that it returns the response cached (http cache, for a short period of time)
If your data has to be realtime fresh, then your API should return an eTag in the response of the API. The eTag header basically acts like a checksum for your data, a good checksum would be "last updated date" of all the records in a collection. Upon receiving a request you check if you have the header "if-none-match" which would contain the checksum, at this point, you do a "lite" call to your database to just pull the checksum, if it matches then you return 304 http code (not modified), otherwise you actually pull the full data you need and return it (alongside the new checksum in the response eTag). Basically you are letting your browser do the caching...
Note that you can also combine caching in points 1 and 2 and use them together.
More resources on this here:
https://devcenter.heroku.com/articles/increasing-application-performance-with-http-cache-headers
https://developers.facebook.com/docs/marketing-api/etags

Store and retrieve data from coordinates in Redis

I'm using Redis as a caching system to store data to avoid unnecessary consume on an API. What i was thinking is after get the result from the API, store the coordinate (as a key, for example) and the data on Redis and in a later search before consume the external api again, passing the a new coordinate check in Redis if matches with the saved coordinate (in meters, or anything) and if it matches, bring me the data stored.
I already searched for at least 1 hour and could not find any relevant results that fits my needs. For example, GEOADD could not help me because it doesn't expire automatically by Redis.
The only solution would be storing the coordinates as key (example: -51.356156,-50.356945) with a json value, and check with a functional programming all the keys (coordinates) if it's matches with an other coordinate. But it seens not elegant, also with bad performance.
Any ideas?
I'm using Redis in NodeJS (Express).
If I understand correctly, you want to able to:
Cache the API's response for a given tuple of coordinates
Be able to perform an efficient radius search over the cached responses
Use Redis' expiration to invalidate old cache entries
To satisfy #1, you've already outlined the right approach - store each API call under its own key. You can name the key by your coordinates, or use their geohash value (computing it can be done in the client or with a temporary element in Redis). Also don't forget setting a TTL on that key and the global maxmemory eviction policy for eviction to actually work.
The 2nd requirement calls for using a Geo Set. Store the coordinates and the key names in it. Perform your query by calling GEORADIUS and then fetch the relevant keys according to the reply.
While fetching the keys from #2's query, you may find some of them have been evicted from the keyspace but not in the Geo Set. Call ZREM for each of these to keep a semblance of sync between your index (the Geo Set) and the keyspace. Additionally, you can also run a periodic background task that ZSCAN's the Geo Set and does housekeeping. That should take care of #3.

Cloud Functions Http Request return cached Firebase database

I'm new in Node.js and Cloud Functions for Firebase, I'll try to be specific for my question.
I have a firebase-database with objects including a "score" field. I want the data to be retrieved based on that, and that can be done easily in client side.
The issue is that, if the database gets to grow big, I'm worried that either it will take too long to return and/or will consume a lot of resources. That's why I was thinking of a http service using Cloud Functions to store a cache with the top N objects that will be updating itself when the score of any objects change with a listener.
Then, client side just has to call something like https://myexampleprojectroute/givemethetoplevels to receive a Json with the top N levels.
Is it reasonable? If so, how can I approach that? Which structures do I need to use this cache, and how to return them in json format via http?
At the moment I'll keep doing it client side but I'd really like to have that both for performance and learning purpose.
Thanks in advance.
EDIT:
In the end I did not implement the optimization. The reason why is, first, that the firebase database does not contain a "child count" so I didn't find a way with my newbie javascript knowledge to implement that. Second, and most important, is that I'm pretty sure it won't scale up to millions, having at most 10K entries, and firebase has rules for sorted reading optimization. For more information please check out this link.
Also, I'll post a simple code snippet to retrieve data from your database via http request using cloud-functions in case someone is looking for it. Hope this helps!
// Simple Test function to retrieve a json object from the DB
// Warning: No security methods are being used such authentication, request methods, etc
exports.request_all_levels = functions.https.onRequest((req, res) => {
const ref = admin.database().ref('CustomLevels');
ref.once('value').then(function(snapshot) {
res.status(200).send(JSON.stringify(snapshot.val()));
});
});
You're duplicating data upon writes, to gain better read performance. That's a completely reasonable approach. In fact, it is so common in NoSQL databases to keep such derived data structures that it even has a name: denormalization.
A few things to keep in mind:
While Cloud Functions run in a more predictable environment than the average client, the resources are still limited. So reading a huge list of items to determine the latest 10 items, is still a suboptimal approach. For simple operations, you'll want to keep the derived data structure up to date for every write operation.
So if you have a "latest 10" and a new item comes in, you remove the oldest item and add the new one. With this approach you have at most 11 items to consider, compared to having your Cloud Function query the list of items for the latest 10 upon every write, which is a O(something-with-n) operation.
Same for an averaging operation: you'll find a moving average to be most performant, because it doesn't require any of the previous data.

Multiple node instances with a single database

I'm currently writing a Node app and I'm thinking ahead in scaling. As I understand, horizontal scaling is one of the easier ways to scale up an application to handle more concurrent requests. My working copy currently uses MongoDb on the backend.
My question is thus this: I have a data structure that resembles a linked list that requires the order to be strictly maintained. My (imaginary) concern is that when there is a race condition to the database via multiple node instances, it is possible that the resolution of the linked list will be incorrect.
To give an example: Imagine the server having this list a->b. Instance 1 comes in with object c and instance 2 comes in with object d. It is possible that there is a race condition in which both instances read a->b and decides to append their own objects to the list. Instance 1 will then imagine it's insertion to be a->b->c while instance 2 think it's a->b->d when the database actually holds a->b->c->d.
In general, this sounds like a job for optimistic locking, however, as I understand, neither MongoDB or Redis (the other database that I am considering) does transactions in the SQL manner.
I therefore imagine the solution to be one of the below:
Implement my own transaction in MongoDB using flags. The client does a findAndModify on the lock variable and if successful, performs the operations. If unsuccessful, the client retries after a certain timeout.
Use Redis transactions and pubsub to achieve the same effect. I'm not exactly sure how to do this yet, but it sounds like it might be plausible.
Implement some sort of smart load balancing. If multiple clients is operating on the same item, route them to the same instance. Since JS is single threaded, the problem would be solved. Unfortunately, I didn't find a straightforward solution to that.
I sure there exists a better, more elegant way to achieve the above, and I would love to hear any solutions or suggestions. Thank you!
If I understood correctly, and the list is being stored as one single document, you might be looking at row versioning. So add a property to the document that will handle the version, when you update, you increase (or change) the version and you make that a conditional update:
//update(condition, value)
update({version: whateverYouReceivedWhenYouDidFind}, newValue)
Hope it helps.
Gus
You want the findAndModify command on mongodb that will guarantee an atomic modification while returning the newly modified doc. As the changes are serial and atomic instance 1 will have a->b->c and instance 2 will have a->b->c->d
Cheers
If all you are doing is adding new elements to the list, you could use a Redis list and include the time in every value you add. The list may be unsorted on redis but should be quickly sortable when retrieved.

Get Timestamp after Insert/Update

In azure table storage. Is there a way to get the new timestamp value after an update or insert. I am writing a 3-phase commit protocol to get table storage to support distributed transactions , and it involes multiple writes to the same entity. So the operation order goes like this, Read Entity, Write Entity (Lock Item), Write Entity (Commit new values). I would like to get the new timestamp after the lock item operation so I don't have to unecessarily read the item again before doing the commit new value operation. So does any one know how to efficiently get the new timestamp value after a savechanges operation?
I don't think you need to do anything special/extra. When you read your entity you will get an Etag for it. When you save that entity (setting someLock=true) that save will only succeed if nobody else have updated the entity since your read. Hence you know you have the lock. And then you can do your second write as you please.
I don't believe it is possible. I would use your own timestamp and/or guid to mark entries.
If you're willing to go back to the Update REST API call, it does return the time that the response was generated. It probably won't be exactly the same as the time stamp on the record, but it will be close I'm sure.
You may need to hack your Azure table. drivers
In the Azure python lib (TableStorage) for example, the Timestamp is simply skipped over.
# exclude the Timestamp since it is auto added by azure when
# inserting entity. We don't want this to mix with real properties
if name in ['Timestamp']:
continue

Resources