I have a nodeJS application that operates on user data in the form of JSON. The size of the JSON data is variable and depends on the user. Usually the size is around 30KB.
Every time any value of the the JSON parameter changes, I recalculate the JSON object, stringify it and encrypt the string using RSA encryption.
Calculating the json involves doing the following steps:
From the database get the data required to form the json. This involves querying atleast 4 tables in a nested for loop.
Use object.assign() to combine the data to form larger json object in the same for loop of order 2.
Once the final object is formed stringify and encrypt it using crypto module of nodejs.
So all this is causing my CPU to crash when the data is huge which means large number of iterations and lot of data to encrypt.
We use postgres database, hence I have to recalculate the entire json object even if only a single parameter value changed.
I was wondering if nodejs worker pool could be a solution to this? But I would like to know how the worker threads handles tasks under the hood. Suggestions on an alternate solution to this problem are also welcomed.
Related
we have a map of custom object key to custom value Object(complex Object). We set the in-memory-format as OBJECT. But IMap.get is taking more time to get the value when the retrieved object size is big. We cannot afford latency here and this is required for further processing. IMap.get is called in jvm where cluster is started. Do we have a way to get the objects quickly irrespective of its size?
This is partly the price you pay for in-memory-format==OBJECT
To confirm, try in-memory-format==BINARY and compare the difference.
Store and retrieve are slower with OBJECT, some queries will be faster. If you run enough of those queries the penalty is justified.
If you do get(X) and the value is stored deserialized (OBJECT), the following sequence occurs
1 - the object it serialized from object to byte[]
2 - the byte array is sent to the caller, possibly across the network
3 - the object is deserialized by the caller, byte[] to object.
If you change to store serialized (BINARY), step 1 isn't need.
If the caller is the same process, step 2 isn't needed.
If you can, it's worth upgrading (latest is 5.1.3) as there are some newer options that may perform better. See this blog post explaining.
You also don't necessarily have to return the entire object to the caller. A read-only EntryProcessor can extract part of the data you need to return across the network. A smaller network packet will help, but if the cost is in the serialization then the difference may not be remarkable.
If you're retrieving a non-local map entry (either because you're using client-server deployment model, or an embedded deployment with multiple nodes so that some retrievals are remote), then a retrieval is going to require moving data across the network. There is no way to move data across the network that isn't affected by object size; so the solution is to find a way to make the objects more compact.
You don't mention what serialization method you're using, but the default Java serialization is horribly inefficient ... any other option would be an improvement. If your code is all Java, IdentifiedDataSerializable is the most performant. See the following blog for some numbers:
https://hazelcast.com/blog/comparing-serialization-options/
Also, if your data is stored in BINARY format, then it's stored in serialized form (whatever serialization option you've chosen), so at retrieval time the data is ready to be put on the wire. By storing in OBJECT form, you'll have to perform the serialization at retrieval time. This will make your GET operation slower. The trade-off is that if you're doing server-side compute (using the distributed executor service, EntryProcessors, or Jet pipelines), the server-side compute is faster if the data is in OBJECT format because it doesn't have to deserialize the data to access the data fields. So if you aren't using those server-side compute capabilities, you're better off with BINARY storage format.
Finally, if your objects are large, do you really need to be retrieving the entire object? Using the SQL API, you can do a SELECT of just certain fields in the object, rather than retrieving the entire object. (You can also do this with Projections and the older Predicate API but the SQL method is the preferred way to do this). If the client code doesn't need the entire object, selecting certain fields can save network bandwidth on the object transfer.
I have an JSON object the has the size of about 350kb a list of items about 1500 item , I don't want to keep it on the front end but yet I don't want to use database either can I store it in node.js and call it from there each time the data are needed ? I have no idea if this can be considered a bad practice what do you think ?
I'm new in Node.js and Cloud Functions for Firebase, I'll try to be specific for my question.
I have a firebase-database with objects including a "score" field. I want the data to be retrieved based on that, and that can be done easily in client side.
The issue is that, if the database gets to grow big, I'm worried that either it will take too long to return and/or will consume a lot of resources. That's why I was thinking of a http service using Cloud Functions to store a cache with the top N objects that will be updating itself when the score of any objects change with a listener.
Then, client side just has to call something like https://myexampleprojectroute/givemethetoplevels to receive a Json with the top N levels.
Is it reasonable? If so, how can I approach that? Which structures do I need to use this cache, and how to return them in json format via http?
At the moment I'll keep doing it client side but I'd really like to have that both for performance and learning purpose.
Thanks in advance.
EDIT:
In the end I did not implement the optimization. The reason why is, first, that the firebase database does not contain a "child count" so I didn't find a way with my newbie javascript knowledge to implement that. Second, and most important, is that I'm pretty sure it won't scale up to millions, having at most 10K entries, and firebase has rules for sorted reading optimization. For more information please check out this link.
Also, I'll post a simple code snippet to retrieve data from your database via http request using cloud-functions in case someone is looking for it. Hope this helps!
// Simple Test function to retrieve a json object from the DB
// Warning: No security methods are being used such authentication, request methods, etc
exports.request_all_levels = functions.https.onRequest((req, res) => {
const ref = admin.database().ref('CustomLevels');
ref.once('value').then(function(snapshot) {
res.status(200).send(JSON.stringify(snapshot.val()));
});
});
You're duplicating data upon writes, to gain better read performance. That's a completely reasonable approach. In fact, it is so common in NoSQL databases to keep such derived data structures that it even has a name: denormalization.
A few things to keep in mind:
While Cloud Functions run in a more predictable environment than the average client, the resources are still limited. So reading a huge list of items to determine the latest 10 items, is still a suboptimal approach. For simple operations, you'll want to keep the derived data structure up to date for every write operation.
So if you have a "latest 10" and a new item comes in, you remove the oldest item and add the new one. With this approach you have at most 11 items to consider, compared to having your Cloud Function query the list of items for the latest 10 upon every write, which is a O(something-with-n) operation.
Same for an averaging operation: you'll find a moving average to be most performant, because it doesn't require any of the previous data.
I am using Redis key-value pair for storing the data. The data against a particular key can change at any point of time, so after every retrieval request I asynchronously update the data stored against the requested key so that the next request can be served with updated data.
I have done quite a bit of testing but still I am wondering if there could be any case where this approach might have some negative consequences?
PS: The data is consolidated from multiple servers.
Thanks in advance for any help/suggestions.
If you already know the value to be stored, you can use GETSET (or a transaction if it is not a simple string type).
If the new value is some manipulation on the value i.e. f(value), you should do it in a LUA script.
Otherwise some other client might read the old value before you update it.
What's the best approach for generating a page that is the results of complex calculation/data manipulation/api calls (e.g. 5 mins per page)? Obviously I can't do the calculation within my rails web request.
A scheduled task can produce some data, but where should I store it? Should I store it in a postgres table? Should I store it in a document oriented database? Should I store it in memory? Should I generate an html?
I have the feeling of being second-level ignorant about the subject. Is there a well known set of tools to deal with this kind of architectural problem?
Thanks.
I would suggest following approach:
1. Once you receive initial request:
You can start processing in a separate thread when you receive the first request with input for calculation and send some token/unique identifier for the request.
2. Store the result:
Then start the calculation and store the result in memory using some tool like memcached.
3. Poll for the result:
Then the request for fetching the result should keep polling for the result with generated token/unique request identifier. As Adriano said you can use AJAX for that (I am assuming you are getting the requests from Web Browser).