I keep running into cases where I have more complicated information to store than what can fit into any of Redis' simple data structures. I still want to use Redis, but I was wondering if there are any standard alternatives people use when ideally they'd like to use a nested structure?
You have basically two strategies:
you can serialize your complex objects and store them as strings. We suggest json or msgpack for the serialization format. This is easy enough to manipulate from most client-side languages. If server-side access is needed, then a server-side Lua script can easily encode/decode such objects since Redis is compiled with msgpack and json support for Lua.
you can split your objects in different keys. Instead of storing user:id and a complex data structure to this id, you can store several keys such as user:id, user:id:address_list, user:id:document_lists, etc ... If you need atomicity, pipelining MULTI/EXEC blocks can be used to guarantee the data consistency, and aggregate the roundtrips.
See a simple example in this answer:
Will the LPUSH command work on a record that was initialized from JSON?
Finally, Redis is not a document oriented database. If you really have a lot of complex documents, perhaps you could be better served by solutions such as MongoDB, ArangoDB, CouchDB, Couchbase, etc ...
When you need to modify the object, it's very inefficient to serialize the complex object to string, and save the string to Redis. Since you have to fetch the string back to the client side, deserialize it to an object, modify it, serialize it to string again, and save it back to Redis. Too much work...
Now, it's 2019, and there're some new Redis modules which can enable Redis to support nested data structures, e.g. RedisJSON, redis-protobuf.
Desclaimer: I'm the author of redis-protobuf, so I'll give some examples on this module. Also redis-protobuf is faster and more memory efficient than RedisJSON, since it's use a binary format other than text format to serialize/deserialize the data.
First of all, you need to define your nested data structure in Protobuf format and save it to a local file:
syntax = "proto3";
message SubMsg {
string s = 1;
int32 i = 2;
}
message Msg {
int32 i = 1;
SubMsg sub = 2;
repeated int32 arr = 3;
}
Then load the module with the following configure in redis.conf:
loadmodule /path/to/libredis-protobuf.so --dir proto-directory
After that, you can read and write the nested data structure:
PB.SET key Msg '{"i" : 1, "sub" : {"s" : "string", "i" : 2}, "arr" : [1, 2, 3]}'
PB.SET key Msg.i 10
PB.GET key Msg.i
PB.SET key Msg.sub.s redis-protobuf
PB.GET key Msg.sub.s
PB.SET key Msg.arr[0] 2
PB.GET key Msg.arr[0]
Please check the doc for detail. If you have any problem with redis-protobuf, feel free to let me know.
Related
we have a map of custom object key to custom value Object(complex Object). We set the in-memory-format as OBJECT. But IMap.get is taking more time to get the value when the retrieved object size is big. We cannot afford latency here and this is required for further processing. IMap.get is called in jvm where cluster is started. Do we have a way to get the objects quickly irrespective of its size?
This is partly the price you pay for in-memory-format==OBJECT
To confirm, try in-memory-format==BINARY and compare the difference.
Store and retrieve are slower with OBJECT, some queries will be faster. If you run enough of those queries the penalty is justified.
If you do get(X) and the value is stored deserialized (OBJECT), the following sequence occurs
1 - the object it serialized from object to byte[]
2 - the byte array is sent to the caller, possibly across the network
3 - the object is deserialized by the caller, byte[] to object.
If you change to store serialized (BINARY), step 1 isn't need.
If the caller is the same process, step 2 isn't needed.
If you can, it's worth upgrading (latest is 5.1.3) as there are some newer options that may perform better. See this blog post explaining.
You also don't necessarily have to return the entire object to the caller. A read-only EntryProcessor can extract part of the data you need to return across the network. A smaller network packet will help, but if the cost is in the serialization then the difference may not be remarkable.
If you're retrieving a non-local map entry (either because you're using client-server deployment model, or an embedded deployment with multiple nodes so that some retrievals are remote), then a retrieval is going to require moving data across the network. There is no way to move data across the network that isn't affected by object size; so the solution is to find a way to make the objects more compact.
You don't mention what serialization method you're using, but the default Java serialization is horribly inefficient ... any other option would be an improvement. If your code is all Java, IdentifiedDataSerializable is the most performant. See the following blog for some numbers:
https://hazelcast.com/blog/comparing-serialization-options/
Also, if your data is stored in BINARY format, then it's stored in serialized form (whatever serialization option you've chosen), so at retrieval time the data is ready to be put on the wire. By storing in OBJECT form, you'll have to perform the serialization at retrieval time. This will make your GET operation slower. The trade-off is that if you're doing server-side compute (using the distributed executor service, EntryProcessors, or Jet pipelines), the server-side compute is faster if the data is in OBJECT format because it doesn't have to deserialize the data to access the data fields. So if you aren't using those server-side compute capabilities, you're better off with BINARY storage format.
Finally, if your objects are large, do you really need to be retrieving the entire object? Using the SQL API, you can do a SELECT of just certain fields in the object, rather than retrieving the entire object. (You can also do this with Projections and the older Predicate API but the SQL method is the preferred way to do this). If the client code doesn't need the entire object, selecting certain fields can save network bandwidth on the object transfer.
I have a nodeJS application that operates on user data in the form of JSON. The size of the JSON data is variable and depends on the user. Usually the size is around 30KB.
Every time any value of the the JSON parameter changes, I recalculate the JSON object, stringify it and encrypt the string using RSA encryption.
Calculating the json involves doing the following steps:
From the database get the data required to form the json. This involves querying atleast 4 tables in a nested for loop.
Use object.assign() to combine the data to form larger json object in the same for loop of order 2.
Once the final object is formed stringify and encrypt it using crypto module of nodejs.
So all this is causing my CPU to crash when the data is huge which means large number of iterations and lot of data to encrypt.
We use postgres database, hence I have to recalculate the entire json object even if only a single parameter value changed.
I was wondering if nodejs worker pool could be a solution to this? But I would like to know how the worker threads handles tasks under the hood. Suggestions on an alternate solution to this problem are also welcomed.
I want to store JSON DATA in Redis How I can do that what is the best option to store JSON in Redis. my object will look like
{"name":"b123.home.group.title", "value":"Hellow World", "locale":"en-us", "uid":"b456"}
I want to update the object based on value and locale. also want to get this object with any condition. also want TTL support here (that I can remove if not required)
So what is the best way to store this data in Redis without any Memory issue and support all operations with less time?
I currently have a table in my Postgres database with about 115k rows that I feel is too slow for my serverless functions. The only thing I need that table for is to lookup values using functions like ILIKE and the network barrier is slowing things down a lot I believe.
My thought was to take the table and make it into a javascript array of objects as it doesn't change often if ever. Now that I have it in a file such as array.ts and inside is:
export default [
{}, {}, {},...
]
What is the best way to query this huge array? Is it best to just use the .filter function? I currently am trying to import the array and filter it but it seems to just hang and never actually complete. MUCH slower the the current DB approach so I am unsure if this is the right approach.
Make the database faster
As people have commented, it's likely that the database will actually perform better than anything else given that databases are good at indexing large data sets. It may just be a case of adding the right index, or changing the way your serverless functions handle the connection pool.
Make local files faster
If you want to do it without the database, there are a couple of things that will make a big difference:
Read the file and then use JSON.parse, do not use require(...)
JavaScript is much slower to parse than JSON. You can therefore make things load much faster by parsing it as JavaScript.
Find a way to split up the data
Especially in a serverless environment, you're unlikely to need all the data for every request, and the serverless function will probably only serve a few requests before it is shutdown and a new one is started.
If you could split your files up such that you typically only need to load an array of 1,000 or so items, things will run much faster.
Depending on the size of the objects, you might consider having a file that contains only the id of the objects & the fields needed to filter them, then having a separate file for each object so you can load the full object after you have filtered.
Use a local database
If the issue is genuinely the network latency, and you can't find a good way to split up the files, you could try using a local database engine.
#databases/sqlite can be used to query an SQLite database file that you could pre-populate with your array of values and index appropriately.
const openDatabase = require('#databases/sqlite');
const {sql} = require('#databases/sqlite');
const db = openDatabase('mydata.db');
async function query(pattern) {
await db.query(sql`SELECT * FROM items WHERE item_name LIKE ${pattern}`);
}
query('%foo%').then(results => console.log(results));
I want to store all objects of a class in redis cache and be able to retrive them, as I understand hashmaps are used for storing objects, but they are require a different key to be saved. So I can't save them all under key e.g. "items" and retrieve them by that key. Only way I can do it is something like this:
items.forEach(item => {
redis.hmset(`item${item.id}`, item);
}
But this feels wrong and I have to have a for loop again when I want to get this data. Is there a better solution?
Also there is a problem of associated objects, I can't find anywhere how they are stored and used in redis.
As I understand, you want to save different keys with same prefix
You can use mset to store them
For retrieving the data you use the mget
with your keys as params
In case you still want to use the hmset
Use pipline in the loop
So the call to redis will be only one with the sync action