Arrays in Azure Table Storage Entities (Non-Byte) - azure

I'm looking to store arrays in Azure Table entities. At present, the only type of array supported natively is byte-array, limited to 64k length. The size is enough, but I'd like to store arrays of longs, doubles and timestamps in an entity.
I can obviously cast multiple bytes to the requested type myself, but I was wondering if there's any best-practice to achieve that.
To clarify, these are fixed length arrays (e.g. 1000 cells) associated with a single key.

I have written a Azure table storage client, called Lucifure Stash, which supports arrays, enums, large data, serialization, public and private properties and fields and more.
You can get it at https://github.com/hocho/LucifureStash

I've been trying to think of a nice way to do this other than the method you've already mentioned, and I'm at a loss. The simplest solution I can come up with is to take the array, binary serialize it and store in a binary array property.
Other options I've come up with but dismissed:
If storing it natively is important, you could keep this information in another child table (I know Azure Tables don't technically have relationships, but that doesn't mean you can't represent this type of thing). The downside of this being that it will be considerably slower than your original.
Take the array, XML serialize it and store it in a string property. This would mean that you could see the contents of your array when using 3rd party data explorer tools and you could run (inefficient) queries that look for an exact match on the contents of the array.
Use Lokad Cloud fat entities to store your data. This essentially takes you're whole object, binary serializes it and splits the results into 64kb blocks across the properties of the table entity. This does solve problems like the one you're experiencing, but you will only be able to access your data using tools that support this framework.

If you have just a key-value collection to store, then you can also check out Azure BLOBs. They can rather efficiently store arrays of up to 25M time-value points per single blob (with a random access within the dataset).

If you choose to store your object in blob storage and need more than one "key" to get it, you can just create an azure table or two or n where you store the key you want to look up and the reference to the exact blob item.

Why don't you store the values as csv strings?

You could serialize your array as a JSON string using the .NET JavaScript serializer:
http://msdn.microsoft.com/en-us/library/system.web.script.serialization.javascriptserializer.aspx
This class has a "MaxJsonLength" property you could use to ensure your arrays didn't exceed 64K when you were serializing them. And you can use the same class to deserialize your stored objects.

Related

is redisJSON better than plain redis when keeping data for boardgame session data?

Just loaded up a redis server for my backend with ioredis.
I'm learning that if i want to store data in json spec, i gotta use the redisJSON module instead. Since hashes are only string typed and they are flat. However, if im only storing one object per user instance, containing less than 10 fields that are typed string/num or array.. is it better to just use without redisJSON? On one hand, redisJSON can let me query an object on one query. On the other, i can just store multiple datatypes and query between those sets/hash with a consistent naming convention.
Does anyone know whats the better usage or pitfalls with either approach?
the backend serves a websocket for a multiplayer boardgame.
The answer is it depends and it requires several trade-offs to be made for each project
Performance: RedisJSON uses a tree structure for storing all elements in a document.
Comparing to a string: the advantage is that updating sub-elements of a document will be faster than manipulating a string containing a serialised JSON object. But retrieving (reassembling) and writing the entire document will be more expensive compared to Strings. Read more here.
Comparing to Hash: when manipulating a flat document (1 level deep), RedisJSON and HSET performance are comparable.
Maintainability: using several native data types in Redis to represent your object can be really performing, but the code will be more complex to maintain. There can be additional migration/refactoring work when the structure of the document is altered.
Querying: RediSearch has support for indexing and querying the content RedisJSON documents. This is, of course, if your use case requires secondary indexing and querying documents other than with their key. You can still build your own secondary indexing with Redis data structures, but this is also a trade-off in maintainability
disclaimer: I work at Redis, creator and maintainer of RediSearch and RedisJSON

What is a efficient way of storing snapshots of an in-memory key-value store in Java?

I am trying to design an in-memory key-value store that maps strings to strings of variable length. I also want to give it the ability to take snapshots of its key-value data sets for any particular moment in time. Moreover, modifications to the key-value store should not affect past snapshots. I am currently using a HashMap for this, and for snapshots I maintain a mapping of timestamps to deep-copies of the respective HashMap's entry sets (with simple String compression). Are there any other more effective methods of doing this in-memory?
I am wondering, is it perhaps more memory-efficient, since I am working with strings of characters, to use tries instead?
Interesting. A little research shows that a Ctrie might be what you are looking for. Wiki: https://en.wikipedia.org/wiki/Ctrie
ctrie: Concurrent Tries with Efficient Non-Blocking Snapshots
Looks like there is code available in multiple languages
java haskell python C++
Found related :
Creating a ConcurrentHashMap that supports "snapshots"
and searching Stackoverflow: https://stackoverflow.com/search?q=ctrie

redis performance, store json object as a string

I need to save a User model, something like:
{ "nickname": "alan",
"email": ...,
"password":...,
...} // and a couple of other fields
Today, I use a Set: users
In this Set, I have a member like user:alan
In this member I have the hash above
This is working fine but I was just wondering if instead of the above approach that could make sense to use the following one:
Still use users Set (to easily get the users (members) list)
In this set only use a key / value storage like:
key: alan
value : the stringify version of the above user hash
Retrieving a record would then be easier (I will then have to Parse it with JSON).
I'm very new to redis and I am not sure what could be the best. What do you think ?
You can use Redis hashes data structure to store your JSON object fields and values. For example your "users" set can still be used as a list which stores all users and your individual JSON object can be stored into hash like this:
db.hmset("user:id", JSON.stringify(jsonObj));
Now you can get by key all users or only specific one (from which you get/set only specified fields/values). Also these two questions are probably related to your scenario.
EDIT: (sorry I didn't realize that we talked about this earlier)
Retrieving a record would then be easier (I will then have to Parse it with JSON).
This is true, but with hash data structure you can get/set only the field/value which you need to work with. Retrieving entire JSON object can result in decrease of performance (depends on how often you do it) if you only want to change part of the object (other thing is that you will need to stringify/parse the object everytime).
One additional merit for JSON over hashes is maintaining type. 123.3 becomes the string "123.3" and depending on library Null/None can accidentally be casted to "null".
Both are a bit tedious as that will require writing a transformer for extracting the strings and converting them back to their expected types.
For space/memory consumption considerations, I've started leaning towards storing just the values as a JSON list ["my_type_version", 123.5, null , ... ] so I didn't have overhead of N * ( sum(len(concat(JSON key names))) which in my case was +60% of Redis's used memory footprint.
bear in mind: Hashes cannot store nested objects, JSON can do it.
Truthfully, either way works fine. The way you store it is a design decision you will need to make. It depends on how you want to retrieve the user information, etc.
In terms of performance, storing the JSON encoded version of the user object will use less memory and take less time for storage/retrieval. That is, JSON parsing is probably faster than retrieving each field from Redis. And, even if not, it is probably more memory efficient. The difference in performance is probably minimal anyway.

Store large amounts of data in RMS

I need to store large amount of data using RMS API through J2ME.
How can I store that multiple-column data, given they must be hardcoded so I need to store those multiple columns and rows data.
How can I do this in this, should I use structs?
Well, RMS only allows you to store records that are arrays of bytes. You will have to decide for yourself how a record is stored, and if you want to store your data in a single or in multiple records. If you use the DataInputStream and DataOutputStream classes, you'll be able to read/write Strings, booleans, integers, etc. The API documentation includes a decent example of how you can do this.
If you have complex data to store, or a lot of different objects, you may want to create a simple library for RMS I/O, that allows you to pass objects implementing e.g. "Storable" to a library class that writes away your object into RMS.
See my question on exacty the same topic. In the end we bought a commercial BTree imeplementation and extended it to work across multiple record stores.
Has Jeroen said RMS is quite basic. You can only store arrays of bytes. But, though it's basic, it's quite easy implement a more complex memory structure with an index stored in a record store and addressing other record stores containing data.
Have a look to this page : Understanding the Record Management System

What is the best way to store and search through object transactions?

We have a decent sized object-oriented application. Whenever an object in the app is changed, the object changes are saved back to the DB. However, this has become less than ideal.
Currently, transactions are stored as a transaction and a set of transactionLI's.
The transaction table has fields for who, what, when, why, foreignKey, and foreignTable. The first four are self-explanatory. ForeignKey and foreignTable are used to determine which object changed.
TransactionLI has timestamp, key, val, oldVal, and a transactionID. This is basically a key/value/oldValue storage system.
The problem is that these two tables are used for every object in the application, so they're pretty big tables now. Using them for anything is slow. Indexes only help so much.
So we're thinking about other ways to do something like this. Things we've considered so far:
- Sharding these tables by something like the timestamp.
- Denormalizing the two tables and merge them into one.
- A combination of the two above.
- Doing something along the lines of serializing each object after a change and storing it in subversion.
- Probably something else, but I can't think of it right now.
The whole problem is that we'd like to have some mechanism for properly storing and searching through transactional data. Yeah you can force feed that into a relational database, but really, it's transactional data and should be stored accordingly.
What is everyone else doing?
We have taken the following approach:-
All objects are serialised (using the standard XMLSeriliser) but we have decorated our classes with serialisation attributes so that the resultant XML is much smaller (storing elements as attributes and dropping vowels on field names for example). This could be taken a stage further by compressing the XML if necessary.
The object repository is accessed via a SQL view. The view fronts a number of tables that are identical in structure but the table name appended with a GUID. A new table is generated when the previous table has reached critical mass (a pre-determined number of rows)
We run a nightly archiving routine that generates the new tables and modifies the views accordingly so that calling applications do not see any differences.
Finally, as part of the overnight routine we archive any old object instances that are no longer required to disk (and then tape).
I've never found a great end all solution for this type of problem. Some things you can try is if your DB supports partioning (or even if it doesn't you can implement the same concept your self), but partion this log table by object type and then you can further partion by date/time or by your object ID (if your ID is a numeric this works nicely not sure how a guid would partion).
This will help maintain the size of the table and keep all related transactions to a single instance of an object to itself.
One idea you could explore is instead of storing each field in a name value pair table, you could store the data as a blob (either text or binary). For example serialize the object to Xml and store it in a field.
The downside of this is that as your object changes you have to consider how this affects all historical data if your using Xml then there are easy ways to update the historical xml structures, if your using binary there are ways but you have to be more concious of the effort.
I've had awsome success storing a rather complex object model that has tons of interelations as a blob (the xml serializer in .net didn't handle the relationships btw the objects). I could very easily see myself storing the binary data. A huge downside of storing it as binary data is that to access it you have to take it out of the database with Xml if your using a modern database like MSSQL you can access the data.
One last approach is to split the two patterns, you could define a Difference Schema (and I assume more then one property changes at a time) so for example imagine storing this xml:
<objectDiff>
<field name="firstName" newValue="Josh" oldValue="joshua"/>
<field name="lastName" newValue="Box" oldValue="boxer"/>
</objectDiff>
This will help alleviate the number of rows, and if your using MSSQL you can define an XML Schema and get some of the rich querying ability around the object. You can still partition the table.
Josh
Depending on the characteristics of your specific application an alternative approach is to keep revisions of the entities themselves in their respective tables, together with the who, what, why and when per revision. The who, what and when can still be foreign keys.
Although I would be very careful to use this approach, since this is only viable for applications with a relatively small amount of changes per entity/entity type.
If querying the data is important I would use true Partitioning in SQL Server 2005 and above if you have enterprise edition of SQL Server. We have millions of rows partitioned by year down to day for the current month - you can be as granular as your application demands with a maximum number of 1000 partitions.
Alternatively , if you are using SQL 2008 you could look into filtered indexes.
These are solutions that will enable you to retain the simplified structure you have whilst providing the performance you need to query that data.
Splitting/Archiving older changes obviously should be considered.

Resources