What is a efficient way of storing snapshots of an in-memory key-value store in Java? - hashmap

I am trying to design an in-memory key-value store that maps strings to strings of variable length. I also want to give it the ability to take snapshots of its key-value data sets for any particular moment in time. Moreover, modifications to the key-value store should not affect past snapshots. I am currently using a HashMap for this, and for snapshots I maintain a mapping of timestamps to deep-copies of the respective HashMap's entry sets (with simple String compression). Are there any other more effective methods of doing this in-memory?
I am wondering, is it perhaps more memory-efficient, since I am working with strings of characters, to use tries instead?

Interesting. A little research shows that a Ctrie might be what you are looking for. Wiki: https://en.wikipedia.org/wiki/Ctrie
ctrie: Concurrent Tries with Efficient Non-Blocking Snapshots
Looks like there is code available in multiple languages
java haskell python C++
Found related :
Creating a ConcurrentHashMap that supports "snapshots"
and searching Stackoverflow: https://stackoverflow.com/search?q=ctrie

Related

is redisJSON better than plain redis when keeping data for boardgame session data?

Just loaded up a redis server for my backend with ioredis.
I'm learning that if i want to store data in json spec, i gotta use the redisJSON module instead. Since hashes are only string typed and they are flat. However, if im only storing one object per user instance, containing less than 10 fields that are typed string/num or array.. is it better to just use without redisJSON? On one hand, redisJSON can let me query an object on one query. On the other, i can just store multiple datatypes and query between those sets/hash with a consistent naming convention.
Does anyone know whats the better usage or pitfalls with either approach?
the backend serves a websocket for a multiplayer boardgame.
The answer is it depends and it requires several trade-offs to be made for each project
Performance: RedisJSON uses a tree structure for storing all elements in a document.
Comparing to a string: the advantage is that updating sub-elements of a document will be faster than manipulating a string containing a serialised JSON object. But retrieving (reassembling) and writing the entire document will be more expensive compared to Strings. Read more here.
Comparing to Hash: when manipulating a flat document (1 level deep), RedisJSON and HSET performance are comparable.
Maintainability: using several native data types in Redis to represent your object can be really performing, but the code will be more complex to maintain. There can be additional migration/refactoring work when the structure of the document is altered.
Querying: RediSearch has support for indexing and querying the content RedisJSON documents. This is, of course, if your use case requires secondary indexing and querying documents other than with their key. You can still build your own secondary indexing with Redis data structures, but this is also a trade-off in maintainability
disclaimer: I work at Redis, creator and maintainer of RediSearch and RedisJSON

Does there exist a language with the characteristic of storing variables in persistent storage?

I had this idea this morning, and was thinking about how to implement it when it occurred to me somebody has probably already done this. I searched but found nothing, here's my idea:
In short, all variable storage is stored in persistent storage. I don't mean battery backed up RAM. I mean more like a database.
To use common technologies to explain what I mean: Lets say you were to use an SQL database for this persistent storage. An array/list would be stored as a table with one column. An ordered list would be stored as two columns with the first being a sequence number. A hash would be a table with two columns, the first being the key, the second being the value. All simple stuff. But what I'm getting at is that you could do large data moving/calculating/reporting operations with native language constructs without all that mucking about in hyper... I mean without all that SQL and loading data from the database.
I was thinking sort of like the way you can do matrix math in APL. It would be native to the language and all the underpinning storage would just work. And in reality it would use a record manager more than a SQL database. That was just to explain.
Of course this would be horribly slow, but solid state disk is getting bigger faster and cheaper, so this might not be as unwieldy as it might first seem.
Anyway, is this a novel idea or has somebody done this before?
MUMPS has something like that.
Database interaction is transparently built into the language. The MUMPS language provides a hierarchical database made up of persistent sparse arrays, which is implicitly “opened” for every MUMPS application. All variable names prefixed with the caret character (“^”) use permanent (instead of RAM) storage, will maintain their values after the application exits, and will be visible to (and modifiable by) other running applications.
Of course, it’s explicit—thus not applied to all variables—but still automatic.
How persistent are you talking? The localStorage API works well (persists across browser tabs and sessions) so long as you know users can choose to clear it out. Your question sounds eerily like WebKit client-side database storage though.
Well, to point out the obvious, there is SQL.

Space efficient embedded Haskell persistence solution

I'm looking for a persistence solution (maybe a NoSQL db? or something else...) that has the following criteria:
1) Has a Haskell API
2) Is disk space efficient--the db could easily get to many gigabytes of data but I need it to run well on a typical desktop. I need something that stores the data as efficiently as possible. So, for example, storing field names in a record would be bad.
3) High performance for reading sequential records. The typical use case is start somewhere and then read forward straight through the data--reading through possibly millions of records as quickly as possible.
4) Data is basically never changed (would only be changed if it was discovered data was incorrect somehow), just logged
5) It should act directly on file(s) that can be easily moved/copied around. It should not be calling a separate running server.
If you remove the "single file" requirement with no other running process, everything else can be fulfilled by every standard RDBMS, and depending on the type of data, sometimes especially well by columnar stores in particular.
The only single-file solution I know of is sqlite. Mainly sqlite founders when a single db needs to be accessed by multiple concurrent processes. If that isn't the case, then I wouldn't be surprised if you could scale it up singificantly.
Additionally, if you're only looking for sequential scans and key-value stores, you could just go with berkeleydb, which is known to be high-performance for very large data sets.
There are high quality Haskell bindings for talking to both sqlite and berkeleydb.
Edit: For sequential access only, its also blindingly straightforward to roll your own layer with the binary or cereal packages -- you basically need to write a helper function to wrap reading records from a file sequentially rather than all at once. An abstraction for folding over them is nice as well. Then you can decide to append to a single file, or spread your writes across files as you go. Either way, that's the most lightweight and straightforward option of all. The only drawback is having to worry about durability -- safe writes in the presence of interrupts, and all that other stuff that a good DB solution should take care of for you.
CouchDB ticks most of your boxes:
1) http://hackage.haskell.org/package/CouchDB
2) Depends on how you use it. You can store any binary data in it, but its up to you to know what it means. Or you can store XML or JSON, which is less space efficient but easier to migrate as your schema evolves (which it will).
3) Don't know, but its used for big web sites.
4) CouchDB uses a CM-like concept of updates and baselines, so old data stays around. It can be purged later as obsolete, but I think thats optional.
5) No. Its written in Erlang and runs (I believe) as a separate process. But why is that a problem?

couchdb hovercraft limitations, storing arbitrary erlang terms into Couchdb

so Ive been messing with hovercraft and ran into some anoying limitation, that are probably there due to the fact that internally couchdb deals with key/value pairs associated with a document as opaque strings (json strings).
namelly:
- doc _id's can only be binary strings (utf8) - no complex erlang terms allowed here
- key/value pairs can only be binatry_strings or atoms or lists (no tuples, or arbitrary binaries allowed).
I was looking forward to storing arbitrary erlang terms in there, without encoding them as JSON first. yes this is possible, but then the entire view system (and the http api,notifications,verification,indexing) just stops working.
that too is fine, I could code around it, not use futon, map/reduce over documents manually and store results as documents (which actually is better since then those results can be replicated to other DBs/nodes, unlike views results(which dont replicate - correct me if Im wrong)).
the real problem seems that without views one cannot get a list of all the keys that are stored in a db, at least not via the current hovercraft api. that is a show stoper for mapreducing manually over an entire db, without knowing prior what the doc _id's are.
any ideas as to how I can get a list of these keys in a db? via erlang calls, possibly into the internals of couchdb?
its even more obvious to me now that the direct erlang api for couchdb was a total afterthough.
As the author of Hovercraft, I agree with the statement "the direct erlang api for couchdb was a total afterthough."
You should only use Hovercraft if you are converting CouchDB from an HTTP server to say, an SMTP server. HTTP will scale much better than Hovercraft.
It should be possible to use the internal _changes API to iterate over all the docs in the database and maintain a secondary index incrementally.
As for storing non-JSON data in CouchDB, that sounds risky as no one will be looking out to make sure we don't break your use case.
But if you are having fun, by all means, continue. And I love getting patches to Hovercraft, so any little thing will probably get rolled back in.
Thanks,
Chris

Store large amounts of data in RMS

I need to store large amount of data using RMS API through J2ME.
How can I store that multiple-column data, given they must be hardcoded so I need to store those multiple columns and rows data.
How can I do this in this, should I use structs?
Well, RMS only allows you to store records that are arrays of bytes. You will have to decide for yourself how a record is stored, and if you want to store your data in a single or in multiple records. If you use the DataInputStream and DataOutputStream classes, you'll be able to read/write Strings, booleans, integers, etc. The API documentation includes a decent example of how you can do this.
If you have complex data to store, or a lot of different objects, you may want to create a simple library for RMS I/O, that allows you to pass objects implementing e.g. "Storable" to a library class that writes away your object into RMS.
See my question on exacty the same topic. In the end we bought a commercial BTree imeplementation and extended it to work across multiple record stores.
Has Jeroen said RMS is quite basic. You can only store arrays of bytes. But, though it's basic, it's quite easy implement a more complex memory structure with an index stored in a record store and addressing other record stores containing data.
Have a look to this page : Understanding the Record Management System

Resources