I need to store large amount of data using RMS API through J2ME.
How can I store that multiple-column data, given they must be hardcoded so I need to store those multiple columns and rows data.
How can I do this in this, should I use structs?
Well, RMS only allows you to store records that are arrays of bytes. You will have to decide for yourself how a record is stored, and if you want to store your data in a single or in multiple records. If you use the DataInputStream and DataOutputStream classes, you'll be able to read/write Strings, booleans, integers, etc. The API documentation includes a decent example of how you can do this.
If you have complex data to store, or a lot of different objects, you may want to create a simple library for RMS I/O, that allows you to pass objects implementing e.g. "Storable" to a library class that writes away your object into RMS.
See my question on exacty the same topic. In the end we bought a commercial BTree imeplementation and extended it to work across multiple record stores.
Has Jeroen said RMS is quite basic. You can only store arrays of bytes. But, though it's basic, it's quite easy implement a more complex memory structure with an index stored in a record store and addressing other record stores containing data.
Have a look to this page : Understanding the Record Management System
Related
Just loaded up a redis server for my backend with ioredis.
I'm learning that if i want to store data in json spec, i gotta use the redisJSON module instead. Since hashes are only string typed and they are flat. However, if im only storing one object per user instance, containing less than 10 fields that are typed string/num or array.. is it better to just use without redisJSON? On one hand, redisJSON can let me query an object on one query. On the other, i can just store multiple datatypes and query between those sets/hash with a consistent naming convention.
Does anyone know whats the better usage or pitfalls with either approach?
the backend serves a websocket for a multiplayer boardgame.
The answer is it depends and it requires several trade-offs to be made for each project
Performance: RedisJSON uses a tree structure for storing all elements in a document.
Comparing to a string: the advantage is that updating sub-elements of a document will be faster than manipulating a string containing a serialised JSON object. But retrieving (reassembling) and writing the entire document will be more expensive compared to Strings. Read more here.
Comparing to Hash: when manipulating a flat document (1 level deep), RedisJSON and HSET performance are comparable.
Maintainability: using several native data types in Redis to represent your object can be really performing, but the code will be more complex to maintain. There can be additional migration/refactoring work when the structure of the document is altered.
Querying: RediSearch has support for indexing and querying the content RedisJSON documents. This is, of course, if your use case requires secondary indexing and querying documents other than with their key. You can still build your own secondary indexing with Redis data structures, but this is also a trade-off in maintainability
disclaimer: I work at Redis, creator and maintainer of RediSearch and RedisJSON
I'm looking to store arrays in Azure Table entities. At present, the only type of array supported natively is byte-array, limited to 64k length. The size is enough, but I'd like to store arrays of longs, doubles and timestamps in an entity.
I can obviously cast multiple bytes to the requested type myself, but I was wondering if there's any best-practice to achieve that.
To clarify, these are fixed length arrays (e.g. 1000 cells) associated with a single key.
I have written a Azure table storage client, called Lucifure Stash, which supports arrays, enums, large data, serialization, public and private properties and fields and more.
You can get it at https://github.com/hocho/LucifureStash
I've been trying to think of a nice way to do this other than the method you've already mentioned, and I'm at a loss. The simplest solution I can come up with is to take the array, binary serialize it and store in a binary array property.
Other options I've come up with but dismissed:
If storing it natively is important, you could keep this information in another child table (I know Azure Tables don't technically have relationships, but that doesn't mean you can't represent this type of thing). The downside of this being that it will be considerably slower than your original.
Take the array, XML serialize it and store it in a string property. This would mean that you could see the contents of your array when using 3rd party data explorer tools and you could run (inefficient) queries that look for an exact match on the contents of the array.
Use Lokad Cloud fat entities to store your data. This essentially takes you're whole object, binary serializes it and splits the results into 64kb blocks across the properties of the table entity. This does solve problems like the one you're experiencing, but you will only be able to access your data using tools that support this framework.
If you have just a key-value collection to store, then you can also check out Azure BLOBs. They can rather efficiently store arrays of up to 25M time-value points per single blob (with a random access within the dataset).
If you choose to store your object in blob storage and need more than one "key" to get it, you can just create an azure table or two or n where you store the key you want to look up and the reference to the exact blob item.
Why don't you store the values as csv strings?
You could serialize your array as a JSON string using the .NET JavaScript serializer:
http://msdn.microsoft.com/en-us/library/system.web.script.serialization.javascriptserializer.aspx
This class has a "MaxJsonLength" property you could use to ensure your arrays didn't exceed 64K when you were serializing them. And you can use the same class to deserialize your stored objects.
I need to know
What is the best data structure to use when transferring and storing large amounts of data across different COM objects in MFC application.
(the data is usually large strings, xml files, images etc)
Is there any memory issue if I use CList, CMap etc
Thanks
1) Data structures to use is totally depend on the application and data needs to be store. Which ever data structure you will use is not going to affect the result but it defiantly affect runtime algorithm. I liked the following statement so pasting here.
The universal properties of data structures are the amount of memory used in storing the contents, and the time and additional memory each operation takes. You come to know those for some important kinds ofdata structures and look for a fit with the requirements on footprint or responsiveness.
2) Personally I dont think their will be any memory issue if you properly manage data/objects stored into heap/stack from the data structure.
Which data structures I should store the real life 'objects' in?
I am not looking for computer representation. I am looking for different data structures for different item in real life access/storage etc. Is there any study on this?
Update:
Based upon comments, I should remove the 'data' from data structures and simply looking for structures to store various objects in based upon usability rules.
Your question is a bit too vague to answer well, but in general you can think about using existing "objects"/models/representations of the abstract things you want to model or manipulate.
If those don't exist then you build your own.
Which data structure to use completely depends on the type of action you are going to perform on your data.
Some data structures are useful for random access(Arrays) while others are fast for insert delete operation( linked list )
Some store key value pair( HashMap or TreeMap)
Different operation varies arithmetically from each other in terms of time and space.So use the data structure that suit your requirements properly.
We have a decent sized object-oriented application. Whenever an object in the app is changed, the object changes are saved back to the DB. However, this has become less than ideal.
Currently, transactions are stored as a transaction and a set of transactionLI's.
The transaction table has fields for who, what, when, why, foreignKey, and foreignTable. The first four are self-explanatory. ForeignKey and foreignTable are used to determine which object changed.
TransactionLI has timestamp, key, val, oldVal, and a transactionID. This is basically a key/value/oldValue storage system.
The problem is that these two tables are used for every object in the application, so they're pretty big tables now. Using them for anything is slow. Indexes only help so much.
So we're thinking about other ways to do something like this. Things we've considered so far:
- Sharding these tables by something like the timestamp.
- Denormalizing the two tables and merge them into one.
- A combination of the two above.
- Doing something along the lines of serializing each object after a change and storing it in subversion.
- Probably something else, but I can't think of it right now.
The whole problem is that we'd like to have some mechanism for properly storing and searching through transactional data. Yeah you can force feed that into a relational database, but really, it's transactional data and should be stored accordingly.
What is everyone else doing?
We have taken the following approach:-
All objects are serialised (using the standard XMLSeriliser) but we have decorated our classes with serialisation attributes so that the resultant XML is much smaller (storing elements as attributes and dropping vowels on field names for example). This could be taken a stage further by compressing the XML if necessary.
The object repository is accessed via a SQL view. The view fronts a number of tables that are identical in structure but the table name appended with a GUID. A new table is generated when the previous table has reached critical mass (a pre-determined number of rows)
We run a nightly archiving routine that generates the new tables and modifies the views accordingly so that calling applications do not see any differences.
Finally, as part of the overnight routine we archive any old object instances that are no longer required to disk (and then tape).
I've never found a great end all solution for this type of problem. Some things you can try is if your DB supports partioning (or even if it doesn't you can implement the same concept your self), but partion this log table by object type and then you can further partion by date/time or by your object ID (if your ID is a numeric this works nicely not sure how a guid would partion).
This will help maintain the size of the table and keep all related transactions to a single instance of an object to itself.
One idea you could explore is instead of storing each field in a name value pair table, you could store the data as a blob (either text or binary). For example serialize the object to Xml and store it in a field.
The downside of this is that as your object changes you have to consider how this affects all historical data if your using Xml then there are easy ways to update the historical xml structures, if your using binary there are ways but you have to be more concious of the effort.
I've had awsome success storing a rather complex object model that has tons of interelations as a blob (the xml serializer in .net didn't handle the relationships btw the objects). I could very easily see myself storing the binary data. A huge downside of storing it as binary data is that to access it you have to take it out of the database with Xml if your using a modern database like MSSQL you can access the data.
One last approach is to split the two patterns, you could define a Difference Schema (and I assume more then one property changes at a time) so for example imagine storing this xml:
<objectDiff>
<field name="firstName" newValue="Josh" oldValue="joshua"/>
<field name="lastName" newValue="Box" oldValue="boxer"/>
</objectDiff>
This will help alleviate the number of rows, and if your using MSSQL you can define an XML Schema and get some of the rich querying ability around the object. You can still partition the table.
Josh
Depending on the characteristics of your specific application an alternative approach is to keep revisions of the entities themselves in their respective tables, together with the who, what, why and when per revision. The who, what and when can still be foreign keys.
Although I would be very careful to use this approach, since this is only viable for applications with a relatively small amount of changes per entity/entity type.
If querying the data is important I would use true Partitioning in SQL Server 2005 and above if you have enterprise edition of SQL Server. We have millions of rows partitioned by year down to day for the current month - you can be as granular as your application demands with a maximum number of 1000 partitions.
Alternatively , if you are using SQL 2008 you could look into filtered indexes.
These are solutions that will enable you to retain the simplified structure you have whilst providing the performance you need to query that data.
Splitting/Archiving older changes obviously should be considered.