choosing right data structure - visual-c++

I need to know
What is the best data structure to use when transferring and storing large amounts of data across different COM objects in MFC application.
(the data is usually large strings, xml files, images etc)
Is there any memory issue if I use CList, CMap etc
Thanks

1) Data structures to use is totally depend on the application and data needs to be store. Which ever data structure you will use is not going to affect the result but it defiantly affect runtime algorithm. I liked the following statement so pasting here.
The universal properties of data structures are the amount of memory used in storing the contents, and the time and additional memory each operation takes. You come to know those for some important kinds ofdata structures and look for a fit with the requirements on footprint or responsiveness.
2) Personally I dont think their will be any memory issue if you properly manage data/objects stored into heap/stack from the data structure.

Related

Persisting only part of a data source

I'm using intake to access the catalog catalog.ocean.GFDL_CM2_6.GFDL_CM2_6_control_ocean_surface.
At the moment I only work with small patches of that data, but accessing that data every single time is still quite costly (it's on Google Cloud Storage). So I want to use the persist option of intake to store that data locally. However as far as I've understood from the docs, it looks like one can only persist the whole dataset. For that specific dataset that would amount to almost 400 dollars if I take a cost of 0.1$ per GB, since the total data is 3976GB.
Hence my questions:
Is there a way (especially for a zarr file which in theory should make this quite easy) to persist only parts of the data (for instance only a subset of the variables)
This is probably more complicated, but can I push things further, by persisting regions of data I'm interested in (in terms of coordinates values for instance)?
There is no direct Intake way to do what you are asking for. Intake was conceived as a way to get your data into a format that you can then manipulate as you normally do, i.e., deal with only the loading part, so that a persisted data-set is the same as the original.
However, it is not hard to accomplish manually: you should grab the xarray, filter for the region you need, and call to_zarr to save the new dataset. You can then point a simple catalogue entry like the old one at the new location.
You could have done this manipulation in a driver directly if this was a specific pattern that would repeat a lot. In fact, we have mooted the idea of whether/how to implement such processing steps in Intake, but there is no plan yet. In the end, we may take the work on pipelines in Holoviews to describe processing steps.

Is Monotouch.Dialog a suitable replacement for all UITableviews?

This question is mainly targeted towards Miguel as the creator of MT.Dialog but I would like to hear opinions of others as well.
I'm currently refactoring a project that has many table views. I'm wondering if I should replace All of them with MT.Dialog.
My pros are:
easy to use
simple code
hope that Xamarin will offer it cross platform one day
Cons:
my cells are complete custom made. Does it make sense in that case?
performance? Is that an issue?
breaking the MVC paradigms (source no longer separated from view and controller)
Is it in general better to just use MT.Dialog or inherit from it for specific use cases? What are your experiences?
To address some of your questions.
The major difference between MonoTouch.Dialog and UITableView is that with the former you "load" all the data that you want to render upfront, and then forget about it. You let MonoTouch.Dialog take care of rendering it, pushing views and taking care of sections/elements. With UITableView you need so provide callback methods for returning the number of sections, the titles for the sections and the data itself.
UITableView has the advantage that to render say a million rows with the same size and the same cells, you dont really have to load all the data upfront, you can just wait to be called back. That being said, this breaks quickly if you use cells with different heights, as UITableView will have to query for the sizes of all of your rows.
So in short:
(1) yes, even if you use custom cells, you will benefit from shorter code and a simpler programming model. Whether you use the other features of it or not, is up to you.
(2) For performance, the issue boils down to how many rows you will have. Like I mentioned before, if you are browsing a potentially large data set, you would have to load all of those cells in memory up front, or like TweetStation, add features to load "on-demand".
The reality is that it will consume more memory, because you need to load your data in MonoTouch.Dialog. Your best optimization technique is to keep your Elements very lightweight. Tweetstation for example uses a "TweetElement" that merely holds the ID to the tweet, and loads the actual contents on demand, to keep the size of the TweetElement in memory very small.
With UITableView, you do not pay for that price. But if you are not using a database of some sort, the data will still be in memory.
If your application calls for the data to be in memory, then you might as well move the data to be elements and use that as your model.
(3) This is a little bit of a straw man. Your data "source" is never really independent of UIKit. I know that people like to talk about these models as being reusable, but in practice, you wont ever be able to reuse a UITableViewSource as a source for anything but a UITableView. It's main use is to support scalable controls that do not require data to be loaded in memory up-front, it is not really about separating the Model from the View.
So what you really have is an adaptor class that bridges the world of the UITableView with your actual data model (a database, an XML list, an in-memory array, a Redis connection).
With UITableView, your adaptor code lives in the constructor and the UITableViewSource. With MonoTouch.Dialog your adatpro code lives in the code that populates the initial RootElement to DialogViewController.
So there are reasons to use UITableView over MonoTouch.Dialog, but it is none of those three Cons.
I use MonoTouch.Dialog (and it's brother QuickDialog for objc) pretty much every time I use a tableview. It does help a lot to simplify the code, and gives you a better abstraction of a table.
There's one exception, though, which is when the table will have thousands and thousands of rows, and the data is in a database. MT.D/QD requires you to load all the data upfront, so you can create the sections, and that's simply too slow if you don't already have the objects in memory.
Regarding "breaking MVC", I kind of agree with you. I never really use the reflection bindings in MT.D because of that fact. Usually I end up creating the root from scratch in code, or use something like JSON (in my fork https://github.com/escoz/MonoMobile.Forms), so that my domain objects don't need to know about MT.D.

Space efficient embedded Haskell persistence solution

I'm looking for a persistence solution (maybe a NoSQL db? or something else...) that has the following criteria:
1) Has a Haskell API
2) Is disk space efficient--the db could easily get to many gigabytes of data but I need it to run well on a typical desktop. I need something that stores the data as efficiently as possible. So, for example, storing field names in a record would be bad.
3) High performance for reading sequential records. The typical use case is start somewhere and then read forward straight through the data--reading through possibly millions of records as quickly as possible.
4) Data is basically never changed (would only be changed if it was discovered data was incorrect somehow), just logged
5) It should act directly on file(s) that can be easily moved/copied around. It should not be calling a separate running server.
If you remove the "single file" requirement with no other running process, everything else can be fulfilled by every standard RDBMS, and depending on the type of data, sometimes especially well by columnar stores in particular.
The only single-file solution I know of is sqlite. Mainly sqlite founders when a single db needs to be accessed by multiple concurrent processes. If that isn't the case, then I wouldn't be surprised if you could scale it up singificantly.
Additionally, if you're only looking for sequential scans and key-value stores, you could just go with berkeleydb, which is known to be high-performance for very large data sets.
There are high quality Haskell bindings for talking to both sqlite and berkeleydb.
Edit: For sequential access only, its also blindingly straightforward to roll your own layer with the binary or cereal packages -- you basically need to write a helper function to wrap reading records from a file sequentially rather than all at once. An abstraction for folding over them is nice as well. Then you can decide to append to a single file, or spread your writes across files as you go. Either way, that's the most lightweight and straightforward option of all. The only drawback is having to worry about durability -- safe writes in the presence of interrupts, and all that other stuff that a good DB solution should take care of for you.
CouchDB ticks most of your boxes:
1) http://hackage.haskell.org/package/CouchDB
2) Depends on how you use it. You can store any binary data in it, but its up to you to know what it means. Or you can store XML or JSON, which is less space efficient but easier to migrate as your schema evolves (which it will).
3) Don't know, but its used for big web sites.
4) CouchDB uses a CM-like concept of updates and baselines, so old data stays around. It can be purged later as obsolete, but I think thats optional.
5) No. Its written in Erlang and runs (I believe) as a separate process. But why is that a problem?

Data structure for Objects

Which data structures I should store the real life 'objects' in?
I am not looking for computer representation. I am looking for different data structures for different item in real life access/storage etc. Is there any study on this?
Update:
Based upon comments, I should remove the 'data' from data structures and simply looking for structures to store various objects in based upon usability rules.
Your question is a bit too vague to answer well, but in general you can think about using existing "objects"/models/representations of the abstract things you want to model or manipulate.
If those don't exist then you build your own.
Which data structure to use completely depends on the type of action you are going to perform on your data.
Some data structures are useful for random access(Arrays) while others are fast for insert delete operation( linked list )
Some store key value pair( HashMap or TreeMap)
Different operation varies arithmetically from each other in terms of time and space.So use the data structure that suit your requirements properly.

Concerns about Core Data

I'm getting ready to dive into my first Core Data adventure. While evaluating the framework two questions came up that really got me thinking about using Core Data at all for this project or to stick with SQLite.
My app will heavily rely upon importing data from an external source. I'm aware that one can import into Core Data but handling complex relationships seems complicated and tedious. Is there an easy way to accomplish complex imports?
The app has to be able to execute complex queries spanning multiple tables or having multiple conditions. Building these predicates and expressions simply scares me...
Is it worth to take the plunge and use Core Data or should I stick with SQLite?
As I and others have said before, Core Data is really an object-graph management framework. It manages the relationships between model objects, including constraints on their cardinality, and manages cascading deletes etc. It also manages constraints on individual attributes. Core Data just happens to also be able to persist that object graph to disk. It can do this in a number of formats, including XML, binary, and via SQLite. Thus, Core Data is really orthogonal to SQLite. If your task is dealing with an embedded SQL-compatible database, go with SQLite. If your task is managing the model layer of an MVC app, go with Core Data. In specific answers to your questions:
There is no magic that can automatically import complex data into any model. That said, it is relatively easy in Core Data. Taking a multi-pass approach and using the SQLite backend can help with memory consumption by allowing you to keep only a subset of the data in memory at a time. If the data sets can be kept in memory, you can write a custom persistent store format that reads/writes directly to your legacy data format from within Core Data (see the Atomic Store Programming Guide).
Building a complex NSPredicate declaratively is somewhat verbose but shouldn't scare you. The Predicate Programming Guide is a good place to start. You can, of course, also write predicates using a string format, much like a string-formatted SQL statement. It's worth noting that, as described above, the predicates in Core Data are on the objects and object graph, not on the SQL tables. If you really want to think at the level of tables, stick with SQLite and write your own wrapper.
I can't really speak to your first point.
However, regarding your second point, using Core Data means you don't have to really worry about complex queries since you can just pretend that all the relationships are properly established in memory already (Apple's implementation details aside). It doesn't matter how complex a join it might be in a database environment because you really aren't in a database environment. If you need to get the fourth child of the grandparent of your current object and then find that child's pet's name and breed, all you do is traverse up the object tree in code using a series of messages or properties. No worries about joins or anything. The only problem is it might be really slow depending on your objects' relationships, but I can't really speak accurately to that since I haven't actually implemented anything using Core Data (I've just read about it extensively on Apple's and others' websites).
If the data importer from an external source is written based on the same core data model (for the targeted/destination side of the import) - nothing will be conceptually different as compare to using/updating the same data (through the core data stack from your actual application).
If you create the data importer without using the core data stack, make sure you learn well the db schema that would be generated/expected by the core data based model. There is nothing magic there - just make sure you follow how the cross entity relationships are implemented and how entity hierarchies are stored.
I had to create recently a data importer from Access database into the core data based Sqlite store as a .NET app. Once my destination core data model was define, I created a small app that populated the Sqlite store with randomly generated entities (including all the expected relationships). Then, I reverse engineered how the core data actually created the Sqlite store for the model and how it handles the relationships by learning from the generated and persisted data. Then, I implemented the .NET based importer/data-transformer according to my observations. At the end, I got perfect core data friendly data store that could be open an modified from the application that was using the core data stack on Mac OSX.

Resources