If yes then why priority queue is a non-linear data structure? Does non-linear data sturctures are bad in performance as compared to linear ones? If yes then why? Please explain in detail.
Linear Data Structures are lists and arrays. Priority Queue is an abstract data structure (abstract means it can be implemented in terms of other data structures in multiple ways) usually implemented in terms of a heap. For performance measurement, usually the asymptotic cost of operations is used. For example, how much time does N insertion operations take?
Sorry this is an incomplete answer. A complete answer is beyond the scope of SO.
Related
CAS belongs to the read-modify-write (RMW) family, a set of algorithms that allow you to perform complex transactions atomically.
Specifically, Wikipedia says that
CAS is used to implement synchronization primitives like semaphores and mutexes, as well as more sophisticated lock-free and wait-free algorithms. [...] CAS can implement more of these algorithms than atomic read, write, or fetch-and-add, and assuming a fairly large amount of memory, [...] it can implement all of them.
https://en.wikipedia.org/wiki/Compare-and-swap#Overview
So it seems that a CAS algorithm is the "one size fits all" product of its category. Why is that so? What do other RMW algorithms lack of? If CAS is the best tool, what are the other algorithms for?
CAS belongs to a class of objects called "consensus objects", each of which has a consensus number; the maximum number of threads for which a given consensus object can solve the consensus problem.
The consensus problem goes like this: for some number of threads n, propose some value p and decide on one of the proposed values d such that n threads agrees on d.
CAS is the most "powerful" consensus object because its consensus number is infinite. That is, CAS can be used to solve the consensus problem among a theoretically infinite number of threads. It even does it in a wait-free manner.
This can't be done with atomic registers, test-and-set, fetch-add, stacks since they all have finite consensus numbers. There are proofs for these consensus numbers but that is another story...
The significance of all of this is that it can be proven that there exists a wait-free implementation of an object for n threads using a consensus object with a consensus number of at least n. CAS is especially powerful because you can use it to implement wait-free objects for an arbitrary number of threads.
As to why other RMW operations are useful? Some problems in multiprocessing don't really involve solving the consensus problem for some arbitrary number of threads. For example, mutual exclusion can be solved using less powerful RMW operations like test-and-set (a simple TAS lock), fetch-add (ticket lock) or atomic swap (CLH lock).
More information on consensus for shared memory at Wikipedia Consensus (computer_science) section: In_shared-memory_systems
Also, there's a whole chapter on consensus and universal constructions in Herlihy and Shavit's The Art of Multiprocessor Programming (WorldCat) that I highly recommend.
So I'm working on a solver for the Hungarian Rings puzzle in Haskell (https://www.jaapsch.net/puzzles/rings.htm)
I'm not great at the language, still have a lot of blind spots. I'm struggling to figure out what data structure to use to represent the puzzle, and would love any hints, tips or answers for this! (btw how my current idea represents the coloured balls is as a series of numbers that will be in order when the puzzle is solved)
Much like how to represent a Rubik's cube in a data structure, a naive model contains redundant information, and the most compact model depends on an algebraic analysis of the object. So on the one hand, an operation on a model with redundant information may be inefficient, and an operation on a compact model (e.g. a permutation group) may be quite abstract when translating to physical operations.
So you may find that a permutation group of a higher order more easily describes it; here's from the Rubik's Cube Group article on Wikipedia:
The Rubik's Cube group is the subgroup of the symmetric group S₄₈ generated by the six permutations corresponding to the six clockwise cube moves.
And that might well correspond to a set of double-ended queues as luqui suggests, as long as you take into account that one rotating operation of one queue affects the other queues.
I want to write a simulation of a multi-entity system. I believe such systems motivated creation of Simula and OOP where each object would maintain its own state and the runtime would manage the the entire system (e.g. stop threads, serialize data).
On the other hand, I would like to have ability to rewind, change the simulation parameters and compare the results. Thus, immutability sounds great (at least up to almost certain garbage collection issues caused by keeping track of possibly redundant data).
However I don't know how to model this. Does this mean that I must put every interacting entity into a single, huge structure where each object update would require locating it first?
I'm worried that such approach would affect performance badly because of GC overhead and constant structure traversals as opposed to keeping one fixed address of entity in memory.
UPDATE
To clarify, this question asks if there is any other design option available other than creating a single structure that contains all possibly interacting entities as a root. Intuitively, such a structure would imply logarithmic single update penalty unless updates are "clustered" somehow to amortize.
Is there a known system where interactions could be modelled differently? For example, like in cold/hot data storage optimization?
After some research, there seems to be a connection with N-body simulation where systems can be clustered but I'm not familiar with it yet. Even so, would that also mean I need to have a single structure of clusters?
While I agree with the people commenting that this is a vague question, I'll still try to address some of the issues put forth.
It's true that there's some performance overhead from immutability, because when you use mutable state, you can update some values in-place, whereas with immutable state, some copying has to take place.
It is, however, a common misconception that this is causes problems with big 'object' graphs. It doesn't have to.
Consider a Haskell data structure:
data BigDataStructure = BigDataStructure {
bigChild1 :: AnotherBigDataStructure
, bigChild2 :: YetAnotherBigDataStructure
-- more elements go here...
, bigChildN :: Whatever }
deriving (Show, Eq)
Imagine that each of these child elements are big and complex themselves. If you want to change, say, bigChild2, you could write something like:
updatedValue = myValue { bigChild2 = updatedChild }
When you do that, some data copying takes place, but it's often less that most people think. This expression does create a new BigDataStructure record, but it doesn't 'deep copy' any of its values. It just reuses bigChild1, updatedChild, bigChildN, and all the other values, because they're immutable.
In theory (but we'll get back to that in a minute), the flatter your data structures are, the more data sharing should be enabled. If, on the other hand, you have some deeply nested data structures, and you need to update the leafs, you'll need to create a copy of the immediate parents of those leafs, plus the parents of those parents, and their parents as well, all the way to the root. That might be expensive.
That's the theory, though, but we've known for decades that it's impractical to try predict how software will perform. Instead, try to measure it.
While the OP suggest that significant data is involved, it doesn't state how much, and neither does it state the hardware specs of the system that's going to run the simulation. So, as Eric Lippert explains so well, the person who can best answer questions about performance is you.
P.S. It's my experience that when I start to encounter performance problems, I need to get creative with how I design my system. Efficient data structures can address many performance issues. This is just as much the case in OOP as it is in FP.
Is it correct to say that one can choose to design a concurrent/parallel system using atomic operations OR using immutable types? I.e. atomic operations leads to a designing a system of atomically shared state while immutable types leads to designing a system that avoids sharing all-together? Are these two designs essentially substitutes for each other (solving the same underlying problem) or do they address different problems (meaning it might be possible to have to use both atomic operations and immutable types to design a fully concurrency-safe system)?
While both concepts are relevant for concurrent systems, and both guarantee no intermediate state is ever read, they're very different and fit different scenarios. For example, iterating an immutable data structure guarantees correct and "safe" iteration, while mutable data structures are not safe for iteration even if mutating operations are atomic. On the other hand, atomically mutating a shared/central data store guarantees no incorrect data is ever read, while immutability is irrelevant since the store has to change.
Is it correct to say that one can choose to design a concurrent/parallel system using atomic operations OR using immutable types?
This seems somewhat like a forced argument to me. In the example you give, it seems like these two concepts are related but only tenuously. I'd say that these two different concepts both assist in implementing and optimizing concurrent/parallel architectures.
Are these two designs essentially substitutes for each other (solving the same underlying problem) or do they address different problems (meaning it might be possible to have to use both atomic operations and immutable types to design a fully concurrency-safe system)?
They are certainly not "substitutes". I would say that atomicity addresses a specific problem with concurrent systems but immutability is more of a "nice-to-have".
If you are locking between concurrent threads then you will need atomic operations to do (for example) test-and-set and unlock-and-wait operations which are required to share data and do signaling. There probably are other ways to accomplish this without atomicity but it probably gets a lot harder.
Immutable types are a hallmark of a concurrent/parallel architectures because threads can share data safely without worrying about modification. Immutability also helps with data visibility optimizations. But that said, immutability is not a requirement.
I.e. atomic operations leads to a designing a system of atomically shared state while immutable types leads to designing a system that avoids sharing all-together?
Not really. Atomicity is about avoiding race conditions when threads are updating the same state as mentioned above however sharing of state isn't necessarily atomic.
You can certainly share data between 2 threads without either atomicity nor immutable types. Thread #1 updating State1 which is shared by all threads and thread #2 updating State2 which is also shared.
Our Domain has a need to deal with large amounts (possibly more than 1000 records worth) of objects as domain concepts. This is largely historical data that Domain business logic needs do use. Normally this kind of processing depends on a Stored Procedure or some other service to do this kind of work, but since it is all intimately Domain Related, and we want to maintain the validity of the Model, we'd like to find a solution that allows the Aggregate to manage all of the business logic and rules required to work with the data.
Essentially, we're talking about past transaction data. Our idea was to build a lightweight class and create an instance for each transaction we need to work with from the database. We're uncomfortable with this because of the volume of objects we'd be instantiating and the potential performance hit, but we're equally uncomfortable with offloading this Domain logic to a stored procedure since that would break the consistency of our Model.
Any ideas on how we can approach this?
"1000" isn't really that big a number when it comes to simple objects. I know that a given thread in the system I work on may be holding on to tens of thousands of domain objects at a given time, all while other threads are doing the same at the same time. By the time you consider all of the different things going on in a reasonably complicated application, 1000 objects is kind of a drop in the bucket.
YMMV depending on what sort of resources those objects are holding on to, system load, hard performance requirements, or any number of other factors, but if, as you say, they're just "lightweight" objects, I'd make sure you actually have a performance problem on your hands before you try getting too fancy.
Lazy loading is one technique for mitigating this problem and most of the popular object-relational management solutions implement it. It has detractors (for example, see this answer to Lazy loading - what’s the best approach?), but others consider lazy loading indispensable.
Pros
Can reduce the memory footprint of your aggregates to a manageable level.
Lets your ORM infrastructure manage your units of work for you.
In cases where you don't need a lot of child data, it can be faster than fully materializing ("hydrating") your aggregate root.
Cons
Chattier that materializing your aggregates all at once. You make a lot of small trips to the database.
Usually requires architectural changes to your domain entity classes, which can compromise your own design. (For example, NHibernate just requires you to expose a default constructor make your entities virtual to take advantage of lazy loading - but I've seen other solutions that are much more intrusive).
By contrast, another approach would be to create multiple classes to represent each entity. These classes would essentially be partial aggregates tailored to specific use cases. The main drawback to this is that you risk inflating the number of classes and the amount of logic that your domain clients need to deal with.
When you say 1000 records worth, do you mean 1000 tables or 1000 rows? How much data would be loaded into memory?
It all depends on the memory footprint of your objects. Lazy loading can indeed help, if the objects in question references other objects which are not of interest in your process.
If you end up with a performance hog, you must ask yourself (or perhaps your client) if the process must run synchronously, or if it can be offloaded to a batch process somewhere else.
Using DDD, How Does One Implement Batch Processing?