Which concurrency models do multi-process/thread programming belong to?

Which concurrency models do multi-process/thread programming belong to? - multithreading

Wikipedia C/S article says
A number of formalisms for modeling and understanding concurrent
systems have been developed, including:[5]
The parallel random-access machine[6]
The actor model
Computational bridging models such as the bulk synchronous parallel (BSP) model
Petri nets
Process calculi
Calculus of communicating systems (CCS)
Communicating sequential processes (CSP) model
π-calculus
Tuple spaces, e.g., Linda
Simple Concurrent Object-Oriented Programming (SCOOP)
Reo Coordination Language
Which model(s) do multi-process programming (as in Linux API, MPI, Java, Python) belong to?
Which model(s) do multi-threading programming (as in PThread, Java, Python) belong to?

Let me add I bit of thoughts:
occam-pi is true-[PARALLEL] language well fit onto parallel InMos T-414 Transputer hardware ( actually a hardware network of Transputers ). Process-flow was based on the theory of lambda-calculus guaranteed scheduling strategy and coordination was thanks to the seminal work of Hoarre's CSP not a constraint to achieving a true-[PARALLEL] execution, pure-[SERIAL] execution ( where feasible ) and opportunistic "just"-[CONCURRENT], where required.
So the language ( paradigm ) does not uniquely map onto some above Wikipedia listed archetype form of parallelism. Also the external code-execution eco-system properties matter.
Python, on the other hand, is since the Guido Rossum's design decision a pure sequential interpreter process ( whatever amount of threads one might have instantiated, as the central Global Interpreter Lock, the GIL-lock, knowingly chops the flow of time and permits one and only one thread to execute, all others waiting for GIL, thus the code principally avoids any form of a just-[CONCURRENT] related collisions ( race condition to acquire a resource, read(s) colliding with write(s) et al ).
Python can use message passing using MPI or ZeroMQ, can use a CSP paradigm module, has modules that enjoy the actor-model behaviour ( as an example in, mimicking the XEROX PARC-place invention of a Model-Visual-Controller coordination ) so the language typically does not constrain a paradigm, being used on a higher layer of abstraction ( while the lower level constraints do limit how hard-real-time any such abstracted form of execution may get harnessed, as any low-level limitations extend all upper abstraction-layers latency and may introduce fine-grain blocking state(s), that are outside of a domain of control of the upper-layer abstracted code-execution behaviour(s) )
Python can use multiprocessing ( joblib decorated or not ) - it helped to partially escape from the principal, and as Guido Rossum has expressed on his own that GIL-lock will remain a natural part of the python interpreter, unless an immense scale of total re-design of the whole concept of the interpreter is undertaken (which is not, in his view, a probable direction of further efforts spent in this domain). Attempts to escape from the otherwise, known and forever present principal GIL-lock orchestrated re-[SERIAL]-isation of any number of threads execution were developed, yet each one comes at a cost - human-related: refactoring the code, system-related: re-spawning the full-identical copies of the original python interpreter state ( the only chance under Windows-class O/S-es, partial or ad-hoc copies in linux fork or forkserver ), making troubles for both newbies and practitioners by ignoring or wrong guesses of the Amdahl Law add-on costs added right due to process-instantiation costs ( TIME + RAM-allocations TIME + RAM-to-RAM copy TIME + parameters / interprocess SER/DES-add-on TIME ), sum of which may easily wipe up any promises or wished-to-have speedups from going into a "just"-[CONCURRENT] or a true-[PARALLEL] code-execution domain.
Python can, as most of the other mentioned examples, participate in a distributed-computing-infrastructure, where higher-layer paradigms control the mode-of-cooperative execution, so the macro-system may have higher-levels of concurrency, not visible from "inside" a python-node.
The above-listed "forms"-of are sort of academic ( missing a hardware-based ILP-parallelism, AND-based and OR-based forms of fine-grain forms of parallelisms ), PRAM-s being the subject of C/S research as deep as in late 60-ies, early 70-ies, when it was concluded that even PRAM-based architectures cannot escape from Class-2 computing taxonomy.
"Section 4.3 ( IS THERE ANY CHANCE FOR A GIANT LEAP ) BEYOND THE CLASS-2 COMPUTER BOUNDARIES 2
The main practical - though negative - implication of the previous thoughts is a fact
that within the Class-2 computing, there is not to be expected any efficient solution
for sequentially intractable problems.
Nevertheless, a question raises here, whether some other sort of parallel computers could be imaginable,
that would be computationally more efficient than Class-2 computers.
Indications, coming from many known, conceptually different C2 class computer models,
suggest that without adding some other, fundamental computing capability, the parallelism per se
does not suffice to overcome C2 class boundaries,
irrespective how we try to modify, within all thinkable possibilites,
the architectures of such computers.
As a matter of fact, it turns out, that C2 class boundaries will be crossed,
if there would be a non-determinism added to an MIMD-type parallelism ( Ref. Section 3.5 ).
Non-deterministic PRAM (+)
can, as an example, solve ( intractable ) problems from NPTIME class
in polylogarithmic time and problems of a demonstrably exponential sequential complexity in polynomial time.
Because, in the context of computers, where the non-determinism is equally well technically feasible to be implemented
as a clairvoyance, the C2 computer class seems to represent, from the efficiency point of view,
the ultimate class for the parallel computers, the borders of which will never be crossed.
+) PRAM: a Parallel-RAM, not a SIMD-only processor, demonstrated by Savitch, Stimson in 1979 (1)
(1) SAVITCH, W. J. - STIMSON, M. J.: Time bounded random access machines with parallel processing. J. ACM 26, 1979, Pg. 103-118.
(2) WIEDERMANN, J.: Efficiency boundaries of parallel computing systems. ( Medze efektivnosti paralelných výpočtových systémov ).
Advances in Mathematics, Physics and Astronomy ( Pokroky matematiky, fyziky a astronomie ), Vol. 33 (1988), No. 2, Pg. 81--94"
Both a process-based and a thread-based code may per-se use, or participate in a gang-of-coordinated actors in almost any of the above enlisted forms-of-concurrency.
The code-implementation plus all the underlying resources' management constraints ( hardware + O/S + resource-management policy in respective context of use ) actually decide about what forms remain achievable in fields, when and how any piece of code gets executed - i.e., your code design may be of any level of geniality architecture-wise, if O/S policy resorts your code to get executed on a one only and the only one CPU-core ( due to user-process effective rights enforced affinity mapping constraints ), again any such smart-code will result in a re-[SERIAL]-ised code-execution ( paying all the add-on overhead costs of wished-to-have [CONCURRENT]-execution, but getting nothing in return of having spent and continuing to spend such add-on costs ) the very like the straightforward, pure-[SERIAL] code does [ which one also remains free from any wasted add-on costs, so results in a faster result generation, often with also enjoying a benefit of non-depleted CPU-core local L1/L2 cache hierarchies, if HPC-grade computing was carefully designed-in :o) ]

Related

Where to get hardware model data?

I have a task which consists of 3 concurrent self-defined (recursive to each other) processes. I need somehow to make it execute on computer, but any attempt to convert a requirement to program code with just my brain fails since first iteration produces 3^3 entities with 27^2 cross-relations, but it needs to implement at least several iterations to try if program even works at all.
So I decided to give up on trying to understand the whole system and formalized the problem and now want to map it to hardware to generate an algorithm and run. Language doesn't matter (maybe even directly to machine/assembly one?).
I never did anything like that before, so all topics I searched through like algorithm synthesis, software and hardware co-design, etc. mention hardware model as the second half (in addition to problem model) of solution generation, but I never seen one. The whole work supposed to look like this:
I don't know yet what level hardware model described at, so can't decide how problem model must be formalized to fit hardware model layer.
For example, target system may contain CPU and GPGPU, let's say target solution having 2 concurrent processes. System must decide which process to run on CPU and which on GPGPU. The highest level solution may come from comparing computational intensity of processes with target hardware, which is ~300 for CPUs and ~50 for GPGPUs.
But a normal model gotta be much more complete with at least cache hierarchy, memory access batch size, etc.
Another example is implementing k-ary trees. A synthesized algorithm could address parents and children with computing k * i + c / ( i - 1 ) / k or store direct pointers - depending on computations per memory latency ratio.
Where can I get a hardware model or data to use? Any hardware would suffice for now - to just see how it can look like - later would be awesome to get models of modern processors, GPGPUs and common heterogeneous clusters.
Do manufacturers supply such kinds of models? Description of how their systems work in any formal language.

I'm not pretty sure if it might be the case for you, but as you're mentioning modeling, I just thought about Modelica. It's used to model physical systems and combined with a simulation environment, you can run some simulations on it.

How to send multiple timeframe data from MQL4 to a Node.js?

I am trying to get multiple time-frame data of different trading instrument ( _Symbol ) from MetaTrader4 Terminal to a node.
How can I do it?
Can we do it from the same EA inside a MetaTrader4 Terminal?

A.1: Yes, we can.
A.2: No, that initial idea is not a good one.
While the intention is clear, the idea to use a single EA to send live-data for multiple trading instruments is not working for the said interest well.
MQL4 code-execution environment has some fixed, hard-wired internal logic and due to these + plus due to the reality, how Capital Markets and Broker-type Market access mediators work, the solo-EA will never fit these requirements.
A simple call to
iOpen( aTradingInstrumentSymbolNAME, // iHigh, iLow, iClose, iVolume, iTime
aSelectedTimeFrameDefinedCODE,
aRelativeBarPTR
)
is by far not enough.
Professional solution will require a lot of care for a real-time handling capabilities, for unmasking the actual flow of mutually hiding events, for achieving minimalistic processing latencies, so a quite high engineering expertise will be needed.
Start with learning the basics about Scripts, benchmark all your critical code-sections with recording their actual durations in [us] and assure, your code will remain non-blocking under all circumstances. This will decide, whether more than one code-execution thread(s) will be necessary in prime-time / for peak-hour.
Having managed that, your way just started to lead in a direction towards your expected result.
Next one has to decide about a feasible inter-process / distributed-computing data-flow and signalling, needed for inter-platform integration.
Last, but not least, important point is the legal-side of such undertaking. It depends both on your local juri§$§$§diction and Broker's Terms & Conditions as no one would enjoy to celebrate a technically well mastered Project from inside of jail.
All that, quite an interesting Project.

iOpen(Symbol(),PERIOD_M1,1) - is the way to get data from M1 ( last bar ), if you need another timeframe - replace PERIOD_M1 with another ENUM_TIMEFRAMES. So what is the problem? Usually StackOverflow requires to see your MCVE-based example to help you.

Nools and Drools

I was really happy to see a rules engine in Node and also was looking at Drools in the Java world and while reading the documentation (specifically: http://docs.jboss.org/drools/release/6.1.0.Final/drools-docs/html_single/index.html#PHREAK)found that Drools 6.0 has evolved and now uses the PHREAK method for rules matching. The specific paragraph that is of interest is:
Each successful join attempt in RETE produces a tuple (or token, or
partial match) that will be propagated to the child nodes. For this
reason it is characterised as a tuple oriented algorithm. For each
child node that it reaches it will attempt to join with the other side
of the node, again each successful join attempt will be propagated
straight away. This creates a descent recursion effect. Thrashing the
network of nodes as it ripples up and down, left and right from the
point of entry into the beta network to all the reachable leaf nodes.
For complex rules and rules over a certain limit, the above quote says that RETE based method trashes the memory quite a lot and so it was evolved into PHREAK.
Since nools is based on the Rete algorithm, is the above valid? Are there any optimizations done similar to PHREAK? Any comparisons done w.r.t to Drools?

The network thrashing is only an issue when you want to try and apply concurrency and parallelism, which requires locking in areas. As NodeJS is single threaded, that won't be an issue. We haven't yet attempted to solve this area in Drools yet either - but the Phreak work was preparation with this in mind, learning from the issues we found from our Rete implementation. On a separate note Rete has used partition algorithms in the past for parallelism, and this work is in the same area for the problem it's trying to solve.
For single threaded machines lazy rule evaluation is much more interesting. However as the document notes a single rule of joins will not differ in performance between Phreak and Rete. As you add lots of rules, the lazy nature avoids potential work, thus saving over-all cpu cycles. The algorithm is also more forgiving for a larger number of badly written rules, and should degrade less in performance. For instance it doesn't need the traditional Rete root "Context" object that is used to drive rule selection and short-circuit wasteful matching - this would be seen as anti-pattern in Phreak and may actually slow it down, as you blow away matches it might use again in the future.
http://www.dzone.com/links/rip_rete_time_to_get_phreaky.html
Also the collection oriented propagation is relevant when multiple subnetworks are used in rules, such as with multiple accumulates.
http://blog.athico.com/2014/02/drools-6-performance-with-phreak.html
I also did a follow up on the backward chaining and stack evaluation infrastructure:
http://blog.athico.com/2014/01/drools-phreak-stack-based-evaluations.html
Mark (Creator of Phreak)

Improving simulation performance via concurrency

Consider this sequential procedure on a data structure containing collections (for simplicity, call them lists) of Doubles. For as long as I feel like, do:
Select two different lists from the structure at random
Calculate a statistic based on those lists
Flip a coin based on that statistic
Possibly modify one of the lists, based on the outcome of the coin toss
The goal is to eventually achieve convergence to something, so the 'solution' is linear in the number of iterations. An implementation of this procedure can be seen in the SO question here, and here is an intuitive visualization:
It seems that this procedure could be better performed - that is, convergence could be achieved faster - by using several workers executing concurrently on separate OS threads, ex:
I guess a perfectly-realized implementation of this should be able to achieve a solution in O(n/P) time, for P the number of available compute resources.
Reading up on Haskell concurrency has left my head spinning with terms like MVar, TVar, TChan, acid-state, etc. What seems clear is that a concurrent implementation of this procedure would look very different from the one I linked above. But, the procedure itself seems to essentially be a pretty tame algorithm on what is essentially an in-memory database, which is a problem that I'm sure somebody has come across before.
I'm guessing I will have to use some kind of mutable, concurrent data structure that supports decent random access (that is, to random idle elements) & modification. I am getting a bit lost when I try to piece together all the things that this might require with a view towards improving performance (STM seems dubious, for example).
What data structures, concurrency concepts, etc. are suitable for this kind of task, if the goal is a performance boost over a sequential implementation?

Keep it simple:
forkIO for lightweight, super-cheap threads.
MVar, for fast, thread safe shared memory.
and the appropriate sequence type (probably vector, maybe lists if you only prepend)
a good stats package
and a fast random number source (e.g. mersenne-random-pure64)
You can try the fancier stuff later. For raw performance, keep things simple first: keep the number of locks down (e.g. one per buffer); make sure to compile your code and use the threaded runtime (ghc -O2) and you should be off to a great start.
RWH has a intro chapter to cover the basics of concurrent Haskell.

Meaning of Leaky Abstraction?

What does the term "Leaky Abstraction" mean? (Please explain with examples. I often have a hard time grokking a mere theory.)

Here's a meatspace example:
Automobiles have abstractions for drivers. In its purest form, there's a steering wheel, accelerator and brake. This abstraction hides a lot of detail about what's under the hood: engine, cams, timing belt, spark plugs, radiator, etc.
The neat thing about this abstraction is that we can replace parts of the implementation with improved parts without retraining the user. Let's say we replace the distributor cap with electronic ignition, and we replace the fixed cam with a variable cam. These changes improve performance but the user still steers with the wheel and uses the pedals to start and stop.
It's actually quite remarkable... a 16 year old or an 80 year old can operate this complicated piece of machinery without really knowing much about how it works inside!
But there are leaks. The transmission is a small leak. In an automatic transmission you can feel the car lose power for a moment as it switches gears, whereas in CVT you feel smooth torque all the way up.
There are bigger leaks, too. If you rev the engine too fast, you may do damage to it. If the engine block is too cold, the car may not start or it may have poor performance. And if you crank the radio, headlights, and AC all at the same time, you'll see your gas mileage go down.

It simply means that your abstraction exposes some of the implementation details, or that you need to be aware of the implementation details when using the abstraction. The term is attributed to Joel Spolsky, circa 2002. See the wikipedia article for more information.
A classic example are network libraries that allow you to treat remote files as local. The developer using this abstraction must be aware that network problems may cause this to fail in ways that local files do not. You then need to develop code to handle specifically errors outside the abstraction that the network library provides.

Wikipedia has a pretty good definition for this
A leaky abstraction refers to any implemented abstraction, intended to reduce (or hide) complexity, where the underlying details are not completely hidden
Or in other words for software it's when you can observe implementation details of a feature via limitations or side effects in the program.
A quick example would be C# / VB.Net closures and their inability to capture ref / out parameters. The reason they cannot be captured is due to an implementation detail of how the lifting process occurs. This is not to say though that there is a better way of doing this.

Here's an example familiar to .NET developers: ASP.NET's Page class attempts to hide the details of HTTP operations, particularly the management of form data, so that developers don't have to deal with posted values (because it automatically maps form values to server controls).
But if you wander beyond the most basic usage scenarios the Page abstraction begins to leak and it becomes hard to work with pages unless you understand the class' implementation details.
One common example is dynamically adding controls to a page - the value of dynamically-added controls won't be mapped for you unless you add them at just the right time: before the underlying engine maps the incoming form values to the appropriate controls. When you have to learn that, the abstraction has leaked.

Well, in a way it is a purely theoretical thing, though not unimportant.
We use abstractions to make things easier to comprehend. I may operate on a string class in some language to hide the fact that I'm dealing with an ordered set of characters that are individual items. I deal with an ordered set of characters to hide the fact that I'm dealing with numbers. I deal with numbers to hide the fact that I'm dealing with 1s and 0s.
A leaky abstraction is one that doesn't hide the details its meant to hide. If call string.Length on a 5-character string in Java or .NET I could get any answer from 5 to 10, because of implementation details where what those languages call characters are really UTF-16 data-points which can represent either 1 or .5 of a character. The abstraction has leaked. Not leaking it though means that finding the length would either require more storage space (to store the real length) or change from being O(1) to O(n) (to work out what the real length is). If I care about the real answer (often you don't really) you need to work on the knowledge of what is really going on.
More debatable cases happen with cases like where a method or property lets you get in at the inner workings, whether they are abstraction leaks, or well-defined ways to move to a lower level of abstraction, can sometimes be a matter people disagree on.

I'll continue in the vein of giving examples by using RPC.
In the ideal world of RPC, a remote procedure call should look like a local procedure call (or so the story goes). It should be completely transparent to the programmer such that when they call SomeObject.someFunction() they have no idea if SomeObject (or just someFunction for that matter) are locally stored and executed or remotely stored and executed. The theory goes that this makes programming simpler.
The reality is different because there's a HUGE difference between making a local function call (even if you're using the world's slowest interpreted language) and:
calling through a proxy object
serializing your parameters
making a network connection (if not already established)
transmitting the data to the remote proxy
having the remote proxy restore the data and call the remote function on your behalf
serializing the return value(s)
transmitting the return values to the local proxy
reassembling the serialized data
returning the response from the remote function
In time alone that's about three orders (or more!) of magnitude difference. Those three+ orders of magnitude are going to make a huge difference in performance that will make your abstraction of a procedure call leak rather obviously the first time you mistakenly treat an RPC as a real function call. Further a real function call, barring serious problems in your code, will have very few failure points outside of implementation bugs. An RPC call has all of the following possible problems that will get slathered on as failure cases over and above what you'd expect from a regular local call:
you might not be able to instantiate your local proxy
you might not be able to instantiate your remote proxy
the proxies may not be able to connect
the parameters you send may not make it intact or at all
the return value the remote sends may not make it intact or at all
So now your RPC call which is "just like a local function call" has a whole buttload of extra failure conditions you don't have to contend with when doing local function calls. The abstraction has leaked again, even harder.
In the end RPC is a bad abstraction because it leaks like a sieve at every level -- when successful and when failing both.

What is abstraction?
Abstraction is a way of simplifying the world.
It means you don't have to worry about what is actually happening under the hood.
Example: Flying a 737/747 is "abstracted" away
Planes are complicated systems: it involves: jet engines, oxygen systems, electrical systems, landing gear systems etc.
...but the pilot doesn't have to worry about it... all of that is "abstracted away". The only thing a pilot needs to focus on is yoke (i.e. steering wheel of the plane).
He pushes the yoke left to go left, and right to go right etc.
....that is in an ideal world. In reality, flying a plane is much more complicated. Because many details ARE NOT "abstracted away".
Leaky Abstractions in 737 Example
Pilots in reality have to worry about a LOT of things: wind speed, thrust, angles of attack, fuel, altitude, weather problems, angles of descent. Computers can help the pilot in these tasks, but not everything is automated / simplified......not everything is "abstracted away".
e.g. If the pilot pulls up too hard on the column - the plane will obey, but then the plane might stall, and that's really bad.
In other words, it is not enough for the pilot to simply control the steering wheel without knowing anything else.........nooooo.......the pilot must know about the underlying risks and limitations of the plane before the pilot flies one.......the pilot must know how the plane works, and how the plane flies; the pilot must know implementation details..... that pulling up too hard will lead to a stall, or that landing too steeply will destroy the plane etc.
Those things are not abstracted away. A lot of things are abstracted, but not everything. The abstraction is "leaky".
Leaky Abstractions in Code
......it's the same thing in your code. If you don't know the underlying implementation details, then you're gonna have problems.
ORMs abstract a lot of the hassle in dealing with database queries, but if you've ever done something like:
User.all.each do |user|
puts user.name # let's print each user's name
end
Then you will realise that's a nice way to kill your app. You need to know that calling User.allwith 25 million users is going to spike your memory usage, and is going to cause problems. You need to know some underlying details. The abstraction is leaky.

An example in the django ORM many-to-many example:
Notice in the Sample API Usage that you need to .save() the base Article object a1 before you can add Publication objects to the many-to-many attribute. And notice that updating the many-to-many attribute saves to the underlying database immediately, whereas updating a singular attribute is not reflected in the db until the .save() is called.
The abstraction is that we are working with an object graph, where single-value attributes and mult-value attributes are just attributes. But the implementation as a relational database backed data store leaks... as the integrity system of the RDBS appears through the thin veneer of an object interface.

The fact that at some point, which will guided by your scale and execution, you will be needed to get familiar with the implementation details of your abstraction framework in order to understand why it behave that way it behave.
For example, consider this SQL query:
SELECT id, first_name, last_name, age, subject FROM student_details;
And its alternative:
SELECT * FROM student_details;
Now, they do look like a logically equivalent solutions, but the performance of the first one is better due the individual column names specification.
It's a trivial example but eventually it comes back to Joel Spolsky quote:
All non-trivial abstractions, to some degree, are leaky.
At some point, when you will reach a certain scale in your operation, you will want to optimize the way your DB (SQL) works. To do it, you will need to know the way relational databases works. It was abstracted to you in the beginning, but it's leaky. You need to learn it at some point.

Assume, we have the following code in a library:
Object[] fetchDeviceColorAndModel(String serialNumberOfDevice)
{
//fetch Device Color and Device Model from DB.
//create new Object[] and set 0th field with color and 1st field with model value.
}
When the consumer calls the API, they get an Object[]. The consumer has to understand that the first field of the object array has color value and second field is the model value. Here the abstraction has leaked from library to the consumer code.
One of the solutions is to return an object which encapsulates Model and Color of the Device. The consumer can call that object to get the model and color value.
DeviceColorAndModel fetchDeviceColorAndModel(String serialNumberOfTheDevice)
{
//fetch Device Color and Device Model from DB.
return new DeviceColorAndModel(color, model);
}

Leaky abstraction is all about encapsulating state. very simple example of leaky abstraction:
$currentTime = new DateTime();
$bankAccount1->setLastRefresh($currentTime);
$bankAccount2->setLastRefresh($currentTime);
$currentTime->setTimestamp($aTimestamp);
class BankAccount {
// ...
public function setLastRefresh(DateTimeImmutable $lastRefresh)
{
$this->lastRefresh = $lastRefresh;
} }
and the right way(not leaky abstraction):
class BankAccount
{
// ...
public function setLastRefresh(DateTime $lastRefresh)
{
$this->lastRefresh = clone $lastRefresh;
}
}
more description here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string