Java : Use Disruptor or Not . . - disruptor-pattern

Hy,
Currently I am developing a program that takes 2 values from an amq queue and performs a series of mathematical calculations on them. A topic has been created on the amq server to which my program subscribes and receive messages via callbacks (listeners).
Now whenever a message arrives the two values are taken out of and added to the SynchronizedDescriptiveStatistics object. After each addition to the list of values the whole sequence of calculations is performed all over again (this is part of the requirement actually).
The problem I am facing right now is that since I am using listeners, sometimes a single or more messages are received in the middle of calculations. Although SynchronizedDescriptiveStatistics takes care of all the thread related issues it self but it adds all the waiting values in its list of numbers at once when it comes out of lock or something. While my problem was to add one value then perform calcls on it then second value and on and on.
The solution I came up with is to use job queues in my program (not amq queues). In this way whenever calcs are over the program would look for further jobs in the queue and goes on accordingly.
Since I am also looking for efficiency and speed I thought the Disruptor framework might be good for this problem and it is optimized for threaded situations. But I am not sure if its worth the trouble of implementing Disruptor in to my application because regular standard queue might be enough for what I am trying to do.
Let me also tell you that the data on which the calcs need to be performed is a lot and it will keep on coming and the whole calcs will need to be performed all over again for each addition of a single value in a continuous fashion. So keeping in mind the efficiency and the huge volume of data what do you think will be useful in the long run.
Waiting for a reply. . .
Regards.

I'll give our typical answer to this question: test first, and make your decision based on your results.
Although you talk about efficiency, you don't specifically say that performance is a fundamental requirement. If you have an idea of your performance requirements, you could mock up a simple prototype using queues versus a basic implementation of the Disruptor, and take measurements of the performance of both.
If one comes off substantially better than the other, that's your answer. If, however, one is much more effort to implement, especially if it's also not giving you the efficiency you require, or you don't have any hard performance requirements, then that suggests that solution is not the right one.
Measure first, and decide based on your results.

Related

How to deal with (Apache Beam) high IO bottlenecks?

Let's say to cite a simple example, that I have a very simple beam pipeline which just reads from a file and dumps the data into an output file. Now let's consider that the input file is huge (some GBs in size, the type of file you can't typically open in a text editor). Since the direct-runner implementation is quite simple (it reads the whole input set into memory), it won't be able to read and output those huge files (unless you assign an impractically high amount of memory to the java vm process); so my question is: "How do production runners like flink/spark/cloud dataflow" deal with this 'huge dataset problem'? - assuming they would not just try to put the whole file(s)/dataset into memory?" -.
I'd expect production runner's implementation need to work "in parts or batches" (like reading/processing/outputting in parts) to avoid trying to fit huge datasets into memory at any specific point in time. Can somebody please share their feedback regarding how production runners deal with this "huge data" situation?
Generalizing, please notice this applies for other input/output mechanisms too, for example if my input is a PCollection coming from huge database table (broadly speaking huge in both row-size and amount), does the internal implementation of the production's runner somehow divides the given input SQL statement into many internally generated sub statements each taking smaller subsets (for example by internally generating a count(-) statement, followed by N statements, each taking (count(-)/N) elements? the direct-runner won't do this and will just pass the given query 1:1 to the DB), or is my responsibility as a developer to "iterate in batches" and divide the problem, and if this is indeed the case, what are the best practices here, ie: having one pipeline for this or many?, and if only one then somehow parametrise the pipeline to read/write in batches? or iterate over a simple pipeline and manage necessary metadata externally to the pipeline?
thanks in advance, any feedback would be greatly appreciated!
EDIT (reflecting David's feedback):
David your feedback is highly valuable and definitely touches the point i'm interested in. Having a work discovery phase for splitting a source and and read phase to concurrently read the split-partitions is definitely what I was interested in hearing, so thanks for pointing me in the right direction. I have a couple of small follow up questions if you don't mind:
1 - The article points out under the section "Generic enumerator-reader communication mechanism" the following:
"The SplitEnumerator and SourceReader are both user implemented class.
It is not rare that the implementation require some communication
between these two components. In order to facilitate such use cases [....]"
So my question here would be, is that "splitting + reading behaviour" triggered by some user (ie. developer) provided implementation (specifically SplitEnumerator and SourceReader), or can I benefit from that out of the box without any custom code?.
2 - Probably just delving deeper into the question above; if I have a batch/bounded workload (let's say I'm using apache flink), and I'm interested in processing a "huge file" like described in the original post, will the pipeline work "out of the box" (doing the behind the scenes "work preparation phase" splits and the parallel reads), or would that require some custom code implemented by the developer?
thank's in advance for all your valuable feedback!
Note that when the inputs are bounded and known in advance (i.e., a batch workload as opposed to streaming), this is more straightforward.
In Flink, which is designed with streaming in mind, this is done by separating "work discovery" from "reading". A single SplitEnumerator runs once and enumerates the chunks to be read (the splits/partitions), and assigns them to parallel readers. In the batch case a split is defined by a range of offsets, while in the streaming case, the end offset for each split is set to LONG_MAX.
This is described in more detail in FLIP-27: Refactor Source Interface.
Just to provide some closure to this question, the justification for this question was to know if apache beam - when coupled with a production runner-(like flink or spark or google cloud dataflow), offered out of the box mechanisms for -splitting work a.k.a reading/writing manipulating - huge files (or datasources in general). The comment provided by David Anderson above proved of great value in hintintg at how Apache flink deals with this workflows.
At this point I've implemented solutions using huge files (for testing possible IO bottlenecks) with a "beam on flink" based pipeline, and I can confirm, that flink will create an excecution plan which includes splitting sources, and dividing work in such a way that no memory problem arises. Now, there can be of course conditions under which stability/"IO performance" is compromised, but at least I can confirm that the workflows carried out behind the pipeline abstraction, uses the filesystem when carriying out tasks, avoiding fitting all data in memory and thus avoiding trivial memory errors. Conclusion: yes "beam on flink" (and likely spark and dataflow too) do offer proper work preparation, work splitting and filesystem usage so that available volatile memory is used in an efficient way.
Update about datasources: Regarding DBs as datasources, Flink won't (and can't - it is not trivial) optimize/split/distribute work related to DB datasources in the same way it optimizes reading from the filesystem. There are still approaches to read huge amount of data (records) from a DB though, but the implementation details need to be addressed by the developer instead of being responsibility of the framework. I've found this article (https://nl.devoteam.com/expert-view/querying-jdbc-database-in-parallel-with-google-dataflow-apache-beam/) very helpful in addressing the point of reading massive amounts of records from a DB in beam (the article uses a cloud dataflow runner, but I used Flink and it worked just fine), splitting queries and distributing the processing.

How to send multiple timeframe data from MQL4 to a Node.js?

I am trying to get multiple time-frame data of different trading instrument ( _Symbol ) from MetaTrader4 Terminal to a node.
How can I do it?
Can we do it from the same EA inside a MetaTrader4 Terminal?
A.1: Yes, we can.
A.2: No, that initial idea is not a good one.
While the intention is clear, the idea to use a single EA to send live-data for multiple trading instruments is not working for the said interest well.
MQL4 code-execution environment has some fixed, hard-wired internal logic and due to these + plus due to the reality, how Capital Markets and Broker-type Market access mediators work, the solo-EA will never fit these requirements.
A simple call to
iOpen( aTradingInstrumentSymbolNAME, // iHigh, iLow, iClose, iVolume, iTime
aSelectedTimeFrameDefinedCODE,
aRelativeBarPTR
)
is by far not enough.
Professional solution will require a lot of care for a real-time handling capabilities, for unmasking the actual flow of mutually hiding events, for achieving minimalistic processing latencies, so a quite high engineering expertise will be needed.
Start with learning the basics about Scripts, benchmark all your critical code-sections with recording their actual durations in [us] and assure, your code will remain non-blocking under all circumstances. This will decide, whether more than one code-execution thread(s) will be necessary in prime-time / for peak-hour.
Having managed that, your way just started to lead in a direction towards your expected result.
Next one has to decide about a feasible inter-process / distributed-computing data-flow and signalling, needed for inter-platform integration.
Last, but not least, important point is the legal-side of such undertaking. It depends both on your local juri§$§$§diction and Broker's Terms & Conditions as no one would enjoy to celebrate a technically well mastered Project from inside of jail.
All that, quite an interesting Project.
iOpen(Symbol(),PERIOD_M1,1) - is the way to get data from M1 ( last bar ), if you need another timeframe - replace PERIOD_M1 with another ENUM_TIMEFRAMES. So what is the problem? Usually StackOverflow requires to see your MCVE-based example to help you.

fast parallel picking/checking of some ID (think non-game application of loot lag)

So I am verifying new operational management systems, and one of these OS's sends pick lists to a scale-able number of handheld devices. It sends these using messages, and their pick lists may contain overlapping jobs. So in my virtual world, I need to make sure that two simulated humans don't pick the same job - whenever someone picks a job, all the job lists get refreshed, so that the picked job doesn't appear on anyone else's handheld anymore, but for me the message is still in the queue being handled, so I have to make sure to discard that option.
Basically I have this giant list with a mutex, and the more "people" hitting it faster, the slower I can handle messages, to the point where I'm no longer at real-time, which is bad, because I can't actually validate the system because I can't keep up with the messages. (two guys on the same isle will recognize that one is going to pick one object and the next guy should pick the 2nd item, but I need to check every single job i'm about to pick and see if it has been claimed by someone else already)
I've considered localized binning of the lists, but it actually doesn't solve the problem in the stupid case that breaks it anyway, tons of people working on the same row. Now granted this would probably be confusing for the real people as well, as in real life they need to do the same resolution, but I'm curious what the currently accepted "best" solution to this problem is.
PS - I already am implementing this in c++ and it's fast, fast enough that in any practical test I don't "need" this question answered, it's more because I'm curious that I'm asking.
Thanks in advance!
I see a problem in the design "giant list with (one) mutex". You simply can't provide the whole list in synchronized fashion, if the list size and/or access rate is unlimited. Basic math works against you. So what i would do is a mutexed flag on each job. You can't prevent a job from being displayed on someone's screen, but you can assure that he gets a graceful "no more available" error and THEN the updated list. If you ever wanted to reserve a seat on highly popular gig, you may have witnessed the solution.

Improving simulation performance via concurrency

Consider this sequential procedure on a data structure containing collections (for simplicity, call them lists) of Doubles. For as long as I feel like, do:
Select two different lists from the structure at random
Calculate a statistic based on those lists
Flip a coin based on that statistic
Possibly modify one of the lists, based on the outcome of the coin toss
The goal is to eventually achieve convergence to something, so the 'solution' is linear in the number of iterations. An implementation of this procedure can be seen in the SO question here, and here is an intuitive visualization:
It seems that this procedure could be better performed - that is, convergence could be achieved faster - by using several workers executing concurrently on separate OS threads, ex:
I guess a perfectly-realized implementation of this should be able to achieve a solution in O(n/P) time, for P the number of available compute resources.
Reading up on Haskell concurrency has left my head spinning with terms like MVar, TVar, TChan, acid-state, etc. What seems clear is that a concurrent implementation of this procedure would look very different from the one I linked above. But, the procedure itself seems to essentially be a pretty tame algorithm on what is essentially an in-memory database, which is a problem that I'm sure somebody has come across before.
I'm guessing I will have to use some kind of mutable, concurrent data structure that supports decent random access (that is, to random idle elements) & modification. I am getting a bit lost when I try to piece together all the things that this might require with a view towards improving performance (STM seems dubious, for example).
What data structures, concurrency concepts, etc. are suitable for this kind of task, if the goal is a performance boost over a sequential implementation?
Keep it simple:
forkIO for lightweight, super-cheap threads.
MVar, for fast, thread safe shared memory.
and the appropriate sequence type (probably vector, maybe lists if you only prepend)
a good stats package
and a fast random number source (e.g. mersenne-random-pure64)
You can try the fancier stuff later. For raw performance, keep things simple first: keep the number of locks down (e.g. one per buffer); make sure to compile your code and use the threaded runtime (ghc -O2) and you should be off to a great start.
RWH has a intro chapter to cover the basics of concurrent Haskell.

What is the best way to search multiple sources simultaneously?

I'm writing a phonebook search, that will query multiple remote sources but I'm wondering how it's best to approach this task.
The easiest way to do this is to take the query, start a thread per remote source query (limiting max results to say 10), waiting for the results from all threads and aggregating the list into a total of 10 entries and returning them.
BUT...which of the remote source is more important if all sources return at least 10 results, so then I would have to do a search on the search results. While this would yield accurate information it seems inefficient and unlikely to scale up well.
Is there a solution commercial or open source that I could use and extend, or is there a clever algorithm I can use that I've missed?
Thanks
John, I believe what you want is federated search. I suggest you check out Solr as a framework for this. I agree with Nick that you will have to evaluate the relative quality of the different sources yourself, and build a merge function. Solr has some infrastructure for this, as this email thread shows.
To be honest I haven't seen a ready solution, but this is why we programmers exist: to create a solution if one is not readily availble :-)
The way I would do it is similar to what you describe: using threads - if this is a web application then ajax is your friend for speed and usability, for a desktop app gui representation is not even an issue.
It sounds like you can't determine or guess upfront which source is the best in terms of reliability, speed & number of results. So you need to setup you program so that it determines best results on the fly. Let's say you have 10 data sources, and therfore 10 threads. When you fire up your threads - wait for the first one to return with results > 0. This is going to be you "master" result. As other threads return you can compare them to your "master" result and add new results. There is really no way to avoid this if you want to provide unique results. You can start displaying results as soon as you have your first thread. You don't have to update your screen right away with all the new results as they come in but if takes some time user may become agitated. You can just have some sort of indicator that shows that more results are available, if you have more than 10 for instance.
If you only have a few sources, like 10, and you limit the number of results per source you are waiting for, to like 10, it really shouldn't take that much time to sort through them in any programming language. Also make sure you can recover if your remote sources are not available. If let's say, you are waiting for all 10 sources to come back to display data - you may be in for a long wait, if one of the sources is down.
The other approach is to f00l user. Sort of like airfare search sites do - where they make you want a few seconds while they collect and sort results. I really like Kayak.com's implementation - as it make me feel like it's doing something unlike some other sites.
Hope that helps.

Resources