How to version logic in a distributed event-sourced system

How to version logic in a distributed event-sourced system - domain-driven-design

Example
My distributed event-sourced system simulates houses being built and purchased over a period of time. For simplicity sake, we will use the year as the distributed clock value (forgetting vector clocks for now).
Houses take 1 year to build in version 1 of the system, but take twice as long in version 2. This is a change in logic rather than structure.
To cope with this change, events recorded in version 1 must also be replayed by version 1 when rebuilding state/snapshots. When version 2 of the log is reached, the application switches over to version 2 of the logic and continues replaying the residual events. A valid snapshot is built.
Problem
The nodes in my distributed system will be updated to version 2 at different times, creating a window whereby multiple versions are running simultaneously. My current understanding is this window can only be reduced through techniques like feature switching, but cannot be completely removed (unless you sacrifice availability by bringing the entire system down for an upgrade).
This creates a problem when merging the event logs from distributed nodes. The event versions bleed into each other, making it impossible to simply upgrade from version 1 to 2 during the replay. E.g.:
Node Clock Event
... pre-merge ...
A 2000 HouseBuildStarted('Alpha')
A 2001 HousePurchased('Alpha') <- 'HouseBuilt' event is implicit (inferred through logic).
A 2002 NodeUpgradedTo('V2')
B 2002 HouseBuildStarted('Bravo')
B 2003 HousePurchased('Bravo')
B 2004 NodeUpgradedTo('V2')
... post-merge ...
A 2000 HouseBuildStarted('Alpha')
A 2001 HousePurchased('Alpha')
B 2002 HouseBuildStarted('Bravo')
A 2002 NodeUpgradedTo('V2')
B 2003 HousePurchased('Bravo') <- 'Bravo' does not exist yet (1 year early)
B 2004 NodeUpgradedTo('V2')
How is this usually handled in systems where taking all the nodes down is not acceptable?

The questions of upgrading logic and distributing upgrades are different. If you need to upgrade an event stream (e.g. your "HouseBuilt event is implicit"), then you should do so. Your read models will have to be rebuilt by playing the event stream over again, with logic that does the upgrade. This is really no different in concept than patching a database when you upgrade your program. Facts about the persisted data now need to be reconsidered in light of your newer representations (default values may have to be substituted, obsolete events ignored, etc.)
How you determine which node runs which version of the code is a separate question. If you have a no-downtime-upgrade policy, what do you normally do? Have some customers serviced by old versions and some by new? The same thing could happen, building new read models while the old ones are online, servicing old aggregates and application services while the new ones are being deployed, etc.

Related

How to keep the monitoring working always on the recent added indice in opendistro or opensearch?

I am trying to setup monitors that detect any penetration attempts to a website. For that, I am using the Alert functionality in open distro ( older version of opensearch ).
The problem is we have indices for the actual day and the 29 day before. And as far as I know, I can only select one of these 30 options.
Is there a workaround where I can create monitors for indices that haven´t been created yet or a persistent monitor that I don´t have to modify every time a new day/indice comes up ?
Every answer or comment is very much appreciated !

How to change the middle node location in the torrc?

I am trying to edit my torrc and make all of the nodes funnel through one country.
So far I am able to force the entry and exit nodes but don't know how to change the middle node... any ideas?
I have already tried "MiddleNodes" and "RelayNodes"
EntryNodes {us},{ca}
ExitNodes {us},{ca}
StrictNodes 1

It's possible to restrict to MiddleNodes per Tor docs: https://2019.www.torproject.org/docs/tor-manual.html.en
MiddleNodes node,node,…
A list of identity fingerprints and country
codes of nodes to use for "middle" hops in your normal circuits.
Normal circuits include all circuits except for direct connections to
directory servers. Middle hops are all hops other than exit and entry.
This is an experimental feature that is meant to be used by
researchers and developers to test new features in the Tor network
safely. Using it without care will strongly influence your anonymity.
This feature might get removed in the future. The HSLayer2Node and
HSLayer3Node options override this option for onion service circuits,
if they are set. The vanguards addon will read this option, and if
set, it will set HSLayer2Nodes and HSLayer3Nodes to nodes from this
set. The ExcludeNodes option overrides this option: any node listed in
both MiddleNodes and ExcludeNodes is treated as excluded. See the
ExcludeNodes option for more information on how to specify nodes.

Edit: See new answer by #user1652110 describing MiddleNodes option which was added in January 2019.
There is no option to do so. The closest option you can try is ExcludeNodes by using as large a list of country codes as you can come up with that doesn't include the countries you do want to use.
Also note, at the time of writing, limiting your circuits' entry and exit points to relays in the US and Canada might severely limit your performance, anonymity, and reliability since there just aren't that many high-bandwidth exits and guards in these two countries.

Hazelcast High Response Times

We have a Java 1.6 application that uses Hazelcast 3.7.4 version,
with a topology of two nodes. The application operates mainly with 3 maps.
In normal application working, response times when consulting the maps are
generally in values around some milliseconds tens.
I have observed that in some circumstances such as for example with network
cuts, the response time increases to huge values such as for example, 20 or 30 seconds!!
And this is impacting the application performance.
I would like to know if this kind of situation with network micro-cuts can lead
to increase searches response time in this manner. I do not know if some concrete configuration can be done to minimize this, and also which other elements can provoke so high times.
I provide some examples of some executed consults
Example 1:
String sqlPredicate = "acui='"+acui+"'";
Collection<Agent> agents =
(Collection<Agent>) data.getMapAgents().values(new SqlPredicate(sqlPredicate));
Example 2:
boolean exist = data.getMapAgents().containsKey(agent);
Thanks so much for your help.
Best Regards,
Jorge

The Map operations are all TCP Socket based and thus are subject to your Operating Systems TCP Driver implementation.
See TCP_NODELAY

multithreading or shared memory - Architecture

There are 3 parts to my application:
A numerical simulator solving a 21 variable diff equation by runge-kutta method - direct from numerical recipes in C, step size is 0.0001 s
A C code pinging a PIC based micrprocessor every 1s and receiving data at about 3600 samples per second over the USB-COM port; It sends relevant data to the front end over TCP/IP
A JAVA front end reading the data from the numerical simulator via SWIG (for the C code) and JNI, modifying the parameters with input from the microprocessor and finally plotting it to the GUI.
I want to recode the JAVA front end in C++ now, with the option of using HTML/Javascript for plotting.
Would rewriting the front end in C++ so that the numerical simulator runs on a separate thread be a good approach?
I don't understand threading though I have used it for the listening and plotting functions in the JAVA code. It seems like having it all run on multiple threads instead of separate processes would slow down my simulations.
Can I combine 1 , 2 and 3 into a single program or should they remain separate to retain the 0.0001 ms simulation speed and the ability to handle the large amount to microprocessor data.
Please help me pick a path forward!
Thanks in Advance!

On a multicore platform, multithreading will generally improve performance. However, GPOS such as Linux and Windows are not deterministic, so there are no guarantees.
That said, the computational performance of a modern PC is such that it will hardly be stretched by this task and data rate,so it hardly matters perhaps?

How to search for Possibilities to parallelize?

I have some serial code that I have started to parallelize using Intel's TBB. My first aim was to parallelize almost all the for loops in the code (I have even parallelized for within for loop)and right now having done that I get some speedup.I am looking for more places/ideas/options to parallelize...I know this might sound a bit vague without having much reference to the problem but I am looking for generic ideas here which I can explore in my code.
Overview of algo( the following algo is run over all levels of the image starting with shortest and increasing width and height by 2 each time till you reach actual height and width).
For all image pairs starting with the smallest pair
For height = 2 to image_height - 2
Create a 5 by image_width ROI of both left and right images.
For width = 2 to image_width - 2
Create a 5 by 5 window of the left ROI centered around width and find best match in the right ROI using NCC
Create a 5 by 5 window of the right ROI centered around width and find best match in the left ROI using NCC
Disparity = current_width - best match
The edge pixels that did not receive a disparity gets the disparity of its neighbors
For height = 0 to image_height
For width = 0 to image_width
Check smoothness, uniqueness and order constraints*(parallelized separately)
For height = 0 to image_height
For width = 0 to image_width
For disparity that failed constraints, use the average disparity of
neighbors that passed the constraints
Normalize all disparity and output to screen

Just for some perspective, it may not always be worthwhile to parallelize something.
Just because you have a for loop where each iteration can be done independently of each other, doesn't always mean you should.
TBB has some overhead for starting those parallel_for loops, so unless you're looping a large number of times, you probably shouldn't parallelize it.
But, if each loop is extremely expensive (Like in CirrusFlyer's example) then feel free to parallelize it.
More specifically, look for times where the overhead of the parallel computation is small relative to the cost of having it parallelized.
Also, be careful about doing nested parallel_for loops, as this can get expensive. You may want to just stick with paralellizing the outer for loop.

The silly answer is anything that is time consuming or iterative. I use Microsoft's .NET v4.0 Task Parallel Library and one of the interesting things about their setup is its "expressed parallelism." An interesting term to describe "attempted parallelism." Though, your coding statements may say "use the TPL here" if the host platform doesn't have the necessary cores it will simply invoke the old fashion serial code in its place.
I have begun to use the TPL on all my projects. Any place there are loops especially (this requires that I design my classes and methods such that there are no dependencies between the loop iterations). But any place that might have been just good old fashion multithreaded code I look to see if it's something I can place on different cores now.
My favorite so far has been an application I have that downloads ~7,800 different URL's to analyze the contents of the pages, and if it finds information that it's looking for does some additional processing .... this used to take between 26 - 29 minutes to complete. My Dell T7500 workstation with dual quad core Xeon 3GHz processors, with 24GB of RAM, and Windows 7 Ultimate 64-bit edition now crunches the entire thing in about 5 minutes. A huge difference for me.
I also have a publish / subscribe communication engine that I have been refactoring to take advantage of TPL (especially on "push" data from the Server to Clients ... you may have 10,000 client computers who have stated their interest in specific things, that once that event occurs, I need to push data to all of them). I don't have this done yet but I'm REALLY LOOKING FORWARD to seeing the results on this one.
Food for thought ...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string