I have a problem where my CICS application gives a different result compared to when run with CEDF. I have a program where I make some updates to a db2 table, write to vsam. When I run the application, the tasks always fails. But when I run it in CEDF, tasks are successfully done, no errors. Somebody got any idea why this happens?
Related
I am developing fault tolerance mechanisms for a distributed application in Rust. I need to simulate failure of one node (and eventually more). The kind of failure to simulate is a node crash. I want the application to completely exit with error in a controlled manner. I want to choose which node fails and I when it does (as much as possible).
The different nodes of the application communicate to each other as peer-to-peer. Each node executes two threads and it would be best if both are be terminated.
In my testing environment I have each node running on a thread (and this thread creates a second one) in my laptop, and a network port assigned to each.
A preliminary idea would be to randomly exit a thread given a probability. This idea does not provide me the control I need to only exit one node and in the exact moment of the application I want to test my fault tolerance mechanisms. Also, this would leave the second thread of a node executing (as far as I know).
I am looking for a way to simulate the node crash in a way I can control and reproduce the same crash whenever I need.
How does a program like folding#home work? Does my computer
individually perform a unit of "work" on it completely separate to other computers running folding#home? Then send the answer back when it's completed?
Or does Folding#home see all the computers connected to it as the project having let's say 1000 cores and then when work is done it's the equivalent of saying something like make -j <total number of cores>
Projects likes Folding#Home and BOINC are examples of loosely-coupled parallel computing where each task is fully self-contained and can be completed without communication with other computing entities. They are also examples of a pattern known as controller/worker (used to be known as master/worker), in which a central controller splits a large task into a pool of small(er) subtasks and distributes it to a bunch of worker processes on a first come first served basis, which corresponds to your first point.
In F#H (and BOINC), client computers connect to the server, request a task, work on it until it's complete, then connect to the server again to return the result and request a new task. The benefits of this are automatic load balancing, fault tolerance (via redundancy), and no need for scheduling.
When you run make -j #cores, make launches a number of parallel jobs but those jobs are usually interdependent, so make has to schedule them in an optimal way. The jobs are then run as processes on the same computer which affords make full process control. If a build step fails, the entire build job aborts immediately and the user can quickly look into the problem, fix it, and restart the build. This is not a viable model for when a client computer could have an arbitrary compute speed, could connect and disconnect at any time, and/or could decide to simply stop processing tasks. There are distributed versions of make like dmake that run different parts of the build process on different remote nodes, but that still happens in a tightly controlled environment, typically on a build cluster.
Note that on a very high level of abstraction the two are basically equivalent with the main difference being whether jobs are pushed or pulled. While job pulling works fine on all kinds of systems, job pushing usually requires (tightly-coupled) systems with predictable characteristics and good scheduling algorithms to be efficient.
I have a node.js script that run once in a day on ubuntu EC2 instance. This script pulls data from some hundered thousand remote APIs and save to our local database. Is there any way we can monitor this node.js script on remote server? There have been few instances where script crashed due to some reason and we were unable to figure it out without SSHing into instance and checking the logs. I have however created a small system after first few crashes which send us an email whenever script crashes due to some uncaught exception and also when script completes execution.
However, we need to develop a better system where we can monitor the progress of script via web interface of our admin application which is deployed over some other instance and also trigger start/stop of script via this interface. What are possible options for achieving this?
If you like to stay in Node.js, then there are several process monitoring tools:
PM2 comes with lots of other features besides monitoring processes. You can monitor your processes via CLI or their official web interface: https://keymetrics.io/. A quick search on npm also gives a bunch of nice unofficial gui tools: https://www.npmjs.com/search?q=pm2+web
Forever is not as feature rich as PM2 but will do the basic process operations and couple of gui are also available in npm.
There are two problems here that you are trying to solve:
Scheduling work to be done
Monitoring a process for failure
At a simple level, this is easy: schedule a cron job and restart failed things so they keep trying.
However, when things don't go smoothly, it helps to have a lot more granularity over what you are scheduling, and how it is executed. This would also give you the visibility over each little piece of work.
Adding a little more complexity, you can end up with something like this:
Schedule the script that starts everything (via cron, if that's comfortable)
That script generates several jobs that need to be executed into a queue
A worker process (or n worker processes) consume that queue and execute pending jobs
You can monitor both the progress of the jobs, as well as the state of each worker (# of crashes, failures, jobs completed, etc.). The other tools mentioned above are good candidates for this (forever, pm2, etc.)
When jobs fail, other workers can pick up the small piece of work that was in progress and restart it. This is much more efficient than restarting the entire process, and also lets you parallelize things across n workers based on how you can split up the workloads.
You could easily throw the status onto a web app so you can check in periodically rather than have to dig through server logs.
You can also get more intelligent with different types of failures. Network error? Retry 5 times. Rated limited? Gradual back-off. Crash? Don't retry and notify via email. etc
I have tried this with pm2, you can get the info of the task, then cat out or grab the log files. Or you could have a logging server, see also: https://github.com/papertrail/remote_syslog2
We use clustering with our express apps on multi cpu boxes. Works well, we get the maximum use out of AWS linux servers.
We inherited an app we are fixing up. It's unusual in that it has two processes. It has an Express API portion, to take incoming requests. But the process that acts on those requests can run for several minutes, so it was build as a seperate background process, node calling python and maya.
Originally the two were tightly coupled, with the python script called by the request to upload the data. But this of course was suboptimal, as it would leave the client waiting for a response for the time it took to run, so it was rewritten as a background process that runs in a loop, checking for new uploads, and processing them sequentially.
So my question is this: if we have this separate node process running in the background, and we run clusters which starts up a process for each CPU, how is that going to work? Are we not going to get two node processes competing for the same CPU. We were getting a bit of weird behaviour and crashing yesterday, without a lot of error messages, (god I love node), so it's bit concerning. I'm assuming Linux will just swap the processes in and out as they are being used. But I wonder if it will be problematic, and I also wonder about someone getting their web session swapped out for several minutes while the longer running process runs.
The smart thing to do would be to rewrite this to run on two different servers, but the files that maya uses/creates are on the server's file system, and we were not given the budget to rebuild the way we should. So, we're stuck with this architecture for now.
Any thoughts now possible problems and how to avoid them would be appreciated.
From an overall architecture prospective, spawning 1 nodejs per core is a great way to go. You have a lot of interdependencies though, the nodejs processes are calling maya which may use mulitple threads (keep that in mind).
The part that is concerning to me is your random crashes and your "process that runs in a loop". If that process is just checking the file system you probably have a race condition where the nodejs processes are competing to work on the same input/output files.
In theory, 1 nodejs process per core will work great and should help to utilize all your CPU usage. Linux always swaps the processes in and out so that is not an issue. You could start multiple nodejs per core and still not have an issue.
One last note, be sure to keep an eye on your memory usage, several linux distributions on EC2 do not have a swap file enabled by default, running out of memory can be another silent app killer, best to add a swap file in case you run into memory issues.
We are facing few issues while executing Coded UI Test scripts.
Regulary we have to execute automated scripts on Coded UI Test, earlier we used Test Partner for execution. Recently we migrated few of our Test Partner scripts to Coded UI Test . However, we observed that Coded UI Test scripts execution time is more when compared toTest Partner exection time. Our automated scripts were completely hand written, no where we used recording and playback feature.
And few of our observations were
IE Browser hangs on executing Coded UI Test scripts on windows XP. Everytime we have to kill the process and we have to recreated the scenario to continue the execution further. So, it does not suffice the automation essentiality, as each and every time one has to monitor whether script execution goes fine without browser hang. Its a very frequent problem on XP.
If we execute Coded UI Test scripts on windows 7. The execution time is quite slow. It will consume more time then the execution time on XP. So our execution time drags, though script goes fine without Browser hang. We tried to execute scripts in release mode as well. But whenever script halts one has to execute script again in debug mode.
Could you please suggest on this. What exactly the point we are missing? By chaning tool settings can we improve performance of the execution time? Thanks for the support.
First of all you should enable the logging and see why the search takes up so many time.
You can also find useful information in the debug outputs that give warning when operations take more time than expected.
Here are two useful links for enabling those logs
For VS/MTM 2010 and 2012 beta: http://blogs.msdn.com/b/gautamg/archive/2009/11/29/how-to-enable-tracing-for-ui-test-components.aspx
For VS/MTM 2012 : http://blogs.msdn.com/b/visualstudioalm/archive/2012/06/05/enabling-coded-ui-test-playback-logs-in-visual-studio-2012-release-candidate.aspx
A friendly .html file with logs should be created in %temp%\UITestLogs*\LastRun\ directory.
As for the possible explanation to your issue - it doesn't matter if you coded your tests explicitly or by hand the produced calls to WpfControl.Find() or one of deriving classes, if the search fails at first it will move up to performing heuristics to find the targeted control anyway.
You can turn MatchExactHierachy setting of your Playback to be true, and stop using the smartmatch feature
(more on it here together with afew other usefull performance tips
http://blogs.msdn.com/b/mathew_aniyan/archive/2009/08/10/configuring-playback-in-vstt-2010.aspx)