Bash - how to redirect stdout of a certain thread? - linux

Suppose I have a C program, and it creates threads for doing different tasks. Now, I want to redirect the stdout of a certain thread in bash scripts?
Here you can assume that I always have a way to get the process id and thread id, I only want to know if it's possible to do this using bash scripts and how?
Note: This is not about process, it's thread, and I haven't found any questions related to this yet.

There is only one console, not one per thread. So when 5 threads write in parallel to stdout, all of that goes into a single sink, basically in nondeterministic ways.
So unless each line contains a specific string that identifies the original thread, you can't take that output apart after the fact.
Alternatively, you could have your threads write to different files! When you don't throw random output together, it is much easier to get to the individual sources later on.

Related

Two threads accessing the same file in scala

I have a shell script that copies files into a location and another one that picks these up for further processing. I want to use multithreading to pick up files parallelly in Scala using a threadpool.
However, if there are two threads and two files, both of them are picking up the same file. I have tried the program a lot of times, and it always ends up like this. I need the threads to pick up different files in parallel.
Can someone help me out? What approaches can I use? If you could point me in the right direction that would be enough.
I think you can use a parallel sequence to do the processing in parallel.
You don't have to handle this logic yourself. for ex. the code could be like this:
newFiles:Seq[String] = listCurrentFilesNames()
newFiles.par.foreach { fileName =>
processFile(fileName)
}
This code will be executed in parallel. and you could set the number of threads to a specific number as mentioned here: https://stackoverflow.com/a/37725987/2201566
You can also try using actors - for eg- for your reference - https://github.com/tsheppard01/akka-parallel-read-csv-file

How to run parallel fork as single thread in perl?

I was trying to check response messages written in perl which takes requests through Amazon API and returns responses..How to run parallel fork as single thread in perl?. I'm using LWP::UserAgent module and I want to debug HTTP requests.
As a word of warning - threads and forks are different things in perl. Very different.
However the long and short of it is - you can't, at least not trivially - a fork is a separate process. It actually happens when you run -any- external command in perl, it's just by default perl sits and waits for that command to finish and return output.
However if you've got access to the code, you can amend it to run single threaded - sometimes that's as simple as reducing the paralleism with a config parameter. (In fact quite often - debugging parallel code is a much more complicated task than sequential, so getting it working before running parallel is really important).
You might be able to embed a waitpid into your primary code so you've only got one thing running at once. Without a code example though, it's impossible to say for sure.

Having intercommunicating asynchronous processes in wxPython

I am working on a big project that puts performance as a high priority. I have a little bit of experience using wxPython to create windows and dialog boxes for software, but I have no experience in getting processes to work in parallel during the course of a single program.
So basically, what I want to accomplish is the following:
I want one main class that controls the high level program. It sets up a configuration either from a config file or from user input. This much I have accomplished on my own.
I need PROCESS #1 to read in a file and a list of commands, execute the commands, and then pass the modified file to PROCESS #2 (this requires that PROCESS #2 is ready to accept new input.) Once the file is passed, PROCESS #1 would begin work on the next set of inputs and wait for PROCESS #2 to finish before the cycle repeats.
PROCESS #2 takes input from PROCESS #1 and writes output to a log file. Once the output is complete, it waits for the next set of output from PROCESS #1.
I know how to use wxTimers and the events associated with that, but what I have found is that a timer event will not execute if the program is otherwise occupied (like in the middle of a method.)
I have seen threads about "threading" and "Pool", but the terminology tends to go over my head, and I haven't gotten any of that sort of stuff to work.
If anybody can point me in the right direction, I would be greatly appreciative.
If you use threads, then I think this would be fairly easy to do. Here's what I would suggest:
Create a button (or some other widget) to execute process #1 in a thread. The thread itself will run BOTH processes. Here's some psuedo-code that might help:
# this is in your thread code:
result = self.call_process_1(args)
self.call_process_2(result)
This will allow you to start another process #1/2 with a new set of commands every time you press the button. Since the two processes are encapsulated in the thread, they don't have to wait for process #2 to finish. You will probably need to log to separate logs for the logs to make sense, but you can label the logs with a timestamp and a thread number or a uuid.
Depending on how many of these processes you need to do, you might need to look into setting up a cluster that's driven with celery or some such. But I think this is a good starting place.

Waiting on many parallel shell commands with Perl

Concise-ish problem explanation:
I'd like to be able to run multiple (we'll say a few hundred) shell commands, each of which starts a long running process and blocks for hours or days with at most a line or two of output (this command is simply a job submission to a cluster). This blocking is helpful so I can know exactly when each finishes, because I'd like to investigate each result and possibly re-run each multiple times in case they fail. My program will act as a sort of controller for these programs.
for all commands in parallel {
submit_job_and_wait()
tries = 1
while ! job_was_successful and tries < 3{
resubmit_with_extra_memory_and_wait()
tries++
}
}
What I've tried/investigated:
I was so far thinking it would be best to create a thread for each submission which just blocks waiting for input. There is enough memory for quite a few waiting threads. But from what I've read, perl threads are closer to duplicate processes than in other languages, so creating hundreds of them is not feasible (nor does it feel right).
There also seem to be a variety of event-loop-ish cooperative systems like AnyEvent and Coro, but these seem to require you to rely on asynchronous libraries, otherwise you can't really do anything concurrently. I can't figure out how to make multiple shell commands with it. I've tried using AnyEvent::Util::run_cmd, but after I submit multiple commands, I have to specify the order in which I want to wait for them. I don't know in advance how long each submission will take, so I can't recv without sometimes getting very unlucky. This isn't really parallel.
my $cv1 = run_cmd("qsub -sync y 'sleep $RANDOM'");
my $cv2 = run_cmd("qsub -sync y 'sleep $RANDOM'");
# Now should I $cv1->recv first or $cv2->recv? Who knows!
# Out of 100 submissions, I may have to wait on the longest one before processing any.
My understanding of AnyEvent and friends may be wrong, so please correct me if so. :)
The other option is to run the job submission in its non-blocking form and have it communicate its completion back to my process, but the inter-process communication required to accomplish and coordinate this across different machines daunts me a little. I'm hoping to find a local solution before resorting to that.
Is there a solution I've overlooked?
You could rather use Scientific Workflow software such as fireworks or pegasus which are designed to help scientists submit large numbers of computing jobs to shared or dedicated resources. But they can also do much more so it might be overkill for your problem, but they are still worth having a look at.
If your goal is to try and find the tightest memory requirements for you job, you could also simply submit your job with a large amount or requested memory, and then extract actual memory usage from accounting (qacct), or , cluster policy permitting, logging on the compute node(s) where your job is running and view the memory usage with top or ps.

Is is OK to use a non-zero return code for a process that executed successfully?

I'm implementing a simple job scheduler, which spans a new process for every job to run. When a job exits, I'd like it to report the number of actions executed to the scheduler.
The simplest way I could find, is to exit with the number of actions as a return code. The process would for example exit with return code 3 for "3 actions executed".
But the standard (AFAIK) being to use the return code 0 when a process exited successfully, and any other value when there was en error, would this approach risk to create any problem?
Note: the child process is not an executable script, but a fork of the parent, so not accessible from the outside world.
What you are looking for is inter process communication - and there are plenty ways to do it:
Sockets
Shared memory
Pipes
Exclusive file descriptors (to some extend, rather go for something else if you can)
...
Return convention changes are not something a regular programmer should dare to violate.
The only risk is confusing a calling script. What you describe makes sense, since what you want really is the count. As Joe said, use negative values for failures, and you should consider including a --help option that explains the return values ... so you can figure out what this code is doing when you try to use it next month.
I would use logs for it: log the number of actions executed to the scheduler. This way you can also log datetimes and other extra info.
I would not change the return convention...
If the scheduler spans a child and you are writing that you could also open a pipe per child, or a named pipes or maybe unix domain sockets, and use that for inter process communication and writing the processed jobs there.
I would stick with conventions, namely returning 0 for success, expecially if your program is visible/usable around by other people, or anyway document well those decisions.
Anyway apart from conventions there are also standards.

Resources