I have a very long command running on a very large file. It involves sort, uniq, grep and awk commands in the single command that pipes the results of one command to another.
Once I issue this command for execution, the command prompt doesn't return back until the command has completely executed.
Is there a way to know what is the progress of the command in terms of how much of its execution it has completed or anything similar that gives us an idea of how much of a particular command inside this main command has completed?
Without knowing exactly what you're doing I can't say whether or not it would work for you, but have a look at pv. It might fit the bill.
Perl was originally created because AWK wasn't quite powerful enough for the task at hand. With commands like sort and grep, and a syntax very similar to AWK's, it should not be hard to translate a command line using those programs into a short Perl script.
The advantage of Perl is that you can easily communicate the progress of your script via print statements. For example, you could indicate when the input file was done being loaded, when the sort was completed, etc.
Related
I have a batch processing system that can execute a number of commands sequentially. These commands are specified as list of words, that are executed by python's subprocess.call() function, without using a shell. For various reasons I do not want to change the processing system.
I would like to write something to a file, so a subsequent command can use it. Unfortunately, all the ways I can think of to write something to the disk involve some sort of redirection, which is a shell concept.
So is there a way to write a Linux command line that will take its argument and write it to a file, in a context where it is executed outside a shell?
Well, one could write a generalised parser and process manager that could handle this for you, but, luckily, one already comes with Linux. All you have to do is tell it what command to run, and it will handle the redirection for you.
So, if you were to modify your commands a bit, you could easily do this. Just concatenate the words together with strings, quoting when those words may have spaces or other special characters in them, and then you can use a list such as:
/bin/sh, -c, {your new string here} > /some/file
Et voila, stuff written to disk. :)
Looking at the docs for subprocess.call, I see it has extra parameters:
subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)
If you specify stdout= to a file you have opened, then the output of your code will go to that file, which is basically the same behaviour?
I don't see your exact usage case, but this is certainly a way to synthesise the command-line pipe behaviours, with little coding change.
Note that the docs also say that you should not use the built-in =PIPE support, depending on your exact requirements. It is important that you read data from a pipe regularly or the writer will stall when the buffer is full.
I am trying to use suckless ii irc client. I can listen to a channel by tail -f out file. However is it also possible for me to input into the same console by starting an echo or cat command?
If I background the process, it actually displays the output in this console but that doesn't seem to be right way? Logically, I think I need to get the fd of the console (but how to do that) and then force the tail output to that fd and probably background it. And then use the present bash to start a cat > in.
Is it actually fine to do this or is that I am creating a lot of processes overhead for a simple task? In other words piping a lot of stuff is nice but it creates a lot of overhead which ideally has to be in a single process if you are going to repeat that task it a lot?
However is it also possible for me to input into the same console by starting an echo or cat command?
Simply NO! cat writes the current content. cat has no idea that the content will grow later. echo writes variables and results from the given command line. echo itself is not made for writing the content of files.
If I background the process, it actually displays the output in this console but that doesn't seem to be right way?
If you do not redirect the output, the output goes to the console. That is the way it is designed :-)
Logically, I think I need to get the fd of the console (but how to do that) and then force the tail output to that fd and probably background it.
As I understand that is the opposite direction. If you want to write to the stdin from the process, you simply can use a pipe for that. The ( useless ) example show that cat writes to the pipe and the next command will read from the pipe. You can extend to any other pipe read/write scenario. See link given below.
Example:
cat main.cpp | cat /dev/stdin
cat main.cpp | tail -f
The last one will not exit, because it waits that the pipe gets more content which never happens.
Is it actually fine to do this or is that I am creating a lot of processes overhead for a simple task? In other words piping a lot of stuff is nice but it creates a lot of overhead which ideally has to be in a single process if you are going to repeat that task it a lot?
I have no idea how time critical your job is, but I believe that the overhead is quite low. Doing the same things in a self written prog must not be faster. If all is done in a single process and no access to the file system is required, it will be much faster. But if you also use system calls, e.g. file system access, it will not be much faster I believe. You always have to pay for the work you get.
For IO redirection please read:
http://www.tldp.org/LDP/abs/html/io-redirection.html
If your scenario is more complex, you can think of named pipes instead of IO redirection. For that you can have a look at:
http://www.linuxjournal.com/content/using-named-pipes-fifos-bash
I created some slurm scripts and then tried to execute them with sbatch. But the output file is updated not frequently (once a minute maybe).
Is there a way to change the output buffering latency in sbatch? I know stdbuf is used in such situations but I could not make it work with sbatch.
The issue is certainly with buffering. If you are trying to run python code, add flush=True in print command like print(...,flush=True).
What is the most straightforward way to create a "virtual" file in Linux, that would allow the read operation on it, always returning the output of some particular command (run everytime the file is being read from)? So, every read operation would cause an execution of a command, catching its output and passing it as a "content" of the file.
There is no way to create such so called "virtual file". On the other hand, you would be
able to achieve this behaviour by implementing simple synthetic filesystem in userspace via FUSE. Moreover you don't have to use c, there
are bindings even for scripting languages such as python.
Edit: And chances are that something like this already exists: see for example scriptfs.
This is a great answer I copied below.
Basically, named pipes let you do this in scripting, and Fuse let's you do it easily in Python.
You may be looking for a named pipe.
mkfifo f
{
echo 'V cebqhpr bhgchg.'
sleep 2
echo 'Urer vf zber bhgchg.'
} >f
rot13 < f
Writing to the pipe doesn't start the listening program. If you want to process input in a loop, you need to keep a listening program running.
while true; do rot13 <f >decoded-output-$(date +%s.%N); done
Note that all data written to the pipe is merged, even if there are multiple processes writing. If multiple processes are reading, only one gets the data. So a pipe may not be suitable for concurrent situations.
A named socket can handle concurrent connections, but this is beyond the capabilities for basic shell scripts.
At the most complex end of the scale are custom filesystems, which lets you design and mount a filesystem where each open, write, etc., triggers a function in a program. The minimum investment is tens of lines of nontrivial coding, for example in Python. If you only want to execute commands when reading files, you can use scriptfs or fuseflt.
No one mentioned this but if you can choose the path to the file you can use the standard input /dev/stdin.
Everytime the cat program runs, it ends up reading the output of the program writing to the pipe which is simply echo my input here:
for i in 1 2 3; do
echo my input | cat /dev/stdin
done
outputs:
my input
my input
my input
I'm afraid this is not easily possible. When a process reads from a file, it uses system calls like open, fstat, read. You would need to intercept these calls and output something different from what they would return. This would require writing some sort of kernel module, and even then it may turn out to be impossible.
However, if you simply need to trigger something whenever a certain file is accessed, you could play with inotifywait:
#!/bin/bash
while inotifywait -qq -e access /path/to/file; do
echo "$(date +%s)" >> /tmp/access.txt
done
Run this as a background process, and you will get an entry in /tmp/access.txt each time your file is being read.
I'm not sure if what I am trying to do is possible, and I'm fairly new to perl so I'd appreciate any help.
My perl application will use system() to issue commands to Perforce that will create a devel/workspace, integrate, sync, etc. But obviously I can't integrate until my devel is created, and I can't sync unless some condition is met, so on and so forth. Also when my code is synced and I run it, I'm not sure how to tell if it finished or not either.
So I'm wondering how to say (slack pseudo code):
system(create my devel);
wait until devel created
system(integrate blah);
wait until integration complete
system (launch test);
wait until test complete;
etc...
I looked at other questions and saw the possibility of using forks, but I am not familiar with how to code that in this context.
Thanks
Normally, the system command in Perl will wait until the command you asked it to run has completed. This would work exactly the same as if you entered the command at a shell prompt, the program would run and the shell prompt would appear only when the command has completed whatever it is doing.
Perforce has a free Perl module downloadable from http://www.perforce.com/downloads/Perforce/20-User?qt-perforce_downloads_step_3=6#qt-perforce_downloads_step_3#52, with documentation at http://www.perforce.com/perforce/r12.1/manuals/p4script/02_perl.html#1047731.
But it sounds like you need more experience with Perl multiprogramming and IPC. Have you read the Camel book?