parse output and count number of times a string appears [closed]

parse output and count number of times a string appears [closed] - linux

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have some code running. Because of the complexity and the length I thought maybe use some code to make my life easy. So the code is running using >commandA
output
results
are
popping
...
here
I want to count the number of times banana appears in the output of commandA (which is running) and when the count is 10, I want to stop the processing (using CTRL+Z) and
echo "************we reached 10**********************"
and start again.
I am writing the code in perl on unix system.
EDIT: I cannot use grep function here as the command has already run. Or will be run but without a grep function. Before the command runs, I will turn on my program to look for the specific words in the terminal output. Now it would very easy to use grep, but I don't know which function in perl actually takes in the output to the terminal as stdin

You can start the other program by opening a pipe from it to your Perl program, and then read its output line by line until you reach the terminating condition:
open my $pipe, 'commandA |'
or die "Error opening pipe from commandA: $!\n";
my $n = 0;
while (<$pipe>) {
$n++ if /banana/;
last if $n >= 10;
}
close $pipe; # kills the command with SIGPIPE if it's not done yet
print "commandA printed 'banana' ", ($n >= 10 ? "at least 10" : $n), " times.\n";
There are a couple of pitfalls to note here, though. One is that closing the pipe will only kill the other program when it next tries to print something. If the other program might run for a long time without generating any output, you may want to kill it explicitly.
For this, you will need to know its process ID, but, conveniently, that's exactly what open returns when you open a pipe. However, you may want to use the multi-arg version of open, so that the PID returned will be that of the actual commandA process, rather than of a shell used to launch it:
my $pid = open my $pipe, '-|', 'commandA', #args
or die "Error opening pipe from commandA: $!\n";
# ...
kill 'INT', $pid; # make sure the process dies
close $pipe;
Another pitfall is output buffering. Most programs don't actually send their output directly to the output stream, but will buffer it until enough has accumulated or until the buffer is explicitly flushed. The reason you don't usually notice this is that, by default, many programs (including Perl) will flush their output buffer at the end of every output line (i.e. whenever a \n is printed) if they detect that the output stream goes to an interactive terminal (i.e. a tty).
However, when you pipe the output of a program to another program, the I/O libraries used by the first program may notice that the output goes to a pipe rather than to a tty, and may enable more aggressive output buffering. Often this won't be a problem, but in some problematic cases it could add a substantial delay between the time when the other programs prints a string and the time when your program receives it.
Unfortunately, if you can't modify the other program, there's not much you can easily do about this. It is possible to replace the pipe with something called a "pseudo-tty", which looks like an interactive terminal to the other command, but that gets a bit complicated. There's a CPAN module to simplify it a bit, though, called IO::Pty.
(If you can modify the other program, it's a lot easier. For example, if it's another Perl script, you can just add $| = 1; at the beginning of the script to enable output autoflushing.)

Related

How to use the attach the same console as output for a process and input for another process?

I am trying to use suckless ii irc client. I can listen to a channel by tail -f out file. However is it also possible for me to input into the same console by starting an echo or cat command?
If I background the process, it actually displays the output in this console but that doesn't seem to be right way? Logically, I think I need to get the fd of the console (but how to do that) and then force the tail output to that fd and probably background it. And then use the present bash to start a cat > in.
Is it actually fine to do this or is that I am creating a lot of processes overhead for a simple task? In other words piping a lot of stuff is nice but it creates a lot of overhead which ideally has to be in a single process if you are going to repeat that task it a lot?

However is it also possible for me to input into the same console by starting an echo or cat command?
Simply NO! cat writes the current content. cat has no idea that the content will grow later. echo writes variables and results from the given command line. echo itself is not made for writing the content of files.
If I background the process, it actually displays the output in this console but that doesn't seem to be right way?
If you do not redirect the output, the output goes to the console. That is the way it is designed :-)
Logically, I think I need to get the fd of the console (but how to do that) and then force the tail output to that fd and probably background it.
As I understand that is the opposite direction. If you want to write to the stdin from the process, you simply can use a pipe for that. The ( useless ) example show that cat writes to the pipe and the next command will read from the pipe. You can extend to any other pipe read/write scenario. See link given below.
Example:
cat main.cpp | cat /dev/stdin
cat main.cpp | tail -f
The last one will not exit, because it waits that the pipe gets more content which never happens.
Is it actually fine to do this or is that I am creating a lot of processes overhead for a simple task? In other words piping a lot of stuff is nice but it creates a lot of overhead which ideally has to be in a single process if you are going to repeat that task it a lot?
I have no idea how time critical your job is, but I believe that the overhead is quite low. Doing the same things in a self written prog must not be faster. If all is done in a single process and no access to the file system is required, it will be much faster. But if you also use system calls, e.g. file system access, it will not be much faster I believe. You always have to pay for the work you get.
For IO redirection please read:
http://www.tldp.org/LDP/abs/html/io-redirection.html
If your scenario is more complex, you can think of named pipes instead of IO redirection. For that you can have a look at:
http://www.linuxjournal.com/content/using-named-pipes-fifos-bash

How to create a virtual command-backed file in Linux?

What is the most straightforward way to create a "virtual" file in Linux, that would allow the read operation on it, always returning the output of some particular command (run everytime the file is being read from)? So, every read operation would cause an execution of a command, catching its output and passing it as a "content" of the file.

There is no way to create such so called "virtual file". On the other hand, you would be
able to achieve this behaviour by implementing simple synthetic filesystem in userspace via FUSE. Moreover you don't have to use c, there
are bindings even for scripting languages such as python.
Edit: And chances are that something like this already exists: see for example scriptfs.

This is a great answer I copied below.
Basically, named pipes let you do this in scripting, and Fuse let's you do it easily in Python.
You may be looking for a named pipe.
mkfifo f
{
echo 'V cebqhpr bhgchg.'
sleep 2
echo 'Urer vf zber bhgchg.'
} >f
rot13 < f
Writing to the pipe doesn't start the listening program. If you want to process input in a loop, you need to keep a listening program running.
while true; do rot13 <f >decoded-output-$(date +%s.%N); done
Note that all data written to the pipe is merged, even if there are multiple processes writing. If multiple processes are reading, only one gets the data. So a pipe may not be suitable for concurrent situations.
A named socket can handle concurrent connections, but this is beyond the capabilities for basic shell scripts.
At the most complex end of the scale are custom filesystems, which lets you design and mount a filesystem where each open, write, etc., triggers a function in a program. The minimum investment is tens of lines of nontrivial coding, for example in Python. If you only want to execute commands when reading files, you can use scriptfs or fuseflt.

No one mentioned this but if you can choose the path to the file you can use the standard input /dev/stdin.
Everytime the cat program runs, it ends up reading the output of the program writing to the pipe which is simply echo my input here:
for i in 1 2 3; do
echo my input | cat /dev/stdin
done
outputs:
my input
my input
my input

I'm afraid this is not easily possible. When a process reads from a file, it uses system calls like open, fstat, read. You would need to intercept these calls and output something different from what they would return. This would require writing some sort of kernel module, and even then it may turn out to be impossible.
However, if you simply need to trigger something whenever a certain file is accessed, you could play with inotifywait:
#!/bin/bash
while inotifywait -qq -e access /path/to/file; do
echo "$(date +%s)" >> /tmp/access.txt
done
Run this as a background process, and you will get an entry in /tmp/access.txt each time your file is being read.

Logs are written asynchronous to log file

I have come across strange scenario where when I am trying to redirect stdout logs of perl script into a log file, all the logs are getting written at the end of execution when script completes instead of during execution of the script.
While running script when I do tail -f "filename", I could able to see log only when script has completed its execution not during execution of script.
My script details are given below:
/root/Application/download_mornings.pl >> "/var/log/file_manage/file_manage-$(date +\%Y-\%m-\%d).txt"
But when I run without redirecting log file, I can see logs on command prompt as when script progresses.
Let me know if you need any other details.
Thanks in advance for any light that you all might be able shed whats going on.
Santosh

Perl would buffer the output by default. You can say:
$| = 1;
(at the beginning of the script) to disable buffering. Quoting perldoc perlvar:
$|
If set to nonzero, forces a flush right away and after every write or
print on the currently selected output channel. Default is 0
(regardless of whether the channel is really buffered by the system or
not; $| tells you only whether you've asked Perl explicitly to flush
after each write). STDOUT will typically be line buffered if output is
to the terminal and block buffered otherwise. Setting this variable is
useful primarily when you are outputting to a pipe or socket, such as
when you are running a Perl program under rsh and want to see the
output as it's happening. This has no effect on input buffering. See
getc for that. See select on how to select the output channel. See
also IO::Handle.
You might also want to refer to Suffering from Buffering?.

how shell pipe handles infinite loop

Whenever I need to limit shell command output, I use less to paginate the results:
cat file_with_long_content | less
which works fine and dandy, but what I'm curious about is, less still works even if the output is never ending, consider having the following script in inf.sh file:
while true; do date; done
then I run
sh inf.sh | less
And it's still able to again paginate the results, so is it correct to say that pipe streams the result rather than waiting for the command to finish before outputting the result?

Yes, when you run sh inf.sh | less the two commands are run in parallel. Data written into the pipe by the first process is buffered (by the kernel) until it is read by the second. If the buffer is full (i.e., if the first command writes to the pipe faster than the second can read) then the next write operation will block until further space is available. A similar condition occurs when reading from an empty pipe: if the pipe buffer is empty but the input end is still open, a read will block for more data.
See the pipe(7) manual for details.

It is correct. Pipes are streams.
You can code your own version of the less tool in very few lines of C code. Take the time to do it, including a short research on files and pipes, and you'll emerge with the understanding to answer your own question and more :).

How does commands-piping work in *NIX?

When I do this:
find . -name "pattern" | grep "another-pattern"
Are the processes, find and grep, spawned together? My guess is yes. If so, then how does this work?:
yes | command_that_prompts_for_confirmations
If yes is continuously sending 'y' to stdout and command_that_prompts_for_confirmations reads 'y' whenever it's reading its stdin, how does yes know when to terminate? Because if I run yes alone without piping its output to some other command, it never ends.
But if piping commands don't spawn all the processes simultaneously, then how does yes know how many 'y's to output? It's catch-22 for me here. Can someone explain me how this piping works in *NIX?

From the wikipedia page: "By itself, the yes command outputs 'y' or whatever is specified as an argument, followed by a newline, until stopped by the user or otherwise killed; when piped into a command, it will continue until the pipe breaks (i.e., the program completes its execution)."
yes does not "know" when to terminate. However, at some point outputting "y" to stdout will cause an error because the other process has finished, that will cause a broken pipe, and yes will terminate.
The sequence is:
other program terminates
operating system closes pipe
yes tries to output character
error happens (broken pipe)
yes terminates

Yes, (generally speaking) all of the processes in a pipeline are spawned together. With regard to yes and similar situations, a signal is passed back up the pipeline to indicate that it is no longer accepting input. Specifically: SIGPIPE, details here and here. Lots more fun infomation on *nix pipelining is available on wikipedia.
You can see the SIGPIPE in action if you interrupt a command that doesn't expect it, as you get a Broken Pipe error. I can't seem to find an example that does it off the top of my head on my Ubuntu setup though.

Other answers have covered termination. The other facet is that yes will only output a limited number of y's - there is a buffer in the pipe, and once that is full yes will block in its write request. Thus yes doesn't consume infinite CPU time.

The stdout of the first process is connected to the stdin of the second process, and so on down the line. "yes" exits when the second process is done because it no longer has a stdout to write to.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string