Linux bash: grep from stream and write to file [duplicate] - linux

This question already has answers here:
How to 'grep' a continuous stream?
(13 answers)
Closed 6 years ago.
I have a log file A that is constantly updated (but it is rolled over) and I need to constantly filter it's content and write to a persistent file.
TL;DR
I need to:
tail -f A.log | grep "keyword" >> B.log
But this command does not write anything to B.log.
Research only got me complex stuff that is not my case. My guess is that I'm missing some simple concept.
This is not the same question marked as possible duplicate, as the grep works and I have it's output if I don't try to write it to a file. The problem is the file.

If just grep, without writing to the file, works, you encountered a buffering "problem". I/O buffering, unless manually implemented by the program will get handled by the libc. If the program's stdout is a termial, buffering will be line-based. If not, the libc buffers output until the buffer reached a size limit.
On Linux, meaning with glibc you can use the stdbuf command to configure that buffering:
tail -f A.log | stdbuf -oL grep "keyword" >> B.log
-oL specifies that the output stream should be line-buffered.

Related

Does any magic "stdout" file exists? [duplicate]

This question already has answers here:
pass stdout as file name for command line util?
(6 answers)
Closed 5 years ago.
Some utilities can not output to stdout.
Example
util out.txt
It works. But sometimes I want to pipe the output to some other program like:
util out.txt | grep test
Does any magic "stdout" file in linux exists, so when I will replace the out.txt above, it will work redirect the data to stdout pipe?
Note: I know util out.txt && cat out.txt | grep test, so please do not post answers like this.
You could use /dev/stdout. But that won't always work if a program needs to lseek(2) (or mmap(2)) it.
Usually /dev/stdout is a symlink to /proc/self/fd/1 (see proc(5)).
IIRC some version of some programs (probably GNU awk) are handling specifically the /dev/stdout filename (e.g. to be able to work without /proc/ being mounted).
A common, but not universal, convention for program arguments is to consider -, when used as a file name, to represent the stdout (or the stdin). For example, see tar(1) used with -f -.
If you write some utility, I recommend following that - convention when possible and document if stdout needs to be seekable.
Some programs are testing if stdout or stdin is a terminal (e.g. using isatty(3)) to behave differently, e.g. by using ncurses. If you write such a program, I recommend providing a program option to disable that detection.

How does/frequent unix tee command write stdout terminal output to file? if the output is too big

I am redirecting some tool stdout to tee command so that current progress can be seen on terminal as well as in the log file
Here is the code snippet where I am running tool and its stdout is fed to tee command and this code snippet is written from tcl script.
$(EH_SUBMIT) $(ICC_EXEC) $(OPTIONS) -f ./scripts/$#.tcl | tee -i ./logs/$#.log
I can see current real time progress on the terminal but the same observation is not seen in the log file! and it writes stdout to log file chunk by chunk
How does tee work? Does it write by blocks or time or both?
If block what is the minimum block size? If it is time what is minimum duration?
I need to parse real time log entries for some data analytics(as I read log file via tail -f and then push new data as the log file grows).
Unless programs handle buffering on their own, buffering of IO streams is handled in the libc. The standard behaviour is: Buffer output line wise if it goes to a terminal, buffer output block wise if it goes to a non-terminal, meaning a file or a pipe. That's why the output appears in log file as you described it: chunk by chunk. This behaviour is for performance optimization.
On Linux the stdbuf command can be used to run a program with adjusted buffers. You need to run your program like this:
stdbuf -oL your_program >> your.log &
tail -f your.log
-oL means buffer stdout linewise.
From the POSIX spec for tee, emphasis added:
The tee utility shall copy standard input to standard output, making a copy in zero or more files. The tee utility shall not buffer output.
So -- tee isn't your problem. Almost certainly your program is buffering what it's writing to stdout (which is default in many programming languages, including C, when stdout is not to a TTY).
stdbuf -oL yourProgram | tee file
...will, if your program is relying on the standard C library to determine its default behavior, suppress that.

Read from a endless pipe bash [duplicate]

This question already has answers here:
How do you pipe input through grep to another utility?
(3 answers)
Closed 7 years ago.
I want to create a script that runs another script for each line that the first one receives piping.
Like this:
journalctl -f | myScript1.sh
this myScript1.sh will run another one like this:
./myScript2.sh $line_in_pipe
Problem I found is every code I tested just runs well in a finite pipe (till EOF).
But when I pipe programs like tail -f or others it just won't execute. I think it just waits for EOF to execute the loop.
EDIT:
the endless pipe is like this:
tail -f /var/log/apache2/access.log | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | script_ip_check.sh
so the idea on script_ip_check.sh is doing something like this:
#!/bin/bash
for line in $(cat); do
echo "process:$line"
nmap -sV -p1234 --open -T4 $line | grep 'open' -B3 | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' >> list_of_ip_mapped &
done
for each line, this case IP, I will spawn a thread of nmap to scan something special on that host.
I will use it to scan IPs that tries to connect some "hidden" port on my server.
So my script must runs all the time till I cancel it or it receives an EOF.
EDIT2:
I just found out that grep flushes its buffer so that's why it's not working.
I I use --line-buffered to force grep to output each line as it's being processed.
We can't say definitively without knowing what's in your script.
For instance, if you're doing this:
# DON'T DO THIS: Violates http://mywiki.wooledge.org/DontReadLinesWithFor
for line in $(cat); do
: ...do something with "$line"...
done
...that'll wait until all stdin is available, resulting in the hang you describe.
However, if you're following best practices (per BashFAQ #1), your code will operate more like this:
while IFS= read -r line; do
: ...do something with "$line"
done
...and that'll actually behave properly, subject to any buffering performed by the writer. For hints on controlling buffering, see BashFAQ #9.
Finally, quoting from DontReadLinesWithFor:
The final issue with reading lines with for is inefficiency. A while read loop reads one line at a time from an input stream; $(<afile) slurps the entire file into memory all at once. For small files, this is not a problem, but if you're reading large files, the memory requirement will be enormous. (Bash will have to allocate one string to hold the file, and another set of strings to hold the word-split results... essentially, the memory allocated will be twice the size of the input file.)
Obviously, if the content is indefinite, the memory requirements and completion time are likewise.

How to make nohup.out update with perl script?

I have a perl script that copies a large amount of files. It prints some text to standard out and also writes a logfile. However, when running with nohup, both of these display a blank file:
tail -f nohup.out
tail -f logfile.log
The files don't update until the script is done running. Moreover, for some reason tailing the .log file does work if I don't use nohup!
I found a similar question for python (
How come I can't tail my log?)
Is there a similar way to flush the output in perl?
I would use tmux or screen, but they don't exist on this server.
Check perldoc,
HANDLE->autoflush( EXPR );
To disable buffering on standard output that would be,
STDOUT->autoflush(1);

Use of tee command promptly even for one command

I am new to using tee command.
I am trying to run one of my program which takes long time to finish but it prints out information as it progresses. I am using 'tee' to save the output to a file as well as to see the output in the shell (bash).
But the problem is tee doesn't forward the output to shell until the end of my command.
Is there any way to do that ?
I am using Debian and bash.
This actually depends on the amount of output and the implementation of whatever command you are running. No program is obliged to print stuff straight to stdout or stderr and flush it all the time. So even though most C runtime implementation flush after a certain amount of data was written using one of the runtime routines, such as printf, this may not be true depending on the implementation.
It tee doesn't output it right away, it is likely only receiving the input at the very end of the run of your command. It might be helpful to mention which exact command it is.
The problem you are experienced is most probably related to buffering.
You may have a look at stdbuf command, which does the following:
stdbuf - Run COMMAND, with modified buffering operations for its standard streams.
If you were to post your usage I could give a better answer, but as it is
(for i in `seq 10`; do echo $i; sleep 1s; done) | tee ./tmp
Is proper usage of the tee command and seems to work. Replace the part before the pipe with your command and you should be good to go.

Resources