Bash function can indent piped lines multiple times without waiting for whole input - linux

What I want to achieve:
Define a function that can be used to pipe input, such as echo input | my_function
This function modifies every inputted line, e.g., it adds indentation at the beginning.
This function can be reused twice and in that case does its modification (double indent) twice, e.g. echo input | my_function | my_function results in \t\tinput.
This function does not wait for whole input to be supplied, it can print out the line directly without seeing all input.
As for my test, please see the following script:
#!/usr/bin/env bash
main() {
echo 'first:'
echo 'once' | tab_indent_to_right
echo 'twice' | tab_indent_to_right | tab_indent_to_right
{
echo 'wait 2 sec'
sleep 2
echo 'wait 2 sec'
sleep 2
echo 'waited'
} | tab_indent_to_right
}
tab_indent_to_right() {
# while read -r line; do echo $'\t'"$line"; done # πŸ”΄ double indent not working
# awk -v prefix='\t' '{print prefix $0}' # πŸ”΄ buffer not working
# sed 's/^/\t/' # πŸ”΄ buffer not working
# xargs -I {} echo $'\t{}' # πŸ”΄ double indent not working
# xargs -L1 echo $'\t' # πŸ”΄ double indent not working
}
main
Each line in tab_indent_to_right is my failed attempt to solve the problem. They have two different problems, either:
Double indent is not working e.g., tab_indent_to_right | tab_indent_to_right
Or, the lines are not being flushed/printed directly but instead being buffered. In other words, the function waits for all sleeps and prints out everything at once, instead of printing the lines as they come.
How can I create this function so both double calls gives me the modification I want and also the script does not wait for complete execution of the piped shell?

I suggest to set IFS to nothing to keep spaces/tabs.
tab_indent_to_right() {
while IFS= read -r line; do echo $'\t'"$line"; done
}

In recentish versions of awk, you can use the fflush() function to force it to send output lines immediately:
awk -v prefix='\t' '{print prefix $0; fflush()}'
Also, depending on the version of sed you have, you may be able to add either the -u flag (for "unbuffered"; GNU sed supports this) or the -l flag ("line buffered"; bsd sed supports this). If you have some other version of sed... check its man page to see if it has a similar option.
BTW, the bsd version of sed doesn't support \t for tab; but if you use $'s/^/\t/', bash will convert \t to tab before passing it to sed. This works with both versions of sed, but not with shells that don't support ANSI quoting mode (i.e. dash).

Related

quickest way to select/copy lines containing string from huge txt.gz file

So I have the following sed one liner:
sed -e '/^S|/d' -e '/^T|/d' -e '/^#D=/d' -e '/^##/d' -e 's/H|/,H|/g' -e 's/Q|/,,Q|/g' -e '1 i\,,,' sample_1.txt > sample_2.txt
I have many lines that start with either:
S|
T|
#D=
##
H|
Q|
The idea is to not copy the lines starting with one of the first fours and
to replace H| (at the beginning of lines) by ,H| and Q| (at the beginning of lines) by ,,Q|
But now I would need to:
use the fastest way possible (internet suggests (m)awk is faster than sed)
read from a .txt.gz file and save the result in a .txt.gz file, avoiding, if possible, the intermediate un-zip/re-zip
there are in fact several hundreds .txt.gz files, each about ~1GB, to process in this way (all in the same folder). Is there a CLI way to run the code on parallel on all of them (so each core will get assigned a subset of the files in the directory)?
--I use linux --ubuntu
Untested, but likely pretty close to this with GNU Parallel.
First make output directory so as not to overwrite any valuable data:
mkdir -p output
Now declare a function that does one file and export it to subprocesses so jobs started by GNU Parallel can find it:
doit(){
echo Processing $1
gzcat "$1" | awk '
/^[ST]\|/ || /^#D=/ || /^##/ {next} # ignore lines starting S|, T| etc
/^H\|/ {print ","} # prefix "H|" with ","
/^Q\|/ {print ",,"} # prefix "Q|" with ",,"
1 # print all other lines
' | gzip > output/"$1"
}
export -f doit
Now process all txt.gz files in parallel and show progress bar too:
parallel --bar doit ::: *txt.gz
Was something like this what you had in mind?
#!/bin/bash
export LC_ALL=C
zcat sample_1.txt.gz | gawk '
$1 !~ /^([ST]\||#D=|##)/ {
switch ($0) {
case /^H\|/:
print "," $0
break
case /^Q\|/:
print ",," $0
break
default:
print $0
}
}' | gzip > sample_2.txt.gz
The export LC_ALL=C tells your environment you aren't expecting extended characters, and can profoundly speed up execution. zcat expands and dumps a gz file to stdout. That is piped into gawk, which checks that the first part of each line does not match the first four character groupings you have in your question. For lines that pass that test, output to stdout (massaged as requested). As gawk executes, its stdout gets piped into gzip and written to a .txt.gz file.
It might be possible to use xargs with the -P and -n switches to parallelize your processing, but I think GNU parallel might be easier to work with.

Echo output of a piped command

I am trying to just echo a command within my bash script code.
OVERRUN_ERRORS="$ifconfig | egrep -i "RX errors" | awk '{print $7}'"
echo ${OVERRUN_ERRORS}
however it gives me an error and the $7 does not show up in the command. I have to store it in a variable, because I will process the output (OVERRUN_ERRORS) at a later point in time. What's the right syntax for doing this? Thanks.
On Bash Syntax
foo="bar | baz"
...is assigning the string "bar | baz" to the variable named foo; it doesn't run bar | baz as a pipeline. To do that, you want to use command substitution, in either its modern $() syntax or antiquated backtick-based form:
foo="$(bar | baz)"
On Storing Code For Later Execution
Since your intent isn't clear in the question --
The correct way to store code is with a function, whereas the correct way to store output is in a string:
# store code in a function; this also works with pipelines
get_rx_errors() { cat /sys/class/net/"$1"/statistics/rx_errors; }
# store result of calling that function in a string
eth0_errors="$(get_rx_errors eth0)"
sleep 1 # wait a second for demonstration purposes, then...
# compare: echoing the stored value, vs calculating a new value
echo "One second ago, the number of rx errors was ${eth0_errors}"
etho "Right now, it is $(get_rx_errors eth0)"
See BashFAQ #50 for an extended discussion of the pitfalls of storing code in a string, and alternatives to same. Also relevant is BashFAQ #48, which describes in detail the security risks associated with a eval, which is often suggested as a workaround.
On Collecting Interface Error Counts
Don't use ifconfig, or grep, or awk for this at all -- just ask your kernel for the number you want:
#!/bin/bash
for device in /sys/class/net/*; do
[[ -e $device/statistics/rx_errors ]] || continue
rx_errors=$(<"${device}/statistics/rx_errors")
echo "Number of rx_errors for ${device##*/} is $rx_errors"
done
Use $(...) to capture the output of a command, not double quotes.
overrun_errors=$(ifconfig | egrep -i "RX errors" | awk '{print $7}')
Your double quotes around RX errors are a problem. Try;
OVERRUN_ERRORS="$ifconfig | egrep -i 'RX errors' | awk '{print $7}'"
To see the commands as they are executing, you can use
set -v
or
set -x
For example;
set -x
OVERRUN_ERRORS="$ifconfig | egrep -i 'RX errors' | awk '{print $7}'"
set +x

Bash shell script update and print a variable overwriting the same line [duplicate]

This question already has answers here:
Displaying only single most recent line of a command's output
(2 answers)
Closed 5 years ago.
I been trying to print a variable in the same time for a scrip that pretends automatize a process the content is the output of this
sed "s/Read/\n/g" /tmp/Air/test.txt | tail -1 test.txt | grep ARP
so i put this in a while loop
do
out= sed "s/Read/\n/g" /tmp/Air/test.txt | tail -1 test.txt | grep ARP
echo -n "$out"
sleep 1
done
i read other questions here and i try with different option like echo -ne, echo -ne "$out" \r, printf "\r" or printf "%s" and no luck with no one, all the other example don't have a variable to print just counter o system variables
Update
it seems to appear that the echo -n repeat $out in the same line, if out="this is a test" the output of echo -n is "this is a test this is a test this is a test this is a test ...." maybe im missing some option ?
Update 2
sorry for the miss understood perhaps i was not very clear but what i want is overwrite the same line with the value of $out, the source of $out is the output of the aireplay-ng command that executes along with the script and i get the output with
the ouput is something like this
102415 packets (got 5 ARP requests and 15438 ACKs), sent 37085 packets...(499 pps)
but the number of ARP request is changing constantly
this code for example use echo -ne and overwrite in the same line
#!/bin/bash
for pc in $(seq 1 100); do
echo -ne "$pc%\033[0K\r"
sleep 1
done
the output of this is like a percent indicator that shows "10%" and going instead of "1% 2% 3% 4% 5% .." in the same line and i already try like this but with no luck
if you are trying to execute the sed Please use
`sed "s/Read/\n/g" /tmp/Air/test.txt | tail -1 test.txt | grep ARP`
First of all you are assigning a value of bash command wrongly to variable.
out=$(sed "s/Read/\n/g" /tmp/Air/test.txt | tail -1 test.txt | grep ARP)
Then you can print all your output in one line as you wrote:-
echo -n $out
The recent addendum to your question reads like you're miscommunicating your intent: this is a test this is a test this is a test is what a plain reading of your question indicates you to be asking for (printing this is a test over and over in a loop without newlines, after all, can be expected to do nothing else); why you'd describe this in a context that makes it sound like a bug is thus surprising.
If you want to send the cursor back to the beginning of your current line and overwrite that line, that might be something like the following:
#!/bin/bash
# ^^^^ not /bin/sh; this enables bash extensions
# ask the shell to keep $COLUMNS up-to-date
shopt -s checkwinsize
# defaults to 80-character terminal width, but uses $COLUMNS if available
printf "%-${COLUMNS:-80}s\r" "$out"`
...which prints your string, pads out to 80 characters with spaces, and then returns the cursor to the beginning of the line, such that the next thing you write will overwrite that string.
Of course, if you print that line and then return to a shell prompt, the prompt will start at the beginning of the same line and overwrite the text, so be sure to follow up with an echo.

How to stop sed from buffering?

I have a program that writes to fd3 and I want to process that data with grep and sed. Here is how the code looks so far:
exec 3> >(grep "good:"|sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
Nothing is output until I do a
exec 3>&-
Then, everything that I wanted finally arrives as I expected:
I got: data2
It seems to reply immediately if I use only a grep or only a sed, but mixing them seems to cause some sort of buffering. How can I get immediate output from fd3?
I think I found it. For some reason, grep doesn't automatically do line buffering. I added a --line-buffered option to grep and now it responds immediately.
You only need to tell grep and sed to not bufferize lines:
grep --line-buffered
and
sed -u
An alternate means to stop sed from buffering is to run it through the s2p sed-to-Perl translator and insert a directive to have it command-buffered, perhaps like
BEGIN { $| = 1 }
The other reason to do this is that it gives you the more convenient notation from EREs instead of the backslash-annoying legacy BREs. You also get the full complement of Unicode properties, which is often critical.
But you don’t need the translator for such a simple sed command. And you do not need both grep and sed, either. These all work:
perl -nle 'BEGIN{$|=1} if (/good:/) { s/.*:(.*)/I got: $1/; print }'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:(.*)/I got: $1/; print'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:/I got: /; print'
Now you also have access to the minimal quantifier, *?, +?, ??, {N,}?, and {N,M}?. These now allow things like .*? or \S+? or [\p{Pd}.]??, which may well be preferable.
You can merge the grep into the sed like so:
exec 3> >(sed -une '/^good:/s//I got: /p')
echo "bad:data1">&3
echo "good:data2">&3
Unpacking that a bit: You can put a regexp (between slashes as usual) before any sed command, which makes it only be applied to lines that match that regexp. If the first regexp argument to the s command is the empty string (s//whatever/) then it will reuse the last regexp that matched, which in this case is the prefix, so that saves having to repeat yourself. And finally, the -n option tells sed to print only what it is specifically told to print, and the /p suffix on the s command tells it to print the result of the substitution.
The -e option is not strictly necessary but is good style, it just means "the next argument is the sed script, not a filename".
Always put sed scripts in single quotes unless you need to substitute a shell variable in there, and even then I would put everything but the shell variable in single quotes (the shell variable is, of course, double-quoted). You avoid a bunch of backslash-related grief that way.
On a Mac, brew install coreutils and use gstdbuf to control buffering of grep and sed.
Turn off buffering in pipe seems to be the easiest and most generic answer. Using stdbuf (coreutils) :
exec 3> >(stdbuf -oL grep "good:" | sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
I got: data2
Buffering has other dependencies, for example depending on mawk either gawk reading this pipe :
exec 3> >(stdbuf -oL grep "good:" | awk '{ sub(".*:", "I got: "); print }')
In that case, mawk would retain the input, gawk wouldn't.
See also How to fix stdio buffering

Quick unix command to display specific lines in the middle of a file?

Trying to debug an issue with a server and my only log file is a 20GB log file (with no timestamps even! Why do people use System.out.println() as logging? In production?!)
Using grep, I've found an area of the file that I'd like to take a look at, line 347340107.
Other than doing something like
head -<$LINENUM + 10> filename | tail -20
... which would require head to read through the first 347 million lines of the log file, is there a quick and easy command that would dump lines 347340100 - 347340200 (for example) to the console?
update I totally forgot that grep can print the context around a match ... this works well. Thanks!
I found two other solutions if you know the line number but nothing else (no grep possible):
Assuming you need lines 20 to 40,
sed -n '20,40p;41q' file_name
or
awk 'FNR>=20 && FNR<=40' file_name
When using sed it is more efficient to quit processing after having printed the last line than continue processing until the end of the file. This is especially important in the case of large files and printing lines at the beginning. In order to do so, the sed command above introduces the instruction 41q in order to stop processing after line 41 because in the example we are interested in lines 20-40 only. You will need to change the 41 to whatever the last line you are interested in is, plus one.
# print line number 52
sed -n '52p' # method 1
sed '52!d' # method 2
sed '52q;d' # method 3, efficient on large files
method 3 efficient on large files
fastest way to display specific lines
with GNU-grep you could just say
grep --context=10 ...
No there isn't, files are not line-addressable.
There is no constant-time way to find the start of line n in a text file. You must stream through the file and count newlines.
Use the simplest/fastest tool you have to do the job. To me, using head makes much more sense than grep, since the latter is way more complicated. I'm not saying "grep is slow", it really isn't, but I would be surprised if it's faster than head for this case. That'd be a bug in head, basically.
What about:
tail -n +347340107 filename | head -n 100
I didn't test it, but I think that would work.
I prefer just going into less and
typing 50% to goto halfway the file,
43210G to go to line 43210
:43210 to do the same
and stuff like that.
Even better: hit v to start editing (in vim, of course!), at that location. Now, note that vim has the same key bindings!
You can use the ex command, a standard Unix editor (part of Vim now), e.g.
display a single line (e.g. 2nd one):
ex +2p -scq file.txt
corresponding sed syntax: sed -n '2p' file.txt
range of lines (e.g. 2-5 lines):
ex +2,5p -scq file.txt
sed syntax: sed -n '2,5p' file.txt
from the given line till the end (e.g. 5th to the end of the file):
ex +5,p -scq file.txt
sed syntax: sed -n '2,$p' file.txt
multiple line ranges (e.g. 2-4 and 6-8 lines):
ex +2,4p +6,8p -scq file.txt
sed syntax: sed -n '2,4p;6,8p' file.txt
Above commands can be tested with the following test file:
seq 1 20 > file.txt
Explanation:
+ or -c followed by the command - execute the (vi/vim) command after file has been read,
-s - silent mode, also uses current terminal as a default output,
q followed by -c is the command to quit editor (add ! to do force quit, e.g. -scq!).
I'd first split the file into few smaller ones like this
$ split --lines=50000 /path/to/large/file /path/to/output/file/prefix
and then grep on the resulting files.
If your line number is 100 to read
head -100 filename | tail -1
Get ack
Ubuntu/Debian install:
$ sudo apt-get install ack-grep
Then run:
$ ack --lines=$START-$END filename
Example:
$ ack --lines=10-20 filename
From $ man ack:
--lines=NUM
Only print line NUM of each file. Multiple lines can be given with multiple --lines options or as a comma separated list (--lines=3,5,7). --lines=4-7 also works.
The lines are always output in ascending order, no matter the order given on the command line.
sed will need to read the data too to count the lines.
The only way a shortcut would be possible would there to be context/order in the file to operate on. For example if there were log lines prepended with a fixed width time/date etc.
you could use the look unix utility to binary search through the files for particular dates/times
Use
x=`cat -n <file> | grep <match> | awk '{print $1}'`
Here you will get the line number where the match occurred.
Now you can use the following command to print 100 lines
awk -v var="$x" 'NR>=var && NR<=var+100{print}' <file>
or you can use "sed" as well
sed -n "${x},${x+100}p" <file>
With sed -e '1,N d; M q' you'll print lines N+1 through M. This is probably a bit better then grep -C as it doesn't try to match lines to a pattern.
Building on Sklivvz' answer, here's a nice function one can put in a .bash_aliases file. It is efficient on huge files when printing stuff from the front of the file.
function middle()
{
startidx=$1
len=$2
endidx=$(($startidx+$len))
filename=$3
awk "FNR>=${startidx} && FNR<=${endidx} { print NR\" \"\$0 }; FNR>${endidx} { print \"END HERE\"; exit }" $filename
}
To display a line from a <textfile> by its <line#>, just do this:
perl -wne 'print if $. == <line#>' <textfile>
If you want a more powerful way to show a range of lines with regular expressions -- I won't say why grep is a bad idea for doing this, it should be fairly obvious -- this simple expression will show you your range in a single pass which is what you want when dealing with ~20GB text files:
perl -wne 'print if m/<regex1>/ .. m/<regex2>/' <filename>
(tip: if your regex has / in it, use something like m!<regex>! instead)
This would print out <filename> starting with the line that matches <regex1> up until (and including) the line that matches <regex2>.
It doesn't take a wizard to see how a few tweaks can make it even more powerful.
Last thing: perl, since it is a mature language, has many hidden enhancements to favor speed and performance. With this in mind, it makes it the obvious choice for such an operation since it was originally developed for handling large log files, text, databases, etc.
print line 5
sed -n '5p' file.txt
sed '5q' file.txt
print everything else than line 5
`sed '5d' file.txt
and my creation using google
#!/bin/bash
#removeline.sh
#remove deleting it comes move line xD
usage() { # Function: Print a help message.
echo "Usage: $0 -l LINENUMBER -i INPUTFILE [ -o OUTPUTFILE ]"
echo "line is removed from INPUTFILE"
echo "line is appended to OUTPUTFILE"
}
exit_abnormal() { # Function: Exit with error.
usage
exit 1
}
while getopts l:i:o:b flag
do
case "${flag}" in
l) line=${OPTARG};;
i) input=${OPTARG};;
o) output=${OPTARG};;
esac
done
if [ -f tmp ]; then
echo "Temp file:tmp exist. delete it yourself :)"
exit
fi
if [ -f "$input" ]; then
re_isanum='^[0-9]+$'
if ! [[ $line =~ $re_isanum ]] ; then
echo "Error: LINENUMBER must be a positive, whole number."
exit 1
elif [ $line -eq "0" ]; then
echo "Error: LINENUMBER must be greater than zero."
exit_abnormal
fi
if [ ! -z $output ]; then
sed -n "${line}p" $input >> $output
fi
if [ ! -z $input ]; then
# remove this sed command and this comes move line to other file
sed "${line}d" $input > tmp && cp tmp $input
fi
fi
if [ -f tmp ]; then
rm tmp
fi
You could try this command:
egrep -n "*" <filename> | egrep "<line number>"
Easy with perl! If you want to get line 1, 3 and 5 from a file, say /etc/passwd:
perl -e 'while(<>){if(++$l~~[1,3,5]){print}}' < /etc/passwd
I am surprised only one other answer (by Ramana Reddy) suggested to add line numbers to the output. The following searches for the required line number and colours the output.
file=FILE
lineno=LINENO
wb="107"; bf="30;1"; rb="101"; yb="103"
cat -n ${file} | { GREP_COLORS="se=${wb};${bf}:cx=${wb};${bf}:ms=${rb};${bf}:sl=${yb};${bf}" grep --color -C 10 "^[[:space:]]\\+${lineno}[[:space:]]"; }

Resources