Print previous line if condition is met

Print previous line if condition is met - linux

I would like to grep a word and then find the second column in the line and check if it is bigger than a value. Is yes, I want to print the previous line.
Ex:
Input file
AAAAAAAAAAAAA
BB 2
CCCCCCCCCCCCC
BB 0.1
Output
AAAAAAAAAAAAA
Now, I want to search for BB and if the second column (2 or 0.1) in that line is bigger than 1, I want to print the previous line.
Can somebody help me with grep and awk? Thanks. Any other suggestions are also welcome. Thanks.

This can be a way:
$ awk '$1=="BB" && $2>1 {print f} {f=$1}' file
AAAAAAAAAAAAA
Explanation
$1=="BB" && $2>1 {print f} if the 1st field is exactly BB and 2nd field is bigger than 1, then print f, a stored value.
{f=$1} store the current line in f, so that it is accessible when reading the next line.

Another option: reverse the file and print the next line if the condition matches:
tac file | awk '$1 == "BB" && $2 > 1 {getline; print}' | tac

Concerning generality
I think it needs to be mentioned that the most general solution to this class of problem involves two passes:
the first pass to add a decimal row number ($REC) to the front of each line, effectively grouping lines into records by $REC
the second pass to trigger on the first instance of each new value of $REC as a record boundary (resetting $CURREC), thereafter rolling along in the native AWK idiom concerning the records to follow matching $CURREC.
In the intermediate file, some sequence of decimal digits followed by a separator (for human reasons, typically an added tab or space) is parsed (aka conceptually snipped off) as out-of-band with respect to the baseline file.
Command line paste monster
Even confined to the command line, it's an easy matter to ensure that the intermediate file never hits disk. You just need to use an advanced shell such as ZSH (my own favourite) which supports process substitution:
paste <( <input.txt awk "BEGIN { R=0; N=0; } /Header pattern/ { N=1; } { R=R+N; N=0; print R; }" ) input.txt | awk -f yourscript.awk
Let's render that one-liner more suitable for exposition:
P="/Header pattern/"
X="BEGIN { R=0; N=0; } $P { N=1; } { R=R+N; N=0; print R; }"
paste <( <input.txt awk $X ) input.txt | awk -f yourscript.awk
This starts three processes: the trivial inline AWK script, paste, and the AWK script you really wanted to run in the first place.
Behind the scenes, the <() command line construct creates a named pipe and passes the pipe name to paste as the name of its first input file. For paste's second input file, we give it the name of our original input file (this file is thus read sequentially, in parallel, by two different processes, which will consume between them at most one read from disk, if the input file is cold).
The magic named pipe in the middle is an in-memory FIFO that ancient Unix probably managed at about 16 kB of average size (intermittently pausing the paste process if the yourscript.awk process is sluggish in draining this FIFO back down).
Perhaps modern Unix throws a bigger buffer in there because it can, but it's certainly not a scarce resource you should be concerned about, until you write your first truly advanced command line with process redirection involving these by the hundreds or thousands :-)
Additional performance considerations
On modern CPUs, all three of these processes could easily find themselves running on separate cores.
The first two of these processes border on the truly trivial: an AWK script with a single pattern match and some minor bookkeeping, paste called with two arguments. yourscript.awk will be hard pressed to run faster than these.
What, your development machine has no lightly loaded cores to render this master shell-master solution pattern almost free in the execution domain?
Ring, ring.
Hello?
Hey, it's for you. 2018 just called, and wants its problem back.
2020 is officially the reprieve of MTV: That's the way we like it, magic pipes for nothing and cores for free. Not to name out loud any particular TLA chip vendor who is rocking the space these days.
As a final performance consideration, if you don't want the overhead of parsing actual record numbers:
X="BEGIN { N=0; } $P { N=1; } { print N; N=0; }"
Now your in-FIFO intermediate file is annotated with just an additional two characters prepended to each line ('0' or '1' and the default separator character added by paste), with '1' demarking first line in record.
Named FIFOs
Under the hood, these are no different than the magic FIFOs instantiated by Unix when you write any normal pipe command:
cat file | proc1 | proc2 | proc2
Three unnamed pipes (and a whole process devoted to cat you didn't even need).
It's almost unfortunate that the truly exceptional convenience of the default stdin/stdout streams as premanaged by the shell obscures the reality that paste $magictemppipe1 $magictemppipe2 bears no additional performance considerations worth thinking about, in 99% of all cases.
"Use the <() Y-joint, Luke."
Your instinctive reflex toward natural semantic decomposition in the problem domain will herewith benefit immensely.
If anyone had had the wits to name the shell construct <() as the YODA operator in the first place, I suspect it would have been pressed into universal service at least a solid decade ago.

Combining sed & awk you get this:
sed 'N;s/\n/ /' < file |awk '$3>1{print $1}'
sed 'N;s/\n/ / : Combine 1st and 2nd line and replace next line char with space
awk '$3>1{print $1}': print $1(1st column) if $3(3rd column's value is > 1)

Related

Is there a way to trap the output of Linux "selection" commands that do not meet their match criteria?

By "selection" commands, I mean commands that do filtering such as grep, find, etc.
Background
There are at least a few different IBM mainframe environments that support pipeline processing (CMS Pipelines, for example). It's not a shell construct like it is in Bash, but usually a dedicated PIPE command that has its own built-in subcommands (stages) that perform the filtering and data processing.
One of the cooler features, in my opinion, is that "selection stages" that perform some kind of filtering usually support multiple output streams. Those data lines that meet the selection criteria are passed to the primary output stream, and if specified, those that do not are passed to a secondary output stream, where they can undergo an entirely different processing sequence.
Taking the example from the Wikipedia page linked above, which might appear in a REXX program:
'PIPE (END ?) < INPUT TXT', /* read contents of file INPUT TXT */
'|A: LOCATE /Hello/', /* find all lines containing "Hello" */
'|INSERT / World!/ AFTER', /* give those to INSERT to append " World!" */
'|B: FANINANY', /* pass to FANINANY, accepts multiple input streams */
'|> NEWFILE TXT A', /* write all contents to file NEWFILE TXT A */
'?A:', /* end this pipeline, 2nd output of LOCATE goes here */
'|XLATE UPPER', /* translate text to uppercase */
'|B:' /* feed back into FANINANY stage above */
The second occurrence of the label A: connects the second output stream of LOCATE (in this case, lines from the input file that do not match "Hello") to the input stream of XLATE, which converts the data to uppercase and passes that back to the first B: label (FANINANY). FANINANY accepts more than one input stream and will read from all connected input streams simultaneously, preserving the order of the data.
The question mark ? serves as the end character in this example, and tells the command processor "this is the end of the first pipeline" so that whatever follows can be used to independently connect another pipeline to other labeled stages, allowing you to specify the entire pipe in one command.
Example INPUT TXT file:
foo
Hello
bar
Hello
baz
Hello
After this PIPE, the NEWFILE TXT A file would then contain:
FOO
Hello World!
BAR
Hello World!
BAZ
Hello World!
Question
My main question is: is it possible to achieve something like this in Bash?
I think the framework is in place with named pipes (mkfifo, etc.) and process substitution (both of which I am familiar with).
But, the critical piece of the puzzle is: I assume that whether a Linux/UNIX command will echo all of its output to different places depends on the individual command and whether it was written to do that. If not, I suspect the code would have to be modified, after which point I could conceivably use Bash constructs to achieve this kind of thing.

Unix (/Linux) programs don't generally support multiple output streams, although with some like awk and perl you could certainly write scripts that send output to multiple places. Here's a simple awk script that sends matching lines to stdout, and non-matching ones to a named pipe:
awk -v nomatch="/path/to/pipe" '{if ($0 ~ /Hello/) {print} else {print $0 > nomatch}}'
But it sounds like you want to recombine the streams in a coherent way (i.e. keeping the line order), and there's no good way to do that, since the data flows independently through each stream (maybe at different speeds, with independent buffering etc).
But it sounds like what you really want is just a single program that transforms different lines differently, and there are a number of unix programs that'll do that. In roughly increasing order of power and flexibility (and complexity), the standard ones are sed, awk, and perl, but you can use pretty much anything (even bash itself!). Here's an example in awk:
awk '{if ($0 ~ /Hello/) {print $0 " World!"} else {print toupper($0)}}' input.txt >newfile.txt
And here's the equivalent in bash itself:
while IFS= read -r line; do
if [[ "$line" =~ Hello ]]; then
printf '%s World!\n' "$line"
else
tr "[:lower:]" "[:upper:]" <<<"$line"
fi
done <input.txt >newfile.txt
(with newer versions of bash, you could replace that tr command with just printf '%s\n' "${line^^}")

You can easily do both transformations in a single command with GNU sed's negative match operator:
$ cd "$(mktemp --directory)"
$ cat > input.txt <<'EOF'
foo
Hello
bar
Hello
baz
Hello
EOF
$ sed '/Hello/! s/\(.*\)/\U\1/;s/\(Hello\)/\1 World!/;' input.txt
FOO
Hello World!
BAR
Hello World!
BAZ
Hello World!
I don't think there's an efficient way to send part of standard input to one command and part of it to another command (relevant question 1, 2), since any command which filters standard input will necessarily also consume all of it. Something like
while IFS= read -r line
do
if [[ "$line" == 'Hello' ]]
then
echo "${line} World!"
else
echo "${line^^}"
fi
done < input.txt
is unfortunately very slow.
As an alternative you can send the same file to two different input streams using COMMAND < input.txt 3< input.txt.

Bash Scripting, Reading From A File [duplicate]

This question already has answers here:
While loop stops reading after the first line in Bash
(5 answers)
Closed 2 years ago.
I'm trying to select lines that have F starting them from my .txt file, and then find the average of the numbers, here's my code, I don't know why it's not adding up.
#!/bin/bash
function F1()
{
count=1;
total=0;
file='users.txt'
while read line; do
if (grep \^"F");
then
for i in $( awk '{ print $3; }')
do
total=$(echo $total+$i )
var=$((var+1))
done
fi
echo "scale=2; $total / $count"
echo $line
done < $file
}
F1
my output

If you are using Awk anyway, use its capabilities.
awk '/^F/ { sum+=$3; count++ } END { print (count ? sum/count : 0) }' users.txt
This is a standard Awk idiom which will typically be an exercise within the first hour of any slow-paced basic beginner Awk tutorial, or within ten minutes of more directed learning.
Your shell script had at least the following errors;
grep without an argument will read in the rest of your input lines in one go. Running it in a subshell (i.e. inside parentheses) does nothing useful, and costs you a process.
The shell does not perform any arithmetic unless you separately request it; variables are simply strings. If you want to add two numbers, you need to explicitly use an arithmetic evaluation like ((total+=i)) ... like you already did in another place.
read without any options will mangle backlashes in your input; if you don't specifically require this behavior, you should always use read -r.
All except one of those semicolons are useless.
You should generally quote all your variables; see When to wrap quotes around a shell variable
If you are using Awk anyway, you should probably use it for as much as possible. The shell is very slow and inefficient when it comes to processing a file line by line, and horribly clunky when it comes to integer arithmetic (let alone then dividing numbers which are not even multiples of each other).
This is probably not complete; also try http://shellcheck.net/ before asking for human assistance.
On Stack Overflow, try to reduce your problem so that it really only asks a single question. There are duplicate questions for all of these issues, but I'll just mark this as a duplicate of the one you are actually asking about.

You don't perform an addition for the variable total. On the first iteration, the line
total=$(echo $total+$i )
(assuming that i, for instance, is 4711) is expanded to
total=0+4711
So the variable total is set to a 6-character string, not a number. To actually add here, you would have to write
((total = total + i))

Grep multiple expressions in one pass, output matches to each expression in separate file

I want to use grep on the top utility. Iterating through top 5 times, here are my criteria:
grep two different expressions in a single pass on the top output
expression 1: grep for the line on overall cpu usage, then output it to file: cpu_stats.txt
expression 2: grep for the line on overall memory usage, then output it to file: memory_stats.txt
Here is what I have now:
top -b -n 5 | egrep "\%Cpu\(s\):|KiB Mem :" > both_cpu_and_memory.txt
This successfully grabs the desired top output, but notice it is putting both expression matches in the exact same file.
Where I am stuck: I do not know how to, in a single pass, output the matches from one expression to one file, and how to output the matches from the other expression to another file.
Is this possible? Is it possible to, in one pass, grep multiple expressions, and matches for each expression is outputted to a separate file?

Unless you do something convoluted with saving the output in a temporary file and making a couple of passes over it, you can't do what you want with grep. It's really easy with awk, though:
top -b -n 5 | awk '/%Cpu\(s\):/ { print > "cpu_stats.txt" }
/KiB Mem :/ { print > "memory_stats.txt" }'

grep cannot do what you ask. This is a job for awk, or indeed any other scripting language capable of parsing or capable of using regular expressions.

How can I remove lines that contain more than N words

Is there a good one-liner in bash to remove lines containing more than N words from a file?
example input:
I want this, not that, but thank you it is very nice of you to offer.
The very long sentence finding form ordering system always and redundantly requires an initial, albeit annoying and sometimes nonsensical use of commas, completion of the form A-1 followed, after this has been processed by the finance department and is legal, by a positive approval that allows for the form B-1 to be completed after the affirmative response to the form A-1 is received.
example output:
I want this, not that, but thank you it is very nice of you to offer.
In Python I would code something like this:
if len(line.split()) < 40:
print line

To only show lines containing less than 40 words, you can use awk:
awk 'NF < 40' file
Using the default field separator, each word is treated as a field. Lines with less than 40 fields are printed.

Note this answer assumes the first approach of the question: how to print those lines being shorter than a given number of characters
Use awk with length():
awk 'length($0)<40' file
You can even give the length as a parameter:
awk -v maxsize=40 'length($0) < maxsize' file
A test with 10 characters:
$ cat a
hello
how are you
i am fine but
i would like
to do other
things
$ awk 'length($0)<10' a
hello
things
If you feel like using sed for this, you can say:
sed -rn '/^.{,39}$/p' file
This checks if the line contains less than 40 characters. If so, it prints it.

What are the differences among grep, awk & sed? [duplicate]

This question already has answers here:
What are the differences between Perl, Python, AWK and sed? [closed]
(5 answers)
What is the difference between sed and awk? [closed]
(3 answers)
Closed last month.
I am confused about the differences between grep, awk and sed in terms of their role in Unix/Linux system administration and text processing.

Short definition:
grep: search for specific terms in a file
#usage
$ grep This file.txt
Every line containing "This"
Every line containing "This"
Every line containing "This"
Every line containing "This"
$ cat file.txt
Every line containing "This"
Every line containing "This"
Every line containing "That"
Every line containing "This"
Every line containing "This"
Now awk and sed are completly different than grep.
awk and sed are text processors. Not only do they have the ability to find what you are looking for in text, they have the ability to remove, add and modify the text as well (and much more).
awk is mostly used for data extraction and reporting. sed is a stream editor
Each one of them has its own functionality and specialties.
Example
Sed
$ sed -i 's/cat/dog/' file.txt
# this will replace any occurrence of the characters 'cat' by 'dog'
Awk
$ awk '{print $2}' file.txt
# this will print the second column of file.txt
Basic awk usage:
Compute sum/average/max/min/etc. what ever you may need.
$ cat file.txt
A 10
B 20
C 60
$ awk 'BEGIN {sum=0; count=0; OFS="\t"} {sum+=$2; count++} END {print "Average:", sum/count}' file.txt
Average: 30
I recommend that you read this book: Sed & Awk: 2nd Ed.
It will help you become a proficient sed/awk user on any unix-like environment.

Grep is useful if you want to quickly search for lines that match in a file. It can also return some other simple information like matching line numbers, match count, and file name lists.
Awk is an entire programming language built around reading CSV-style files, processing the records, and optionally printing out a result data set. It can do many things but it is not the easiest tool to use for simple tasks.
Sed is useful when you want to make changes to a file based on regular expressions. It allows you to easily match parts of lines, make modifications, and print out results. It's less expressive than awk but that lends it to somewhat easier use for simple tasks. It has many more complicated operators you can use (I think it's even turing complete), but in general you won't use those features.

I just want to mention a thing, there are many tools can do text processing, e.g.
sort, cut, split, join, paste, comm, uniq, column, rev, tac, tr, nl, pr, head, tail.....
they are very handy but you have to learn their options etc.
A lazy way (not the best way) to learn text processing might be: only learn grep , sed and awk. with this three tools, you can solve almost 99% of text processing problems and don't need to memorize above different cmds and options. :)
AND, if you 've learned and used the three, you knew the difference. Actually, the difference here means which tool is good at solving what kind of problem.
a more lazy way might be learning a script language (python, perl or ruby) and do every text processing with it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string