linux|awk|shell script block deletion - linux

My input file has blocks like below. Please help me deleting the block and its contents using awk or sed
[abc]
para1=123
para2=456
para3=111
[pqr]
para1=333
para2=765
para3=1345
[xyz]
para1=888
para2=236
para3=964
now how do i delete a block and its parameters completely .Please help me achieve this with awk command.Thanks in advance

You can use RS for split blocks, (NOTE: NR>1 because awk generate a empty block in beginning)
awk -vRS='[' -v remove="pqr" '
NR>1 && $0 !~ "^"remove"]" {printf "%s", "["$0; }
' file
you get,
[abc]
para1=123
para2=456
para3=111
[xyz]
para1=888
para2=236
para3=964

Depends on how you want to filter. If you want to delete the block with the header '[pqr]'
awk '!/^\[pqr\]/' RS= ORS='\n\n' input
or
awk '$1 !~ "[pqr]"' RS= ORS='\n\n' input
If you want to omit the 2nd record (the same as above)
awk 'NR!=2' RS= ORS='\n\n' input
If you want to omit the record in which para2=765,
awk '$3 !~ "765"' RS= ORS='\n\n' input

Perl solution to remove block [abc]
perl -lne 'BEGIN{$/=""} print "$_\n" unless /^\[abc\]/' file
-n loop around every line of the input file, put the line in the $_ variable, do not automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
$/ is the input record separator. Setting it to "" in a BEGIN{} block puts Perl into paragraph mode.
$_ is the current line.
/^/ is a regular expression which begins with the search term
output:
[pqr]
para1=333
para2=765
para3=1345
[xyz]
para1=888
para2=236
para3=964
This variation enables argument parsing with -s and passes [abc] to variable $b
perl -slne 'BEGIN{$/=""} print "$_\n" unless /^$b/' -- -b='\[abc\]'

I propose a slightly different solution using only shell.
#!/bin/sh
# specify the block to withhold
WITHHOLD=2
COUNT=1
INAWHITESP=0
while read i
do if [ -z "$i" -a "$INAWHITESP" -eq 0 ]
then COUNT=$(( COUNT + 1 ))
INAWHITESP=1
fi
if [ -n "$i" -a "$INAWHITESP" -eq 1 ]
then INAWHITESP=0
fi
if [ "$COUNT" -ne "$WITHHOLD" ]
then printf "%s\n" "$i"
fi
done < inputfile > outputfile

To remove block abc
awk 'BEGIN{RS=""} !/\[abc\]/'

Related

Bash function with input fails awk command

I am writing a function in a BASH shell script, that should return lines from csv-files with headers, having more commas than the header. This can happen, as there are values inside these files, that could contain commas. For quality control, I must identify these lines to later clean them up. What I have currently:
#!/bin/bash
get_bad_lines () {
local correct_no_of_commas=$(head -n 1 $1/$1_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $1 | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$1/$1_0_${i}_0.csv" ]; then
echo "File: $1_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "$1_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$1/$1_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk '$1 > $correct_no_of_commas {print}'
done
}
get_bad_lines products
get_bad_lines users
The output of this program is now all the comma-counts with all of the line numbers in all the files,
and I suspect this is due to the input $1 (foldername, i.e. products & users) conflicting with the call to awk with reference to $1 as well (where I wish to grab the first column being the count of commas for that line in the current file in the loop).
Is this the issue? and if so, would it be solvable by either referencing the 1.st column or the folder name by different variable names instead of both of them using $1 ?
Example, current output:
5 6667
5 6668
5 6669
5 6670
(should only show lines for that file having more than 5 commas).
Tried variable declaration in call to awk as well, with same effect
(as in the accepted answer to Awk field variable clash with function argument)
:
get_bad_lines () {
local table_name=$1
local correct_no_of_commas=$(head -n 1 $table_name/${table_name}_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $table_name | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$table_name/${table_name}_0_${i}_0.csv" ]; then
echo "File: ${table_name}_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "${table_name}_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$table_name/${table_name}_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk -v table_name="$table_name" '$1 > $correct_no_of_commas {print}'
done
}
You can use awk the full way to achieve that :
get_bad_lines () {
find "$1" -maxdepth 1 -name "$1_0_*_0.csv" | while read -r my_file ; do
awk -v table_name="$1" '
NR==1 { num_comma=gsub(/,/, ""); }
/,/ { if (gsub(/,/, ",", $0) > num_comma) wrong_array[wrong++]=NR":"$0;}
END { if (wrong > 0) {
print(FILENAME" has over "num_comma" commas in the following lines:");
for (i=0;i<wrong;i++) { print(wrong_array[i]); }
}
}' "${my_file}"
done
}
For why your original awk command failed to give only lines with too many commas, that is because you are using a shell variable correct_no_of_commas inside a single quoted awk statement ('$1 > $correct_no_of_commas {print}'). Thus there no substitution by the shell, and awk read "$correct_no_of_commas" as is, and perceives it as an undefined variable. More precisely, awk look for the variable correct_no_of_commas which is undefined in the awk script so it is an empty string . awk will then execute $1 > $"" as matching condition, and as $"" is a $0 equivalent, awk will compare the count in $1 with the full input line. From a numerical point of view, the full input line has the form <tab><count><tab><num_line>, so it is 0 for awk. Thus, $1 > $correct_no_of_commas will be always true.
You can identify all the bad lines with a single awk command
awk -F, 'FNR==1{print FILENAME; headerCount=NF;} NF>headerCount{print} ENDFILE{print "#######\n"}' /path/here/*.csv
If you want the line number also to be printed, use this
awk -F, 'FNR==1{print FILENAME"\nLine#\tLine"; headerCount=NF;} NF>headerCount{print FNR"\t"$0} ENDFILE{print "#######\n"}' /path/here/*.csv

how to print lines with specific column matching members of array in bash

#!/bin/bash
awk '$1 == "abc" {print}' file # print lines first column matching "abc"
How to print lines when the first column matching members of array("12" or "34" or "56")?
#!/bin/bash
ARR=("12" "34" "56")
Add
Also, how to print lines when the first column exactly matching members of array("12" or "34" or "56")?
You could use bash to interpolate the string to a regex pattern used in Awk, by changing the IFS value to a | character and do array expansion as below:
ARR=("12" "34" "56")
regex=$( IFS='|'; echo "${ARR[*]}" )
awk -v str="$regex" '$1 ~ str' file
The array expansion converts the list elements to a string delimited with |, for e.g. 12|34|56 in this case.
The $() runs in the sub-shell do that the value of IFS is not reflcted in the parent shell. You could make it in one line as
awk -v str="$( IFS='|'; echo "${ARR[*]}" )" '$1 ~ str' file
OP had also asked for an exact match of the strings from the array in the file, in that case using grep with its ERE support can do the job
regex=$( IFS='|'; echo "${ARR[*]}" )
egrep -w "$regex" file
(or)
grep -Ew "$regex" file
awk one-liner
awk -v var="${ARR[*]}" 'BEGIN{split(var,array," "); for(i in array) a[array[i]] } ($1 in a){print $0}' file
The following code does the trick:
awk 'BEGIN{myarray [0]="aaa";myarray [1]="bbb"; test=0 }{
test=0;
for ( x in myarray ) {
if($1 == myarray[x]){
test=1;
break;
}
}
if(test==0) print}'
If you need to pass a variable to awk use the -v option, however for array it is a bit tricker but the following syntax should work.
A=( $( ls -1p ) ) #example of list to be passed to awk (to be adapted to your needs)
awk -v var="$A" 'BEGIN{split(var,list,"\n")}END{ for (i in list) print i}'
Near the same as Inian
ARR=("34" "56" "12");regex=" ${ARR[*]} ";regex="'^${regex// /\\|^}'";grep -w $regex infile

AWK how to put the output of a command to a variable

I am trying to get the number of the line where the word "nel" comes as the variable "line" from the prueba.txt with help of the progtesis.awk command I am writing.
I am running this in the terminal:
awk -f progtesis.awk prueba.txt
And progtesis sees as follow:
line=$(awk -f '/nel/{print NR}' FILENAME}
echo "$line"
Any suggestions?
No need of an external awk script :
line=$(awk '/nel/ {print NR; exit}' "${filename}")
echo "${line}"
will display number of first line matching /nel/.
Otherwise if progtesis.awk contains
/nel/ {print NR; exit}
The bash commands can be
line=$(awk progtesis.awk "${filename}")
echo "${line}"

Remove lines containing space in unix

Below is my comma separated input.txt file, i want to read the columns and write the lines in to the output.txt when any 1 column has a space.
Content of input.txt:
1,Hello,world
2,worl d,hell o
3,h e l l o, world
4,Hello_Hello,World#c#
5,Hello,W orld
Content of output.txt:
1,Hello,world
4,Hello_Hello,World#c#
is't possible to achieve using awk? Please help!
A simple way to filter out lines with spaces is using inverted matching with grep:
grep -v ' ' input.txt
If you must use awk:
awk '!/ /' input.txt
Or perl:
perl -ne '/ / || print' input.txt
Or pure bash:
while read line; do [[ $line == *' '* ]] || echo $line; done < input.txt
# or
while read line; do [[ $line =~ ' ' ]] || echo $line; done < input.txt
UPDATE
To check if let's say field 2 contains space, you could use awk like this:
awk -F, '$2 !~ / /' input.txt
To check if let's say field 2 OR field 3 contains space:
awk -F, '!($2 ~ / / || $3 ~ / /)' input.txt
For your follow-up question in comments
To do the same using sed, I only know these awkward solutions:
# remove lines if 2nd field contains space
sed -e '/^[^,]*,[^,]* /d' input.txt
# remove lines if 2nd or 3rd field contains space
sed -e '/^[^,]*,[^,]* /d' -e '/^[^,]*,[^,]*,[^,]* /d' input.txt
For your 2nd follow-up question in comments
To disregard leading spaces in the 2nd or 3rd fields:
awk -F', *' '!($2 ~ / / || $3 ~ / /)' input.txt
# or perhaps what you really want is this:
awk -F', *' -v OFS=, '!($2 ~ / / || $3 ~ / /) { print $1, $2, $3 }' input.txt
This can also be done easily with sed
sed '/ /d' input.txt
try this one-liner
awk 'NF==1' file
as #jwpat7 pointed out, it won't give correct output if the line has only leading space, then this line, with regex should do, but it has been already posted in janos's answer.
awk '!/ /' file
or
awk -F' *' 'NF==1'
Pure bash for the fun of it...
#!/bin/bash
while read line
do
if [[ ! $line =~ " " ]]
then
echo $line
fi
done < input.txt
columnWithSpace=2
ColumnBef=$(( ${columnWithSpace} - 1 ))
sed '/\([^,]*,\)\{${ColumnBef\}[^ ,]* [^,]*,/ d'
if you know the column directly (by example the 3):
sed '/\([^,]*,\)\{2}[^ ,]* [^,]*,/ d'
If you can trust the input to always have no more than three fields, simply finding a space somewhere after a comma is sufficient.
grep ',.* ' input.txt
If there can be (or usually are) more fields, you can pull that off with grep -E and a suitable ERE, but you are fast approaching the point at which the equivalent Awk solution will be more readable and maintainable.

Replacing a line with two new lines

I have a file named abc.csv which contains these 6 lines:
xxx,one
yyy,two
zzz,all
aaa,one
bbb,two
ccc,all
Now whenever all comes in a line that line should be replaced by both one and two, that is:
xxx,one
yyy,two
zzz,one
zzz,two
aaa,one
bbb,two
ccc,one
ccc,two
Can someone help how to do this?
$ awk -F, -v OFS=, '/all/ { print $1, "one"; print $1, "two"; next }1' foo.input
xxx,one
yyy,two
zzz,one
zzz,two
aaa,one
bbb,two
ccc,one
ccc,two
If you want to stick to a shell-only solution:
while read line; do
if [[ "${line}" = *all* ]]; then
echo "${line%,*},one"
echo "${line%,*},two"
else
echo "${line}"
fi
done < foo.input
In sed:
sed '/,all$/{ s/,all$/,one/p; s/,one$/,two/; }'
When the line matches ,all at the end, first substitute all with one and print it; then substitute one with two and let the automatic print do the printing.

Resources