awk add string to each line except last blank line - linux

I have file with blank line at the end. I need to add suffix to each line except last blank line.
I use:
awk '$0=$0"suffix"' | sed 's/^suffix$//'
But maybe it can be done without sed?
UPDATE:
I want to skip all lines which contain only '\n' symbol.
EXAMPLE:
I have file test.tsv:
a\tb\t1\n
\t\t\n
c\td\t2\n
\n
I run cat test.tsv | awk '$0=$0"\t2"' | sed 's/^\t2$//':
a\tb\t1\t2\n
\t\t\t2\n
c\td\t2\t2\n
\n

It sounds like this is what you need:
awk 'NR>1{print prev "suffix"} {prev=$0} END{ if (NR) print prev (prev == "" ? "" : "suffix") }' file
The test for NR in the END is to avoid printing a blank line given an empty input file. It's untested, of course, since you didn't provide any sample input/output in your question.
To treat all empty lines the same:
awk '{print $0 (/./ ? "suffix" : "")}' file

#try:
awk 'NF{print $0 "suffix"}' Input_file

this will skip all blank lines
awk 'NF{$0=$0 "suffix"}1' file
to only skip the last line if blank
awk 'NR>1{print p "suffix"} {p=$0} END{print p (NF?"suffix":"") }' file

If perl is okay:
$ cat ip.txt
a b 1
c d 2
$ perl -lpe '$_ .= "\t 2" if !(eof && /^$/)' ip.txt
a b 1 2
2
c d 2 2
$ # no blank line for empty file as well
$ printf '' | perl -lpe '$_ .= "\t 2" if !(eof && /^$/)'
$
-l strips newline from input, adds back when line is printed at end of code due to -p option
eof to check end of file
/^$/ blank line
$_ .= "\t 2" append to input line

Try this -
$ cat f ###Blank line only in the end of file
-11.2
hello
$ awk '{print (/./?$0"suffix":"")}' f
-11.2suffix
hellosuffix
$
OR
$ cat f ####blank line in middle and end of file
-11.2
hello
$ awk -v val=$(wc -l < f) '{print (/./ || NR!=val?$0"suffix":"")}' f
-11.2suffix
suffix
hellosuffix
$

Related

How to get 1st field of a file only when 2nd field matches a string?

How to get 1st field of a file only when 2nd field matches a given string?
#cat temp.txt
Ankit pass
amit pass
aman fail
abhay pass
asha fail
ashu fail
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'*
gives no output
Another syntax with awk:
awk '$2 ~ /^faild$/{print $1}' input_file
A deleted 'cat' command.
^ start string
$ end string
It's the best way to match patten.
Either:
Your fields are not tab-separated or
You have blanks at the end of the relevant lines or
You have DOS line-endings and so there are CRs at the end of every
line and so also at the end of every $2 in every line (see
Why does my tool output overwrite itself and how do I fix it?)
With GNU cat you can run cat -Tev temp.txt to see tabs (^I), CRs (^M) and line endings ($).
Your code seems to work fine when I remove the * at the end
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'
The other thing to check is if your file is using tab or spaces. My copy/paste of your data file copied spaces, so I needed this line:
cat temp.txt | awk '$2 == "fail" { print $1 }'
The other way of doing this is with grep:
cat temp.txt | grep fail$ | awk '{ print $1 }'

Count number of ';' in column

I use the following command to count number of ; in a first line in a file:
awk -F';' '(NR==1){print NF;}' $filename
I would like to do same with all lines in the same file. That is to say, count number of ; on all line in file.
What I have :
$ awk -F';' '(NR==1){print NF;}' $filename
11
What I would like to have :
11
11
11
11
11
11
Straight forward method to count ; per line should be:
awk '{print gsub(/;/,"&")}' Input_file
To remove empty lines try:
awk 'NF{print gsub(/;/,"&")}' Input_file
To do this in OP's way reduce 1 from value of NF:
awk -F';' '{print (NF-1)}' Input_file
OR
awk -F';' 'NF{print (NF-1)}' Input_file
I'd say you can solve your problem with the following:
awk -F';' '{if (NF) {a += NF-1;}} END {print a}' test.txt
You want to keep a running count of all the occurrences made (variable a).
As NF will return the number of fields, which is one more than the number of separators, you'll need to subtract 1 for each line. This is the NF-1 part.
However, you don't want to count "-1" for the lines in which there is no separator at all. To skip those you need the if (NF) part.
Here's a (perhaps contrived) example:
$ cat test.txt
;;
; ; ; ;;
; asd ;;a
a ; ;
$ awk -F';' '{if (NF) {a += NF-1;}} END {print a}' test.txt
12
Notice the empty line at the end (to test against the "no separator" case).
A different approach using tr and wc:
$ tr -cd ';' < file | wc -c
42
Your code returns a number one more than the number of semicolons; NF is the number of fields you get from splitting on a semicolon (so for example, if there is one semicolon, the line is split in two).
If you want to add this number from each line, that's easy;
awk -F ';' '{ sum += NF-1 } END { print sum }' "$filename"
If the number of fields is consistent, you could also just count the number of lines and multiply;
awk -F ':' 'END { print NR * (NF-1) }' "$filename"
But that's obviously wrong if you can't guarantee that all lines contain exactly the same number of fields.

Extracting whole line if a character is present at certain position

I am a java programmer and a newbie to shell scripting, I have a daunting task to parse multi gigabyte logs and look for lines where '1'(just 1 no qoutes) is present at 446th position of the line, I am able to verify that character 1 is present by running this cat *.log | cut -c 446-446 | sort | uniq -c but I am not able to extract the lines and print them in an output file.
awk '{if (substr($0,446,1) == "1") {print $0}}' file
is the basics.
You can use FILENAME in the print feature to add the filename to the output, so then you could do
awk '{if (substr($0,446,1) == "1") {print FILENAME ":" $0}}' file1 file2 ...
IHTH
Try adding grep to the pipe:
grep '^.\{445\}1.*$'
You can use an awk command for that:
awk 'substr($0, 446, 1) == "1"' file.log
substr function will get 1 character at position 446 and == "1" will ensure that character is 1.
Another in awk. To make a more sane example, we print lines where the third char is 3:
$ cat file
123 # this
456 # not this
$ awk -F '' '$3==3' file
123 # this
based on that example but untested:
$ awk -F '' '$446==1' file

tr command in awk to change the column values

I am using in my shell script TR command in awk to mask the data. Below example file affects only first line of the my file when i used tr command in awk. when i use the same in while loop and called the awk command inside of it then its working fine but it taking very long time to get completed. Now my requirement i want to mask many columns[example :$1, $5, $9] in the same file(file.txt) and this should affect the whole file not first line and i want to achieve this as much as faster to mask the data. Please advise
cat file.txt
========
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek,lskjsjshsh
abcbchs,degehek
abcbchs,degehek,lskjsjshsh
OUTPUT
awk -F"," -v OFS="," '{ "echo \""$1"\" | tr \"a-c\" \"e-f\" | tr \"0-5\" \"6-9\"" | getline $1 }7' file.txt
effffhs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek,lskjsjshsh
abcbchs,degehek
abcbchs,degehek,lskjsjshsh
Expected output
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek,lskjsjshsh
effffhs,degehek
effffhs,degehek,lskjsjshsh
The code you found runs an external shell command pipeline on each input line. Like you discovered, that's an awfully inefficient way to do what you are asking. Awk isn't really an ideal choice for this task at all. Maybe try Perl.
perl -F, -lane '$F[$_] =~ tr/a-c/e-f/ =~ tr/0-5/6-9/ for (0, 4, 8); print join(",", #F)' file
The -F, option is like with Awk, but Perl doesn't automatically split the input line. With -a it does, splitting into an array named #F, and with -n it loops over all input lines. The -l is a convenience to remove newlines from each input line and adding one back when you print.
Notice how the columns are numbered from zero, not one, like in Awk; so the indices in the for loop access the first, fifth, and ninth elements of #F.
You forgot to close() the command after every invocation. Here's the correct way to write it:
$ cat tst.awk
BEGIN { FS=OFS="," }
{
cmd="echo '" $1 "' | tr 'a-c' 'e-f' | tr '0-5' '6-9'"
$1 = ( (cmd | getline line) > 0 ? line : $1 )
close(cmd)
print
}
$ awk -f tst.awk file
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek,lskjsjshsh
effffhs,degehek
effffhs,degehek,lskjsjshsh
You also didn't protect yourself from getline failures, hence the extra complexity around the getline call, see http://awk.info/?tip/getline.
Given your comments, this shows how to modify multiple fields (1, 3, and 5 in this case) simultaneously:
$ cat tst.awk
BEGIN { FS=OFS="," }
{
cmd = "echo '" $0 "' | tr 'a-c' 'e-f' | tr '0-5' '6-9'"
new = ( (cmd | getline line) > 0 ? line : $1 )
close(cmd)
split(new,tmp)
for (i in tmp) {
if (i ~ /^(1|3|5)$/) {
$i = tmp[i]
}
}
print
}
$ cat file
abc,abc,abc,abc,abc
abc,abc,abc,abc,abc,abc,abc
abc,abc,abc,abc,abc,abc
abc,abc,abc,abc
$ awk -f tst.awk file
eff,abc,eff,abc,eff
eff,abc,eff,abc,eff,abc,abc
eff,abc,eff,abc,eff,abc
eff,abc,eff,abc
To handle quotes in the input data:
$ cat tst.awk
BEGIN { FS=OFS="," }
{
gsub(/'/,SUBSEP)
cmd = "echo '" $0 "' | tr 'a-c' 'e-f' | tr '0-5' '6-9'"
new = ( (cmd | getline line) > 0 ? line : $1 )
close(cmd)
split(new,tmp)
for (i in tmp) {
if (i ~ /^(1|3|5)$/) {
$i = tmp[i]
}
}
gsub(SUBSEP,"'")
print
}
$ cat file
a'c,abc,a"c,abc,abc
abc,a'c,abc,a"c,abc,abc,abc
abc,abc,abc,abc,abc,abc
abc,abc,abc,abc
$ awk -f tst.awk file
e'f,abc,e"f,abc,eff
eff,a'c,eff,a"c,eff,abc,abc
eff,abc,eff,abc,eff,abc
eff,abc,eff,abc
If you don't have any particular control char that's guaranteed not to appear in your input, you can create a non-existent string to use instead of SUBSEP above by using the technique described at the end of https://stackoverflow.com/a/29237745/1745001

Linux file splitting

I am using sed to split a file in two
I have a file that has a custom separator "/-sep-/" and I want to split the file where the separator is
currently I have:
sed -n '1,/-sep-/ {p}' /export/data.temp > /export/data.sql.md5
sed -n '/-sep-/,$ {p}' /export/data.temp > /export/data.sql
but the file 1 contains /-sep-/ at the end and the file two begins with /-sep-/
how can I handle this?
note that on the file one I should remove a break line and the /-sep-/ and on the file 2 remove the /-sep-/ and a break line :S
Reverse it: tell it what to not print instead.
sed '/-sep-/Q' /export/data.temp > /export/data.sql.md5
sed '1,/-sep-/d'/export/data.temp > /export/data.sql
(Regarding that break line, I did not understand it. A sample input would probably help.)
By the way, your original code needs only minor addition to do what you want:
sed -n '1,/-sep-/{/-sep-/!p}' /export/data.temp > /export/data.sql.md5
sed -n '/-sep-/,${/-sep-/!p}' /export/data.temp > /export/data.sql
$ cat >testfile
a
a
a
a
/-sep-/
b
b
b
b
and then
$ csplit testfile '/-sep-/' '//'
8
8
8
$ head -n 999 xx*
==> xx00 <==
a
a
a
a
==> xx01 <==
/-sep-/
==> xx02 <==
b
b
b
b
sed -n '/-sep-/q; p' /export/data.temp > /export/data.sql.md5
sed -n '/-sep-/,$ {p}' /export/data.temp | sed '1d' > /export/data.sql
Might be easier to do in one pass with awk:
awk -v out=/export/data.sql.md5 -v f2=/export/data.sql '
/-sep-/ { out=f2; next}
{ print > out }
' /exourt/data.temp

Resources