Using Grep or AWK command - linux

I have file with this text information:
http://=
en.domain.com/registration.html#/?doitoken=1D7f1ad404-f84b-4a3b-8931=
-4f40b619730e
http://=
en.domain.com/registration.html#/?doitoken=5D8172f6e6-240f-42e6-8512=
-6d7f6bd61c2d
http://=
en.domain.com/registration.html#/?doitoken=8D8172f6e6-240f-42e6-8512=
-6d7f6bd61c2d
How i can do this using grep or awk command in linux bash:
http://en.domain.com/registration.html#/?doitoken=1D7f1ad404-f84b-4a3b-8931-4f40b619730e
http://en.domain.com/registration.html#/?doitoken=5D8172f6e6-240f-42e6-8512-6d7f6bd61c2d
http://en.domain.com/registration.html#/?doitoken=8D8172f6e6-240f-42e6-8512-6d7f6bd61c2d
Thanks for your answers!

awk 'BEGIN{FS="=\n"; RS=""; OFS=""} {print $1, $2, $3}' input_file
You could also get rid of OFS="" and remove the ,s in the print statement

Save the program as pr.awk, and run awk -f pr.awk input.dat
NF {
n++
sub(/=$/, "")
ans = ans $0
}
n==3 { # flush
print ans
ans = ""; n = 0
}

$ awk '/=$/{sub(/=$/,""); printf "%s",$0;next} /./{print}' file
http://en.domain.com/registration.html#/?doitoken=1D7f1ad404-f84b-4a3b-8931-4f40b619730e
http://en.domain.com/registration.html#/?doitoken=5D8172f6e6-240f-42e6-8512-6d7f6bd61c2d
http://en.domain.com/registration.html#/?doitoken=8D8172f6e6-240f-42e6-8512-6d7f6bd61c2d
How it works:
/=$/{sub(/=$/,""); printf "%s",$0;next}
If the line ends with =, then remove the trailing =, print the result (without a trailing newline) and jump to the next line.
/./{print}
If we get here, then this line does not end with = and we just print it normally (with the trailing newline).

Related

Regex issue for match a column value

I wrote a script to extract a column value from a file which doesn't matches the pattern defined in col metadata file.
But it is not returning the right output. Can anyone point out the issue here? I was trying to match string with double quotes .quotes also needs to be matched.
Code:
`awk -F'|' -v n="$col_pos" -v m="$col_patt" 'NR!=1 && $n !~ "^" m "$" {
printf "%s:%s:%s\n", FILENAME, FNR, $0 > "/dev/stderr"
count++
}
END {print count}' $input_file`
run output :-
++ awk '-F|' -v n=4 -v 'm="[a-z]+#gmail.com"' 'NR!=1 && $n !~ "^" m "$" {
printf "%s:%s:%s\n", FILENAME, FNR, $0 > "/dev/stderr"
count++
}
END {print count}' /test/data/infa_shared/dev/SrcFiles/datawarehouse/poc/BNX.csv
10,22,"00AF","abc#gmail.com",197,10,1/1/2020 12:06:10.260 PM,"BNX","Hard b","50","Us",1,"25" -- this line is not expected in output as it matches the email pattern "[a-z]+#gmail.com". pattern is extracted from the below file
Input file for pattern extraction file_col_metadata
FILE_ID~col_POS~COL_START_POS~COL_END_POS~datatype~delimited_ind~col_format~columnlength
5~4~~~char~Y~"[a-z]+#gmail.com"~100
If you replace awk -F'|' ... with awk -F',' ... it will work.

How to get 1st field of a file only when 2nd field matches a string?

How to get 1st field of a file only when 2nd field matches a given string?
#cat temp.txt
Ankit pass
amit pass
aman fail
abhay pass
asha fail
ashu fail
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'*
gives no output
Another syntax with awk:
awk '$2 ~ /^faild$/{print $1}' input_file
A deleted 'cat' command.
^ start string
$ end string
It's the best way to match patten.
Either:
Your fields are not tab-separated or
You have blanks at the end of the relevant lines or
You have DOS line-endings and so there are CRs at the end of every
line and so also at the end of every $2 in every line (see
Why does my tool output overwrite itself and how do I fix it?)
With GNU cat you can run cat -Tev temp.txt to see tabs (^I), CRs (^M) and line endings ($).
Your code seems to work fine when I remove the * at the end
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'
The other thing to check is if your file is using tab or spaces. My copy/paste of your data file copied spaces, so I needed this line:
cat temp.txt | awk '$2 == "fail" { print $1 }'
The other way of doing this is with grep:
cat temp.txt | grep fail$ | awk '{ print $1 }'

Replaceing multiple command calls

Able to trim and transpose the below data with sed, but it takes considerable time. Hope it would be better with AWK. Welcome any suggestions on this
Input Sample Data:
[INX_8_60L ] :9:Y
[INX_8_60L ] :9:N
[INX_8_60L ] :9:Y
[INX_8_60Z ] :9:Y
[INX_8_60Z ] :9:Y
Required Output:
INX?_8_60L¦INX?_8_60L¦INX?_8_60L¦INX?_8_60Z¦INX?_8_60Z
Just use awk, e.g.
awk -v n=0 '{printf (n?"!%s":"%s", substr ($0,2,match($0,/[ \t]+/)-2)); n=1} END {print ""}' file
Which will be orders of magnitude faster. It just picks out the (e.g. "INX_8_60L") substring using substring and match. n is simply used as a false/true (0/1) flag to prevent outputting a "!" before the first string.
Example Use/Output
With your data in file you would get:
$ awk -v n=0 '{printf (n?"!%s":"%s", substr ($0,2,match($0,/[ \t]+/)-2)); n=1} END {print ""}' file
INX_8_60L!INX_8_60L!INX_8_60L!INX_8_60Z!INX_8_60Z
Which appears to be what you are after. (Note: I'm not sure what your separator character is, so just change above as needed) If not, let me know and I'm happy to help further.
Edit Per-Changes
Including the '?' isn't difficult, and I just copied the character, so you would now have:
awk -v n=0 '{s=substr($0,2,match($0,/[ \t]+/)-2); sub(/_/,"?_",s); printf n?"¦%s":"%s", s; n=1}
END {print ""}' file
Example Output
INX?_8_60L¦INX?_8_60L¦INX?_8_60L¦INX?_8_60Z¦INX?_8_60Z
And to simplify, just operating on the first field as in #JamesBrown's answer, that would reduce to:
awk -v n=0 '{s=substr($1,2); sub(/_/,"?_",s); printf n?"¦%s":"%s", s; n=1} END {print ""}' file
Let me know if that needs more changes.
Don't start so many sed commands, separate the sed operations with semicolon instead.
Try to process the data in a single job and avoid regex. Below reading with substr() static sized first block and insterting ? while outputing.
$ awk '{
b=b (b==""?"":";") substr($1,2,3) "?" substr($1,5)
}
END {
print b
}' file
Output:
INX?_8_60L;INX?_8_60L;INX?_8_60L;INX?_8_60Z;INX?_8_60Z
If the fields are not that static in size:
$ awk '
BEGIN {
FS="[[_ ]" # split field with regex
}
{
printf "%s%s?_%s_%s",(i++?";":""), $2,$3,$4 # output semicolons and fields
}
END {
print ""
}' file
Performance of solutions for 20 M records:
Former:
real 0m8.017s
user 0m7.856s
sys 0m0.160s
Latter:
real 0m24.731s
user 0m24.620s
sys 0m0.112s
sed can be very fast when used gingerly, so for simplicity and speed you might wish to consider:
sed -e 's/ .*//' -e 's/\[INX/INX?/' | tr '\n' '|' | sed -e '$s/|$//'
The second call to sed is there to satisfy the requirement that there is no trailing |.
Another solution using GNU awk:
awk -F'[[ ]+' '
{printf "%s%s",(o?"¦":""),gensub(/INX/,"INX?",1,$2);o=1}
END{print ""}
' file
The field separator is set (with -F option) such that it matches the wanted parameter.
The main statement is to print the modified parameter with the ? character.
The variable o allows to keep track of the delimeter ¦.

awk add string to each line except last blank line

I have file with blank line at the end. I need to add suffix to each line except last blank line.
I use:
awk '$0=$0"suffix"' | sed 's/^suffix$//'
But maybe it can be done without sed?
UPDATE:
I want to skip all lines which contain only '\n' symbol.
EXAMPLE:
I have file test.tsv:
a\tb\t1\n
\t\t\n
c\td\t2\n
\n
I run cat test.tsv | awk '$0=$0"\t2"' | sed 's/^\t2$//':
a\tb\t1\t2\n
\t\t\t2\n
c\td\t2\t2\n
\n
It sounds like this is what you need:
awk 'NR>1{print prev "suffix"} {prev=$0} END{ if (NR) print prev (prev == "" ? "" : "suffix") }' file
The test for NR in the END is to avoid printing a blank line given an empty input file. It's untested, of course, since you didn't provide any sample input/output in your question.
To treat all empty lines the same:
awk '{print $0 (/./ ? "suffix" : "")}' file
#try:
awk 'NF{print $0 "suffix"}' Input_file
this will skip all blank lines
awk 'NF{$0=$0 "suffix"}1' file
to only skip the last line if blank
awk 'NR>1{print p "suffix"} {p=$0} END{print p (NF?"suffix":"") }' file
If perl is okay:
$ cat ip.txt
a b 1
c d 2
$ perl -lpe '$_ .= "\t 2" if !(eof && /^$/)' ip.txt
a b 1 2
2
c d 2 2
$ # no blank line for empty file as well
$ printf '' | perl -lpe '$_ .= "\t 2" if !(eof && /^$/)'
$
-l strips newline from input, adds back when line is printed at end of code due to -p option
eof to check end of file
/^$/ blank line
$_ .= "\t 2" append to input line
Try this -
$ cat f ###Blank line only in the end of file
-11.2
hello
$ awk '{print (/./?$0"suffix":"")}' f
-11.2suffix
hellosuffix
$
OR
$ cat f ####blank line in middle and end of file
-11.2
hello
$ awk -v val=$(wc -l < f) '{print (/./ || NR!=val?$0"suffix":"")}' f
-11.2suffix
suffix
hellosuffix
$

Replacing a line with two new lines

I have a file named abc.csv which contains these 6 lines:
xxx,one
yyy,two
zzz,all
aaa,one
bbb,two
ccc,all
Now whenever all comes in a line that line should be replaced by both one and two, that is:
xxx,one
yyy,two
zzz,one
zzz,two
aaa,one
bbb,two
ccc,one
ccc,two
Can someone help how to do this?
$ awk -F, -v OFS=, '/all/ { print $1, "one"; print $1, "two"; next }1' foo.input
xxx,one
yyy,two
zzz,one
zzz,two
aaa,one
bbb,two
ccc,one
ccc,two
If you want to stick to a shell-only solution:
while read line; do
if [[ "${line}" = *all* ]]; then
echo "${line%,*},one"
echo "${line%,*},two"
else
echo "${line}"
fi
done < foo.input
In sed:
sed '/,all$/{ s/,all$/,one/p; s/,one$/,two/; }'
When the line matches ,all at the end, first substitute all with one and print it; then substitute one with two and let the automatic print do the printing.

Resources