awk - Delimiter as combination of number and

awk - Delimiter as combination of number and | (pipe) not working - linux

I have an input file with some records as below,
input.txt
Record|111|aaa|aaa|11|1-bb|bb|1111|cccc|cccc
Record|11|1-aaa|aaa|111|bb|bb|1111|cccc|cccc
Record|111|aaa|aaa|11|1-bb|bb|1111|cccc|cccc
Record|111|aaa|aaa|111|bb|bb|11|1-cccc|cccc
Record|22|aaa|aaa|222|bb|bb|2222|cccc|cccc|11|1-dddd|dd
Record|333|aaa|aaa|11|1-bb|bb|333|cccc|cccc
Record|11|1-aaa|aaa|102|bb|bb|1111|cccc|cccc
i want to use a delimiter |11| in awk and get the second field, i tried the most common way as below,
Command
awk -F'|11|' '{print $2}' input.txt
Output
1|aaa|aaa|
|1-aaa|aaa|
1|aaa|aaa|
1|aaa|aaa|
|1-dddd|dd
|1-bb|bb|333|cccc|cccc
|1-aaa|aaa|102|bb|bb|
Expected Output
1-bb|bb|1111|cccc|cccc
1-aaa|aaa|111|bb|bb|1111|cccc|cccc
1-bb|bb|1111|cccc|cccc
1-cccc|cccc
1-dddd|dd
1-bb|bb|333|cccc|cccc
1-aaa|aaa|102|bb|bb|1111|cccc|cccc
Basically its not considering the last | of the delimiter |11|, instead it is taking a delimiter |11.
i tried all below, none gave me the expected output,
awk -F"|11|" '{print $2}' input.txt # gives wrong output
awk -F\|11\| '{print $2}' input.txt # gives Wrong output
awk -v FS='|11|' '{print $2}' input.txt # gives Wrong output
Finally i had to write a for loop inside awk with delimiter as | to make it work, i would like to know why the simple solution doesn't work

Argument to -F is a regex.
awk -F "\\\|11\\\|" '{print $2}' file
or
awk -F '\\|11\\|' '{print $2}' file
or (Thanks to EdMorton)
awk -F'[|]11[|]' '{print $2}' input.txt
Output:
1-bb|bb|1111|cccc|cccc
1-aaa|aaa|111|bb|bb|1111|cccc|cccc
1-bb|bb|1111|cccc|cccc
1-cccc|cccc
1-dddd|dd
1-bb|bb|333|cccc|cccc
1-aaa|aaa|102|bb|bb|1111|cccc|cccc

Cyrus explained why your delimiter does not work as expected (a combination of regular expression quoting issues).
With sed, removing everything up to and including the |11| on each line:
$ sed 's/.*|11|//' input.txt
1-bb|bb|1111|cccc|cccc
1-aaa|aaa|111|bb|bb|1111|cccc|cccc
1-bb|bb|1111|cccc|cccc
1-cccc|cccc
1-dddd|dd
1-bb|bb|333|cccc|cccc
1-aaa|aaa|102|bb|bb|1111|cccc|cccc

Related

Exact Match of Word using grep

I have data in file.txt as follows
BRAD CHICAGO|NORTH SAMSONCHESTER|
CORA|NEW ERICA|
CAMP LOGAN|KINGBERG|
NCHICAGOS|ESTING|
CHICAGO|MANKING|
OCREAN|CHICAGO|
CHICAGO PIT|BULL|
CHICAGO |NEWYORK|
Question 1:
I want to search for the exact match for word "CHICAGO" in first column and print second column.
Output should look like:
MANKING
NEWYORK
Question 2:
If multiple matches found then can we limit the out to only one ? so that the output will be only MANKING or NEWYORK
I tried below
grep -E -i "^CHICAGO" file.txt | awk -F '|' '{print $2}'
but i am getting below output
MANKING
BULL
NEWYORK
Expected output for Question 1:
MANKING
NEWYORK
Expected output for Question 2:
MANKING

Here are some more ways:
Using grep and cut:
grep "^CHICAGO|" file.txt | cut -d'|' -f2
Using awk
awk -F"|" '/^CHICAGO\|/{print $2}' file.txt
For question 2 simply pipe it to head, i.e:
grep "^CHICAGO|" file.txt | cut -d'|' -f2 | head -n1
Similarly for the awk command.

how about an awk solution?
awk -F'|' '$1 == "CHICAGO"{print $2}' file
to only print one output, exit once you have a match, i.e.
awk -F'|' '$1 == "CHICAGO"{print $2; exit}' file
Making that more generic, you can pass in a variable, i.e.
awk -v trgt="CHICAGO" -F'|' '{targ="^" trgt " *$"; if ( $1 ~ targ ) {print $2}}' file
The " *$" regex limits the match to zero or more trailing spaces without any extra chars at the end of the target string. So this will meet your criteria to match skip matching CHICAGO PIT|BULL.
AND this can be further reduced to
awk -v trgt="CHICAGO" -F'|' '{ if ( $1 ~ "^" trgt " *$" ) {print $2}}' file
constructing the regex "in-place" in with the comparison.
So you could use more verbose variable names to "describe" how the regex is being constructed from the input and the regex "wrappers" (as in the 3rd example) OR, you can just combine the input variable with the regex syntax in place. That is just a matter of taste or documentation conventions.
You might want to include a comment to explain you are constructing a regex test that would look like the $1 ~ /^CHICAGO *$/.
IHTH

replace sed command text inline

I have this file
file.txt
unknown#mail.com||unknown#mail.com||
unknown#mail2.com||unknown#mail2.com||
unknown#mail3.com||unknown#mail3.com||
unknown#mail4.com||unknown#mail4.com||
unknownpass
unknownpass2
unknownpass3
unknownpass4
How can I use the sed command to obtain this:
unknown#mail.com|unknownpass|unknown#mail.com|unknownpass|
unknown#mail2.com|unknownpass2|unknown#mail2.com|unknownpass2|
unknown#mail3.com|unknownpass3|unknown#mail3.com|unknownpass3|
unknown#mail4.com|unknownpass4|unknown#mail4.com|unknownpass4|

This might work for you (GNU sed):
sed ':a;N;/\n[^|\n]*$/!ba;s/||\([^|]*\)||\(\n.*\)*\n\(.*\)$/|\3|\1|\3|\2/;P;D' file
Slurp the first part of the file into pattern space and one of the replacements, substitute, print and delete the first line and then repeat.

Well, this does use sed anyway:
{ sed -n 5,\$p file.txt; sed 4q file.txt; } | awk 'NR<5{a[NR]=$0; next}
{$2=a[NR-4]; $4=a[NR-4]} 1' FS=\| OFS=\|

awk to the rescue!
awk 'BEGIN {FS=OFS="|"}
NR==FNR {if(NF==1) a[++c]=$1; next}
NF>4 {$2=a[FNR]; $4=$2; print}' file{,}
a two pass algorithm, caches the entries in the first round and inserts them into the empty fields, assumes the number of items match.
Here is another approach with one pass, powered by tac wrapped awk
tac file |
awk 'BEGIN {FS=OFS="|"}
NF==1 {a[++c]=$1}
NF>4 {$2=a[c--]; $4=$2; print}' |
tac

I would combine the related lines with paste and reshuffle the elements with awk (I assume the related lines are exactly half a file away):
n=$(wc -l < file.txt)
paste -d'|' <(head -n $((n/2)) file.txt) <(tail -n $((n/2)) file.txt) |
awk '{ print $1, $6, $3, $6, "" }' FS='|' OFS='|'
Output:
unknown#mail.com|unknownpass|unknown#mail.com|unknownpass|
unknown#mail2.com|unknownpass2|unknown#mail2.com|unknownpass2|
unknown#mail3.com|unknownpass3|unknown#mail3.com|unknownpass3|
unknown#mail4.com|unknownpass4|unknown#mail4.com|unknownpass4|

Grep entire line after word

What would be the grep command to get an everything in the line after a match?
For example on a file path:
/home/usr/we/This/is/the/file/path
and I want the output to be
/we/This/is/the/File/Path
Matching the /we as the regex.

grep -o does what you want.
grep -o '/we.*'

OP like to use we as a trigger. Using awk
awk -F/ '{for (i=1;i<=NF;i++) {if ($i~/we/) f=1;if (f) printf "/%s",$i}print ""}' file
/we/This/is/the/file/path
Using gnu awk
awk '{print gensub(/.*(\/we)/,"\\1","g")}' file
/we/This/is/the/file/path

YourInput | sed 's|/home/usr\(/we.*\)|\1|'
assuming it's always (and only) starting with /home/usr
else
YourInput | sed -n 's|^.*\(/we.*\)||p'
return only line(s) having /we and remove text before /we

awk print value without quote sign

I have this value
option 'staticip' '5.5.5.1'
I want to print only 5.5.5.1 without quote sign. I have use
cat /etc/filename | grep staticip | awk '{print $3}'
but the result come with '5.5.5.1'

Or, you can use tr to remove the offending characters:
cat /etc/filename | grep staticip | awk '{print $3}' | tr -d \'

You can use awk's gsub() function to change the quotes to nothing.
awk '{gsub(/'"'"'/, "", $3); print $3}'
Note this is really gsub(/'/, "", $3). The ugliness comes from the need to glue quotes together.

awk '$2=="staticip" && $0=$4' FS="'"
Result
5.5.5.1

To remove the ' from the awk output you can use
sed "s/^'//;s/'$//"
This command removes the ' only at the beginning and the end of the output line and is not so heavy as to use awk and not so general if using tr.
awk is much bgiger in memory and tr removes all ' from the output what is not always intended.

You could use awks substr function or pipe that to the cut command. I leave you to read the man page for awk substr.

Multisplitting in AWK

I would like to execute 2 splits using AWK (i have 2 fields separator), the String of data i'm working on would look like something like so:
data;digit&int&string&int&digit;data;digit&int&string&int&digit
As you can see the outer field separator is a semicolon, and the nested one is an ampersand.
What i'm doing with awk is (suppose that the String would be in a variable named test)
echo ${test} | awk '{FS=";"} {print $2}' | awk '{FS="&"} {print $3}'
This should catch the "String" word, but for some reason this is not working.
It seems like the second pipe its not being applied, as i see only the result of the first awk function
Any advice?

use awk arrays
echo $test | awk -F';' '{split($2, arr, "&"); print(arr[3])}'

The other answers give working solutions, but they don't really explain the problem.
The problem is that setting FS inside a regular { ... } block the awk script won't cause $1, $2, etc. to be re-calculated for the current line; so FS will be set for any later lines, but the very first line will already have been split by whitespace. To set FS before running the script, you can use a BEGIN block (which is run before the first line); or, you can use the -F command-line option.
Making either of those changes will fix your command:
echo "$test" | awk 'BEGIN{FS=";"} {print $2}' | awk 'BEGIN{FS="&"} {print $3}'
echo "$test" | awk -F';' '{print $2}' | awk -F'&' '{print $3}'
(I also took the liberty of wrapping $test in double-quotes, since unquoted parameter-expansions are a recipe for trouble. With your value of $test it would have been fine, but I make it a habit to always use double-quotes, just in case.)

Try that :
echo "$test" | awk -F'[;&]' '{print $4}'
I specify a multiple separator in -F'[;&]'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

awk - Delimiter as combination of number and | (pipe) not working - linux

Related

Exact Match of Word using grep

replace sed command text inline

Grep entire line after word

awk print value without quote sign

Multisplitting in AWK

Categories

Resources