Multisplitting in AWK - linux

I would like to execute 2 splits using AWK (i have 2 fields separator), the String of data i'm working on would look like something like so:
data;digit&int&string&int&digit;data;digit&int&string&int&digit
As you can see the outer field separator is a semicolon, and the nested one is an ampersand.
What i'm doing with awk is (suppose that the String would be in a variable named test)
echo ${test} | awk '{FS=";"} {print $2}' | awk '{FS="&"} {print $3}'
This should catch the "String" word, but for some reason this is not working.
It seems like the second pipe its not being applied, as i see only the result of the first awk function
Any advice?

use awk arrays
echo $test | awk -F';' '{split($2, arr, "&"); print(arr[3])}'

The other answers give working solutions, but they don't really explain the problem.
The problem is that setting FS inside a regular { ... } block the awk script won't cause $1, $2, etc. to be re-calculated for the current line; so FS will be set for any later lines, but the very first line will already have been split by whitespace. To set FS before running the script, you can use a BEGIN block (which is run before the first line); or, you can use the -F command-line option.
Making either of those changes will fix your command:
echo "$test" | awk 'BEGIN{FS=";"} {print $2}' | awk 'BEGIN{FS="&"} {print $3}'
echo "$test" | awk -F';' '{print $2}' | awk -F'&' '{print $3}'
(I also took the liberty of wrapping $test in double-quotes, since unquoted parameter-expansions are a recipe for trouble. With your value of $test it would have been fine, but I make it a habit to always use double-quotes, just in case.)

Try that :
echo "$test" | awk -F'[;&]' '{print $4}'
I specify a multiple separator in -F'[;&]'

Related

awk - Delimiter as combination of number and | (pipe) not working

I have an input file with some records as below,
input.txt
Record|111|aaa|aaa|11|1-bb|bb|1111|cccc|cccc
Record|11|1-aaa|aaa|111|bb|bb|1111|cccc|cccc
Record|111|aaa|aaa|11|1-bb|bb|1111|cccc|cccc
Record|111|aaa|aaa|111|bb|bb|11|1-cccc|cccc
Record|22|aaa|aaa|222|bb|bb|2222|cccc|cccc|11|1-dddd|dd
Record|333|aaa|aaa|11|1-bb|bb|333|cccc|cccc
Record|11|1-aaa|aaa|102|bb|bb|1111|cccc|cccc
i want to use a delimiter |11| in awk and get the second field, i tried the most common way as below,
Command
awk -F'|11|' '{print $2}' input.txt
Output
1|aaa|aaa|
|1-aaa|aaa|
1|aaa|aaa|
1|aaa|aaa|
|1-dddd|dd
|1-bb|bb|333|cccc|cccc
|1-aaa|aaa|102|bb|bb|
Expected Output
1-bb|bb|1111|cccc|cccc
1-aaa|aaa|111|bb|bb|1111|cccc|cccc
1-bb|bb|1111|cccc|cccc
1-cccc|cccc
1-dddd|dd
1-bb|bb|333|cccc|cccc
1-aaa|aaa|102|bb|bb|1111|cccc|cccc
Basically its not considering the last | of the delimiter |11|, instead it is taking a delimiter |11.
i tried all below, none gave me the expected output,
awk -F"|11|" '{print $2}' input.txt # gives wrong output
awk -F\|11\| '{print $2}' input.txt # gives Wrong output
awk -v FS='|11|' '{print $2}' input.txt # gives Wrong output
Finally i had to write a for loop inside awk with delimiter as | to make it work, i would like to know why the simple solution doesn't work
Argument to -F is a regex.
awk -F "\\\|11\\\|" '{print $2}' file
or
awk -F '\\|11\\|' '{print $2}' file
or (Thanks to EdMorton)
awk -F'[|]11[|]' '{print $2}' input.txt
Output:
1-bb|bb|1111|cccc|cccc
1-aaa|aaa|111|bb|bb|1111|cccc|cccc
1-bb|bb|1111|cccc|cccc
1-cccc|cccc
1-dddd|dd
1-bb|bb|333|cccc|cccc
1-aaa|aaa|102|bb|bb|1111|cccc|cccc
Cyrus explained why your delimiter does not work as expected (a combination of regular expression quoting issues).
With sed, removing everything up to and including the |11| on each line:
$ sed 's/.*|11|//' input.txt
1-bb|bb|1111|cccc|cccc
1-aaa|aaa|111|bb|bb|1111|cccc|cccc
1-bb|bb|1111|cccc|cccc
1-cccc|cccc
1-dddd|dd
1-bb|bb|333|cccc|cccc
1-aaa|aaa|102|bb|bb|1111|cccc|cccc

BASH shell execution from string with positional parameters

When I try to run the code below, the shell is replacing (because they are not defined as a bash variable) $4 and $2 with blanks. My question is, how do I keep bash from trying to evaluate the positional parameters for awk as its variables?
I've tried putting double and single quotes around the positional parameters, however, that did not suppress bash from interpreting them as local variables instead of strings.
This is what is returned when I echo "$x$i$y"
date -r /root/capture/capture11.mp4 | awk '{print }' | awk -F":" '/1/ {print }'
Code:
#!/bin/sh
i=$(cat /etc/hour.conf)
x="date -r /root/capture/capture"
y=".mp4 | awk '{print $4}' | awk -F\":\" '/1/ {print $2}'"
$x$i$y
Any help would be greatly appreciated!
Variables are interpolated inside double quotes. Use single quotes, or escape them like \$2.
However, the way you're trying to split up the command into separate variables won't work. Instead, you should use a function. Then you don't need to deal with quotes and escaping at all. For instance:
do_thing() {
date -r "/root/capture/capture$1.mp4" | awk '{print $4}' | awk -F':' '/1/ {print $2}'
}
do_thing "$(cat /etc/hour.conf)"
$4 is doubled quoted. Though there are single quotes, it is included in double quotes. So the single quotes are just part of the string and it won't keep the literal meaning of $. So you can escape the $:
y=".mp4 | awk '{print \$4}' | awk -F\":\" '/1/ {print \$2}'"
Or, use single quotes around the whole part:
y='.mp4 | awk "{print \$4}" | awk -F':' "/1/ {print \$2}"'
Concatenating variables like that to build a command line sort of works, but quotes within the variables don't quote anything, they'll just be taken as literal quotes.
This sort of works (but is horrible):
$ x=prin; y="tf %f\\n"; z=" 123456789"
$ $x$y$z
123456789.000000
This doesn't do what you want:
$ z='"foo bar"'; printf $y ; echo
"foo
Instead of one argument foo bar, printf gets the two arguments "foo and bar".

Comma separated value within double quote

I have a data file separated by comma, data enclosed by "":
$ head file.txt
"HD","Sep 13 2016 1:05AM","0001"
"DT","273093045","192534"
"DT","273097637","192534" ..
I want to get the 3rd column value (0001) to be assigned to my variable.
I tried
FILE_VER=`cat file.txt | awk -F',' '{if ($1 == "HD") print $3}'`
I don't get any value assigned to FILE_VER. Please help me with correct syntax.
Another awk version:
awk -F'"' '$2 == "HD"{print $6}' file
You were almost there. Simply removing the quotes should be good enough:
foo=$(awk -F, '$1=="\"HD\""{gsub(/"/,"",$3);print $3}' file)
not sure this is the most optimal way but works:
FILE_VER=$(awk -F',' '$1 == "\"HD\"" {gsub("\"","",$3); print $3}' file.txt)
test for HD between quotes
remove quotes before printing result
You can change the file to substitute the comma and quotes to tab:
tr -s '\"," "\t' < filename | awk '{print $3}'
Maybe there is a solution using only awk, but this works just fine!

Extract text with any command in linux shell

How do I extract the text from the following text and store it to the variables:
05:21-09:32, 14:21-19:30
Here, I want to store 05 in one variable, 21 in another, 09 in another and so on. All the value must me stored in array or in separate varibles.
I have tried:
k="05:21-09:32, 14:21-19:30"
part1=($k | awk -F"-" '{print $1}' | awk -F":" '{print $1}')
part2=($k | awk -F"-" '{print $2}' | awk -F":" '{print $1}')
part3=($k | awk -F"," '{print $2}' | awk -F":" '{print $1}')
part4=($k | awk -F"-" '{print $3}' | awk -F":" '{print $1}')
I need a more clear solution or short solution.
You can use read with the -array option:
IFS=':-, ' read -ra my_arr <<< "05:21-09:32, 14:21-19:30"
The above code will split the input string on :, -, , and spaces:
$ echo "${my_arr[0]}" "${my_arr[1]}" "${my_arr[2]}" "${my_arr[3]}"
05 21 09 32
Your code has a number of problems.
You can't pipe the value of k to standard output with just $k -- you want something like printf '%s\n' "$k" or perhaps the less portable echo "$k"
Notice also the quoting in the expression above; without it, the shell will perform wildcard expansion and whitespace tokenization on the value
Spawning two Awk processes for a simple string substitution is excessive
Spawning a separate pipeline for each value you want to extract is inefficient; if at all possible, extract everything in one go.
Something like IFS=':-, '; set -- $k will assign the parts to $1, $2, $3, and $4 in one go.

awk print value without quote sign

I have this value
option 'staticip' '5.5.5.1'
I want to print only 5.5.5.1 without quote sign. I have use
cat /etc/filename | grep staticip | awk '{print $3}'
but the result come with '5.5.5.1'
Or, you can use tr to remove the offending characters:
cat /etc/filename | grep staticip | awk '{print $3}' | tr -d \'
You can use awk's gsub() function to change the quotes to nothing.
awk '{gsub(/'"'"'/, "", $3); print $3}'
Note this is really gsub(/'/, "", $3). The ugliness comes from the need to glue quotes together.
awk '$2=="staticip" && $0=$4' FS="'"
Result
5.5.5.1
To remove the ' from the awk output you can use
sed "s/^'//;s/'$//"
This command removes the ' only at the beginning and the end of the output line and is not so heavy as to use awk and not so general if using tr.
awk is much bgiger in memory and tr removes all ' from the output what is not always intended.
You could use awks substr function or pipe that to the cut command. I leave you to read the man page for awk substr.

Resources