I am trying to use two data pipeline, || , as delimiter in AWK command. But I am unable to do it. I have a file in which I have to consider two data pipeline as delimiter, just like considering TAB or COMMA as delimiter.
Just tell awk to interpret the | literally with []:
awk -F'[|][|]' ...
Example:
» echo "1 || 2" | awk -F'[|][|]' '{ print $2 }'
2
Related
Able to trim and transpose the below data with sed, but it takes considerable time. Hope it would be better with AWK. Welcome any suggestions on this
Input Sample Data:
[INX_8_60L ] :9:Y
[INX_8_60L ] :9:N
[INX_8_60L ] :9:Y
[INX_8_60Z ] :9:Y
[INX_8_60Z ] :9:Y
Required Output:
INX?_8_60L¦INX?_8_60L¦INX?_8_60L¦INX?_8_60Z¦INX?_8_60Z
Just use awk, e.g.
awk -v n=0 '{printf (n?"!%s":"%s", substr ($0,2,match($0,/[ \t]+/)-2)); n=1} END {print ""}' file
Which will be orders of magnitude faster. It just picks out the (e.g. "INX_8_60L") substring using substring and match. n is simply used as a false/true (0/1) flag to prevent outputting a "!" before the first string.
Example Use/Output
With your data in file you would get:
$ awk -v n=0 '{printf (n?"!%s":"%s", substr ($0,2,match($0,/[ \t]+/)-2)); n=1} END {print ""}' file
INX_8_60L!INX_8_60L!INX_8_60L!INX_8_60Z!INX_8_60Z
Which appears to be what you are after. (Note: I'm not sure what your separator character is, so just change above as needed) If not, let me know and I'm happy to help further.
Edit Per-Changes
Including the '?' isn't difficult, and I just copied the character, so you would now have:
awk -v n=0 '{s=substr($0,2,match($0,/[ \t]+/)-2); sub(/_/,"?_",s); printf n?"¦%s":"%s", s; n=1}
END {print ""}' file
Example Output
INX?_8_60L¦INX?_8_60L¦INX?_8_60L¦INX?_8_60Z¦INX?_8_60Z
And to simplify, just operating on the first field as in #JamesBrown's answer, that would reduce to:
awk -v n=0 '{s=substr($1,2); sub(/_/,"?_",s); printf n?"¦%s":"%s", s; n=1} END {print ""}' file
Let me know if that needs more changes.
Don't start so many sed commands, separate the sed operations with semicolon instead.
Try to process the data in a single job and avoid regex. Below reading with substr() static sized first block and insterting ? while outputing.
$ awk '{
b=b (b==""?"":";") substr($1,2,3) "?" substr($1,5)
}
END {
print b
}' file
Output:
INX?_8_60L;INX?_8_60L;INX?_8_60L;INX?_8_60Z;INX?_8_60Z
If the fields are not that static in size:
$ awk '
BEGIN {
FS="[[_ ]" # split field with regex
}
{
printf "%s%s?_%s_%s",(i++?";":""), $2,$3,$4 # output semicolons and fields
}
END {
print ""
}' file
Performance of solutions for 20 M records:
Former:
real 0m8.017s
user 0m7.856s
sys 0m0.160s
Latter:
real 0m24.731s
user 0m24.620s
sys 0m0.112s
sed can be very fast when used gingerly, so for simplicity and speed you might wish to consider:
sed -e 's/ .*//' -e 's/\[INX/INX?/' | tr '\n' '|' | sed -e '$s/|$//'
The second call to sed is there to satisfy the requirement that there is no trailing |.
Another solution using GNU awk:
awk -F'[[ ]+' '
{printf "%s%s",(o?"¦":""),gensub(/INX/,"INX?",1,$2);o=1}
END{print ""}
' file
The field separator is set (with -F option) such that it matches the wanted parameter.
The main statement is to print the modified parameter with the ? character.
The variable o allows to keep track of the delimeter ¦.
I Have a file name abc.lst i ahve stored that in a variable it contain 3 words string among them i want to grep second word and in that i want to cut the word from expdp to .dmp and store that into variable
example:-
REFLIST_OP=/tmp/abc.lst
cat $REFLIST_OP
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
Desired Output:-
expdp_TEST_P119_*_18112017.dmp
I Have tried below command :-
FULL_DMP_NAME=`cat $REFLIST_OP|grep /orabackup|awk '{print $2}'`
echo $FULL_DMP_NAME
/data/abc/GOon/expdp_TEST_P119_*_18112017.dmp
REFLIST_OP=/tmp/abc.lst
awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
Test Results:
$ REFLIST_OP=/tmp/abc.lst
$ cat "$REFLIST_OP"
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
$ awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
expdp_TEST_P119_*_18112017.dmp
To save in variable
myvar=$( awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" )
Following awk may help you on same.
awk -F'/| ' '{print $6}' Input_file
OR
awk -F'/| ' '{print $6}' "$REFLIST_OP"
Explanation: Simply making space and / as a field separator(as per your shown Input_file) and then printing 6th field of the line which is required by OP.
To see the field number and field's value you could use following command too:
awk -F'/| ' '{for(i=1;i<=NF;i++){print i,$i}}' "$REFLIST_OP"
Using sed with one of these regex
sed -e 's/.*\/\([^[:space:]]*\).*/\1/' abc.lst capture non space characters after /, printing only the captured part.
sed -re 's|.*/([^[:space:]]*).*|\1|' abc.lst Same as above, but using different separator, thus avoiding to escape the /. -r to use unescaped (
sed -e 's|.*/||' -e 's|[[:space:]].*||' abc.lst in two steps, remove up to last /, remove from space to end. (May be easiest to read/understand)
myvar=$(<abc.lst); myvar=${myvar##*/}; myvar=${myvar%% *}; echo $myvar
If you want to avoid external command (sed)
I am beginner in shell script .
I have one variable containing value having = character.
I want to add quote in fields after = Character.
abc="source=TDG"
echo $abc|awk -F"=" '{print $2}'
My code is printing one field only.
my expected output is
source='TDG'
$ abc='source=TDG'
$ echo "$abc" | sed 's/[^=]*$/\x27&\x27/'
source='TDG'
[^=]*$ match non = characters at end of line
\x27&\x27 add single quotes around the matched text
With awk
$ echo "$abc" | awk -F= '{print $1 FS "\047" $2 "\047"}'
source='TDG'
-F= input field separator is =
print $1 FS "\047" $2 "\047" print first field, followed by input field separator, followed by single quotes then second field and another single quotes
See how to escape single quote in awk inside printf
for more ways of handling single quotes in print
With bash parameter expansion
$ echo "${abc%=*}='${abc#*=}'"
source='TDG'
${abc%=*} will delete last occurrence of = and zero or more characters after it
${abc#*=} will delete zero or more characters and first = from start of string
Sed would be the better choice:
echo "$abc" | sed "s/[^=]*$/'&'/"
Awk can do it but needs extra bits:
echo "$abc" | awk -F= 'gsub(/(^|$)/,"\047",$2)' OFS==
What is taking place?
Using sub to surround TDG with single quotes by its octal nr to avoid quoting problems.
echo "$abc" | awk '{sub(/TDG/,"\047TDG\047")}1'
source='TDG'
I have string like this
my/path/to/home/file.txt
Now I want to get the number of parts in this string on the basis of delimeter (/). So for the above string the answer would be 5. I need this in my linux shell script. How to get that without using a for loop.
$ awk -F'/' '{print NF}' <<< "my/path/to/home/file.txt"
5
-F'/' : This will tell awk that fields are separate by / .
NF : This is the last field number. In this case "my" is field 1,path is 2nd..... and file.txt is 5th field.
{print NF}: This will print the last field number.
It greps the delimiter, counts the occurences and adds 1 :
echo $(($(echo "my/path/to/home/file.txt" | grep -o "/" | wc -l)+1))
#=> 5
You can use awk:
string="my/path/to/home/file.txt"
count="$(awk -F/ '{print $NF}' <<< "$string")"
-F/ splits the string into fields based on / as the delimiter. $NF contains the number of those fields.
only pure bash (fastest way):
#!/bin/bash
a=my/path/to/home/file.txt
b=${a//[^\/]}
echo $[ ${#b} +1 ]
I am new to using awk i want to separate string contains spaces.
vboxmanage list vms this is my command and its output is below
"VMOne" {5559eb92-2665-4c52-a75d-b57c248c74db}
"VM Second" {9bc754f8-4dfd-44e5-9469-dd824d438832}
my expected output is VMOne;VM Second below is some thing i have tried
vboxmanage list vms | awk '{print $1,";"}' | sed 's/"//g' | awk -vORS="" '1'
but, it gives me output like VMOne ;VM ; it cuts the second word and add a space before ;
any suggestion will helpfull thanks
awk -F\" '{ printf (NR > 1 ? ";%s" : "%s"), $2 } END { if (NR) print "" }' file
Output:
VMOne;VM Second
if (NR) is optional if expected input always has lines. You can also remove the END block completely if you don't need to terminate the output with newline on the end.