Extract text with any command in linux shell - linux

How do I extract the text from the following text and store it to the variables:
05:21-09:32, 14:21-19:30
Here, I want to store 05 in one variable, 21 in another, 09 in another and so on. All the value must me stored in array or in separate varibles.
I have tried:
k="05:21-09:32, 14:21-19:30"
part1=($k | awk -F"-" '{print $1}' | awk -F":" '{print $1}')
part2=($k | awk -F"-" '{print $2}' | awk -F":" '{print $1}')
part3=($k | awk -F"," '{print $2}' | awk -F":" '{print $1}')
part4=($k | awk -F"-" '{print $3}' | awk -F":" '{print $1}')
I need a more clear solution or short solution.

You can use read with the -array option:
IFS=':-, ' read -ra my_arr <<< "05:21-09:32, 14:21-19:30"
The above code will split the input string on :, -, , and spaces:
$ echo "${my_arr[0]}" "${my_arr[1]}" "${my_arr[2]}" "${my_arr[3]}"
05 21 09 32

Your code has a number of problems.
You can't pipe the value of k to standard output with just $k -- you want something like printf '%s\n' "$k" or perhaps the less portable echo "$k"
Notice also the quoting in the expression above; without it, the shell will perform wildcard expansion and whitespace tokenization on the value
Spawning two Awk processes for a simple string substitution is excessive
Spawning a separate pipeline for each value you want to extract is inefficient; if at all possible, extract everything in one go.
Something like IFS=':-, '; set -- $k will assign the parts to $1, $2, $3, and $4 in one go.

Related

Strip a part of string in linux

Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 is my string and the result I want is vm-1.0.3
What is the best way to do this
Below is what I tried
$ echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 | awk -F _ {'print $2'} | awk -F - {'print $1,$2'}
vm 1.0.3
I also tried
$ echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 | awk -F _ {'print $2'} | awk -F - {'print $1"-",$2'}
vm- 1.0.3
Here I do not need space in between
I tried using cut and I got the expected result
$ echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 | awk -F _ {'print $2'} | cut -c 1-8
vm-1.0.3
What is the best way to do the same?
Making assumptions from the 1 example you provided about what the general form of your input will be so it can handle that robustly, using any sed:
$ echo 'Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2' |
sed 's/^[^-]*-[^-]*-[^_]*_\(.*\)-[^-]*$/\1/'
vm-1.0.3
or any awk:
$ echo 'Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2' |
awk 'sub(/^[^-]+-[^-]+-[^_]+_/,"") && sub(/-[^-]+$/,"")'
vm-1.0.3
You don't need 2 calls to awk, but your syntax with the single quotes outside the curly's, including printing the hyphen:
echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 |
awk -F_ '{print $2}' | awk -F- '{print $1 "-" $2}'
If your string has the same format, let the field separator be either - or _
echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 | awk -F"[-_]" '{print $4 "-" $5}'
Or split the second field on - and print the first 2 parts
echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 | awk -F_ '{
split($2,a,"-")
print a[1] "-" a[2]
}'
Or with gnu-awk a bit more specific match with a capture group:
echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 |
awk 'match($0, /^Apps-[^_]*_(vm-[0-9]+\.[0-9]+\.[0-9]+)/, a) {print a[1]}'
Output
vm-1.0.3
This is the easiest I can think of:
echo "Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2" | cut -c 25-32
Obviously you need to be sure about the location of your characters. In top of that, you seem to be have two separators: '_' and '-', while both characters also are part of the name of your entry.
echo 'Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2' | sed -E 's/^.*_vm-([0-9]+).([0-9]+).([0-9]+)-.*/vm-\1.\2.\3/'

How to split string with awk using '\t|' sting as a separator?

I am trying to split the string using custom field separator like this:
$ echo -e "abc\t|def" | awk -F '\t|' '{print $1, $2}'
abc |def
I expect output to be:
abc def
But instead it also includes | character which is part of separator:
abc |def
If using '\t#' as a separator I am getting expected output:
$ echo -e "abc\t#def" | awk -F '\t#' '{print $1, $2}'
abc def
So by some reason | character in field separator does not work as expected.
How can I make it work ?
It should be:
awk -F '\t[|]' '{print $1, $2}'
-F will get evaluated as a regex. Put the | into a character class since it has special meaning in the regex.
Alternatively you can use:
awk -F '\t\\|' '{print $1, $2}'

bash, extract string from text file with space delimiter

I have a text files with a line like this in them:
MC exp. sig-250-0 events & $0.98 \pm 0.15$ & $3.57 \pm 0.23$ \\
sig-250-0 is something that can change from file to file (but I always know what it is for each file). There are lines before and above this, but the string "MC exp. sig-250-0 events" is unique in the file.
For a particular file, is there a good way to extract the second number 3.57 in the above example using bash?
use awk for this:
awk '/MC exp. sig-250-0/ {print $10}' your.txt
Note that this will print: $3.57 - with the leading $, if you don't like this, pipe the output to tr:
awk '/MC exp. sig-250-0/ {print $10}' your.txt | tr -d '$'
In comments you wrote that you need to call it in a script like this:
while read p ; do
echo $p,awk '/MC exp. sig-$p/ {print $10}' filename | tr -d '$'
done < grid.txt
Note that you need a sub shell $() for the awk pipe. Like this:
echo "$p",$(awk '/MC exp. sig-$p/ {print $10}' filename | tr -d '$')
If you want to pass a shell variable to the awk pattern use the following syntax:
awk -v p="MC exp. sig-$p" '/p/ {print $10}' a.txt | tr -d '$'
More lines would've been nice but I guess you would like to have a simple use awk.
awk '{print $N}' $file
If you don't tell awk what kind of field-separator it has to use it will use just a space ' '. Now you just have to count how many fields you have got to get your field you want to get. In your case it would be 10.
awk '{print $10}' file.txt
$3.57
Don't want the $?
Pipe your awk result to cut:
awk '{print $10}' foo | cut -d $ -f2
-d will use the $ als field-separator and -f will select the second field.
If you know you always have the same number of fields, then
#!/bin/bash
file=$1
key=$2
while read -ra f; do
if [[ "${f[0]} ${f[1]} ${f[2]} ${f[3]}" == "MC exp. $key events" ]]; then
echo ${f[9]}
fi
done < "$file"

Multisplitting in AWK

I would like to execute 2 splits using AWK (i have 2 fields separator), the String of data i'm working on would look like something like so:
data;digit&int&string&int&digit;data;digit&int&string&int&digit
As you can see the outer field separator is a semicolon, and the nested one is an ampersand.
What i'm doing with awk is (suppose that the String would be in a variable named test)
echo ${test} | awk '{FS=";"} {print $2}' | awk '{FS="&"} {print $3}'
This should catch the "String" word, but for some reason this is not working.
It seems like the second pipe its not being applied, as i see only the result of the first awk function
Any advice?
use awk arrays
echo $test | awk -F';' '{split($2, arr, "&"); print(arr[3])}'
The other answers give working solutions, but they don't really explain the problem.
The problem is that setting FS inside a regular { ... } block the awk script won't cause $1, $2, etc. to be re-calculated for the current line; so FS will be set for any later lines, but the very first line will already have been split by whitespace. To set FS before running the script, you can use a BEGIN block (which is run before the first line); or, you can use the -F command-line option.
Making either of those changes will fix your command:
echo "$test" | awk 'BEGIN{FS=";"} {print $2}' | awk 'BEGIN{FS="&"} {print $3}'
echo "$test" | awk -F';' '{print $2}' | awk -F'&' '{print $3}'
(I also took the liberty of wrapping $test in double-quotes, since unquoted parameter-expansions are a recipe for trouble. With your value of $test it would have been fine, but I make it a habit to always use double-quotes, just in case.)
Try that :
echo "$test" | awk -F'[;&]' '{print $4}'
I specify a multiple separator in -F'[;&]'

Unix/Linux Shell Grep to cut

I have a file, say 'names' that looks like this
first middle last userid
Brian Duke Willy willybd
...
whenever I use the following
line=`grep "willybd" /dir/names`
name=`echo $line | cut -f1-3 -d' '`
echo $name
It prints the following:
Brian Duke Willy willybd
Brian Duke Willy
My question is, how would I get it to print just "Brian Duke Willy" without first printing the original line that I cut?
The usual way to do this sort of thing is:
awk '/willybd/{ print $1, $2, $3 }' /dir/names
or, to be more specific
awk '$4 ~ /willybd/ { print $1, $2, $3 }' /dir/names
or
awk '$4 == "willybd" { print $1, $2, $3 }' /dir/names
grep "willybd" /dir/names | cut "-d " -f1-3
The default delimiter for cut is tab, not space.
Unless you need the intermediate variables, you can use
grep "willybd" /dir/names | cut -f1-3 -d' '
One of the beautiful features of linux is that most commands can be used as filters: they read from stdin and write to stdout, which means you can "pipe" the output of one command into the next command. That's what the | character does. It's pronounced pipe.

Resources