bash, extract string from text file with space delimiter - string

I have a text files with a line like this in them:
MC exp. sig-250-0 events & $0.98 \pm 0.15$ & $3.57 \pm 0.23$ \\
sig-250-0 is something that can change from file to file (but I always know what it is for each file). There are lines before and above this, but the string "MC exp. sig-250-0 events" is unique in the file.
For a particular file, is there a good way to extract the second number 3.57 in the above example using bash?

use awk for this:
awk '/MC exp. sig-250-0/ {print $10}' your.txt
Note that this will print: $3.57 - with the leading $, if you don't like this, pipe the output to tr:
awk '/MC exp. sig-250-0/ {print $10}' your.txt | tr -d '$'
In comments you wrote that you need to call it in a script like this:
while read p ; do
echo $p,awk '/MC exp. sig-$p/ {print $10}' filename | tr -d '$'
done < grid.txt
Note that you need a sub shell $() for the awk pipe. Like this:
echo "$p",$(awk '/MC exp. sig-$p/ {print $10}' filename | tr -d '$')
If you want to pass a shell variable to the awk pattern use the following syntax:
awk -v p="MC exp. sig-$p" '/p/ {print $10}' a.txt | tr -d '$'

More lines would've been nice but I guess you would like to have a simple use awk.
awk '{print $N}' $file
If you don't tell awk what kind of field-separator it has to use it will use just a space ' '. Now you just have to count how many fields you have got to get your field you want to get. In your case it would be 10.
awk '{print $10}' file.txt
$3.57
Don't want the $?
Pipe your awk result to cut:
awk '{print $10}' foo | cut -d $ -f2
-d will use the $ als field-separator and -f will select the second field.

If you know you always have the same number of fields, then
#!/bin/bash
file=$1
key=$2
while read -ra f; do
if [[ "${f[0]} ${f[1]} ${f[2]} ${f[3]}" == "MC exp. $key events" ]]; then
echo ${f[9]}
fi
done < "$file"

Related

Using awk to separate an output containing a tab and a "/" separators into a delimited format

I'll appreciate help in converting this output to a pipe delimited
I have the following output
abcde1234 /path/A/file1
test23455 /path/B/file2345
But I would like in
abcde1234|file1
test23455|file2345
In awk, If you set FS as [[:blank:]]+/|/ you can print the first and last fields:
awk -v FS='[[:blank:]]+/|/' -v OFS='|' '{print $1, $NF}' file
abcde1234|file1
test23455|file2345
Here is a one-liner awk solution:
awk -v FS='[ \t].*/' -v OFS='|' '{$1=$1}1' file
and, a sed one-liner:
sed 's%[[:blank:]].*/%|%' file
and a pure bash one
while read -r; do echo "${REPLY%%[[:blank:]]*}|${REPLY##*/}"; done < file
try to use cut 🤷🏻‍♀️.
abcde1234 /path/A/file1
test23455 /path/B/file2345
while IFS= read -r line; do
value1=$(echo $line | cut -d ' ' -f1)
value2=$(echo $line | cut -d '/' -f4)
printf "$value1 $value2\n"
done < <(cat list)

Extract field after colon for lines where field before colon matches pattern

I have a file file1 which looks as below:
tool1v1:1.4.4
tool1v2:1.5.3
tool2v1:1.5.2.c8.5.2.r1981122221118
tool2v2:32.5.0.abc.r20123433554
I want to extract value of tool2v1 and tool2v2
My output should be 1.5.2.c8.5.2.r1981122221118 and 32.5.0.abc.r20123433554.
I have written the following awk but it is not giving correct result:
awk -F: '/^tool2v1/ {print $2}' file1
awk -F: '/^tool2v2/ {print $2}' file1
grep -E can also do the job:
grep -E "tool2v[12]" file1 |sed 's/^.*://'
If you have a grep that supports Perl compatible regular expressions such as GNU grep, you can use a variable-sized look-behind:
$ grep -Po '^tool2v[12]:\K.*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
The -o option is to retain just the match instead of the whole matching line; \K is the same as "the line must match the things to the left, but don't include them in the match".
You could also use a normal look-behind:
$ grep -Po '(?<=^tool2v[12]:).*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
And finally, to fix your awk which was almost correct (and as pointed out in a comment):
$ awk -F: '/^tool2v[12]/ { print $2 }' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
You can filter with grep:
grep '\(tool2v1\|tool2v2\)'
And then remove the part before the : with sed:
sed 's/^.*://'
This sed operation means:
^ - match from beginning of string
.* - all characters
up to and including the :
... and replace this matched content with nothing.
The format is sed 's/<MATCH>/<REPLACE>/'
Whole command:
grep '\(tool2v1\|tool2v2\)' file1|sed 's/^.*://'
Result:
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
the question has already been answered though, but you can also use pure bash to achieve the desired result
#!/usr/bin/env bash
while read line;do
if [[ "$line" =~ ^tool2v* ]];then
echo "${line#*:}"
fi
done < ./file1.txt
the while loop reads every line of the file.txt, =~ does a regexp match to check if the value of $line variable if it starts with toolv2, then it trims : backward

Replace comma with space in shell script

Replace comma with space using a shell script
Given the following input:
Test,10.10.10.10,"80,22,3306",connect
I need to get below output using a bash script
Test 10.10.10.10 "80,22,3306" connect
If you have gawk, you can use FPAT (field pattern), setting it to a regular expression.
awk -v FPAT='([^,]+)|(\"[^"]+\")' '{ for(i=1;i<=NF;i++) { printf "%s ",$i } }' <<< "Test,10.10.10.10,\"80,22,3306\",connect"
We set FPAT to separate the text based on anything that is not a comma and also data enclosed in quotation marks as as well as anything that is not a quotation mark. We then print all the fields with a spaces in between.
Considering if your Input_file is same as shown sample then following sed may help you in same too.
sed 's/\(.[^,]*\),\([^,]*\),\(".*"\),\(.*\)/\1 \2 \3 \4/g' Input_file
Assuming you can read your input from the file, this works
#!/usr/bin/bash
while read -r line;do
declare -a begin=$(echo $line | awk -F'"' '{print $1}' | tr "," " " )
declare -a end=$(echo $line |awk -F'"' '{print $3}' | tr "," " " )
declare -a middle=$(echo $line | awk -F'"' '{print $2}' )
echo "${begin[#]} \"${middle[#]}\" ${end[#]}"
done < connect_file
Edit: I see,that you want to keep the commas between port numbers. I have edited the script.
echo Test,10.10.10.10,\"80,22,3306\",connect|awk '{sub(/,/," ")gsub(/,"80,22,3306",/," \4280,22,3306\42 ")}1'
Test 10.10.10.10 "80,22,3306" connect

store awk output in variable

I ignore what is the problem with this code ?
#! /bin/bash
File1=$1
for (( j=1; j<=3; j++ ))
{
output=$(`awk -F; 'NR=='$j'{print $3}' "${File1}"`)
echo ${output}
}
File1 looks like this :
Char1;2;3;89
char2;9;6;66
char5;3;77;8
I want to extract on every line looped the field 3
so the result will be
3
6
7
It should be like this:
#! /bin/bash
File1=$1
for (( j=1; j<=3; j++ ))
{
output=$(awk -F ';' 'NR=='$j' {print $3}' "${File1}")
echo ${output}
}
It working well on my CentOS.
You are mixing single quotes and backticks all over the place and not escaping them
You can't use bash variables in an awk script without using the -v flag
awk already works in a loop so there is no reason to loop the loop...
Just:
awk -F";" '{print $3}' "${file1}"
Will do exactly what your entire script is trying to do now.
Even easier, use the cut utility : cut -d';' -f3 will produce the result you're looking for, where -d specifies the delimiter to use and -f the field/column you're looking for (1-indexed).
If you simply want to extract a column out from a structured file like the one you have, use the cut utility.
cut will allow you to specify what the delimiter is in your data (;) and what column(s) you'd like to extract (column 3).
cut -d';' -f3 "$file1"
If you would like to loop over the result of this, use a while loop and read the values one by one:
cut -d';' -f3 "$file1" |
while read data; do
echo "data is $data"
done
Would you want the values in a variable, do this
var=$( cut -d';' -f3 "$file1" | tr '\n' ' ' )
The tr '\n' ' ' bit replaces newlines with spaces, so you would get 3 6 77 as a string.
To get them into an array:
declare -a var=( $( cut -d';' -f3 "$file1" ) )
(the tr is not needed here)
You may then access the values as ${var[0]}, ${var[1]} etc.

How to extract words between two characters in linux?

I have the following stored in a file named tmp.txt
user/config/jars/content-config-factory-3.2.0.0.jar
I need to store this word to a variable -
$variable=content-config-factory
I have written the following
while read line
do
var=$(echo $line | awk 'BEGIN{FS="\/"; OFS=" "} {print $NF}' )
var=$(echo $var | awk 'BEGIN{FS="-"; OFS=" "} {print $(1)}' )
echo $var
done < tmp.txt
This returns the result "content" instead of "content-config-factory".
Can anyone please tell me how to extract a word between two characters from a string efficiently.
An awk solution would be like
awk -F/ '{sub("-[^-]+$", "", $NF); print $NF}
Test
$ echo "user/config/jars/content-config-factory-3.2.0.0.jar" | awk -F/ '{sub("-[^-]+$", "", $NF); print $NF}'
content-config-factory
You can try this way also and get your expected result
variable=$(sed 's:.*/\(.*\)-.*:\1:' FileName)
echo $variable
OutPut :
content-config-factory
You could use grep,
grep -oP '(?<=/)[^/]*(?=-\d+\.)' file
Example:
$ var=$(echo 'user/config/jars/content-config-factory-3.2.0.0.jar' | grep -oP '(?<=/)[^/]*(?=-\d+\.)')
$ echo "$var"
content-config-factory

Resources