Shell script tokenizer

Shell script tokenizer - linux

I'm writing a script that queries my JBoss server for some database related data. The thing that is returned after the query looks like this:
ConnectionCount=7
ConnectionCreatedCount=98
MaxConnectionsInUseCount=10
ConnectionDestroyedCount=91
AvailableConnectionCount=10
InUseConnectionCount=0
MaxSize=10
I would like to tokenize this data so the numbers on the right hand side are stored in a variable in the format 7,98,10,91,10,0,10. I tried to use IFS with the equals sign, but that still keeps the parameter names (only the equals signs are eliminated).

I put your input data into file d.txt. The one-liner below extracts the numbers, comma-delimits them and assigns all that to variable TAB (tested with Korn shell):
$ TAB=$(awk -F= '{print $2}' d.txt | xargs echo | sed 's/ /,/g')
$ echo $TAB
7,98,10,91,10,0,10

Or just use cut/tr:
F=($(cut -d'=' -f2 input | tr '\n' ' '))

You can do it with one sed command too:
sed -n 's/^.*=\(.*\)/\1,/;H;${g;s/\n//g;s/,$//;p;}' file
7,98,10,91,10,0,10

A simple cut without any pipes :
arr=( $(cut -d'=' -f2 file) )
Outut
printf '%s\n' "${arr[#]}"
7
98
10
91
10
0
10

Related

String split and extract the last field in bash

I have a text file FILENAME. I want to split the string at - of the first column field and extract the last element from each line. Here "$(echo $line | cut -d, -f1 | cut -d- -f4)"; alone is not giving me the right result.
FILENAME:
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
code I tried:
while read line; do \
DNA="$(echo $line | cut -d, -f1 | cut -d- -f4)";
echo $DNA
done < ${FILENAME}
Result I want
1195060301
1195060302
1195060311

Would you please try the following:
while IFS=, read -r f1 _; do # set field separator to ",", assigns f1 to the 1st field and _ to the rest
dna=${f1##*-} # removes everything before the rightmost "-" from "$f1"
echo "$dna"
done < "$FILENAME"

Well, I had to do with the two lines of codes. May be someone has a better approach.
while read line; do \
DNA="$(echo $line| cut -d, -f1| rev)"
DNA="$(echo $DNA| cut -d- -f1 | rev)"
echo $DNA
done < ${FILENAME}

I do not know the constraints on your input file, but if what you are looking for is a 10-digit number, and there is only ever one 10-digit number per line... This should do niceley
grep -Eo '[0-9]{10,}' input.txt
1195060301
1195060302
1195060311
This essentially says: Show me all 10 digit numbers in this file
input.txt
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram

A sed approach:
sed -nE 's/.*-([[:digit:]]+)\,.*/\1/p' input_file
sed options:
-n: Do not print the whole file back, but only explicit /p.
-E: Use Extend Regex without need to escape its grammar.
sed Extended REgex:
's/.*-([[:digit:]]+)\,.*/\1/p': Search, capture one or more digit in group 1, preceded by anything and a dash, followed by a comma and anything, and print only the captured group.

Using awk:
awk -F[,] '{ split($1,arr,"-");print arr[length(arr)] }' FILENAME
Using , as a separator, take the first delimited "piece" of data and further split it into an arr using - as the delimiter and awk's split function. We then print the last index of arr.

Linux: Extract string from a line including delimiter character using sed command [duplicate]

For example
echo "abc-1234a :" | grep <do-something>
to print only abc-1234a

I think these are closer to what you're getting at, but without knowing what you're really trying to achieve, it's hard to say.
echo "abc-1234a :" | egrep -o '^[^:]+'
... though this will also match lines that have no colon. If you only want lines with colons, and you must use only grep, this might work:
echo "abc-1234a :" | grep : | egrep -o '^[^:]+'
Of course, this only makes sense if your echo "abc-1234a :" is an example that would be replace with possibly multiple lines of input.
The smallest tool you could use is probably cut:
echo "abc-1234a :" | cut -d: -f1
And sed is always available...
echo "abc-1234a :" | sed 's/ *:.*//'
For this last one, if you only want to print lines that include a colon, change it to:
echo "abc-1234a :" | sed -ne 's/ *:.*//p'
Heck, you could even do this in pure bash:
while read line; do
field="${line%%:*}"
# do stuff with $field
done <<<"abc-1234a :"
For information on the %% bit, you can man bash and search for "Parameter Expansion".
UPDATE:
You said:
It's the characters in the first line of input before the colon. The
input could have multiple line though.
The solutions with grep probably aren't your best choice, then, since they'll also print data from subsequent lines that might include colons. Of course, there are many ways to solve this requirement as well. We'll start with sample input:
$ function sample { printf "abc-1234a:foo\nbar baz:\nNarf\n"; }
$ sample
abc-1234a:foo
bar baz:
Narf
You could use multiple pipes, for example:
$ sample | head -1 | grep -Eo '^[^:]*'
abc-1234a
$ sample | head -1 | cut -d: -f1
abc-1234a
Or you could use sed to process only the first line:
$ sample | sed -ne '1s/:.*//p'
abc-1234a
Or tell sed to exit after printing the first line (which is faster than reading the whole file):
$ sample | sed 's/:.*//;q'
abc-1234a
Or do the same thing but only show output if a colon was found (for safety):
$ sample | sed -ne 's/:.*//p;q'
abc-1234a
Or have awk do the same thing (as the last 3 examples, respectively):
$ sample | awk '{sub(/:.*/,"")} NR==1'
abc-1234a
$ sample | awk 'NR>1{nextfile} {sub(/:.*/,"")} 1'
abc-1234a
$ sample | awk 'NR>1{nextfile} sub(/:.*/,"")'
abc-1234a
Or in bash, with no pipes at all:
$ read line < <(sample)
$ printf '%s\n' "${line%%:*}"
abc-1234a

It is possible to do what you want with only sed.
Here is an example:
#!/bin/sh
filename=$1
pattern=yourpattern
# flag -n disables print everyline (default behavior)
sed -n "
1,/$pattern/ {
/$pattern/n # skip line containing pattern
p # print lines ranging from line 1 untill pattern
}
" $filename
exit 0
This works at least for GNU's sed. It should work for other sed too, except
regarding the comments (some implementations of sed don't support comments).
Source: https://www.grymoire.com/Unix/Sed.html

store awk output in variable

I ignore what is the problem with this code ?
#! /bin/bash
File1=$1
for (( j=1; j<=3; j++ ))
{
output=$(`awk -F; 'NR=='$j'{print $3}' "${File1}"`)
echo ${output}
}
File1 looks like this :
Char1;2;3;89
char2;9;6;66
char5;3;77;8
I want to extract on every line looped the field 3
so the result will be
3
6
7

It should be like this:
#! /bin/bash
File1=$1
for (( j=1; j<=3; j++ ))
{
output=$(awk -F ';' 'NR=='$j' {print $3}' "${File1}")
echo ${output}
}
It working well on my CentOS.

You are mixing single quotes and backticks all over the place and not escaping them
You can't use bash variables in an awk script without using the -v flag
awk already works in a loop so there is no reason to loop the loop...
Just:
awk -F";" '{print $3}' "${file1}"
Will do exactly what your entire script is trying to do now.

Even easier, use the cut utility : cut -d';' -f3 will produce the result you're looking for, where -d specifies the delimiter to use and -f the field/column you're looking for (1-indexed).

If you simply want to extract a column out from a structured file like the one you have, use the cut utility.
cut will allow you to specify what the delimiter is in your data (;) and what column(s) you'd like to extract (column 3).
cut -d';' -f3 "$file1"
If you would like to loop over the result of this, use a while loop and read the values one by one:
cut -d';' -f3 "$file1" |
while read data; do
echo "data is $data"
done
Would you want the values in a variable, do this
var=$( cut -d';' -f3 "$file1" | tr '\n' ' ' )
The tr '\n' ' ' bit replaces newlines with spaces, so you would get 3 6 77 as a string.
To get them into an array:
declare -a var=( $( cut -d';' -f3 "$file1" ) )
(the tr is not needed here)
You may then access the values as ${var[0]}, ${var[1]} etc.

bash, extract string from text file with space delimiter

I have a text files with a line like this in them:
MC exp. sig-250-0 events & $0.98 \pm 0.15$ & $3.57 \pm 0.23$ \\
sig-250-0 is something that can change from file to file (but I always know what it is for each file). There are lines before and above this, but the string "MC exp. sig-250-0 events" is unique in the file.
For a particular file, is there a good way to extract the second number 3.57 in the above example using bash?

use awk for this:
awk '/MC exp. sig-250-0/ {print $10}' your.txt
Note that this will print: $3.57 - with the leading $, if you don't like this, pipe the output to tr:
awk '/MC exp. sig-250-0/ {print $10}' your.txt | tr -d '$'
In comments you wrote that you need to call it in a script like this:
while read p ; do
echo $p,awk '/MC exp. sig-$p/ {print $10}' filename | tr -d '$'
done < grid.txt
Note that you need a sub shell $() for the awk pipe. Like this:
echo "$p",$(awk '/MC exp. sig-$p/ {print $10}' filename | tr -d '$')
If you want to pass a shell variable to the awk pattern use the following syntax:
awk -v p="MC exp. sig-$p" '/p/ {print $10}' a.txt | tr -d '$'

More lines would've been nice but I guess you would like to have a simple use awk.
awk '{print $N}' $file
If you don't tell awk what kind of field-separator it has to use it will use just a space ' '. Now you just have to count how many fields you have got to get your field you want to get. In your case it would be 10.
awk '{print $10}' file.txt
$3.57
Don't want the $?
Pipe your awk result to cut:
awk '{print $10}' foo | cut -d $ -f2
-d will use the $ als field-separator and -f will select the second field.

If you know you always have the same number of fields, then
#!/bin/bash
file=$1
key=$2
while read -ra f; do
if [[ "${f[0]} ${f[1]} ${f[2]} ${f[3]}" == "MC exp. $key events" ]]; then
echo ${f[9]}
fi
done < "$file"

linux bash, print line contaning string in a column

let's say my file /etc/passwd contains
ntp:x:38:40::/etc/ntp:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
haldaemon:x:38:68:HAL daemon:/:/sbin/nologin
pulse:x:497:495:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin
gdm:x:42:38::/var/lib/gdm:/sbin/nologin
sshd:x:388:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
tcpdump:x:38:72::/:/sbin/nologin
what i'm trying to do is print the line containing a "38" in the third column, something which will print this:
ntp:x:38:40::/etc/ntp:/sbin/nologin haldaemon:x:38:68:HAL
daemon:/:/sbin/nologin gdm:x:42:38::/var/lib/gdm:/sbin/nologin
tcpdump:x:38:72::/:/sbin/nologin
I tried something like
cat "/etc/passwd" | cut -d ":" -f3 | grep "38"
but it only show the "38" not the entire line
Thanks

you may test this:
awk -F: '$3~/38/' /etc/passwd
note that 3rd column with 338 or 838 will be printed as well.

You could use grep
grep ^.*:.*:38: /etc/passwd
Improved version after tripleee's comment:
egrep ^[^:]*:[^:]*:38: /etc/passwd

You can use wk:
awk -F: '$3==38{print}' file
In general, I would suggest you avoid parsing /etc/passwd directly. Instead you can use getent passwdto read the passwd database.

You can do this:
cat /etc/passwd | egrep "^[[:alnum:]]*:[[:alnum:]]*:38:.*"
Using the alphanumeric character class.

In pure bash (awk is the way to go though!):
$ while read line; do array=(${line//:/ }); [ ${array[2]} -eq 38 ] && echo $line; done < input
ntp:x:38:40::/etc/ntp:/sbin/nologin
haldaemon:x:38:68:HAL daemon:/:/sbin/nologin

only sed was remaining :)
sed -n '/^[^:]*:[^:]:*38:/p' /etc/passwd

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Shell script tokenizer - linux

I put your input data into file d.txt. The one-liner below extracts the numbers, comma-delimits them and assigns all that to variable TAB (tested with Korn shell): $ TAB=$(awk -F= '{print $2}' d.txt | xargs echo | sed 's/ /,/g') $ echo $TAB 7,98,10,91,10,0,10

Or just use cut/tr: F=($(cut -d'=' -f2 input | tr '\n' ' '))

You can do it with one sed command too: sed -n 's/^.=\(.\)/\1,/;H;${g;s/\n//g;s/,$//;p;}' file 7,98,10,91,10,0,10

A simple cut without any pipes : arr=( $(cut -d'=' -f2 file) ) Outut printf '%s\n' "${arr[#]}" 7 98 10 91 10 0 10

Related

String split and extract the last field in bash

Linux: Extract string from a line including delimiter character using sed command [duplicate]

store awk output in variable

bash, extract string from text file with space delimiter

linux bash, print line contaning string in a column

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Shell script tokenizer - linux

I put your input data into file d.txt. The one-liner below extracts the numbers, comma-delimits them and assigns all that to variable TAB (tested with Korn shell): $ TAB=$(awk -F= '{print $2}' d.txt | xargs echo | sed 's/ /,/g') $ echo $TAB 7,98,10,91,10,0,10

Or just use cut/tr: F=($(cut -d'=' -f2 input | tr '\n' ' '))

You can do it with one sed command too: sed -n 's/^.*=\(.*\)/\1,/;H;${g;s/\n//g;s/,$//;p;}' file 7,98,10,91,10,0,10

A simple cut without any pipes : arr=( $(cut -d'=' -f2 file) ) Outut printf '%s\n' "${arr[#]}" 7 98 10 91 10 0 10

Related

String split and extract the last field in bash

Linux: Extract string from a line including delimiter character using sed command [duplicate]

store awk output in variable

bash, extract string from text file with space delimiter

linux bash, print line contaning string in a column

Categories

Resources

You can do it with one sed command too: sed -n 's/^.=\(.\)/\1,/;H;${g;s/\n//g;s/,$//;p;}' file 7,98,10,91,10,0,10