bash: Find irregular values in strings - string

I'm looking to fetch a value after a match in a string.
Lets say I have two string:
string1="Name: John Doe Age: 28 City: Oklahoma City"
string2="Name: Jane Age: 29 Years City: Boston"
Now I want to set three parameters: Name, Age and City.
If I were to do:
name=$(echo "$string1" | awk '{ print $2 $3 }')
city=$(echo "$string1" | awk '{ print $5 }')
city=$(echo "$string1" | awk '{ print $8 $9 }
It would work for string1, but obviously not for string2.
After some googling I believe I should put it in some kind of array, but I do not really know how to proceed.
Basically, I want everything after Name: and before Age: to be parameter $name. Everything between Age: and City: to be $age, and so on.
Best regards

Needs bash version 3 or higher:
if [[ $string1 =~ ^Name:\ (.*)\ Age:\ (.*)\ City:\ (.*) ]] ; then
name=${BASH_REMATCH[1]}
age=${BASH_REMATCH[2]}
city=${BASH_REMATCH[3]}
fi
You might need Age:\ ([0-9]*).*\ City: if you do not want "Years" to be included in $years.

awk is the best solution for this because you can set the field separator to a regex and then your fields are $2, $3 and $4
name=$(awk -F'[[:alpha:]]+: ' '{print $2}' <<<"$string1")
age=$(awk -F'[[:alpha:]]+: ' '{print $3}' <<<"$string1")
city=$(awk -F'[[:alpha:]]+: ' '{print $4}' <<<"$string1")

Perl solution (taken partly from my answer here):
Capture name:
name=`perl -ne 'print $1 if /Name: ([a-zA-Z ]+) Age:/' <<< $string`
Capture age:
age=`perl -ne 'print $1 if /Age: ([0-9a-zA-Z ]+) City:/' <<< $string`
-ne tells perl to loop the specified one-liner over the input file or standard input without printing anything by default (you could call it awk emulation mode).
The parens in the regexes specify the bits you're interested in capturing. The other fragments acts as delimiters.
After running both of these through $string1 of your example I get 'John Doe' and '28'.
Edit: replaced echo $string with <<< $string, which is nice.

Something like this might work:
string1="Name: John Doe Age: 28 City: Oklahoma City"
string1ByRow=$(echo "$string1" | perl -pe 's/(\w+:)/\n$1\n/g' | sed '/^$/d' | sed 's/^ *//')
string1Keys=$(echo "$string1ByRow" | grep ':$' | sed 's/:$//')
string1Vals=$(echo "$string1ByRow" | grep -v ':$')
echo "$string1Keys"
Name
Age
City
echo "$string1Vals"
John Doe
28
Oklahoma City

Consider these commands:
name=$(awk -F": |Age" '{print $2}' <<< $string1)
age=$(awk -F": |City|Years" '{print $3}' <<< $string1)
city=$(awk -F"City: " '{print $2}' <<< $string1)

You can use three perl one-liners for assigning value to your variables -
name=$(perl -pe 's/.*(?<=Name: )([A-Za-z ]+)(?=Age).*/\1/' file)
age=$(perl -pe 's/.*(?<=Age: )([A-Za-z0-9 ]+)(?=City).*/\1/' file)
OR
age=$(perl -pe 's/.*(?<=Age: )([0-9 ]+)(?=Years|City).*/\1/' file)
city=$(perl -pe 's/.*(?<=City: )([A-Za-z ]+)"/\1/' file)
Test File:
[jaypal:~/Temp] cat file
string1="Name: John Doe Age: 28 City: Oklahoma City"
string2="Name: Jane Age: 29 Years City: Boston"
Name:
[jaypal:~/Temp] perl -pe 's/.*(?<=Name: )([A-Za-z ]+)(?=Age).*/\1/' file
John Doe
Jane
Age:
[jaypal:~/Temp] perl -pe 's/.*(?<=Age: )([A-Za-z0-9 ]+)(?=City).*/\1/' file
28
29 Years
OR
if you just want the age and not years then
[jaypal:~/Temp] perl -pe 's/.*(?<=Age: )([0-9 ]+)(?=Years|City).*/\1/' file
28
29
City:
[jaypal:~/Temp] perl -pe 's/.*(?<=City: )([A-Za-z ]+)"/\1/' file
Oklahoma City
Boston

I propose a generic solution:
keys=() values=()
for word in $string; do
wlen=${#word}
if [[ ${word:wlen-1:wlen} = : ]]; then
keys+=("${word:0:wlen-1}") values+=("")
else
alen=${#values[#]}
values[alen-1]=${values[alen-1]:+${values[alen-1]} }$word
fi
done

bash-3.2$ cat sample.log
string1="Name: John Doe Age: 28 City: Oklahoma City"
string2="Name: Jane Age: 29 Years City: Boston"
Using awk match inbuilt function:
awk ' { match($0,/Name:([A-Za-z ]*)Age:/,a); match($0,/Age:([ 0-9]*)/,b); match($0,/City:([A-Za-z ]*)/,c); print a[1]":" b[1]":"c[1] } ' sample.log
Output:
John Doe : 28 : Oklahoma City
Jane : 29 : Boston

Related

Linux: concatenate two substrings to a new line from infile read with cat

I have an infile with, let's say:
01;Masters;Robin;Atlanta;38
02;Jarau;Jennifer;Washington;29
03;Clavell;James;New York;78
...
I want to create an output which looks like this:
Robin Masters, 38
Jennifer Jarau, 29
James Clavell, 78
But I will NOT use the 'while read in; do ...; done' loop, because read is very, very slow for bigger files.
I would love to have a solution with 'cat', like this:
cat infile | echo $3" "$2", "$4 >> staff.list
(I have read that $0, $1, $2 are input parameters...)
Is there a solution with cat, maybe in combination with awk or cut?
Thank you in advance,
-Linuxfluesterer
$ awk '{printf "%s %s, %s\n", $3, $2, $5}' FS=\; infile
Robin Masters, 38
Jennifer Jarau, 29
James Clavell, 78
or
$ awk '{print $3, $2 ",", $5}' FS=\; infile
Robin Masters, 38
Jennifer Jarau, 29
James Clavell, 78
It's not as pretty with sed, but you can also do:
$ v='\([^;]*\)'
$ sed -e "s/$v;$v;$v;$v;$v/\3 \2, \5/" infile
Robin Masters, 38
Jennifer Jarau, 29
James Clavell, 78
There's certainly no need for cat.

using sed to change numbers in a csv file

I have a csv file with 3 columns like below
Jones Smith 656220665
I would like to convert it to
Jones Smith 000000000
The problem i have is not all the numbers are the same length. some are 7 digits long. i can't seem to find a way to change them from their current format to 0,s and has to use sed and cut
Here is 2 of the codes i tried and tried to manipulate to suit my needs
sed 's/\([^ ]*\) \([^_]*\)_\(.*\)/\1 \3/g' Input_file
and
$ sed 's/\(\([^,]\+,\)\{1\}\)\([^,]\+,\)\(.*\)/\1\3\3\4/' /path/to/your/file
Instead of using sed, how about this:
echo 'Jones Smith 656220665' | tr '[0-9]' '0'
Jones Smith 000000000
For the whole file that's then:
tr '[0-9]' '0' < file > file.tmp
Edit 1:
added a sed solution:
sed 's/[0-9]/0/g' smith
Jones Smith 000000000
stil
Edit 2:
cat ClientData.csv > ClientData.csv.bak
sed 's/[0-9]/0/g; w ClientData.csv' ClientData.csv.bak | cut -d" " -f 1-2
You can do this very simply with sed general substitution of:
sed 's/[0-9]/0/g`
(where the 'g' provides a global replacement of all instances of [0-9] with '0'), e.g.
$ echo "Jones Smith 656220665" | sed 's/[0-9]/0/g'
Jones Smith 000000000
Give it a shot and let me know if you have further issues.

Print out only last 4 digits of mac addresses from 2nd column using awk in linux

I have made a shell script for getting the list of mac address using awk and arp-scan command. I want to strip the mac address to only last 4 digits i.e (i want to print only the letters yy)
ac:1e:04:0e:yy:yy
ax:8d:5c:27:yy:yy
ax:ee:fb:55:yy:yy
dx:37:42:c9:yy:yy
cx:bf:9c:a4:yy:yy
Try cut -d: -f5-
(Options meaning: delimiter : and fields 5 and up.)
EDIT: Or in awk, as you requested:
awk -F: '{ print $5 ":" $6 }'
here are a few
line=cx:bf:9c:a4:yy:yy
echo ${line:(-5)}
line=cx:bf:9c:a4:yy:yy
echo $line | cut -d":" -f5-
I imagine you want to strip the trailing spaces, but it isn't clear whether you want yy:yy or yyyy.
Anyhow, there are multiple ways to it but you already are running AWK and have the MAC in $2.
In the first case it would be:
awk '{match($2,/([^:]{2}:[^:]{2}) *$/,m); print m[0]}'
yy:yy
In the second (no colon :):
awk 'match($2,/([^:]{2}):([^:]{2}) *$/,m); print m[1]m[2]}'
yyyy
In case you don't have match available in your AWK, you'd need to resort to gensub.
awk '{print gensub(/.*([^:]{2}:[^:]{2}) *$/,"\\1","g",$2)}'
yy:yy
or:
awk '{print gensub(/.*([^:]{2}):([^:]{2}) *$/,"\\1\\2","g",$0)}'
yyyy
Edit:
I now realized the trailing spaces were added by anubhava in his edit; they were not present in the original question! You can then simply keep the last n characters:
awk '{print substr($2,13,5)}'
yy:yy
or:
awk '{print substr($2,13,2)substr($2,16,2)}'
yyyy
Taking into account that the mac address always is 6 octets, you probably could just do something like this to get the last 2 octets:
awk '{print substr($0,13)}' input.txt
While testing on the fly by using arp -an I notice that the output was not always printing the mac addresses in some cases it was returning something like:
(169.254.113.54) at (incomplete) on en4 [ethernet]
Therefore probably is better to filter the input to guarantee a mac address, this can be done by applying this regex:
^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$
Applying the regex in awk and only printing the 2 last octecs:
arp -an | awk '{if ($4 ~ /^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$/) print substr($4,13)}'
This will filter the column $4 and verify that is a valid MAC address, then it uses substr to just return the last "letters"
You could also split by : and print the output in multiple ways, for example:
awk '{if ($4 ~ /^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$/) split($4,a,":"); print a[5] ":" a[6]}
Notice the exp ~ /regexp/
This is true if the expression exp (taken as a string) is matched by regexp.
The following example matches, or selects, all input records with the upper-case letter `J' somewhere in the first field:
$ awk '$1 ~ /J/' inventory-shipped
-| Jan 13 25 15 115
-| Jun 31 42 75 492
-| Jul 24 34 67 436
-| Jan 21 36 64 620
So does this:
awk '{ if ($1 ~ /J/) print }' inventory-shipped

How to get the first number after a string in bash

For example, if I have those line in a file called input.txt
name: Tom
age: 12
name: Bob
age: 13
name: Jim
age: 14
name: Joe
age:15
I want the first number after Jim, which is 14. Thanks :)
There are multiple solutions to this, using tools including sed, awk, cut, etc. However I prefer perl.
perl -0777 -nle 'print "$1\n" if /^name:\s*Jim\s*\nage:\s*([\d+.]+)/gm' file.txt
Explanation:
([\d.]+) matches any number after the age: on the next line after Jim.
-0777 is Perl Slurp mode used in combination with /m for multiline matching.
A solution using grep:
cat file.txt | grep -A2 'Jim' | grep -Eo '[0-9]*'

How do I parse out a text file with AWK and fprint in BASH?

I have a sample.txt file as follows:
Name City ST Zip CTY
John Smith BrooklynNY10050USA
Paul DavidsonQueens NY10040USA
Michael SmithNY NY10030USA
George HermanBronx NY10020USA
Image of input (in case if upload doesn't show properly)
Input
Desired output is into separate columns as shown below:
Desired Output
I tried this:
#!/bin/bash
awk '{printf "%13-s %-8s %-2s %-5s %-3s\n", $1, $2, $3, $4, $5}' sample.txt > new.txt
And it's unsuccessful with this result:
Name City ST Zip CTY
John Smith BrooklynNY10050USA
Paul DavidsonQueens NY10040USA
Michael SmithNY NY10030USA
George HermanBronx NY10020USA
Would appreciate it if anyone could tweak this so the text file will be in delimited format as shown above. Thank you so much!!
You can use sed to insert spaces to specific positions:
cat data.txt | sed -e 's#\(.\{13\}\)\(.*\)#\1 \2#g' | sed -e 's#\(.\{22\}\)\(.*\)#\1 \2#g' |sed -e '1s#\(.\{29\}\)\(.*\)#\1 \2#g' | sed -e '2,$s#\(.\{25\}\)\(.*\)#\1 \2#g' | sed -e 's#\(.\{31\}\)\(.*\)#\1 \2#g'
With gawk you can set the input field widths in the BEGIN block:
$ gawk 'BEGIN { FIELDWIDTHS = "13 8 2 5 3" } { print $1, $2, $3, $4, $5 }' fw.txt
Name City ST Zip CTY
John Smith Brooklyn NY 10050 USA
Paul Davidson Queens NY 10040 USA
Michael Smith NY NY 10030 USA
George Herman Bronx NY 10020 USA
If your awk does not have FIELDWIDTHS, it's a bit tedious but you can use substr:
$ awk '{ print substr($0,1,13), substr($0,14,8), substr($0,22,2), substr($0,24,5), substr($0,29,3) }' fw.txt
Name City ST Zip CTY
John Smith Brooklyn NY 10050 USA
Paul Davidson Queens NY 10040 USA
Michael Smith NY NY 10030 USA
George Herman Bronx NY 10020 USA
You can split the field lengths into an array then loop over $0 and gather the substrings in regular awk:
awk 'BEGIN {n=split("13 8 2 5 3",ar)}
{
j=1
s=""
sep="\t"
for(i=1;i<n;i++)
{s=s substr($0, j, ar[i]) sep; j+=ar[i]}
s=s substr($0, j, ar[i])
print s
}' file
That uses a tab to delimit the fields, but you can also use a space if preferred.

Resources