How to get the first number after a string in bash - linux

For example, if I have those line in a file called input.txt
name: Tom
age: 12
name: Bob
age: 13
name: Jim
age: 14
name: Joe
age:15
I want the first number after Jim, which is 14. Thanks :)

There are multiple solutions to this, using tools including sed, awk, cut, etc. However I prefer perl.
perl -0777 -nle 'print "$1\n" if /^name:\s*Jim\s*\nage:\s*([\d+.]+)/gm' file.txt
Explanation:
([\d.]+) matches any number after the age: on the next line after Jim.
-0777 is Perl Slurp mode used in combination with /m for multiline matching.

A solution using grep:
cat file.txt | grep -A2 'Jim' | grep -Eo '[0-9]*'

Related

using sed to change numbers in a csv file

I have a csv file with 3 columns like below
Jones Smith 656220665
I would like to convert it to
Jones Smith 000000000
The problem i have is not all the numbers are the same length. some are 7 digits long. i can't seem to find a way to change them from their current format to 0,s and has to use sed and cut
Here is 2 of the codes i tried and tried to manipulate to suit my needs
sed 's/\([^ ]*\) \([^_]*\)_\(.*\)/\1 \3/g' Input_file
and
$ sed 's/\(\([^,]\+,\)\{1\}\)\([^,]\+,\)\(.*\)/\1\3\3\4/' /path/to/your/file
Instead of using sed, how about this:
echo 'Jones Smith 656220665' | tr '[0-9]' '0'
Jones Smith 000000000
For the whole file that's then:
tr '[0-9]' '0' < file > file.tmp
Edit 1:
added a sed solution:
sed 's/[0-9]/0/g' smith
Jones Smith 000000000
stil
Edit 2:
cat ClientData.csv > ClientData.csv.bak
sed 's/[0-9]/0/g; w ClientData.csv' ClientData.csv.bak | cut -d" " -f 1-2
You can do this very simply with sed general substitution of:
sed 's/[0-9]/0/g`
(where the 'g' provides a global replacement of all instances of [0-9] with '0'), e.g.
$ echo "Jones Smith 656220665" | sed 's/[0-9]/0/g'
Jones Smith 000000000
Give it a shot and let me know if you have further issues.

grep shows occurrences of pattern on a per line basis

From the input file:
I am Peter
I am Mary
I am Peter Peter Peter
I am Peter Peter
I want output to be like this:
1 I am Peter
3 I am Peter Peter Peter
2 I am Peter Peter
Where 1, 3 and 2 are occurrences of "Peter".
I tried this, but the info is not formatted the way I wanted:
grep -o -n Peter inputfile
This is not easily solved with grep, I would suggest moving "two tools up" to awk:
awk '$0 ~ FS { print NF-1, $0 }' FS="Peter" inputfile
Output:
1 I am Peter
3 I am Peter Peter Peter
2 I am Peter Peter
###Edit
To answer a question in the comments:
What if I want case insensitive? and what if I want multiple pattern
like "Peter|Mary|Paul", so "I am Peter peter pAul Mary marY John",
will yield the count of 5?
If you are using GNU awk, you do it by enabling IGNORECASE and setting the pattern in FS like this:
awk '$0 ~ FS { print NF-1, $0 }' IGNORECASE=1 FS="Peter|Mary|Paul" inputfile
Output:
1 I am Peter
1 I am Mary
3 I am Peter Peter Peter
2 I am Peter Peter
5 I am Peter peter pAul Mary marY John
You don’t need -o or -n. From grep --help:
-o, --only-matching show only the part of a line matching PATTERN
...
-n, --line-number print line number with output lines
Remove them and your output will be better. I think you’re misinterpreting -n -- it just shows the line number, not the occurrence count.
It looks like you’re trying to get the count of “Peter” appearances per line. You’d need something beyond a single grep for that. awk could be a good choice. Or you could loop over each each line to split into words (say an array) and grep -c the array for each line, to print the line’s count.

replace a word in a string if there is a given string using sed

Consider the following strings:
function 12345 filename.pdf 6789 12
function 12345 filename.doc 7789 4567
Is there a way to search the strings using sed to see if they contain pdf or doc substrings, and replace the strings to the following?
function_pdf 12345 filename.pdf 6789 12
function_doc 12345 filename.doc 7789 4567
You really have not specified the problem adequately, but perhaps you are looking for:
sed -e '/\.pdf/s/function/function_pdf/g' -e /\.doc/s/function/function_doc/g'
Through sed,
$ sed 's/^\([^[:space:]]\+\)\( [^[:space:]]\+ [^[:space:]]\+\.\)\(pdf\|doc\)/\1_\3\2\3/g' file
function_pdf 12345 filename.pdf 6789 12
function_doc 12345 filename.doc 7789 4567
Using sed :
~$ cat i.txt
function 12345 filename.pdf 6789
function 12345 filename.doc 7789
function 12345 filename.txt 8888
~$ sed -e 's/\(function\) \(.*\)\(pdf\|doc\)\(.*\)/\1_\3 \2\3\4/' i.txt
function_pdf 12345 filename.pdf 6789
function_doc 12345 filename.doc 7789
function 12345 filename.txt 8888
Capture the extension with the regexp you want, then insert it where you want using \x notation.
From man sed:
the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
With awk:
awk '$1=="function" && ($3 ~ /\.(pdf|doc)$/) {$1=$1 "_" substr($3,length($3)-2)}7'
sed 's/\( .*\.\)\([^ ]*\)\(.*\)/_\2&/' YourFile
the simpliest sed i found for this (sed seems very efficient for this)

How to extract the integer or decimal at beginning of each input line, using Linux/Unix utilities?

Given input such as:
1
1a
1.1b
2.0c
How to extract the integer/decimal number at beginning of each input line, using only Linux/Unix command line utilities?
Using awk, you could say:
awk '{print $0+0}'
Awk is available in Linux, BSD, and many other Unix-like operating systems. It helps in this way:
echo "1" | awk '{a+=$0; print a}' # output 1
echo "1a" | awk '{a+=$0; print a}' # output 1
echo "1.1b" | awk '{a+=$0; print a}' # output 1.1
echo "2.0c" | awk '{a+=$0; print a}' # output 2
Some more awk
For extracting only digits
$ awk 'gsub(/[[:alpha:]].*/,x,$1) + 1' << EOF
1
1a
1.1b
2.0c
EOF
1
1
1.1
2.0
For integer
$ awk '{print int($0)}' << EOF
1
1a
1.1b
2.0c
EOF
1
1
1
2
---edit---
If there is any blank line in file, you can avoid printing zero from following
$ awk 'NF{$0+=0}1' << EOF
1
1a
1.1b
2foot4c
2
EOF
1
1
1.1
2
2
Here is a way to do this with sed:
echo "12.3abc" | sed -n 's/^\([0-9.][0-9.]*\).*/\1/p'
Output:
12.3
The block in parentheses matches all numbers or periods '.' that occur at the beginning of the line. Everything after that is match by the '.*'.
The \1 says to replace the entire line with just the portion that was matched in the parentheses.
Assuming your version of grep supports -o:
grep -o '^[0-9.]\+' data.in
NB: This will match any sequence of digits and decimal points at the start of the line.

bash: Find irregular values in strings

I'm looking to fetch a value after a match in a string.
Lets say I have two string:
string1="Name: John Doe Age: 28 City: Oklahoma City"
string2="Name: Jane Age: 29 Years City: Boston"
Now I want to set three parameters: Name, Age and City.
If I were to do:
name=$(echo "$string1" | awk '{ print $2 $3 }')
city=$(echo "$string1" | awk '{ print $5 }')
city=$(echo "$string1" | awk '{ print $8 $9 }
It would work for string1, but obviously not for string2.
After some googling I believe I should put it in some kind of array, but I do not really know how to proceed.
Basically, I want everything after Name: and before Age: to be parameter $name. Everything between Age: and City: to be $age, and so on.
Best regards
Needs bash version 3 or higher:
if [[ $string1 =~ ^Name:\ (.*)\ Age:\ (.*)\ City:\ (.*) ]] ; then
name=${BASH_REMATCH[1]}
age=${BASH_REMATCH[2]}
city=${BASH_REMATCH[3]}
fi
You might need Age:\ ([0-9]*).*\ City: if you do not want "Years" to be included in $years.
awk is the best solution for this because you can set the field separator to a regex and then your fields are $2, $3 and $4
name=$(awk -F'[[:alpha:]]+: ' '{print $2}' <<<"$string1")
age=$(awk -F'[[:alpha:]]+: ' '{print $3}' <<<"$string1")
city=$(awk -F'[[:alpha:]]+: ' '{print $4}' <<<"$string1")
Perl solution (taken partly from my answer here):
Capture name:
name=`perl -ne 'print $1 if /Name: ([a-zA-Z ]+) Age:/' <<< $string`
Capture age:
age=`perl -ne 'print $1 if /Age: ([0-9a-zA-Z ]+) City:/' <<< $string`
-ne tells perl to loop the specified one-liner over the input file or standard input without printing anything by default (you could call it awk emulation mode).
The parens in the regexes specify the bits you're interested in capturing. The other fragments acts as delimiters.
After running both of these through $string1 of your example I get 'John Doe' and '28'.
Edit: replaced echo $string with <<< $string, which is nice.
Something like this might work:
string1="Name: John Doe Age: 28 City: Oklahoma City"
string1ByRow=$(echo "$string1" | perl -pe 's/(\w+:)/\n$1\n/g' | sed '/^$/d' | sed 's/^ *//')
string1Keys=$(echo "$string1ByRow" | grep ':$' | sed 's/:$//')
string1Vals=$(echo "$string1ByRow" | grep -v ':$')
echo "$string1Keys"
Name
Age
City
echo "$string1Vals"
John Doe
28
Oklahoma City
Consider these commands:
name=$(awk -F": |Age" '{print $2}' <<< $string1)
age=$(awk -F": |City|Years" '{print $3}' <<< $string1)
city=$(awk -F"City: " '{print $2}' <<< $string1)
You can use three perl one-liners for assigning value to your variables -
name=$(perl -pe 's/.*(?<=Name: )([A-Za-z ]+)(?=Age).*/\1/' file)
age=$(perl -pe 's/.*(?<=Age: )([A-Za-z0-9 ]+)(?=City).*/\1/' file)
OR
age=$(perl -pe 's/.*(?<=Age: )([0-9 ]+)(?=Years|City).*/\1/' file)
city=$(perl -pe 's/.*(?<=City: )([A-Za-z ]+)"/\1/' file)
Test File:
[jaypal:~/Temp] cat file
string1="Name: John Doe Age: 28 City: Oklahoma City"
string2="Name: Jane Age: 29 Years City: Boston"
Name:
[jaypal:~/Temp] perl -pe 's/.*(?<=Name: )([A-Za-z ]+)(?=Age).*/\1/' file
John Doe
Jane
Age:
[jaypal:~/Temp] perl -pe 's/.*(?<=Age: )([A-Za-z0-9 ]+)(?=City).*/\1/' file
28
29 Years
OR
if you just want the age and not years then
[jaypal:~/Temp] perl -pe 's/.*(?<=Age: )([0-9 ]+)(?=Years|City).*/\1/' file
28
29
City:
[jaypal:~/Temp] perl -pe 's/.*(?<=City: )([A-Za-z ]+)"/\1/' file
Oklahoma City
Boston
I propose a generic solution:
keys=() values=()
for word in $string; do
wlen=${#word}
if [[ ${word:wlen-1:wlen} = : ]]; then
keys+=("${word:0:wlen-1}") values+=("")
else
alen=${#values[#]}
values[alen-1]=${values[alen-1]:+${values[alen-1]} }$word
fi
done
bash-3.2$ cat sample.log
string1="Name: John Doe Age: 28 City: Oklahoma City"
string2="Name: Jane Age: 29 Years City: Boston"
Using awk match inbuilt function:
awk ' { match($0,/Name:([A-Za-z ]*)Age:/,a); match($0,/Age:([ 0-9]*)/,b); match($0,/City:([A-Za-z ]*)/,c); print a[1]":" b[1]":"c[1] } ' sample.log
Output:
John Doe : 28 : Oklahoma City
Jane : 29 : Boston

Resources