Split a string with two patterns - linux

I have a string id=12345&data=23456
I want to print 12345 23456
Currently I just know how to split one of them separately by awk
echo id=12345&data=23456 | awk -F"id=" '{print substr($2,1,5)}'
and it's similar for data.
How can I combine those awk command to get the desired result?

regex groups can be one solution but awk can't handle regex groups but, gawk can.
Example
echo "id=12345&data=23456" | gawk 'match($0, /^id=([^&]*)&data=(.*)$/, groups) {print groups[1] " " groups[2]}'
Output
12345 23456

There's no need for external processes. You can use the builtin read to extract the two numbers:
$ IFS="=&" read _ num1 _ num2 <<< "id=12345&data=23456"
$ printf "%s\n" "$num1" "$num2"
12345
23456

With awk:
echo "id=12345&data=23456" | awk -F[\&=] '{ print $2,$4}'
With grep and tr:
echo "id=12345&data=23456" | grep -o '[0-9]\+' | tr '\n' ' '
Note: This above command will add one more space at the end.

Looks like a query string to me...I would suggest parsing it as such. For example, using PHP:
echo "id=12345&data=23456" | php -r 'parse_str(fgets(STDIN), $query); print_r($query);'
This gives the output:
Array
(
[id] => 12345
[data] => 23456
)
So to get the output you were looking for, you could go for:
$ echo "id=12345&data=23456" | php -r 'parse_str(fgets(STDIN), $query); echo $query["id"] . " " . $query["data"];'
12345 23456
For a quick and dirty alternative, you could use sed:
$ echo "id=12345&data=23456" | sed -r 's/id=([^&]+)&data=([^&]+)/\1 \2/'
12345 23456
This captures the part following id= up to the & and the part following &data= up to the next & (if there is one). The disadvantage of this approach is that it breaks if the two parts of the query string are in the opposite order but it might be good enough for your use case.

Alternative Code
echo "id=12345&data=23456" | tr -s '&' '\n' | cut -d '=' -f 2 | tr -s '\n' ''

Depending what you're going to do with the output, one of these may be all you need:
$ echo 'id=12345&data=23456' | tr -c -s '[0-9]' ' '
12345 23456 $
$ echo 'id=12345&data=23456' | tr -s '[a-z=&]' ' '
12345 23456
$

Related

How can I add a new line at the end of the output? (Linux help)

i am using this code
cut -c1 | tr -d '\n'
to basically take and print out the first letter of every line. the problem is, I need a new line at the end, but only at the end, after the word "caroline" (these are the content of the testfile
Cannot use AWK, basename, grep, egrep, fgrep or rgrep
Use echo
echo $( cut -c1 | tr -d '\n' ) \n
cut -c1 | tr -d '\n'; echo -e '\n'
Try using awk utility, something like following:-
awk -F\| '$1 > 0 { print substr($1,1,1)}' testfile.txt

Extract last digits from each word in a string with multiple words using bash

Given a string with multiple words like below, all in one line:
first-second-third-201805241346 first-second-third-201805241348 first-second-third-201805241548 first-second-third-201705241540
I am trying to the maximum number from the string, in this case the answer should be 201805241548
I have tried using awk and grep, but I am only getting the answer as last word in the string.
I am interested in how to get this accomplished.
echo 'first-second-third-201805241346 first-second-third-201805241348 first-second-third-201805241548 first-second-third-201705241540' |\
grep -o '[0-9]\+' | sort -n | tail -1
The relevant part is grep -o '[0-9]\+' | sort -n | tail -n 1.
Using single gnu awk command:
s='first-second-third-201805241346 first-second-third-201805241348 first-second-third-201805241548 first-second-third-201705241540'
awk -F- -v RS='[[:blank:]]+' '$NF>max{max=$NF} END{print max}' <<< "$s"
201805241548
Or using grep + awk (if gnu awk is not available):
grep -Eo '[0-9]+' <<< "$s" | awk '$1>max{max=$1} END{print max}'
Another awk
echo 'first-...-201705241540' | awk -v RS='[^0-9]+' '$0>max{max=$0} END{print max}'
Gnarly pure bash:
n='first-second-third-201805241346 \
first-second-third-201805241348 \
first-second-third-201805241548 \
first-second-third-201705241540'
z="${n//+([a-z-])/;p=}"
p=0 m=0 eval echo -n "${z//\;/\;m=\$((m>p?m:p))\;};m=\$((m>p?m:p))"
echo $m
Output:
201805241548
How it works: This code constructs code, then runs it.
z="${n//+([a-z-])/;p=}" substitutes non-numbers with some pre-code
-- setting $p to the value of each number, (useless on its own). At this point echo $z would output:
;p=201805241346 \ ;p=201805241348 \ ;p=201805241548 \ ;p=201705241540
Substitute the added ;s for more code that sets $m to the
greatest value of $p, which needs eval to run it -- the actual
code the whole line with eval runs looks like this:
p=0 m=0
m=$((m>p?m:p));p=201805241346
m=$((m>p?m:p));p=201805241348
m=$((m>p?m:p));p=201805241548
m=$((m>p?m:p));p=201705241540
m=$((m>p?m:p))
Print $m.

Count number of patterns with a single command

I'd like to count the number of occurrences in a string. For example, in this string :
'apache2|ntpd'
there are 2 different strings separated by | character.
Another example :
'apache2|ntpd|authd|freeradius'
In this case there are 4 different strings separated by | character.
Would you know a shell or perl command that could simply count this for me?
you can use awk command as below;
echo "apache2|ntpd" | awk -F'|' '{print NF}'
-F'|' is to field separator;
NF means Number of Fields
Example;
user#host:/tmp$ echo 'apache2|ntpd|authd|freeradius' | awk -F'|' '{print NF}'
4
you can also use this;
user#host:/tmp$ echo "apache2|ntpd" | tr '|' ' ' | wc -w
2
user#host:/tmp$ echo 'apache2|ntpd|authd|freeradius' | tr '|' ' ' | wc -w
4
tr '|' ' ' : translate | to space
wc -w : print the word counts
if there are spaces in the string, wc -w not correct result, so
echo 'apac he2|ntpd' | tr '|' '\n' | wc -l
user#host:/tmp$ echo 'apac he2|ntpd' | tr '|' ' ' | wc -w
3 --> not correct
user#host:/tmp$ echo 'apac he2|ntpd' | tr '|' '\n' | wc -l
2
tr '|' '\n' : translate | to newline
wc -l : number of lines
Do can do this just within bash without calling external languages like awk or external programs like grep and tr.
data='apache2|ntpd|authd|freeradius'
res=${data//[!|]/}
num_strings=$(( ${#res} + 1 ))
echo $num_strings
Let me explain.
res=${data//[!|]/} removes all characters that are not (that's the !) pipes (|).
${#res} gives the length of the resulting string.
num_strings=$(( ${#res} + 1 )) adds one to the number of pipes to get the number of fields.
It's that simple.
Another pure bash technique using positional-parameters
$ userString="apache2|ntpd|authd|freeradius"
$ printf "%s\n" $(IFS=\|; set -- $userString; printf "%s\n" "$#")
4
Thanks to cdarke's suggestion from the commands, the above command can directly store the count to a variable
$ printf -v count "%d" $(IFS=\|; set -- $userString; printf "%s\n" "$#")
$ printf "%d\n" "$count"
4
With wc and parameter expansion:
$ data='apache2|ntpd|authd|freeradius'
$ wc -w <<< ${data//|/ }
4
Using parameter expansion, all pipes are replaced with spaces. The result string is passed to wc -w for word count.
As #gniourf_gniourf mentionned, it works with what at first looks like process names but will fail if strings contain spaces.
You can do this with grep as well-
echo "apache2|ntpd|authd|freeradius" | grep -o "|" | wc -l
Output-
3
That output is the number of pipes.
To get the number of commands-
var=$(echo "apache2|ntpd|authd|freeradius" | grep -o "|" | wc -l)
echo $((var + 1))
Output -
4
You could use awk to count the occurrances of delimiters +1:
$ awk '{print gsub(/\|/,"")+1}' <(echo "apache2|ntpd|authd|freeradius")
4
may be this will help you.
IN="apache2|ntpd"
mails=$(echo $IN | tr "|" "\n")
for addr in $mails
do
echo "> [$addr]"
done

printing "grep -o" output in single line

How to print output of grep -o in a single line ? I am trying to print :
$ echo "Hello Guys!" |grep -E '[A-Z]'
Hello Guys!
$ echo "Hello Guys!" |grep -Eo '[A-Z]' <----Multiple lines
H
G
$ echo "Hello Guys!" |grep -Eo '[A-Z]'
Desired output:
HG
I am able to cheaply achieve it using following command ,but the issue is that number of letters(3 in this case) could be dynamic. So this approach cannot be used.
echo "HEllo Guys!" |grep -oE '[A-Z]' |xargs -L3 |sed 's/ //g'
HEG
You could do it all with this sed instruction
echo "Hello Guys!" |sed 's/[^A-Z]//g'
UPDATE
Breakdown of sed command:
The s/// is sed's substitute command. It simply replaces the first RegEx (the one between the first and the second slash) with the Expression between slash two and three. The trailing g stands for global, i.e, do this for every match of the RegEx in the current line. Without the g it would just stop processing after the first match. The RegEx itself is matching any non-capital letter and then those letters are replaced with nothing, i.e., effectively deleted.
You can use awk:
echo "Hello Guys!" | awk '{ gsub(/[^A-Z]/,"", $0); print;}'
HG
Also with tr:
echo "Hello Guys!" | tr -cd [:upper:]
HG
Also with sed :
echo "Hello Guys!" | sed 's/[^\[:upper:]]//g'
HG
You just need to remove the newline characters. You can use tr for that:
echo "HEllo Guys!" |grep -Eo '[A-Z]' |tr -d '\n'
HEG
Though, it cuts the last newline too.
You can use perl instead of grep
echo 'HEllo Guys!' | perl -lne 'print /([A-Z])/g'
HEG

Select first part of string

How do I pull a substring from a string. For example, from the string:
'/home/auto/gift/surprise'
take only:
'/home/auto/'
Note that '/home/auto/gift/surprise' may vary, i.e., instead of having 4 directory levels, it may go to 6 or 8, yet I'm only interested in the first 2 folders.
Here's what I've tried so far, without success:
$ pwd
'/home/auto/gift/surprise'
$ pwd | sed 's,^\(.*/\)\?\([^/]*\),\1,'
'/home/auto/gift/'
I think it is better to use cut for this:
$ echo "/home/auto/gift/surpris" | cut -d/ -f1-3
/home/auto
$ echo "/home/auto/gift/surpris/bla/bla" | cut -d/ -f1-3
/home/auto
Note that cut -d/ -f1-3 means: strip the string based on the delimiter /, then print from the 1st to the 3rd parts.
Or also awk:
$ echo "/home/auto/gift/surpris" | awk -F/ 'OFS="/" {print $1,$2,$3}'
/home/auto
$ echo "/home/auto/gift/surpris/bla/bla" | awk -F/ 'OFS="/" {print $1,$2,$3}'
/home/auto
You may use parameter substitution, which is POSIX defined:
$ s="/home/auto/gift/surprise"
$ echo ${s%/*/*}
/home/auto

Resources