Capture a set of numbers in sed

Capture a set of numbers in sed - linux

I have the following string
Text1 Text2 v2010.0_1.3 Tue Jun 6 14:38:31 PDT 2017
I am trying to capture only v2010.0_1.3 using
echo "Text1 Text2 v2010.0_1.3 Tue Jun 6 14:38:31 PDT 2017" |
sed -nE 's/.*(v.*\s).*/\1/p'
and I get the following result v2010.0_1.3 Tue Jun 6 14:38:31 PDT. It looks like sed is not stopping the first occurrence of the space, but at the last one. How can I capture only until the first occurence?

Using sed
sed's regular expressions are "greedy" (more precisely, they are leftmost-longest matches). You need to work around that. For example:
$ s="Text1 Text2 v2010.0_1.3 Tue Jun 6 14:38:31 PDT 2017"
$ echo "$s" | sed -nE 's/.*(v[^[:blank:]]*).*/\1/p'
v2010.0_1.3
Notes:
The expression (v[^[:blank:]]*) will capture as a group any string of non-blanks that begins with v.
\s is non-portable (GNU only). [[:blank:]] will work reliably to match blanks and tabs in a unicode-safe way.
Using awk
$ echo "$s" | awk '/^v/' RS=' '
v2010.0_1.3
RS=' ' tells awk to treat a space as a record separator. /^v/ will print any record that begins with v.

Related

Get a string if two words not from the string match

On a Linux system I have some output like this:
Subject = CN=User_A,OU=users
Status = Valid Kind = IKE Serial = 98505 DP = 9
Not_Before: Wed Jun 15 13:53:55 2022 Not_After: Sun Jun 25 08:25:20 2023
Subject = CN=User_B,OU=users
Status = Valid Kind = IKE Serial = 98934 DP = 8
Not_Before: Sun Apr 18 18:24:16 2021 Not_After: Fri Apr 21 18:24:16 2023
I can use | grep 2022 | grep Jun to find certain data, but how can get Subject line in the output? I need to get the username whose certificate is about to expire ) Something like "Show me the Subject if "grep 2022 | grep Jun"".
Thank you in advance!

What about this:
grep -B 2 "2022" test.txt | grep -o "CN=[A-Za-z0-1_]*" | cut -c 4-
grep -B 2 // show the matching line and two lines before too.
[A-Za-z0-1_]* // any character, being letters, digits and an underscore
grep -o "CN=[...]"
// show only the part, containing "CN=", followed by ...
cut -c 4- // instead of "CN=...", only show "..." (starting at 4th character)

Using grep
$ grep -m2 -e '^Subject' -e 'Jun' -e '2022' input_file
Subject = CN=User_A,OU=users
Not_Before: Wed Jun 15 13:53:55 2022 Not_After: Sun Jun 25 08:25:20 2023
Using sed
$ sed -n '/^Subject/{p;:a;n;/Jun\|2022/p;ba}' input_file
Subject = CN=User_A,OU=users
Not_Before: Wed Jun 15 13:53:55 2022 Not_After: Sun Jun 25 08:25:20 2023

Remove first n "words" from string variable in Bash

I want to remove the first 4 words from my string variable "DATES".
Does someone have a simple solution for this?
Here my example:
DATES="31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01"
WC=$(echo $DATES | wc -w)
DATE_COUNT=$(( $WC / 4 - 1 ))
for i in {0..$DATE_COUNT}
do
YEAR=$(echo $DATES | awk '{print $3}')
MONTH=$(echo $DATES | awk '{print $2}')
MONTH=$( date --date="$(printf "01 %s" $MONTH)" +"%m")
DAY=$(echo $DATES | awk '{print $1}')
TIME=$(echo $DATES | awk '{print $4}' | sed 's/://g')
DATE_ARRAY[$i]="$YEAR$MONTH$DAY$TIME"
#Remove first 4 words from string
done

Use cut.
DATES="31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01"
echo $DATES | cut -d' ' -f 5-
Output:
30 May 2021 10:23:01 29 May 2021 10:24:01
You can even use it for a cleaner solution than awk, like this:
YEAR=$(echo $DATES | cut -d' ' -f 3)
General version to remove n first words
remove_n_first_words(){
echo $2 | cut -d' ' -f $(($1+1))-
}
remove_n_first_words 4 "$DATES"

Using bash regex operator =~:
$ [[ $DATES =~ ^(([^ ]+ +){4})(.*) ]] && echo ${BASH_REMATCH[3]}
30 May 2021 10:23:01 29 May 2021 10:24:01

Maybe use read ?
DATES="31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01"
read -ra dates <<< "$DATES"; echo "${dates[#]:4}"
Or just store the data in an array directly.
DATES=(31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01)
echo "${DATES[#]:4}"
To get the total words/elements like with wc -c
echo "${#DATES[*]}"

how can i cut off the strings from an output in Bash shell?

The command i run is as follows:
rpm -qi setup | grep Install
The output of the command:
Install Date: Do 30 Jul 2020 15:55:28 CEST
I would like to edit this output further more in order to remain with just:
30 Jul 2020
And the rest of the output not to be displayed.
What best editing way in bash can i possibly simply get this end result?

Use grep -Po like so (-P = use Perl regex engine, and -o = print just the match, not the entire line):
echo '**Install Date: Do 30 Jul 2020 15:55:28 CEST**' | grep -Po '\d{1,2}\s+\w{3}\s+\d{4}'
You can also use cut like so (-d' ' = split on blanks, -f4-6 =
print fields 4 through 6):
echo '**Install Date: Do 30 Jul 2020 15:55:28 CEST**' | cut -d' ' -f4-6
Output:
30 Jul 2020

You can do it using just rpmqueryformat and bashprintf:
$ printf '%(%d %b %Y)T\n' $(rpm -q --queryformat '%{INSTALLTIME}\n' setup)
29 Apr 2020

Filter between version names and version numbers

when I run the script kit_version.sh I get the following output
# ./kit_version.bash
--- USAW Kits ---
RPM Kits Installed Time
------------------------------------ ---------------------------------
APP-IR-LRPS-1.1.0.0-01 Thu 15 Nov 2012 11:10:20 AM IST
APP-V-LRPS-4.3.7.0-01 Mon 15 Oct 2012 04:27:54 PM IST
batter-ic-4.3.0.0-04 Mon 24 Feb 2014 02:10:21 PM IST
CSHRS-Monitoring-5.0.0.0-03 Mon 24 Feb 2014 03:32:43 PM IST
CS-RH-watchdog-conf-5.0.0.0-03 Mon 24 Feb 2014 03:32:42 PM IST
CSe-OSP-Bin-5.0.0.0-01 Mon 24 Feb 2014 03:28:00 PM IST
sca_core_2.5.7.0-7 Sun 29 Mar 2015 02:36:46 PM IDT
sca_data:80.7.0-7 Sun 29 Mar 2015 02:37:04 PM IDT
.
.
.
How to filter the output so I get in the first field only the package name and the second field
only the version number as the following:
./kit_version.bash | ......
APP-IR-LRPS 1.1.0.0-01
APP-V-LRPS 4.3.7.0-01
batter-ic 4.3.0.0-04
CSHRS-Monitoring 5.0.0.0-03
CS-RH-watchdog-conf 5.0.0.0-03
CSe-OSP-Bin 5.0.0.0-01
sca_core 2.5.7.0-7
sca_data 80.7.0-7
Remark – the separator between the version name to version number could be different char

With GNU awk, I can imagine
./kit_version.bash | gawk '{ print gensub(/.([0-9.]+-[0-9.]+)$/, "\t\\1", 1, $1) }'
This will replace the character before a string matching a version number at the end of the first field with a tab and print the result of that substitution. To cut off the first three lines, use
awk 'NR > 3 { print gensub(/.([0-9.]+-[0-9.]+)$/, "\t\\1", 1, $1) }'
that is, add the NR > 3 condition.
Alternatively with sed:
./kit_version.bash | sed '1d;2d;3d;s/[[:space:]].*//;s/.\([0-9.]\+-[0-9.]\+\)$/\t\1/'
That is:
1d # first three lines: delete
2d
3d
s/[[:space:]].*// # remove everything after the first space,
# i.e., everything except the first field
s/.\([0-9.]\+-[0-9.]\+\)$/\t\1/ # then substitute as before.
This depends on no packages ending with a number while also being delimited from the version number by a period. That is to say,
# vvvvvvvv-- if this is supposed to be the version
somepackage2.3.4.5-10
will not work properly (it will give somepackag 2.3.4.5-10). It seems unlikely that this format is allowed, though.

./kit_version.bash \
| sed 's/^[[:space:]]*\([^[:space:]]*\).*/\1/;T clean;s/[-._]\([0-9][0-9._-]*\)$/\t\1/;t;:clean;s/.*//'
reformat the line (remove heading space and trialing info)
if no modif, go to cleaning the line
reformat to separate version from name
Only with GNU sed due to T option (or need a t jump;b clean^J:jump^J on posix version where ^J is a real new line)

Replace strings with evaluated string based on matched group (elegant way, not using for .. in)

I'm looking for a way to replace strings of a file, matched by a regular expression, with another string that will be generated/evaluated out of the matched string.
For example, I want to replace the timestamps (timestamp + duration) in this file
1357222500 3600 ...
Maybe intermediate strings...
1357226100 3600 ...
Maybe intermediate strings...
...
By human readable date representations (date range).
Until now, I always used shell scripts like Bash to iterate over each line, matching for the line X, getting the matched group string and printing the line after processing, for example this way (from memory):
IFS="
"
for L in `cat file.txt`; do
if [[ "${L}" =~ ^([0-9]{1,10})\ ([0-9]{1,4})\ .*$ ]]; then
# Written as three lines for better readability/recognition
echo -n "`date --date=#${BASH_REMATCH[1]}` - "
echo -n "`date --date=#$(( ${BASH_REMATCH[1]} + ${BASH_REMATCH[2]} ))`"
echo ""
else
echo "$L"
fi
done
I wonder if there's something like this with a fictional(?) "sed-2.0":
cat file.txt | sed-2.0 's+/^\([0-9]\{1,10\}\) \([0-9]\{1,4\}\) .*$+`date --date="#\1"` - `date --date="#$(( \1 + \2 ))`'
Whereas the backticks in the sed-2.0 replacement will be evaluated as shell command passing the matched groups \1 and \2.
I know that this does not work as expected, but I'd like to write someting like this.
Edit 1
Edit of question above: added missing echo "" in if of Bash script example.
This should be the expected output:
Do 3. Jan 15:15:00 CET 2013 - Do 3. Jan 16:15:00 CET 2013
Maybe intermediate strings...
Do 3. Jan 16:15:00 CET 2013 - Do 3. Jan 17:15:00 CET 2013
Maybe intermediate strings...
...
Note, that the timestamp depends on the timezone.
Edit 2
Edit of question above: fixed syntax error of Bash script example, added comment.
Edit 3
Edit of question above: fixed syntax error of Bash script example. Changed the phrase "old-school example" to "Bash script example".
Summary of Kent's and glenn jackman's answer
There's a huge difference in both approaches: the execution time. I've compared all four methods, here are the results:
gawk using strftime()
/usr/bin/time gawk '/^[0-9]+ [0-9]+ / {t1=$1; $1=strftime("%c -",t1); $2=strftime("%c",t1+$2)} 1' /tmp/test
...
0.06user 0.12system 0:00.30elapsed 60%CPU (0avgtext+0avgdata 1148maxresident)k
0inputs+0outputs (0major+327minor)pagefaults 0swaps
gawk using execution through getline (Gnu AWK Manual)
/usr/bin/time gawk '/^[0-9]{1,10} [0-9]{1,4}/{l=$1+$2; "date --date=#"$1|getline d1; "date --date=#"l|getline d2;print d1" - "d2;next;}1' /tmp/test
...
1.89user 7.59system 0:10.34elapsed 91%CPU (0avgtext+0avgdata 5376maxresident)k
0inputs+0outputs (0major+557419minor)pagefaults 0swaps
Custom Bash script
./sed-2.0.sh /tmp/test
...
3.98user 10.33system 0:15.41elapsed 92%CPU (0avgtext+0avgdata 1536maxresident)k
0inputs+0outputs (0major+759829minor)pagefaults 0swaps
sed using e option
/usr/bin/time sed -r 's#^([0-9]{1,10}) ([0-9]{1,4})(.*$)#echo $(date --date=#\1 )" - "$(date --date=#$((\1+\2)))#ge' /tmp/test
...
3.88user 16.76system 0:21.89elapsed 94%CPU (0avgtext+0avgdata 1272maxresident)k
0inputs+0outputs (0major+1253409minor)pagefaults 0swaps
Input data
for N in `seq 1 1000`; do echo -e "$(( 1357226100 + ( $N * 3600 ) )) 3600 ...\nSomething else ..." >> /tmp/test ; done
We can see that AWK using the strffime() method is the fastest. But even the Bash script is faster than sed with shell execution.
Kent showed us a more generic, universal way to accomplish what I've asked for. My question actually was not only limited to my timestamp example. In this case I had to do exactly this (replacing timestamp + duration by human readable date representation), but I had situations where I had to execute other code.
glenn jackman showed us a specific solution which is suitable for situations were you can do string operations and calculation directly in AWK.
So, it depends on the time you have (or time your script may run), the amount of the data and use case which method should be preferred.

based on your sample input:
gawk '/^[0-9]+ [0-9]+ / {t1=$1; $1=strftime("%c -",t1); $2=strftime("%c",t1+$2)} 1'
outputs
Thu 03 Jan 2013 09:15:00 AM EST - Thu 03 Jan 2013 10:15:00 AM EST ...
Maybe intermediate strings...
Thu 03 Jan 2013 10:15:00 AM EST - Thu 03 Jan 2013 11:15:00 AM EST ...
Maybe intermediate strings...
...

awk oneliner: (the datetime format could be different from your output)
awk '/^[0-9]{1,10} [0-9]{1,4}/{l=$1+$2; "date --date=#"$1|getline d1; "date --date=#"l|getline d2;print d1" - "d2;next;}1' file
test:
kent$ echo "1357222500 3600 ...
Maybe intermediate strings...
1357226100 3600 ...
Maybe intermediate strings...
..."|awk '/^[0-9]{1,10} [0-9]{1,4}/{l=$1+$2; "date --date=#"$1|getline d1; "date --date=#"l|getline d2;print d1" - "d2;next;}1'
Thu Jan 3 15:15:00 CET 2013 - Thu Jan 3 16:15:00 CET 2013
Maybe intermediate strings...
Thu Jan 3 15:15:00 CET 2013 - Thu Jan 3 17:15:00 CET 2013
Maybe intermediate strings...
...
Gnu sed
if you have gnu sed, the idea from your "not working" sed line could work in real world by applying gnu sed's s/foo/shell cmds/ge see below:
sed -r 's#^([0-9]{1,10}) ([0-9]{1,4})(.*$)#echo $(date --date=#\1 )" - "$(date --date=#$((\1+\2)))#ge' file
test
kent$ echo "1357222500 3600 ...
Maybe intermediate strings...
1357226100 3600 ...
Maybe intermediate strings...
..."|sed -r 's#^([0-9]{1,10}) ([0-9]{1,4})(.*$)#echo $(date --date=#\1 )" - "$(date --date=#$((\1+\2)))#ge'
Thu Jan 3 15:15:00 CET 2013 - Thu Jan 3 16:15:00 CET 2013
Maybe intermediate strings...
Thu Jan 3 16:15:00 CET 2013 - Thu Jan 3 17:15:00 CET 2013
Maybe intermediate strings...
...
if I would work on this, personally I would go with awk. because it is straightforward and easy to write.
at the end I paste my sed/awk version info :
kent$ sed --version|head -1
sed (GNU sed) 4.2.2
kent$ awk -V|head -1
GNU Awk 4.0.1

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Capture a set of numbers in sed - linux

Related

Get a string if two words not from the string match

Remove first n "words" from string variable in Bash

how can i cut off the strings from an output in Bash shell?

Filter between version names and version numbers

Replace strings with evaluated string based on matched group (elegant way, not using for .. in)

Categories

Resources