Halt sed after first replacement - linux

I'm writing a script and I need to search for months in a line, eg 01, 02..., 12 and replace it with its abbreviation. However, though I need to search for all months, I only need to replace the first instance of a month number that I find. For instance, if we have a line that looks like this:
05 06 07
I need sed to perform the following:
May 06 07
The current command I'm using produces:
May Jun Jul
Which is not desirable. Here's what I'm using:
date=$(echo $line |cut -d , -f 1 | sed 's/-/ /g;s/:00//;s/:/ /g;s/01/Jan/;s/02/Feb/;s/03/Mar/;s/04/April/;s/05/May/;s/06/Jun/;s/07/Jul/;s/08/Aug/;s/09/Sep/;s/10/Oct/;s/11/Nov/;s/12/Dec/')
Thanks for the help in advanced

with gnu date
echo "05 06 07" | while read -r mon rest
do
mon=$(date -d "2014-$mon-01" +%b)
echo $mon $rest
done

Try something like
> echo $line
foo foo05 06 07 foo 02
> [[ $line =~ [0-9]{2}' '[0-9]{2}' '[0-9]{2} ]] && date -d $(echo "${BASH_REMATCH[0]}" | tr ' ' '/') '+%b %d %y'
May 06 07

This is awkward, but it seems to work:
... sed 's/-/ /g;s/:00//;s/:/ /g
s/01/Jan/
t
s/02/Feb/
t
s/03/Mar/
t
s/04/April/
t
s/05/May/
t
s/06/Jun/
t
s/07/Jul/
t
s/08/Aug/
t
s/09/Sep/
t
s/10/Oct/
t
s/11/Nov/
t
s/12/Dec/'

Related

Remove first n "words" from string variable in Bash

I want to remove the first 4 words from my string variable "DATES".
Does someone have a simple solution for this?
Here my example:
DATES="31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01"
WC=$(echo $DATES | wc -w)
DATE_COUNT=$(( $WC / 4 - 1 ))
for i in {0..$DATE_COUNT}
do
YEAR=$(echo $DATES | awk '{print $3}')
MONTH=$(echo $DATES | awk '{print $2}')
MONTH=$( date --date="$(printf "01 %s" $MONTH)" +"%m")
DAY=$(echo $DATES | awk '{print $1}')
TIME=$(echo $DATES | awk '{print $4}' | sed 's/://g')
DATE_ARRAY[$i]="$YEAR$MONTH$DAY$TIME"
#Remove first 4 words from string
done
Use cut.
DATES="31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01"
echo $DATES | cut -d' ' -f 5-
Output:
30 May 2021 10:23:01 29 May 2021 10:24:01
You can even use it for a cleaner solution than awk, like this:
YEAR=$(echo $DATES | cut -d' ' -f 3)
General version to remove n first words
remove_n_first_words(){
echo $2 | cut -d' ' -f $(($1+1))-
}
remove_n_first_words 4 "$DATES"
Using bash regex operator =~:
$ [[ $DATES =~ ^(([^ ]+ +){4})(.*) ]] && echo ${BASH_REMATCH[3]}
30 May 2021 10:23:01 29 May 2021 10:24:01
Maybe use read ?
DATES="31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01"
read -ra dates <<< "$DATES"; echo "${dates[#]:4}"
Or just store the data in an array directly.
DATES=(31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01)
echo "${DATES[#]:4}"
To get the total words/elements like with wc -c
echo "${#DATES[*]}"

how can i cut off the strings from an output in Bash shell?

The command i run is as follows:
rpm -qi setup | grep Install
The output of the command:
Install Date: Do 30 Jul 2020 15:55:28 CEST
I would like to edit this output further more in order to remain with just:
30 Jul 2020
And the rest of the output not to be displayed.
What best editing way in bash can i possibly simply get this end result?
Use grep -Po like so (-P = use Perl regex engine, and -o = print just the match, not the entire line):
echo '**Install Date: Do 30 Jul 2020 15:55:28 CEST**' | grep -Po '\d{1,2}\s+\w{3}\s+\d{4}'
You can also use cut like so (-d' ' = split on blanks, -f4-6 =
print fields 4 through 6):
echo '**Install Date: Do 30 Jul 2020 15:55:28 CEST**' | cut -d' ' -f4-6
Output:
30 Jul 2020
You can do it using just rpmqueryformat and bashprintf:
$ printf '%(%d %b %Y)T\n' $(rpm -q --queryformat '%{INSTALLTIME}\n' setup)
29 Apr 2020

How do I use a list of Dates from .txt as input in shell script?

I'm trying to extract dates from a .nc file and I wanted to write a script to automate the process by using a .txt file, the data looks like this:
1995 04 05
1995 06 12
1995 06 30
1995 07 16
1995 07 19
1995 07 20
1995 07 28
1996 03 09
1996 04 25
1996 08 13
I want to assign a variable for years, months and days separately e.g. , so that It would take the date from each line as input it in a command like this:
cdo seltimestep,$DD "mon_$MM.nc" "/Desktop/2020/output/$YYYY-$MM-$DD.nc"
I previously made a script similar to this but I had to input each date manually.
You can read input file line by line and then use bash array to split the dates.
#!/bin/bash
while read -r line; do
dateArray=( $line )
echo "YYYY: ${dateArray[0]}, MM: ${dateArray[1]}, DD: ${dateArray[2]}"
done < input.file
Using awk:
awk '{printf("YYYY: %04d, MM: %02d, DD: %02d\n", $1, $2, $3)}' input.txt
The actual awk program is very straightforward, printing formatted fields of each record line:
{
printf("YYYY: %04d, MM: %02d, DD: %02d\n", $1, $2, $3)
}
Read can create all needed vars from line at once
while read -r year month day; do
echo "Year: $year Month: $month Day: $day"
done < file

AWK adding if statement to add zero to number range 0 to 9 ( NEED TO USE AWK)

Hi I need to format the date command output using awk and add zero before the days starting 1 to 9 .
today=`date | awk {'print $1 " " $2 " " $3'}`
So in the above the output is
Wed Mar 2
I need to add 0 to 2 to get to days of the month 1 through 9
Wed Mar 02
Ho can I add this command using the awk command
for i in 0{1..9}; do echo $i; done
So I need to add 0/zero to $3 when it's between 1 or 9
I tried doing it this way , but something is not working I get error
a3=`date|awk '{
if ($3 <=9)
print $1" "$2" " "0"$3;
else
print $1" "$2" " $3;
}'`
echo $a3
Can you please assist?
Regards
If I were you I'd just specify a format directly:
$ date '+%a %b %d'
Wed Mar 02
date takes a format string preceded by a + as its final argument.
if you must do in awk you can use printf for formatted printing
$ echo 1 2 10 20 | awk -v RS=" " '{printf "%s\t-> %02d\n",$1,$1}'
1 -> 01
2 -> 02
10 -> 10
20 -> 20

Replace strings with evaluated string based on matched group (elegant way, not using for .. in)

I'm looking for a way to replace strings of a file, matched by a regular expression, with another string that will be generated/evaluated out of the matched string.
For example, I want to replace the timestamps (timestamp + duration) in this file
1357222500 3600 ...
Maybe intermediate strings...
1357226100 3600 ...
Maybe intermediate strings...
...
By human readable date representations (date range).
Until now, I always used shell scripts like Bash to iterate over each line, matching for the line X, getting the matched group string and printing the line after processing, for example this way (from memory):
IFS="
"
for L in `cat file.txt`; do
if [[ "${L}" =~ ^([0-9]{1,10})\ ([0-9]{1,4})\ .*$ ]]; then
# Written as three lines for better readability/recognition
echo -n "`date --date=#${BASH_REMATCH[1]}` - "
echo -n "`date --date=#$(( ${BASH_REMATCH[1]} + ${BASH_REMATCH[2]} ))`"
echo ""
else
echo "$L"
fi
done
I wonder if there's something like this with a fictional(?) "sed-2.0":
cat file.txt | sed-2.0 's+/^\([0-9]\{1,10\}\) \([0-9]\{1,4\}\) .*$+`date --date="#\1"` - `date --date="#$(( \1 + \2 ))`'
Whereas the backticks in the sed-2.0 replacement will be evaluated as shell command passing the matched groups \1 and \2.
I know that this does not work as expected, but I'd like to write someting like this.
Edit 1
Edit of question above: added missing echo "" in if of Bash script example.
This should be the expected output:
Do 3. Jan 15:15:00 CET 2013 - Do 3. Jan 16:15:00 CET 2013
Maybe intermediate strings...
Do 3. Jan 16:15:00 CET 2013 - Do 3. Jan 17:15:00 CET 2013
Maybe intermediate strings...
...
Note, that the timestamp depends on the timezone.
Edit 2
Edit of question above: fixed syntax error of Bash script example, added comment.
Edit 3
Edit of question above: fixed syntax error of Bash script example. Changed the phrase "old-school example" to "Bash script example".
Summary of Kent's and glenn jackman's answer
There's a huge difference in both approaches: the execution time. I've compared all four methods, here are the results:
gawk using strftime()
/usr/bin/time gawk '/^[0-9]+ [0-9]+ / {t1=$1; $1=strftime("%c -",t1); $2=strftime("%c",t1+$2)} 1' /tmp/test
...
0.06user 0.12system 0:00.30elapsed 60%CPU (0avgtext+0avgdata 1148maxresident)k
0inputs+0outputs (0major+327minor)pagefaults 0swaps
gawk using execution through getline (Gnu AWK Manual)
/usr/bin/time gawk '/^[0-9]{1,10} [0-9]{1,4}/{l=$1+$2; "date --date=#"$1|getline d1; "date --date=#"l|getline d2;print d1" - "d2;next;}1' /tmp/test
...
1.89user 7.59system 0:10.34elapsed 91%CPU (0avgtext+0avgdata 5376maxresident)k
0inputs+0outputs (0major+557419minor)pagefaults 0swaps
Custom Bash script
./sed-2.0.sh /tmp/test
...
3.98user 10.33system 0:15.41elapsed 92%CPU (0avgtext+0avgdata 1536maxresident)k
0inputs+0outputs (0major+759829minor)pagefaults 0swaps
sed using e option
/usr/bin/time sed -r 's#^([0-9]{1,10}) ([0-9]{1,4})(.*$)#echo $(date --date=#\1 )" - "$(date --date=#$((\1+\2)))#ge' /tmp/test
...
3.88user 16.76system 0:21.89elapsed 94%CPU (0avgtext+0avgdata 1272maxresident)k
0inputs+0outputs (0major+1253409minor)pagefaults 0swaps
Input data
for N in `seq 1 1000`; do echo -e "$(( 1357226100 + ( $N * 3600 ) )) 3600 ...\nSomething else ..." >> /tmp/test ; done
We can see that AWK using the strffime() method is the fastest. But even the Bash script is faster than sed with shell execution.
Kent showed us a more generic, universal way to accomplish what I've asked for. My question actually was not only limited to my timestamp example. In this case I had to do exactly this (replacing timestamp + duration by human readable date representation), but I had situations where I had to execute other code.
glenn jackman showed us a specific solution which is suitable for situations were you can do string operations and calculation directly in AWK.
So, it depends on the time you have (or time your script may run), the amount of the data and use case which method should be preferred.
based on your sample input:
gawk '/^[0-9]+ [0-9]+ / {t1=$1; $1=strftime("%c -",t1); $2=strftime("%c",t1+$2)} 1'
outputs
Thu 03 Jan 2013 09:15:00 AM EST - Thu 03 Jan 2013 10:15:00 AM EST ...
Maybe intermediate strings...
Thu 03 Jan 2013 10:15:00 AM EST - Thu 03 Jan 2013 11:15:00 AM EST ...
Maybe intermediate strings...
...
awk oneliner: (the datetime format could be different from your output)
awk '/^[0-9]{1,10} [0-9]{1,4}/{l=$1+$2; "date --date=#"$1|getline d1; "date --date=#"l|getline d2;print d1" - "d2;next;}1' file
test:
kent$ echo "1357222500 3600 ...
Maybe intermediate strings...
1357226100 3600 ...
Maybe intermediate strings...
..."|awk '/^[0-9]{1,10} [0-9]{1,4}/{l=$1+$2; "date --date=#"$1|getline d1; "date --date=#"l|getline d2;print d1" - "d2;next;}1'
Thu Jan 3 15:15:00 CET 2013 - Thu Jan 3 16:15:00 CET 2013
Maybe intermediate strings...
Thu Jan 3 15:15:00 CET 2013 - Thu Jan 3 17:15:00 CET 2013
Maybe intermediate strings...
...
Gnu sed
if you have gnu sed, the idea from your "not working" sed line could work in real world by applying gnu sed's s/foo/shell cmds/ge see below:
sed -r 's#^([0-9]{1,10}) ([0-9]{1,4})(.*$)#echo $(date --date=#\1 )" - "$(date --date=#$((\1+\2)))#ge' file
test
kent$ echo "1357222500 3600 ...
Maybe intermediate strings...
1357226100 3600 ...
Maybe intermediate strings...
..."|sed -r 's#^([0-9]{1,10}) ([0-9]{1,4})(.*$)#echo $(date --date=#\1 )" - "$(date --date=#$((\1+\2)))#ge'
Thu Jan 3 15:15:00 CET 2013 - Thu Jan 3 16:15:00 CET 2013
Maybe intermediate strings...
Thu Jan 3 16:15:00 CET 2013 - Thu Jan 3 17:15:00 CET 2013
Maybe intermediate strings...
...
if I would work on this, personally I would go with awk. because it is straightforward and easy to write.
at the end I paste my sed/awk version info :
kent$ sed --version|head -1
sed (GNU sed) 4.2.2
kent$ awk -V|head -1
GNU Awk 4.0.1

Resources