I need again your expertise, I am trying to do some conditional using awk to get the columns.
If I look at the $5 the data can have year and in some places a date.
So when year is there it's good to print, but other values where I have date and time like 05:17:27 then I need to print the last field.
2021
2021
05:17:27
20:33:17
05:17:20
2020
2020
2021
2020
2021
Below is my sample data.
data_file.
yogutdb01 Mon 28 Jun 2021 11:19:56 PM MST
yogutdb02 Thu 30 Sep 2021 02:02:53 AM MST
yogutdb03 Thu Jul 13 05:17:27 2017
yogutdb04 Fri Jun 23 20:33:17 2017
yogutdb05 Thu Jul 13 05:17:20 2017
yogutdb06 Wed 24 Jun 2020 03:49:16 PM MST
yogutdb07 Wed 24 Jun 2020 04:05:10 PM MST
yogutdb08 Sat 22 May 2021 04:19:14 AM MST
yogutdb09 Thu 09 Apr 2020 12:16:32 PM CEST
yogutdb10 Tue 11 May 2021 03:03:02 PM MST
My trial: I am using below but getting syntax error on the else condition.
$ awk '{ ($5=="[^0-9]+$")print $1,$2,$3,$4,$5; else print $1,$2,$3,$4,$NF}' my_data.text
Desired Should be:
yogutdb01 2021
yogutdb02 2021
yogutdb03 2017
yogutdb04 2017
yogutdb05 2017
yogutdb06 2020
yogutdb07 2020
yogutdb08 2021
yogutdb09 2020
yogutdb10 2021
OR
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu Jul 13 2017
yogutdb04 Fri Jun 23 2017
yogutdb05 Thu Jul 13 2017
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
You cannot use the == operator to test the regex match. Instead you can use
match() function or ~ operator.
You should place the ^ regex in front of [0-9], not inside.
Then would you please try:
awk '{if (match($5,/^[0-9]+$/)) print $1, $2, $3, $4, $5; else print $1, $2, $3, $4, $NF}' my_data.text
Output:
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu Jul 13 2017
yogutdb04 Fri Jun 23 2017
yogutdb05 Thu Jul 13 2017
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
Here is an alternative using ~ operator:
awk '$5 ~ /^[0-9]+$/ {print $1, $2, $3, $4, $5; next} {print $1, $2, $3, $4, $NF}' my_data.text
As per your desired outcome, you should try below which will work.
You can use Regular expression matches like ~.
$ awk '{ if ($5 !~ /:/) { print $1,$2,$3,$4,$5; next } { print $1,$2,$3,$4, $NF } }' exampl_data1
Result:
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu Jul 13 2017
yogutdb04 Fri Jun 23 2017
yogutdb05 Thu Jul 13 2017
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
Just to mention, as #tshiono also asked in the comment,to get the output in order, you can use below.
$ awk '{ if ($5 !~ /:/) { print $1, $2, $3, $4, $5; next } { print $1, $2, $4, $3, $NF } }' exampl_data1
You could print the first 4 fields, and check the 5th field for only 4 digits. If there are not only 4 digits, print the last field.
awk '{print $1, $2, $3, $4, ($5 ~ /^[0-9]+$/ ? $5 : $NF)}' my_data.text
Output
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu Jul 13 2017
yogutdb04 Fri Jun 23 2017
yogutdb05 Thu Jul 13 2017
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
UPDATE : new version that also fixes month-date cross-placements in columns 3 and 4 :
echo "${aaaaa}" \
\
| mawk 'NF=_+!($_=$(!+$NF?_:NF))*($3=$(2+2^(\
__= $4 ~ /^[0-3][0-9]$/)) \
substr("",$4=$(4-__)))' \_=5
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu 13 Jul 2017 *** fixed these 3 rows
yogutdb04 Fri 23 Jun 2017 ***
yogutdb05 Thu 13 Jul 2017 ***
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
first one acts upon the assumption that there aren't any numerical data at $NF other than 4-digit year
2nd option performs a more thorough year-data check. Both involve assigning the proper year value into $5, then using assignment into NF to trim out all the excess columns/fields to the right of it.
< datafile.txt \
\
| mawk 'NF=_^($_=$(!+$NF?_:NF))^!_' \_=5
or
| mawk 'NF= +_+($_=$(/[ ][012][0-9][0-9][0-9]$/? NF :_))*!_' \_=5
| gawk 'NF= _+!($_=$(/[ ][0-2][0-9]{3}$/ ? NF :_))' \_=5
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu Jul 13 2017
yogutdb04 Fri Jun 23 2017
yogutdb05 Thu Jul 13 2017
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
I have a use case where i want to run a monthly job starting at 2:30 PM for every first friday of every month starting from january.
Cron expression which i use :-
0 30 14 ? 1/1 6#1
This works absolutely fine.
Sample fire times : -
Fri Jan 03 14:30:00 UTC 2020
Fri Feb 07 14:30:00 UTC 2020
Fri Mar 06 14:30:00 UTC 2020
Fri Apr 03 14:30:00 UTC 2020
Fri May 01 14:30:00 UTC 2020
Fri Jun 05 14:30:00 UTC 2020
Fri Jul 03 14:30:00 UTC 2020
But if i use the same expression and use December as the starting month
0 30 14 ? 12/1 6#1
This starts failing :-
Fri Dec 04 14:30:00 UTC 2020
Fri Dec 03 14:30:00 UTC 2021
Fri Dec 02 14:30:00 UTC 2022
Fri Dec 01 14:30:00 UTC 2023
This kind of becomes yearly.
I don't see any issue with the expression i am using.How do we resolve this or a workaround ?
IMO this mean every 12 month = every December and it is the same as
0 30 14 ? 12 6#1
and your first record is equal to
0 30 14 ? * 6#1
(star mean every month)
I need to analyse a large dataset with dates formatted in several different formats:
Mon, 04 Nov 2019 06:12:44 -0800 (PST)
Mon, 4 Nov 2019 15:16:58 +0100 (CET)
Mon, 4 Nov 2019 08:03:13 +0000 (UTC)
Mon, 4 Nov 2019 12:05:54 +0100
dfMail.Date = pd.to_datetime(dfMail.Date, format = "%a, %d %b %Y %H:%M:%S %z")
returns error: ValueError: unconverted data remains: (PST)
What is the best strategy to convert these dates?
Thanks
I see that the () extension might be troublesome. In which case, you can just ignore it:
pd.to_datetime(dfMail.Date.str.replace('( \(.*\))', ''), utc=True)
Input:
Date
0 Mon, 04 Nov 2019 06:12:44 -0800 (PST)
1 Mon, 4 Nov 2019 15:16:58 +0100 (CET)
2 Mon, 4 Nov 2019 08:03:13 +0000 (UTC)
3 Mon, 4 Nov 2019 12:05:54 +0100
4 Thu, 17 Oct 2019 23:19:41 +0100 (GMT+01:00)
Output:
0 2019-11-04 14:12:44+00:00
1 2019-11-04 14:16:58+00:00
2 2019-11-04 08:03:13+00:00
3 2019-11-04 11:05:54+00:00
4 2019-10-17 22:19:41+00:00
Name: 0, dtype: datetime64[ns, UTC]
I got a domain work with id, day
Day shows value from Match to current.
I need to find the list of current week and last two weeks
Ex: today is Monday (04/22) then what I need is:
Week1: 06-12 April
Week2: 13-19 April
Current week: 20-26 April.
Please helps, thanks.
Posted here for posterity:
def current = new Date().clearTime()
int currentDay = Calendar.instance.with {
time = current
get( Calendar.DAY_OF_WEEK )
}
def listOfDays = (current - 13 - currentDay)..(current + 7 - currentDay)
listOfDays.each {
println it
}
Prints:
Sun Apr 06 00:00:00 BST 2014
Mon Apr 07 00:00:00 BST 2014
Tue Apr 08 00:00:00 BST 2014
Wed Apr 09 00:00:00 BST 2014
Thu Apr 10 00:00:00 BST 2014
Fri Apr 11 00:00:00 BST 2014
Sat Apr 12 00:00:00 BST 2014
Sun Apr 13 00:00:00 BST 2014
Mon Apr 14 00:00:00 BST 2014
Tue Apr 15 00:00:00 BST 2014
Wed Apr 16 00:00:00 BST 2014
Thu Apr 17 00:00:00 BST 2014
Fri Apr 18 00:00:00 BST 2014
Sat Apr 19 00:00:00 BST 2014
Sun Apr 20 00:00:00 BST 2014
Mon Apr 21 00:00:00 BST 2014
Tue Apr 22 00:00:00 BST 2014
Wed Apr 23 00:00:00 BST 2014
Thu Apr 24 00:00:00 BST 2014
Fri Apr 25 00:00:00 BST 2014
Sat Apr 26 00:00:00 BST 2014
I got a domain work with id, day, list day from January to now.
I get the current time by code:
def current = new Date()
So, I'd like to get list day from last 2 weeks, included this week, then I used the following code but it doesn't work.
def getWeek = current.Time - 13 (13 is 2 week + today)
Please help me solve it.
Not 100% sure I understand, but you should be able to use a Range:
def current = new Date().clearTime()
def listOfDays = (current - 13)..current
listOfDays.each { println it }
That prints:
Wed Apr 09 00:00:00 BST 2014
Thu Apr 10 00:00:00 BST 2014
Fri Apr 11 00:00:00 BST 2014
Sat Apr 12 00:00:00 BST 2014
Sun Apr 13 00:00:00 BST 2014
Mon Apr 14 00:00:00 BST 2014
Tue Apr 15 00:00:00 BST 2014
Wed Apr 16 00:00:00 BST 2014
Thu Apr 17 00:00:00 BST 2014
Fri Apr 18 00:00:00 BST 2014
Sat Apr 19 00:00:00 BST 2014
Sun Apr 20 00:00:00 BST 2014
Mon Apr 21 00:00:00 BST 2014
Tue Apr 22 00:00:00 BST 2014
If you mean you want the entire 2 weeks before the current week AND the current week, you could do:
def current = new Date().clearTime()
int currentDay = Calendar.instance.with {
time = current
get( Calendar.DAY_OF_WEEK )
}
def listOfDays = (current - 13 - currentDay)..(current + 7 - currentDay)
listOfDays.each {
println it
}
Which prints:
Sun Apr 06 00:00:00 BST 2014
Mon Apr 07 00:00:00 BST 2014
Tue Apr 08 00:00:00 BST 2014
Wed Apr 09 00:00:00 BST 2014
Thu Apr 10 00:00:00 BST 2014
Fri Apr 11 00:00:00 BST 2014
Sat Apr 12 00:00:00 BST 2014
Sun Apr 13 00:00:00 BST 2014
Mon Apr 14 00:00:00 BST 2014
Tue Apr 15 00:00:00 BST 2014
Wed Apr 16 00:00:00 BST 2014
Thu Apr 17 00:00:00 BST 2014
Fri Apr 18 00:00:00 BST 2014
Sat Apr 19 00:00:00 BST 2014
Sun Apr 20 00:00:00 BST 2014
Mon Apr 21 00:00:00 BST 2014
Tue Apr 22 00:00:00 BST 2014
Wed Apr 23 00:00:00 BST 2014
Thu Apr 24 00:00:00 BST 2014
Fri Apr 25 00:00:00 BST 2014
Sat Apr 26 00:00:00 BST 2014