Problem to convert local to UTC during the winter change - python-3.x

I have an issue when I try to convert Local date/time to UTC date/time during the winter change hour.
I have a list of each date of a given year with 30 min step:
Those dates are in Local time.
Then I am converting them into UTC. I use
local = pytz.timezone("Europe/Paris")
utc = pytz.timezone("UTC")
Local_time.astimezone(utc).strftime()
It worked well during the first change hour of the year:
Here is the local time (-1 hour)):
2019-03-31 01:00:00 2019-03-31 01:30:00
2019-03-31 01:30:00 2019-03-31 03:00:00
2019-03-31 03:00:00 2019-03-31 03:30:00
Then the output is:
2019-03-31T00:00:00Z 2019-03-31T00:30:00Z
2019-03-31T00:30:00Z 2019-03-31T01:00:00Z
2019-03-31T01:00:00Z 2019-03-31T01:30:00Z
So here everything is fine.
But during the second change of hour:
Here is the local time with (+1 hour):
2019-10-27 01:30:00 2019-10-27 02:00:00
2019-10-27 02:00:00 2019-10-27 02:30:00
2019-10-27 02:30:00 2019-10-27 02:00:00
2019-10-27 02:00:00 2019-10-27 02:30:00
2019-10-27 02:30:00 2019-10-27 03:00:00
Then in UTC, it gives me :
2019-10-26T23:30:00Z 2019-10-27T00:00:00Z
2019-10-27T00:00:00Z 2019-10-27T00:30:00Z
2019-10-27T00:30:00Z 2019-10-27T00:00:00Z
2019-10-27T00:00:00Z 2019-10-27T00:30:00Z
2019-10-27T00:30:00Z 2019-10-27T02:00:00Z
I cannot figure out it put off 2 heures everywhere, whereas it should put off only one hour from 03:00 local time ??

I sorry to have asked this questions, I found it by myself actually.
It was quiet obvious..
I used the offset I had in input, with dateutil library
It worked !

Related

Pyspark unix_timestamp striping the last zeros while converting from datetime to unix time

I have the following date dataframe,
end_dt_time
2020-10-12 04:00:00
2020-10-11 04:00:00
2020-10-10 04:00:00
2020-10-09 04:00:00
2020-10-08 04:00:00
while converting these dates to unix timestamp the trailing zero are not appearing giving me incorrect time in unix.
This is what i am applying :
df = df.withColumn('unix', F.unix_timestamp('en_dt_time'))
the output is missing the last 3 zeros(000)
en_dt_time unix
2020-10-12 04:00:00 1602475200
2020-10-11 04:00:00 1602388800
2020-10-10 04:00:00 1602302400
2020-10-09 04:00:00 1602216000
2020-10-08 04:00:00 1602129600
2020-10-07 04:00:00 1602043200
the desired output is
en_dt_time unix
2020-10-12 04:00:00 1602475200000
2020-10-11 04:00:00 1602388800000
2020-10-10 04:00:00 1602302400000
2020-10-09 04:00:00 1602216000000
2020-10-08 04:00:00 1602129600000
2020-10-07 04:00:00 1602043200000
How can i get this precision while converting to unix timestamp?
I was able to generate this by multiplying the output with 1000
df = df.withColumn('unix', F.unix_timestamp('en_dt_time')*1000)
is this the right approach?
That's correct behavior. From the function's description:
Convert time string with given pattern (‘yyyy-MM-dd HH:mm:ss’, by default) to Unix time stamp (in seconds), using the default timezone and the default locale
So if you just want to get milliseconds, then you just need to convert seconds to milliseconds as you're doing right now.

Pandas: How to Convert UTC Time to Local Time?

I have a Pandas series time of dates and times, like:
UTC:
0 2015-01-01 00:00:00
1 2015-01-01 01:00:00
2 2015-01-01 02:00:00
3 2015-01-01 03:00:00
4 2015-01-01 04:00:00
Name: DT, dtype: datetime64[ns]
That I'd like to convert to another timezone:
time2 = time.dt.tz_localize('UTC').dt.tz_convert('Europe/Rome')
print("CET: ",'\n', time2)
CET:
0 2015-01-01 01:00:00+01:00
1 2015-01-01 02:00:00+01:00
2 2015-01-01 03:00:00+01:00
3 2015-01-01 04:00:00+01:00
4 2015-01-01 05:00:00+01:00
Name: DT, dtype: datetime64[ns, Europe/Rome]
But, the result is not what I need. I want it in the form 2015-01-01 02:00:00 (the local time at UTC 01:00:00), not 2015-01-01 01:00:00+01:00.
How can I do that?
EDIT: While there is another question that deal with this issue (Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone), I think this question is more to the point, providing a clear and concise example for what appears a common problem.
I turns out that my question already has an answer here:
Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone)
I just wasn't able to phrase my question correctly. Anyway, what works is:
time3 = time2.dt.tz_localize(None)
print("Naive: ",'\n', time3)
Naive:
0 2015-01-01 01:00:00
1 2015-01-01 02:00:00
2 2015-01-01 03:00:00
3 2015-01-01 04:00:00
4 2015-01-01 05:00:00
Name: DT, dtype: datetime64[ns]`

Pandas: Find original index of a value with a grouped dataframe

I have a dataframe with a RangeIndex, timestamps in the first column and several thousands hourly temperature observations in the second.
It is easy enough to group the observations by 24 and find daily Tmax and Tmin. But I also want the timestamp of each day's max and min values.
How can I do that?
I hope I can get help without posting a working example, because the nature of the data makes it unpractical.
EDIT: Here's some data, spanning two days.
DT T-C
0 2015-01-01 00:00:00 -2.5
1 2015-01-01 01:00:00 -2.1
2 2015-01-01 02:00:00 -2.3
3 2015-01-01 03:00:00 -2.3
4 2015-01-01 04:00:00 -2.3
5 2015-01-01 05:00:00 -2.0
...
24 2015-01-02 00:00:00 1.1
25 2015-01-02 01:00:00 1.1
26 2015-01-02 02:00:00 0.8
27 2015-01-02 03:00:00 0.5
28 2015-01-02 04:00:00 1.0
29 2015-01-02 05:00:00 0.7
First create DatetimeIndex, then aggregate by Grouper with days and idxmax
idxmin for datetimes for min and max temperature:
df['DT'] = pd.to_datetime(df['DT'])
df = df.set_index('DT')
df = df.groupby(pd.Grouper(freq='D'))['T-C'].agg(['idxmax','idxmin','max','min'])
print (df)
idxmax idxmin max min
DT
2015-01-01 2015-01-01 05:00:00 2015-01-01 00:00:00 -2.0 -2.5
2015-01-02 2015-01-02 00:00:00 2015-01-02 03:00:00 1.1 0.5

Setting start time from previous night without dates from CSV using pandas

tI would like to run timeseries analysis on repeated measures data (time only, no dates) taken overnight from 22:00:00 to 09:00:00 the next morning.
How is the time set so that the Timeseries starts at 22:00:00. At the moment even when plotting it starts at 00:00:00 and ends at 23:00:00 with a flat line between 09:00:00 and 23:00:00?
df = pd.read_csv('1310.csv', parse_dates=True)
df['Time'] = pd.to_datetime(df['Time'])
df['Time'].apply( lambda d : d.time() )
df = df.set_index('Time')
df['2017-05-16 22:00:00'] + pd.Timedelta('-1 day')
Note: The date in the last line of code is automatically added, seen when df['Time'] is executed, so I inserted the same format with date in the last line for 22:00:00.
This is the error:
TypeError: Could not operate Timedelta('-1 days +00:00:00') with block values unsupported operand type(s) for +: 'numpy.ndarray' and 'Timedelta'
You should consider your timestamps as pd.Timedeltas and add a day to the samples before your start time.
Create some example data:
import pandas as pd
d = pd.date_range(start='22:00:00', periods=12, freq='h')
s = pd.Series(d).dt.time
df = pd.DataFrame(pd.np.random.randn(len(s)), index=s, columns=['value'])
df.to_csv('data.csv')
df
value
22:00:00 -0.214977
23:00:00 -0.006585
00:00:00 0.568259
01:00:00 0.603196
02:00:00 0.358124
03:00:00 0.027835
04:00:00 -0.436322
05:00:00 0.627624
06:00:00 0.168189
07:00:00 -0.321916
08:00:00 0.737383
09:00:00 1.100500
Read in, make index a timedelta, add a day to timedeltas before the start time, then assign back to the index.
df2 = pd.read_csv('data.csv', index_col=0)
df2.index = pd.to_timedelta(df2.index)
s = pd.Series(df2.index)
s[s < pd.Timedelta('22:00:00')] += pd.Timedelta('1d')
df2.index = pd.to_datetime(s)
df2
value
1970-01-01 22:00:00 -0.214977
1970-01-01 23:00:00 -0.006585
1970-01-02 00:00:00 0.568259
1970-01-02 01:00:00 0.603196
1970-01-02 02:00:00 0.358124
1970-01-02 03:00:00 0.027835
1970-01-02 04:00:00 -0.436322
1970-01-02 05:00:00 0.627624
1970-01-02 06:00:00 0.168189
1970-01-02 07:00:00 -0.321916
1970-01-02 08:00:00 0.737383
1970-01-02 09:00:00 1.100500
If you want to set the date of the first day:
df2.index += (pd.Timestamp('2015-06-06') - pd.Timestamp(0))
df2
value
2015-06-06 22:00:00 -0.214977
2015-06-06 23:00:00 -0.006585
2015-06-07 00:00:00 0.568259
2015-06-07 01:00:00 0.603196
2015-06-07 02:00:00 0.358124
2015-06-07 03:00:00 0.027835
2015-06-07 04:00:00 -0.436322
2015-06-07 05:00:00 0.627624
2015-06-07 06:00:00 0.168189
2015-06-07 07:00:00 -0.321916
2015-06-07 08:00:00 0.737383
2015-06-07 09:00:00 1.100500

Calculate hours outside normal working time

From my sample xls(office2007) listed below. I am trying to calculate the hours worked from 17:00 onwards. could someone show me how would I write the formula to achieve this.
Many thanks...
Leaving Time Arrival Time DeliveryTime RTB Time per Job hoursforDaysWork
09:30:00 11:05:00 11:15:00 11:15:00 01:45:00 12:00:00
11:15:00 12:15:00 12:30:00 13:30:00 02:15:00
13:30:00 15:30:00 15:45:00 15:45:00 02:15:00
15:45:00 18:15:00 18:30:00 18:30:00 02:45:00
18:30:00 19:00:00 19:45:00 21:30:00 03:00:00
Time in Excel is based upon parts of 24 (whole day) as explained in this article.
So assuming that you care about hours after 17:00, but not after midnight, the following should help you out:
=IF(C2<17/24,0,C2-17/24)*24
C2 is the cell the calculation is for, with 17/24 giving you the time for 17:00 (5pm). The *24 at the end converts the response to be in hours instead of partial day.

Resources