How to convert the the column Time nanoseconds with object data type to datetime? - python-3.x

I have the below dataset in "object datatype" . I want to change the datatype to datetime.
0 00:00:00.000:
1 00:00:00.000:
2 00:00:00.000:
3 00:00:00.000:
4 00:00:00.000:
...
4943983 16:11:21.000:
4943984 16:11:24.000:
4943986 16:11:39.000:
4943987 16:11:51.000:
Name: Time, Length: 4943988, dtype: object
​
I tried the below command . but It replaced all the values with nan.
timefmt = "%H:%M:%S"
dadr['Time'] = pd.to_datetime(dadr['Time'],
errors='coerce').dt.strftime(timefmt)
Output:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
..
4943983 NaN
4943984 NaN
4943985 NaN
4943986 NaN
4943987 NaN
Name: Time, Length: 4943988, dtype: float64
I would like to add that , there are timefields with non zero values in Seconds place . such as time data '07:05:15.026:' does not match format '%H:%M:%S.000:' (match)

You can try to put the timefmt to format= parameter in pd.to_datetime:
timefmt = "%H:%M:%S.000:"
df['Time'] = pd.to_datetime(df['Time'], format=timefmt)
print(df)
Prints:
idx Time
0 4943983 1900-01-01 16:11:21
1 4943984 1900-01-01 16:11:24
2 4943985 1900-01-01 16:11:38
3 4943986 1900-01-01 16:11:39
4 4943987 1900-01-01 16:11:51
EDIT: To parse the second fraction after ., you can use %f:
timefmt = "%H:%M:%S.%f:"
df['Time'] = pd.to_datetime(df['Time'], format=timefmt)
print(df)
Prints:
idx Time
0 4943983 1900-01-01 16:11:21.100
1 4943984 1900-01-01 16:11:24.200
2 4943985 1900-01-01 16:11:38.300
3 4943986 1900-01-01 16:11:39.400
4 4943987 1900-01-01 16:11:51.500

You can try this:
# build your dataset
times = [
'00:00:00.000:',
'16:11:21.000:',
'16:11:24.000:',
'16:11:38.000:',
'16:11:39.000:',
'16:11:51.000:'
]
# remove the last colon in your timestamps
times = [x.replace('000:','000') for x in times]
# create your dataset
df = pd.DataFrame(pd.Series(times))
# perform your request
df['time'] = df[0].apply(lambda x: pd.Timestamp(x).strftime('%H:%M:%S.%f'))
Then you have:

Related

Time series resampling with column of type object

Good evening,
I want to resample on an irregular time series with column of type object but it does not work
Here is my sample data:
Actual start date Ingredients NumberShortage
2002-01-01 LEVOBUNOLOL HYDROCHLORIDE 1
2006-07-30 LEVETIRACETAM 1
2008-03-19 FLAVOXATE HYDROCHLORIDE 1
2010-01-01 LEVOTHYROXINE SODIUM 1
2011-04-01 BIMATOPROST 1
I tried to re-sample my data frame daily but it does not work with my code which is as follows:
df3 = df1.resample('D', on='Actual start date').sum()
and here is what it gives:
Actual start date NumberShortage
2002-01-01 1
2002-01-02 0
2002-01-03 0
2002-01-04 0
2002-01-05 0
and what I want as a result:
Actual start date Ingredients NumberShortage
2002-01-01 LEVOBUNOLOL HYDROCHLORIDE 1
2002-01-02 NAN 0
2002-01-03 NAN 0
2002-01-04 NAN 0
2002-01-05 NAN 0
Any ideas?
details on the data
So I use an excel file which contains several attributes before it is a csv file (this file can be downloaded from this site web https://www.drugshortagescanada.ca/search?perform=0 ) then I group by 'Actual start date' and 'Ingredients'to obtain 'NumberShortage'
and here is the source code:
import pandas as pd
df = pd.read_excel("Data/Data.xlsx")
df = df.dropna(how='any')
df = df.groupby(['Actual start date','Ingredients']).size().reset_index(name='NumberShortage')
finally after having applied your source code here is the eureur which gives me :
and here is the sample excel file :
Brand name Company Name Ingredients Actual start date
ACETAMINOPHEN PHARMASCIENCE INC ACETAMINOPHEN CODEINE 2017-03-23
PMS-METHYLPHENIDATE ER PHARMASCIENCE INC METHYLPHENIDATE 2017-03-28
You rather need to reindex using date_range as a source of new dates, and the time series as temporary index:
df['Actual start date'] = pd.to_datetime(df['Actual start date'])
(df
.set_index('Actual start date')
.reindex(pd.date_range(df['Actual start date'].min(),
df['Actual start date'].max(), freq='D'))
.fillna({'NumberShortage': 0}, downcast='infer')
.reset_index()
)
output:
index Ingredients NumberShortage
0 2002-01-01 LEVOBUNOLOL HYDROCHLORIDE 1
1 2002-01-02 NaN 0
2 2002-01-03 NaN 0
3 2002-01-04 NaN 0
4 2002-01-05 NaN 0
... ... ... ...
3373 2011-03-28 NaN 0
3374 2011-03-29 NaN 0
3375 2011-03-30 NaN 0
3376 2011-03-31 NaN 0
3377 2011-04-01 BIMATOPROST 1
[3378 rows x 3 columns]

Convert 6 digits date format to standard one in Pandas

I'm working with a dataframe has one messy date column with irregular format, ie:
date
0 19.01.01
1 19.02.01
2 1991/01/01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
Is it possible convert it to standard format XXXX-XX-XX, which represents year-month-date? Thank you.
date
0 2019-01-01
1 2019-02-01
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
Use pd.to_datetime with yearfirst=True
Ex:
df = pd.DataFrame({"date": ['19.01.01', '19.02.01', '1991/01/01', '1996-01-01', '1996-06-30', '1995-12-31', '1997-01-01']})
df['date'] = pd.to_datetime(df['date'], yearfirst=True).dt.strftime("%Y-%m-%d")
print(df)
Output:
date
0 2019-01-01
1 2019-02-01
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
It depends of format, the most general solution is specify each format and use Series.combine_first:
date1 = pd.to_datetime(df['date'], format='%y.%m.%d', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%Y/%m/%d', errors='coerce')
date3 = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
df['date'] = date1.combine_first(date2).combine_first(date3)
print (df)
date
0 2019-01-01
1 2019-02-01
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
Try the following
df['date'].replace('\/|.','-', regex=True)
Use pd.to_datetime()
pd.to_datetime(df['date])
Output:
0 2001-01-19
1 2001-02-19
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
Name: 0, dtype: datetime64[ns]

how to multiply values with group of data from pandas series without loop iteration

I have two pandas time series with different length and index, and a Boolean series. Series_1 is from the last data of each month with index last day of the month, series_2 is daily data with index daily, the Boolean series is True on the last day of each month, else as false.
I want to get data from series_1 (s1[0]) times data from series_2 (s2[1:n]) which is the daily data from one month, is there a way to do it without loop?
series_1 = 2010-06-30 1
2010-07-30 2
2010-08-31 5
2010-09-30 7
series_2 = 2010-07-01 2
2010-07-02 3
2010-07-03 5
2010-07-04 6
.....
2010-07-30 7
2010-08-01 6
2010-08-02 7
2010-08-03 5
.....
2010-08-31 6
Boolean = False
false
....
True
False
False
....
True
(with only the end of each month True)
want to get a series as a result that s = series_1[i] * series_2[j:j+n] (n data from same month)
How to make it?
Thanks in advance
Not sure if I got your question completely right but this should get you there:
series_1 = pd.Series({
'2010-07-30': 2,
'2010-08-31': 5
})
series_2 = pd.Series({
'2010-07-01': 2,
'2010-07-02': 3,
'2010-07-03': 5,
'2010-07-04': 6,
'2010-07-30': 7,
'2010-08-01': 6,
'2010-08-02': 7,
'2010-08-03': 5,
'2010-08-31': 6
})
Make the series Datetime aware and resample them to daily frequency:
series_1.index = pd.DatetimeIndex(series_1.index)
series_1 = series_1.resample('1D').asfreq()
series_2.index = pd.DatetimeIndex(series_2.index)
series_2 = series_2.resample('1D').asfreq()
Put them in a dataframe and perform basic multiplication:
df = pd.DataFrame()
df['1'] = series_1
df['2'] = series_2
df['product'] = df['1'] * df['2']
Result:
>>> df
1 2 product
2010-07-30 2.0 7.0 14.0
2010-07-31 NaN NaN NaN
2010-08-01 NaN 6.0 NaN
2010-08-02 NaN 7.0 NaN
2010-08-03 NaN 5.0 NaN
[...]
2010-08-27 NaN NaN NaN
2010-08-28 NaN NaN NaN
2010-08-29 NaN NaN NaN
2010-08-30 NaN NaN NaN
2010-08-31 5.0 6.0 30.0

Create a pandas column based on a lookup value from another dataframe

I have a pandas dataframe that has some data values by hour (which is also the index of this lookup dataframe). The dataframe looks like this:
In [1] print (df_lookup)
Out[1] 0 1.109248
1 1.102435
2 1.085014
3 1.073487
4 1.079385
5 1.088759
6 1.044708
7 0.902482
8 0.852348
9 0.995912
10 1.031643
11 1.023458
12 1.006961
...
23 0.889541
I want to multiply the values from this lookup dataframe to create a column of another dataframe, which has datetime as index.
The dataframe looks like this:
In [2] print (df)
Out[2]
Date_Label ID data-1 data-2 data-3
2015-08-09 00:00:00 1 2513.0 2502 NaN
2015-08-09 00:00:00 1 2113.0 2102 NaN
2015-08-09 01:00:00 2 2006.0 1988 NaN
2015-08-09 02:00:00 3 2016.0 2003 NaN
...
2018-07-19 23:00:00 33 3216.0 333 NaN
I want to calculate the data-3 column from data-2 column, where the weight given to 'data-2' column depends on corresponding value in df_lookup. I get the desired values by looping over the index as follows, but that is too slow:
for idx in df.index:
df.loc[idx,'data-3'] = df.loc[idx, 'data-2']*df_lookup.at[idx.hour]
Is there a faster way someone could suggest?
Using .loc
df['data-2']*df_lookup.loc[df.index.hour].values
Out[275]:
Date_Label
2015-08-09 00:00:00 2775.338496
2015-08-09 00:00:00 2331.639296
2015-08-09 01:00:00 2191.640780
2015-08-09 02:00:00 2173.283042
Name: data-2, dtype: float64
#df['data-3']=df['data-2']*df_lookup.loc[df.index.hour].values
I'd probably try doing a join.
# Fix column name
df_lookup.columns = ['multiplier']
# Get hour index
df['hour'] = df.index.hour
# Join
df = df.join(df_lookup, how='left', on=['hour'])
df['data-3'] = df['data-2'] * df['multiplier']
df = df.drop(['multiplier', 'hour'], axis=1)

Convert a numerical relative index (=months) to datetime

Given is a Pandas DataFrame with a numerical index representing the relative number of months:
df = pd.DataFrame(columns=['A', 'B'], index=np.arange(1,100))
df
A B
1 NaN NaN
2 NaN NaN
3 NaN NaN
...
How can the index be converted to a DateTimeIndex by specifying a start date (e.g., 2018-11-01)?
magic_function(df, start='2018-11-01', delta='month')
A B
2018-11-01 NaN NaN
2018-12-01 NaN NaN
2019-01-01 NaN NaN
...
I would favor a general solution that also works with arbitrary deltas, e.g. daily or yearly series.
Using date_range
idx=pd.date_range(start='2018-11-01',periods =len(df),freq='MS')
df.index=idx
I'm not sure with Pandas, but with plain datetime can't you just do this?
import datetime
start=datetime.date(2018,1,1)
months = 15
adjusted = start.replace(year=start.year + int(months/12), month=months%12)

Resources