I have the following dataframe:
0 1
0 0.224960 -1.376689
1 0.059706 -1.330823
2 -0.133850 -1.251549
3 -0.234644 -1.190972
4 -0.281469 -1.156635
... ... ...
295 0.655912 -1.040209
296 0.618599 -1.068238
297 0.594964 -1.109484
298 0.578758 -1.151496
299 0.570207 -1.179523
I added the index as a column and then generate fake time from this column like that:
df['timestamp'] = df.index
# convert the column (it's a string) to datetime type
datetime_series = pd.to_datetime(df['timestamp'])
# create datetime index passing the datetime series
datetime_index = pd.DatetimeIndex(datetime_series.values)
# set timestamp as datframe index
df=df.set_index('timestamp')
df
The result is:
0 1
timestamp
1970-01-01 00:00:00.000000000 0.224960 -1.376689
1970-01-01 00:00:00.000000001 0.059706 -1.330823
1970-01-01 00:00:00.000000002 -0.133850 -1.251549
1970-01-01 00:00:00.000000003 -0.234644 -1.190972
1970-01-01 00:00:00.000000004 -0.281469 -1.156635
... ... ...
1970-01-01 00:00:00.000000295 0.655912 -1.040209
1970-01-01 00:00:00.000000296 0.618599 -1.068238
1970-01-01 00:00:00.000000297 0.594964 -1.109484
1970-01-01 00:00:00.000000298 0.578758 -1.151496
1970-01-01 00:00:00.000000299 0.570207 -1.179523
I want the timestamp to be like 1970-01-01 00:00:00 then 1970-01-01 00:00:01 and so on.
This will be correct answer.
d = { i:"1970-01-01 00:00:00.{:0>9}".format(i) for i in df.index}
df.index = pd.Series(df.index).replace(d)
Tested like following:
tem = pd.DataFrame({'0':[1,2,3,4,5],'1':[3,4,5,6,7]},columns=['0','1'])
d = {0:"asdf",1:"asdf",2:"sdfs",3:"sdfs",4:"sdfs"}
tem.index = pd.Series(tem.index).replace(d)
tem print:
0 1
0 1 3
1 2 4
2 3 5
3 4 6
4 5 7
d print:
{0: 'asdf', 1: 'asdf', 2: 'sdfs', 3: 'sdfs', 4: 'sdfs'}
result tem print;
0 1
asdf 1 3
asdf 2 4
sdfs 3 5
sdfs 4 6
sdfs 5 7
Related
I have a pandas dataframe
date
0 2010-03
1 2017-09-14
2 2020-10-26
3 2004-12
4 2012-04-01
5 2017-02-01
6 2013-01
I basically want to filter where dates are after 2015-12 (Dec 2015)
To get this:
date
0 2017-09-14
1 2020-10-26
2 2017-02-01
I tried this
df = df[(df['date']> "2015-12")]
but I'm getting an error
ValueError: Wrong number of items passed 17, placement implies 1
First for me working solution correct:
df = df[(df['date']> "2015-12")]
print (df)
date
1 2017-09-14
2 2020-10-26
5 2017-02-01
If convert to datetimes, which should be more robust for me working too:
df = df[(pd.to_datetime(df['date'])> "2015-12")]
print (df)
date
1 2017-09-14
2 2020-10-26
5 2017-02-01
Detail:
print (pd.to_datetime(df['date']))
0 2010-03-01
1 2017-09-14
2 2020-10-26
3 2004-12-01
4 2012-04-01
5 2017-02-01
6 2013-01-01
Name: date, dtype: datetime64[ns]
This question already has answers here:
How to convert string to datetime format in pandas python?
(3 answers)
Closed 1 year ago.
I have the following pandas DataFrame:
data = pd.DataFrame({"id": [1, 2, 3, 4, 5],
"end_time": ["2016-01-13", "2016-01-01", "2016-11-12", "2016-01-17", "2016-03-13"]})
I want to transform the end_time column to a column of datetime objects. But when I do it like this (like it is suggested everywhere):
data["end"] = data["end_time"].apply(lambda x: datetime.datetime.strptime(x, "%Y-%m-%d"))
the output is still a string column:
id end_time end
0 1 2016-01-13 2016-01-13
1 2 2016-01-01 2016-01-01
2 3 2016-11-12 2016-11-12
3 4 2016-01-17 2016-01-17
4 5 2016-03-13 2016-03-13
How to solve this?
strftime is designed to return a string object, details.
If we want to convert end_time to datetime64[ns] and assign to new column named end then we can use:
data['end'] = pd.to_datetime(data.end_time)
strptime will also covert the string to datetime64[ns]. But preferable is to_datetime method.
data["end"] = data["end_time"].apply(lambda x: datetime.datetime.strptime(x, "%Y-%m-%d"))
data.info()
Output
id end_time end
0 1 2016-01-13 2016-01-13
1 2 2016-01-01 2016-01-01
2 3 2016-11-12 2016-11-12
3 4 2016-01-17 2016-01-17
4 5 2016-03-13 2016-03-13
Datatypes:
data.info()
Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 5 non-null int64
1 end_time 5 non-null object
2 end 5 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 248.0+ bytes
My Dataframe looks like
"dataframe_time"
INSERTED_UTC
0 2018-05-29
1 2018-05-22
2 2018-02-10
3 2018-04-30
4 2018-03-02
5 2018-11-26
6 2018-03-07
7 2018-05-12
8 2019-02-03
9 2018-08-03
10 2018-04-27
print(type(dataframe_time['INSERTED_UTC'].iloc[1]))
<class 'datetime.date'>
I am trying to group the dates together and find the count of their occurrence quaterly. Desired Output -
Quarter Count
2018-03-31 3
2018-06-30 5
2018-09-30 1
2018-12-31 1
2019-03-31 1
2019-06-30 0
I am running the following command to group them together
dataframe_time['INSERTED_UTC'].groupby(pd.Grouper(freq='Q'))
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
First are dates converted to datetimes and then is used DataFrame.resample with on for get column with datetimes:
dataframe_time.INSERTED_UTC = pd.to_datetime(dataframe_time.INSERTED_UTC)
df = dataframe_time.resample('Q', on='INSERTED_UTC').size().reset_index(name='Count')
Or your solution is possible change to:
df = (dataframe_time.groupby(pd.Grouper(freq='Q', key='INSERTED_UTC'))
.size()
.reset_index(name='Count'))
print (df)
INSERTED_UTC Count
0 2018-03-31 3
1 2018-06-30 5
2 2018-09-30 1
3 2018-12-31 1
4 2019-03-31 1
You can convert the dates to quarters by to_period('Q') and group by those:
df.INSERTED_UTC = pd.to_datetime(df.INSERTED_UTC)
df.groupby(df.INSERTED_UTC.dt.to_period('Q')).size()
You can also use value_counts:
df.INSERTED_UTC.dt.to_period('Q').value_counts()
Output:
INSERTED_UTC
2018Q1 3
2018Q2 5
2018Q3 1
2018Q4 1
2019Q1 1
Freq: Q-DEC, dtype: int64
I'm working with a dataframe has one messy date column with irregular format, ie:
date
0 19.01.01
1 19.02.01
2 1991/01/01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
Is it possible convert it to standard format XXXX-XX-XX, which represents year-month-date? Thank you.
date
0 2019-01-01
1 2019-02-01
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
Use pd.to_datetime with yearfirst=True
Ex:
df = pd.DataFrame({"date": ['19.01.01', '19.02.01', '1991/01/01', '1996-01-01', '1996-06-30', '1995-12-31', '1997-01-01']})
df['date'] = pd.to_datetime(df['date'], yearfirst=True).dt.strftime("%Y-%m-%d")
print(df)
Output:
date
0 2019-01-01
1 2019-02-01
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
It depends of format, the most general solution is specify each format and use Series.combine_first:
date1 = pd.to_datetime(df['date'], format='%y.%m.%d', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%Y/%m/%d', errors='coerce')
date3 = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
df['date'] = date1.combine_first(date2).combine_first(date3)
print (df)
date
0 2019-01-01
1 2019-02-01
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
Try the following
df['date'].replace('\/|.','-', regex=True)
Use pd.to_datetime()
pd.to_datetime(df['date])
Output:
0 2001-01-19
1 2001-02-19
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
Name: 0, dtype: datetime64[ns]
I have a DF like this:
ID Time
1 20:29
1 20:45
1 23:16
2 11:00
2 13:00
3 01:00
I want to create a new column that puts a 1 next to the largest time value within each ID grouping like so:
ID Time Value
1 20:29 0
1 20:45 0
1 23:16 1
2 11:00 0
2 13:00 1
3 01:00 1
I know the answer involves a groupby mechanism and have been fiddling around with something like:
df.groupby('ID')['Time'].max() = 1
The idea is to write an anonymous function that operates on each of your groups and feed this to your groupby using apply:
df['Value']=df.groupby('ID',as_index=False).apply(lambda x : x.Time == max(x.Time)).values
Assuming that your 'Time' column is already a datetime64 then you want to groupby on 'ID' column and then call transform to apply a lambda to create a series with an index aligned with your original df:
In [92]:
df['Value'] = df.groupby('ID')['Time'].transform(lambda x: (x == x.max())).dt.nanosecond
df
Out[92]:
ID Time Value
0 1 2015-11-20 20:29:00 0
1 1 2015-11-20 20:45:00 0
2 1 2015-11-20 23:16:00 1
3 2 2015-11-20 11:00:00 0
4 2 2015-11-20 13:00:00 1
5 3 2015-11-20 01:00:00 1
The dt.nanosecond call is because the dtype returned is a datetime for some reason rather than a boolean:
In [93]:
df.groupby('ID')['Time'].transform(lambda x: (x == x.max()))
Out[93]:
0 1970-01-01 00:00:00.000000000
1 1970-01-01 00:00:00.000000000
2 1970-01-01 00:00:00.000000001
3 1970-01-01 00:00:00.000000000
4 1970-01-01 00:00:00.000000001
5 1970-01-01 00:00:00.000000001
Name: Time, dtype: datetime64[ns]