How to fill error date value with 0 in python

How to fill error date value with 0 in python - python-3.x

id date_original
1 20200305
2 2020305
3 2020035
4 202035
How can I convert the 'date_original' column into 'date' column in pandas dataframe?
id date
1 20200305
2 20200305
3 20200305
4 20200305

For me working well all formats if used format for match YYYYMMDD, tested in pandas 1.1.3:
df['date_original'] = pd.to_datetime(df['date_original'], format='%Y%m%d', errors='coerce')
print (df)
id date_original
0 1 2020-03-05
1 2 2020-03-05
2 3 2020-03-05
3 4 2020-03-05

Related

How to check if dates in a pandas column are after a date

I have a pandas dataframe
date
0 2010-03
1 2017-09-14
2 2020-10-26
3 2004-12
4 2012-04-01
5 2017-02-01
6 2013-01
I basically want to filter where dates are after 2015-12 (Dec 2015)
To get this:
date
0 2017-09-14
1 2020-10-26
2 2017-02-01
I tried this
df = df[(df['date']> "2015-12")]
but I'm getting an error
ValueError: Wrong number of items passed 17, placement implies 1

First for me working solution correct:
df = df[(df['date']> "2015-12")]
print (df)
date
1 2017-09-14
2 2020-10-26
5 2017-02-01
If convert to datetimes, which should be more robust for me working too:
df = df[(pd.to_datetime(df['date'])> "2015-12")]
print (df)
date
1 2017-09-14
2 2020-10-26
5 2017-02-01
Detail:
print (pd.to_datetime(df['date']))
0 2010-03-01
1 2017-09-14
2 2020-10-26
3 2004-12-01
4 2012-04-01
5 2017-02-01
6 2013-01-01
Name: date, dtype: datetime64[ns]

Groupby dates quaterly in a pandas dataframe and find count for their occurence

My Dataframe looks like
"dataframe_time"
INSERTED_UTC
0 2018-05-29
1 2018-05-22
2 2018-02-10
3 2018-04-30
4 2018-03-02
5 2018-11-26
6 2018-03-07
7 2018-05-12
8 2019-02-03
9 2018-08-03
10 2018-04-27
print(type(dataframe_time['INSERTED_UTC'].iloc[1]))
<class 'datetime.date'>
I am trying to group the dates together and find the count of their occurrence quaterly. Desired Output -
Quarter Count
2018-03-31 3
2018-06-30 5
2018-09-30 1
2018-12-31 1
2019-03-31 1
2019-06-30 0
I am running the following command to group them together
dataframe_time['INSERTED_UTC'].groupby(pd.Grouper(freq='Q'))
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'

First are dates converted to datetimes and then is used DataFrame.resample with on for get column with datetimes:
dataframe_time.INSERTED_UTC = pd.to_datetime(dataframe_time.INSERTED_UTC)
df = dataframe_time.resample('Q', on='INSERTED_UTC').size().reset_index(name='Count')
Or your solution is possible change to:
df = (dataframe_time.groupby(pd.Grouper(freq='Q', key='INSERTED_UTC'))
.size()
.reset_index(name='Count'))
print (df)
INSERTED_UTC Count
0 2018-03-31 3
1 2018-06-30 5
2 2018-09-30 1
3 2018-12-31 1
4 2019-03-31 1

You can convert the dates to quarters by to_period('Q') and group by those:
df.INSERTED_UTC = pd.to_datetime(df.INSERTED_UTC)
df.groupby(df.INSERTED_UTC.dt.to_period('Q')).size()
You can also use value_counts:
df.INSERTED_UTC.dt.to_period('Q').value_counts()
Output:
INSERTED_UTC
2018Q1 3
2018Q2 5
2018Q3 1
2018Q4 1
2019Q1 1
Freq: Q-DEC, dtype: int64

Convert 6 digits date format to standard one in Pandas

I'm working with a dataframe has one messy date column with irregular format, ie:
date
0 19.01.01
1 19.02.01
2 1991/01/01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
Is it possible convert it to standard format XXXX-XX-XX, which represents year-month-date? Thank you.
date
0 2019-01-01
1 2019-02-01
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01

Use pd.to_datetime with yearfirst=True
Ex:
df = pd.DataFrame({"date": ['19.01.01', '19.02.01', '1991/01/01', '1996-01-01', '1996-06-30', '1995-12-31', '1997-01-01']})
df['date'] = pd.to_datetime(df['date'], yearfirst=True).dt.strftime("%Y-%m-%d")
print(df)
Output:
date
0 2019-01-01
1 2019-02-01
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01

It depends of format, the most general solution is specify each format and use Series.combine_first:
date1 = pd.to_datetime(df['date'], format='%y.%m.%d', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%Y/%m/%d', errors='coerce')
date3 = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
df['date'] = date1.combine_first(date2).combine_first(date3)
print (df)
date
0 2019-01-01
1 2019-02-01
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01

Try the following
df['date'].replace('\/|.','-', regex=True)

Use pd.to_datetime()
pd.to_datetime(df['date])
Output:
0 2001-01-19
1 2001-02-19
2 1991-01-01
3 1996-01-01
4 1996-06-30
5 1995-12-31
6 1997-01-01
Name: 0, dtype: datetime64[ns]

how to compare two data frames based in difference in date

I have two data frames, each has #id column and date column,
I want to find rows in both Data frames that have same id with a date difference more than > 2 days

Normally it's helpful to include a datafrme so that the responder doesn't need to create it. :)
import pandas as pd
from datetime import timedelta
Create two dataframes:
df1 = pd.DataFrame(data={"id":[0,1,2,3,4], "date":["2019-01-01","2019-01-03","2019-01-05","2019-01-07","2019-01-09"]})
df1["date"] = pd.to_datetime(df1["date"])
df2 = pd.DataFrame(data={"id":[0,1,2,8,4], "date":["2019-01-02","2019-01-06","2019-01-09","2019-01-07","2019-01-10"]})
df2["date"] = pd.to_datetime(df2["date"])
They will look like this:
DF1
id date
0 0 2019-01-01
1 1 2019-01-03
2 2 2019-01-05
3 3 2019-01-07
4 4 2019-01-09
DF2
id date
0 0 2019-01-02
1 1 2019-01-06
2 2 2019-01-09
3 8 2019-01-07
4 4 2019-01-10
Merge the two dataframes on 'id' columns:
df_result = df1.merge(df2, on="id")
Resulting in:
id date_x date_y
0 0 2019-01-01 2019-01-02
1 1 2019-01-03 2019-01-06
2 2 2019-01-05 2019-01-09
3 4 2019-01-09 2019-01-10
Then subtract the two day columns and filter for greater than two.
df_result[(df_result["date_y"] - df_result["date_x"]) > timedelta(days=2)]
id date_x date_y
1 1 2019-01-03 2019-01-06
2 2 2019-01-05 2019-01-09

dataframe transformation python

I am new to pandas. I have dataframe,df with 3 columns:(date),(name) and (count).
Given each day: is there an easy way to create a new dataframe from original one that contains new columns representing the unique names in the original (name column) and their respective count values in the correct columns?
date name count
0 2017-08-07 ABC 12
1 2017-08-08 ABC 5
2 2017-08-08 TTT 6
3 2017-08-09 TAC 5
4 2017-08-09 ABC 10
It should now be
date ABC TTT TAC
0 2017-08-07 12 0 0
1 2017-08-08 5 6 0
3 2017-08-09 10 0 5

df = pd.DataFrame({"date":["2017-08-07","2017-08-08","2017-08-08","2017-08-09","2017-08-09"],"name":["ABC","ABC","TTT","TAC","ABC"], "count":["12","5","6","5","10"]})
df = df.pivot(index='date', columns='name', values='count').reset_index().fillna(0)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to fill error date value with 0 in python - python-3.x

id date_original 1 20200305 2 2020305 3 2020035 4 202035 How can I convert the 'date_original' column into 'date' column in pandas dataframe? id date 1 20200305 2 20200305 3 20200305 4 20200305

For me working well all formats if used format for match YYYYMMDD, tested in pandas 1.1.3: df['date_original'] = pd.to_datetime(df['date_original'], format='%Y%m%d', errors='coerce') print (df) id date_original 0 1 2020-03-05 1 2 2020-03-05 2 3 2020-03-05 3 4 2020-03-05

Related

How to check if dates in a pandas column are after a date

Groupby dates quaterly in a pandas dataframe and find count for their occurence

Convert 6 digits date format to standard one in Pandas

how to compare two data frames based in difference in date

dataframe transformation python

Categories

Resources