Convert object column dates in same format - Python - python-3.x

I have an object column which contains date. I have extracted these dates from a text column. So all these dates are in different format. Which are mentioned below . But all the date are in mm/dd/yyy or mm/dd/yy or similar formats where month/date/year.
How can I convert this column in mm/dd/yyyy format. Most of the values are in mm/dd/yyyy format only but there are also number of values in other format as mentioned.
date_df =pd.DataFrame(data =['01/14/2019',
'1/14/2019',
'1/3/2019',
'1/03/2018',
'01/09/19',
'1/09/17',
'1/9/19',
'1/09/13'])
date_df:
01/14/2019
1/14/2019
1/3/2019
1/03/2018
01/09/19
1/09/17
1/9/19
1/09/13
Expected result :
01/14/2019
01/14/2019
01/03/2019
01/03/2018
01/09/2019
01/09/2017
01/09/2019
01/09/2013

Use to_datetime with Series.dt.strftime for custom format in strings (objects), if need datetimes only omit dt.strftime:
df['col'] = pd.to_datetime(df['col']).dt.strftime('%m/%d/%Y')
print (df)
col
0 01/14/2019
1 01/14/2019
2 01/03/2019
3 01/03/2019
4 01/09/2019
5 01/09/2019
6 01/09/2019
7 01/09/2019

Related

Pyspark : Convert Julian Date to Calendar date

I have a pySpark DataFrame Column with Julian Dates. I tried to convert the date to Calender Date.
number
julian_date
1
17196
2
17199
3
17281
I tried with the below code:
spdf = spdf.withColumn('date_new',functions.to_date(functions.from_unixtime("julian_date")))
However, I am getting output as:
number
julian_date
date_new
1
17196
1970-01-01
2
17199
1970-01-01
3
17281
1970-01-01
Please help. Thanks in advance
Julian date is consists of 2 year numbers and 3 digits of day-of-year.
For example: 17196 is year 2017's 196th day, which is 2017-07-15.
Thus, you can use to_date with using year (y) and day-of-year (D) format. (ref: date pattern)
df.withColumn('date_new', functions.to_date(df.julian_date, 'yyDDD'))
# If julian_date is not String type.
# df.julian_date.cast(StringType())

How to convert data frame for time series analysis in Python?

I have a dataset of around 13000 rows and 2 columns (text and date) for two year period. One of the column is date in yyyy-mm-dd format. I want to perform time series analysis where x axis would be date (each day) and y axis would be frequency of text on corresponding date.
I think if I create a new data frame with unique dates and number of text on corresponding date that would solve my problem.
Sample data
How can I create a new column with frequency of text each day? For example:
Thanks in Advance!
Depending on the task you are trying to solve, i can see two options for this dataset.
Either, as you show in your example, count the number of occurrences of the text field in each day, independently of the value of the text field.
Or, count the number of occurrence of each unique value of the text field each day. You will then have one column for each possible value of the text field, which may make more sense if the values are purely categorical.
First things to do :
import pandas as pd
df = pd.DataFrame(data={'Date':['2018-01-01','2018-01-01','2018-01-01', '2018-01-02', '2018-01-03'], 'Text':['A','B','C','A','A']})
df['Date'] = pd.to_datetime(df['Date']) #convert to datetime type if not already done
Date Text
0 2018-01-01 A
1 2018-01-01 B
2 2018-01-01 C
3 2018-01-02 A
4 2018-01-03 A
Then for option one :
df = df.groupby('Date').count()
Text
Date
2018-01-01 3
2018-01-02 1
2018-01-03 1
For option two :
df[df['Text'].unique()] = pd.get_dummies(df['Text'])
df = df.drop('Text', axis=1)
df = df.groupby('Date').sum()
A B C
Date
2018-01-01 1 1 1
2018-01-02 1 0 0
2018-01-03 1 0 0
The get_dummies function will create one column per possible value of the Text field. Each column is then a boolean indicator for each row of the dataframe, telling us which value of the Text field occurred in this row. We can then simply make a sum aggregation with a groupby by the Date field.
If you are not familiar with the use of groupby and aggregation operation, i recommend that you read this guide first.

get rows by date regardless of format of date in pandas

I have data as follows:
Col1,ColDate
a,2020-09-11 08:43:00
b,2020-09-12 09:43:00
c,13-09-2020 09:43:00
d,09/16/2020 10:43:00
e,09/19/2020 12:43:00
f,09/12/2020 15:43:00
Intention is to get all rows between 11th sep and 13th sept, regardless of the format. In pandas
I am trying the following:
df[df["ColDate"].between('11-09-2020','13-09-2020')]
I get an empty dataframe.
You can try this,
df[pd.to_datetime(df['ColDate']).dt.strftime('%d-%m-%Y').between('11-09-2020','13-09-2020')]
Col1 ColDate
0 a 2020-09-11 08:43:00
1 b 2020-09-12 09:43:00
2 c 13-09-2020 09:43:00
5 f 09/12/2020 15:43:00
but its really hard to say which will be considered month and day, because of the date format being jumbled.
Please Check the snippet. You can first convert your Coldate to pd.to_datetime format and then you can apply a mask over it like this.
df['ColDate'] = pd.to_datetime(df['ColDate'])
mask = (df['ColDate'] > '2020-09-11') & (df['ColDate'] <='2020-09-13')
df = df.loc[mask]
Output
Col1 ColDate
0 a 2020-09-11 08:43:00
1 b 2020-09-12 09:43:00
5 f 2020-09-12 15:43:00

Excel remove timestamp from date and subtract days

So I have a column say Date1 which has date in datetime stamp. I want to subtract 10 days from Date1 column and keep in another column say Date2. I only want to subtract ten days from date not from datetime.
How to remove the time stamp. Read many solutions online but could not find for excel
Input table
Date1
26-03-2000 21:00:00
25-04-2000 00:00:00
21-03-2000 01:00:00
31-03-2000 13:00:00
05-03-2012 12:00:00
Expected output
Date1 Date2 Date1_no_timestamp
26-03-2000 21:00:00 16-03-2000 26-03-2000
25-04-2000 00:00:00 15-04-2000 25-04-2000
21-03-2000 01:00:00 11-03-2000 21-03-2000
31-03-2000 13:00:00 21-03-2000 31-03-2000
05-03-2012 12:00:00 24-02-2012 05-03-2012 and so on
You could use the TEXT() function.
=TEXT(B2, "DD-MM-YYYY")
Alternatively, as the above solution could cause issue based on timezone formatting, you could remove anything past the first space:
=LEFT(B2, FIND(" ",A2,1)-1)
Place either the following in C2 (assuming those headers exist) and drag down.
You could use:
Method 1:
Date1_no_timestamp:
=TEXT(A2,"dd-mm-yyyy")
Date2:
=TEXT(A2-10,"dd-mm-yyyy")
Method 2
Date1_no_timestamp:
=RIGHT("0"&DAY(A2),2)&"-"&RIGHT("0"&MONTH(A2),2) & "-" & YEAR(A2)
Date2:
=TEXT(DATEVALUE(E2)-10,"dd-mm-yyyy")
Results:
You can also use the INT() and TRUNC() functions:
=INT(A2)
=TRUNC(A2)
Their behavior is identical for positive numbers - the decimal part is sliced off.

is there any method in pandas to convert dataframe from day to defaullt d/m/y format?

I would like to convert all day in the data-frame into day/feb/2020 format
here date field consist only day
from first one convert the date field like this
My current approach is:
import datetime
y=[]
for day in planned_ds.Date:
x=datetime.datetime(2020, 5, day)
print(x)
Is there any easy method to convert all day data-frame to d/m/y format?
One way as assuming you have data like
df = pd.DataFrame([1,2,3,4,5], columns=["date"])
is to convert them to dates and then shift them to start when you need them to:
pd.to_datetime(df["date"], unit="D") - pd.datetime(1970,1,1) + pd.datetime(2020,1,31)
this results in
0 2020-02-01
1 2020-02-02
2 2020-02-03
3 2020-02-04
4 2020-02-05

Resources