Jde julian date to calender date in pyspark - apache-spark

I have dataframe df with column name ILDGL which record date in jde julian date format. I tried to convert that julian date into calender date and store in column ILDGL_Normal But I am not sucessful.
df = df.withColumn("ILDGL_Normal", to_date(concat(lit("20"), col("ILDGL")), "yyMMdd"))
Julian date 123002 mean 2023-01-02, 000001 mean 1900-01-01. How can i convert jde enterprise julian date into normal date format of YYYY-MM-DD?

The JDE Julian date format is CYYDDD
C - Century, Y - Year, D -Day of year
We can ignore Century and convert date to yyyy-MM-dd using to_date function. After that add years by checking centruy.
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
df = spark.createDataFrame([('123002',), ('201002',), ('301002',)], ['jde_julian_date'])
df.withColumn("std_date",
add_months(to_date(substring("jde_julian_date", 2, 5), 'yyDDD'),
when(substring("jde_julian_date", 0, 1) > 1,
(substring("jde_julian_date", 0, 1) - 1) * 100 * 12).
otherwise(0))).show()
+---------------+----------+
|jde_julian_date| std_date|
+---------------+----------+
| 123002|2023-01-02|
| 201002|2101-01-02|
| 301002|2201-01-02|
+---------------+----------+

Related

Pyspark : Convert Julian Date to Calendar date

I have a pySpark DataFrame Column with Julian Dates. I tried to convert the date to Calender Date.
number
julian_date
1
17196
2
17199
3
17281
I tried with the below code:
spdf = spdf.withColumn('date_new',functions.to_date(functions.from_unixtime("julian_date")))
However, I am getting output as:
number
julian_date
date_new
1
17196
1970-01-01
2
17199
1970-01-01
3
17281
1970-01-01
Please help. Thanks in advance
Julian date is consists of 2 year numbers and 3 digits of day-of-year.
For example: 17196 is year 2017's 196th day, which is 2017-07-15.
Thus, you can use to_date with using year (y) and day-of-year (D) format. (ref: date pattern)
df.withColumn('date_new', functions.to_date(df.julian_date, 'yyDDD'))
# If julian_date is not String type.
# df.julian_date.cast(StringType())

How to extract or validate date format from a text using python?

I'm trying to execute this code:
import datefinder
string_with_dates = 'The stock has a 04/30/2009 great record of positive Sept 1st, 2005 earnings surprises, having beaten the trade Consensus EPS estimate in each of the last four quarters. In its last earnings report on May 8, 2018, Triple-S Management reported EPS of $0.6 vs.the trade Consensus of $0.24 while it beat the consensus revenue estimate by 4.93%.'
matches = datefinder.find_dates(string_with_dates)
for match in matches:
print(match)
The output is:
2009-04-30 00:00:00
2005-09-01 00:00:00
2018-05-08 00:00:00
2019-02-04 00:00:00
The last date has come due to the percentage value 4.93% ... How to overcome this situation?
I cannot fix the datefinder module issue. You stated that you needed a solution, so I put this together for you. It's a work in progress, which means that you can adjusted it as needed. Also, some of the regex could have been consolidated, but I wanted to break them out for you. Hopefully, this answer helps you until you find another solution that works better for your needs.
import re
string_with_dates = 'The stock has a 04/30/2009 great record of positive Sept 1st, 2005 earnings surprises having beaten the trade Consensus EPS estimate in each of the last ' \
'four quarters In its last earnings report on March 8, 2018, Triple-S Management reported EPS of $0.6 vs.the trade Consensus of $0.24 while it beat the ' \
'consensus revenue estimate by 4.93%. The next trading day will occur at 2019-02-15T12:00:00-06:30'
def find_dates(input):
'''
This function is used to extract date strings from provide text.
Symbol references:
YYYY = four-digit year
MM = two-digit month (01=January, etc.)
DD = two-digit day of month (01 through 31)
hh = two digits of hour (00 through 23) (am/pm NOT allowed)
mm = two digits of minute (00 through 59)
ss = two digits of second (00 through 59)
s = one or more digits representing a decimal fraction of a second
TZD = time zone designator (Z or +hh:mm or -hh:mm)
:param input: text
:return: date string
'''
date_formats = [
# Matches date format MM/DD/YYYY
'(\d{2}\/\d{2}\/\d{4})',
# Matches date format MM-DD-YYYY
'(\d{2}-\d{2}-\d{4})',
# Matches date format YYYY/MM/DD
'(\d{4}\/\d{1,2}\/\d{1,2})',
# Matches ISO 8601 format (YYYY-MM-DD)
'(\d{4}-\d{1,2}-\d{1,2})',
# Matches ISO 8601 format YYYYMMDD
'(\d{4}\d{2}\d{2})',
# Matches full_month_name dd, YYYY or full_month_name dd[suffixes], YYYY
'(January|February|March|April|May|June|July|August|September|October|November|December)(\s\d{1,2}\W\s\d{4}|\s\d(st|nd|rd|th)\W\s\d{4})',
# Matches abbreviated_month_name dd, YYYY or abbreviated_month_name dd[suffixes], YYYY
'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sept|Oct|Nov|Dec)(\s\d{1,2}\W\s\d{4}|\s\d(st|nd|rd|th)\W\s\d{4})',
# Matches ISO 8601 format with time and time zone
# yyyy-mm-ddThh:mm:ss.nnnnnn+|-hh:mm
'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\+|-)\d{2}:\d{2}',
# Matches ISO 8601 format Datetime with timezone
# yyyymmddThhmmssZ
'\d{8}T\d{6}Z',
# Matches ISO 8601 format Datetime with timezone
# yyyymmddThhmmss+|-hhmm
'\d{8}T\d{6}(\+|-)\d{4}'
]
for item in date_formats:
date_format = re.compile(r'\b{}\b'.format(item), re.IGNORECASE|re.MULTILINE)
find_date = re.search(date_format, input)
if find_date:
print (find_date.group(0))
find_dates(string_with_dates)
# outputs
04/30/2009
March 8, 2018
Sept 1st, 2005
2019-02-15T12:00:00-06:30

How to create a date in excel

I have excel file with 2 columns, year, and month. The year is given in YYYY format and the month the full name of the month like January, February.
Data sample,
column1, Column2
2014, January
2014, February
2018, March
As of now I have tried below code which is giving the error.
=date(A2, B2, 01)
and it is not returning any date values.
Requesting your assistance.
It will also work for me if it is solved by sas code.
Try:
=DATEVALUE(CONCATENATE(B3," 1, ",A3))
This will concatenate the month, follow by a 1 and the year (ex: January 1, 2014). It will then convert this to a datevalue (dates are stored as numbers in Excel). If you format your cell as a date (MMMM YYYY) you will have the desired result.
Try following
=DATEVALUE("1-" & LEFT(B1,3) & "-" & A1)
If you were importing the data into SAS, here is a solution to create a date from month name and year variables.
data have;
length month $ 15;
infile datalines delimiter=',';
input year month $;
datalines;
2014, January
2014, February
2018, March
;
Run;
Data want(keep=date);
Set Have;
/*Length mon $3 yr $4 dt $15;*/
/*Mon=substr(month,1,3);*/
/*Yr=put(year,4.);*/
/*Dt=cats('01',mon,yr);*/
/*Date = input(dt,date9.);*/
/* All in one line */
Date = input(cats('01',substr(month,1,3),put(year,4.)),date9.);
Format DATE mmddyy10.;
Run;

Invalid Dates Python

I am new to Python. I was just wondering, how can you write code that makes beyond a certain date an invalid input. For example, if the user inputs anything after 12/02/2013, it will produce an error. Everything after that date will work perfectly
As glibdud suggested, use datetime objects.
date = datetime.date(YYYY, MM, DD)
where (YYYY, MM, DD) are integers representing years, months, and days. The condition can then be checked in your script with
inputDate > maxDate
for example:
import datetime
maxDate = datetime.date(2013, 12, 2)
y = int(input('Enter year:'))
m = int(input('Enter numerical month (1-12):'))
d = int(input('Enter numerical day (1-31):'))
inputDate = datetime.date(y, m, d)
if inputDate > maxDate:
print('Error - date after 02 December 2013')
else:
print('Success!')
Gives:
Enter year:2018
Enter numerical month (1-12):1
Enter numerical day (1-31):1
Error - date after 02 December 2013
and
Enter year:2000
Enter numerical month (1-12):1
Enter numerical day (1-31):1
Success!

Convert quarter-year to MM/DD/YYYY in Excel

How to convert this quarter-year format into MM/DD/YYYY in Excel where quarter will convert to first day of quarter. For example,
Q1-2014 to 1/1/2014,
Q2-2015 to 4/1/2015,
Q3-2016 to 7/1/2016,
Q4-2017 to 10/1/2017
Try
=DATE(RIGHT(A1,4),(MID(A1,2,1)*3)-2,1)
That will return a date. Format to display in whatever date format you want.
Here is a formula you could use
=DATE(MID(A1,4,4),IF(MID(A1,1,2)= "Q1", 1, IF(MID(A1, 1, 2) = "Q2", 4, IF(MID(A1,1,2)="Q3", 7, 10))), 1)
Assumptions
The format is Q[1/2/3/4]-YYYY
Formula
Take the last 4 digits as Year
Take Month as 1 if Q1 else 3 if month is Q3,...
Take Date as 1

Resources