Is there a way to compare DDMMYYYY format date with current date? - mainframe

I have a requirement to get today's transactions into a separate file through JCL SORT. The date format that I have is in DDMMYYYY format.

INCLUDE COND=(20,10,CH,EQ,DATE1), Will not work because DATE1 returns the date in C'yyyymmdd'format
Try reformatting the date in input file and then compare it with DATE1. Refer below SORT CARD
----+----1----+----2----+----3----+----4----+-
//SORTIN DD *
DATA1 01102020
DATA2 07102020
DATA3 07102020
DATA4 01092020
DATA5 01102010
DATA6 01102019
/*
//SORTOUT DD SYSOUT=*
//SYSIN DD *
OPTION COPY
INREC BUILD=(1,28,X,25,4,23,2,21,2)
OUTFIL REMOVECC,
BUILD=(1,28),INCLUDE=(30,08,CH,EQ,DATE1)
/*
Output will be:
DATE2 07102020
DATE3 07102020

Related

How can I convert a specific string date to date or datetime in Spark?

I have this string pattern in my Spark dataframe: 'Sep 14, 2014, 1:34:36 PM'.
I want to convert this to date or datetime format, using Databricks and Spark.
I've already tried the cast and to_date functions, but nothing works and I got null return everytime.
How can I do that?
Thanks in advance!
If we have a created table like this:
var ds = spark.sparkContext.parallelize(Seq(
"Sep 14, 2014, 01:34:36 PM"
)).toDF("date")
Through the following statement:
ds = ds.withColumn("casted", to_timestamp(col("date"), "MMM dd, yyyy, hh:mm:ss aa"))
You get this result:
+-------------------------+-------------------+
|date |casted |
+-------------------------+-------------------+
|Sep 14, 2014, 01:34:36 PM|2014-09-14 13:34:36|
+-------------------------+-------------------+
which should be useful to you. You can use to_date or other APIs that require a datetime format, good luck!
Your date/time stamp string is incorrect. You have 1 instead of 01.
#
# 1 - Create sample dataframe + view
#
# required library
from pyspark.sql.functions import *
# array of tuples - data
dat1 = [
("1", "Sep 14, 2014, 01:34:36 pm")
]
# array of names - columns
col1 = ["row_id", "date_string1"]
# make data frame
df1 = spark.createDataFrame(data=dat1, schema=col1)
# expand date range into list of dates
df1 = df1.withColumn("time_stamp1", to_timestamp(col("date_string1"), "MMM dd, yyyy, hh:mm:ss a"))
# show schema
df1.printSchema()
# show data
display(df1)
This code produces the correct answer.
If the data has 1:34:36, it fails. You can use a when clause to pick the correct conversion.

spark dataframe: date formatting not working

I have a csv file in which a date column has values like 01080600, basically MM-dd-HH-mm.
I want to add a column in dataframe which shows this in a more readable format.
I do :
spark.sql("SELECT date...")
.withColumn("readable date", to_date(col("date"), "MM:dd HH:mm"))
.show(10)
But readable date is returned null.
What am I missing here?
While formating or converting to date or timestamp you need to provide the date_format as is following your pattern , example in your case you need to modify your format as below and further which can be formatted depending on the final format you wish your date col to take using date_format
References to various patterns and parsing can be found here
To Timestamp
sql.sql("""
SELECT
TO_TIMESTAMP('01080600','ddMMhhmm') as date,
DATE_FORMAT(TO_TIMESTAMP('01080600','ddMMhhmm'),'MM/dd hh:mm') as formated_date
""").show()
+-------------------+-------------+
| date|formated_date|
+-------------------+-------------+
|1970-08-01 06:00:00| 08/01 06:00|
+-------------------+-------------+

Pyspark parse datetime field with day and month names into timestamp

I'm not even sure where to start. I want to parse a column that is currently a string into a timestamp. The records look like the following:
Thu, 28 Jan 2021 02:54:17 +0000
What is the best way to parse this as a timestamp? I wasn't even sure where to start since it's not a super common way to store dates
You could probably start from the docs Datetime Patterns for Formatting and Parsing:
import pyspark.sql.functions as F
df = spark.createDataFrame([("Thu, 28 Jan 2021 02:54:17 +0000",)], ['timestamp'])
df.withColumn(
"timestamp",
F.to_timestamp("timestamp", "E, dd MMM yyyy HH:mm:ss Z")
).show()
#+-------------------+
#| timestamp|
#+-------------------+
#|2021-01-28 02:54:17|
#+-------------------+
However, since Spark version 3.0, you can no longer use some symbols like E while parsing to timestamp:
Symbols of ‘E’, ‘F’, ‘q’ and ‘Q’ can only be used for datetime
formatting, e.g. date_format. They are not allowed used for datetime
parsing, e.g. to_timestamp.
You can either set the time parser to legacy:
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
Or use some string functions to remove the day part from string before using to_timestamp:
df.withColumn(
"timestamp",
F.to_timestamp(F.split("timestamp", ",")[1], " dd MMM yyyy HH:mm:ss Z")
).show()

How to convert a string datatype column to date format in hive

Could you please guide with below query.
I need to convert below string column to date.
Input and expected output is provided in screenshot.
Input table: column maturity_date is in string datatype.
I tried below but not working as expected
to_date(from_unixtime(unix_timestamp(maturity_date,'MM/DD/YYYY H:mm:ss'),'yyyy-mm-dd')
Try using lower case letters. Upper case means another thing (day of year (D) and week-year (Y)).
to_date(from_unixtime(unix_timestamp(maturity_date,'MM/dd/yyyy H:mm:ss'),'yyyy-MM-dd')
Correct input format is 'MM/dd/yyyy H:mm:ss', not 'MM/DD/YYYY H:mm:ss'
Correct output format is yyyy-MM-dd, not yyyy-mm-dd. mm is minutes. MM is month
Read more about date format used in Hive here SimpleDateFormat

Batch formatting of Date in multiple datasets

i have extracted a number of excel spreadsheets into sas using proc import. however i now need to join the datasets together and need a uniform date format for all datasets. they are currently in character format and some are structured as '1999Q1' dates and some as '12/02/2013' dates. any help on how i can change formats for all dates in all datasets?
You will need to use INPUT() function to convert the strings to dates so that you can merge them. Let's make some sample datasets to simulate what you might have imported from your Excel sheets.
data have1;
date='1999Q1';
var1=1;
run;
data have2;
date='02DEC2013'd ;
format date yymmdd10.;
var2=2;
run;
Now let's get the variable names and types from those datasets.
proc contents data=work._all_ noprint out=contents; run;
We can use this metadata to write some code to convert the strings into dates.
filename code temp;
data _null_;
set contents;
where upcase(name)='DATE' and type=2;
file code ;
length dsn $41;
dsn=catx('.',libname,memname);
put 'data ' dsn ';'
/ ' set ' dsn ';'
/ ' datenum=input(date,anydtdte.);'
/ ' format datenum yymmdd10.;'
/ ' rename datenum=date date=datechar;'
/ 'run;'
;
run;
%inc code / source2 ;
Now we can merge the datasets.
data want ;
merge have1 have2;
by date;
run;

Resources