Write column as date with format Java-Spark - apache-spark

I'm using Java-Spark.
I have the following table in Dataset object:
creationDate
15/06/2018 09:15:28
I make select to this column
Dataset<Row> ds = dataframe.select(new Column("creationDate").as("mydate").cast("date"));
And I write it with:
ds.write().mode(mode).save(hdfsDirectory);
Try also:
ds.write().option("dateFormat","dd/MM/yyyy HH:mm:ss").mode(mode).save(hdfsDirectory);
But When I'm looking on my table the column mydate is null.
How can I write my date into my Hive table? I know the default date format should be dd-MM-yyyy but my text is with dd/MM/yyyy format and I can't change it.
Any suggestions?
Thanks.

Related

How to extract the year and quarter from the String date in databricks SQL

Can someone show me how to extract the year from the String date in databricks SQL.
I am based in the UK and our date format is normally as follows:
dd/mm/yyyy
The field containing the dates is set as StringType()
I am trying to extract the year from the string as follows:
select year(cast(financials_0_accountsDate as Date)) from `financiallimited_csv`
I'm using the following the code to extract the quarter
select quarter(cast(financials_0_accountsDate as Date)) from `financiallimited_csv`
However, both result in NULL values.
Any thoughts on how to extract the year and quarter from dates with StringType() dd/mm/yyyy?
The table looks like the following:
Could you try the to_date function?
select year(to_date(financials_0_accountsDate, 'dd/MM/yyyy')) from `financiallimited_csv`

spark dataframe: date formatting not working

I have a csv file in which a date column has values like 01080600, basically MM-dd-HH-mm.
I want to add a column in dataframe which shows this in a more readable format.
I do :
spark.sql("SELECT date...")
.withColumn("readable date", to_date(col("date"), "MM:dd HH:mm"))
.show(10)
But readable date is returned null.
What am I missing here?
While formating or converting to date or timestamp you need to provide the date_format as is following your pattern , example in your case you need to modify your format as below and further which can be formatted depending on the final format you wish your date col to take using date_format
References to various patterns and parsing can be found here
To Timestamp
sql.sql("""
SELECT
TO_TIMESTAMP('01080600','ddMMhhmm') as date,
DATE_FORMAT(TO_TIMESTAMP('01080600','ddMMhhmm'),'MM/dd hh:mm') as formated_date
""").show()
+-------------------+-------------+
| date|formated_date|
+-------------------+-------------+
|1970-08-01 06:00:00| 08/01 06:00|
+-------------------+-------------+

How to convert a string datatype column to date format in hive

Could you please guide with below query.
I need to convert below string column to date.
Input and expected output is provided in screenshot.
Input table: column maturity_date is in string datatype.
I tried below but not working as expected
to_date(from_unixtime(unix_timestamp(maturity_date,'MM/DD/YYYY H:mm:ss'),'yyyy-mm-dd')
Try using lower case letters. Upper case means another thing (day of year (D) and week-year (Y)).
to_date(from_unixtime(unix_timestamp(maturity_date,'MM/dd/yyyy H:mm:ss'),'yyyy-MM-dd')
Correct input format is 'MM/dd/yyyy H:mm:ss', not 'MM/DD/YYYY H:mm:ss'
Correct output format is yyyy-MM-dd, not yyyy-mm-dd. mm is minutes. MM is month
Read more about date format used in Hive here SimpleDateFormat

Spark-Java:How to convert Dataset string column of format "yyyy-MM-ddThh:mm:ss.SSS+0000" to timestamp with a format?

I have a Dataset with one column lastModified of type string with format "yyyy-MM-ddThh:mm:ss.SSS+0000" (sample data: 2018-08-17T19:58:46.000+0000).
I have to add a new column lastModif_mapped of type Timestamp by converting the lastModified's value to format "yyyy-MM-dd hh:mm:ss.SSS".
I tried the code below, but the new column is getting the value null in it:
Dataset<Row> filtered = null;
filtered = ds1.select(ds1.col("id"),ds1.col("lastmodified"))
.withColumn("lastModif_mapped", functions.unix_timestamp(ds1.col("lastmodified"), "yyyy-MM-dd HH:mm:ss.SSS").cast("timestamp")).alias("lastModif_mapped");
Where am I going wrong?
As I have answered in your original question, your input data String field didn't correspond to allowed formats of the unix_timestamp(Column s, String p):
If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
For you case, you need to use to_timestamp(Column s, String fmt)
import static org.apache.spark.sql.functions.to_timestamp;
...
to_timestamp(ds1.col("lastmodified"), "yyyy-MM-dd'T'HH:mm:ss.SSSXXX")
And you don't need to cast explicitly to Timestamp since to_timestamp returns already Timestamp.
When you use withColumn("lastModif_mapped",...) you don't need to add alias("lastModif_mapped"), because withColumn would create a new column with the provided name.

How to create hive table with date format 'dd-MMM-yyyy'?

I,m trying to create a hive table for importing csv data into table where the date format in the csv file is 'dd-MMM-yyyy' (for example 20-Mar-2018). When i created table in hive it turns out the entire column of date into null values. Can anyone suggest me how to figure out this?
My Query:
create external table new_stock (Symbol String,Series String,Dat date,Prev_Close float,Open_Price float,High_Price float,Low_Price float,Last_Price float,Close_Price float,Avg_Price float,Volume int,Turn_Over float,Trades int,Del_Qty int,DQPQ_Per float) row format delimited fields terminated by ',' stored as textfile LOCATION '/stock_details/'
Finally some help from #leftjoin, i solved the problem of converting string date with format (dd-MMM-yyyy) to (dd-MM-yyyy) by using select query. It would work fine.
select from_unixtime(unix_timestamp(columnname ,'dd-MMM-yyyy'), 'dd-MM-yyyy') from tablename;

Resources