I have a csv file in which a date column has values like 01080600, basically MM-dd-HH-mm.
I want to add a column in dataframe which shows this in a more readable format.
I do :
spark.sql("SELECT date...")
.withColumn("readable date", to_date(col("date"), "MM:dd HH:mm"))
.show(10)
But readable date is returned null.
What am I missing here?
While formating or converting to date or timestamp you need to provide the date_format as is following your pattern , example in your case you need to modify your format as below and further which can be formatted depending on the final format you wish your date col to take using date_format
References to various patterns and parsing can be found here
To Timestamp
sql.sql("""
SELECT
TO_TIMESTAMP('01080600','ddMMhhmm') as date,
DATE_FORMAT(TO_TIMESTAMP('01080600','ddMMhhmm'),'MM/dd hh:mm') as formated_date
""").show()
+-------------------+-------------+
| date|formated_date|
+-------------------+-------------+
|1970-08-01 06:00:00| 08/01 06:00|
+-------------------+-------------+
Related
I have input col as 12/03/08 viz dd/MM/yy . So I beed it in the format of dd/MM/yyyy . SO I have use the below transformation as :
toString(toDate(col,'dd/MM/yy'),'dd/MM/yyyy')
This works fine.
But at the end i need to conver this col to date datatype with the same format as dd/MM/yyyy . When i do the CAST transformation like
In cast , i have converted it to date and given the format as dd/MM/yyyy .Its giving the date but in default format like dd-MM-yyyy . I need slashes instead of dashes.
I tried to do
toDate( col, 'dd/MM/yyyy') ,Still i get the default format dd-MM-yyyy.
How to fix this.
When you CAST in SQL for conversion, the resulted format is determined by the default format of your database. If you want to ensure that the date is formatted in a specific way, you can use the DATE_FORMAT function in combination with the toDate function.
If you want to get the wanted dd/MM/yy to the format dd/MM/yyyy, you can use the following transformation:
toString(toDate(col, 'dd/MM/yy'), 'dd/MM/yyyy')
This first converts the string to a type using toDate and then formats itt as a string with the wanted format using the toString function.
However, if you then want to convert the string column back to a date datatype with the format dd/MM/yyyy, you cant simply use the CAST function with the desired format, as the resulting format will still be determined by the default format of your database. Instead, you can use the DATE_FORMAT function to format the date as a string with the desired format, and then convert it back to a date datatype using the toDate function.
To convert the string column col back to a date datatype with the format dd/MM/yyyy, you can use the following transformation:
toDate(DATE_FORMAT(col, 'dd/MM/yyyy'), 'dd/MM/yyyy')
This first formats the date in your string column as a string with the desired format using the DATE_FORMAT function, and then converts it back to a date datatype with the same format using the toDate function.
Could you please guide with below query.
I need to convert below string column to date.
Input and expected output is provided in screenshot.
Input table: column maturity_date is in string datatype.
I tried below but not working as expected
to_date(from_unixtime(unix_timestamp(maturity_date,'MM/DD/YYYY H:mm:ss'),'yyyy-mm-dd')
Try using lower case letters. Upper case means another thing (day of year (D) and week-year (Y)).
to_date(from_unixtime(unix_timestamp(maturity_date,'MM/dd/yyyy H:mm:ss'),'yyyy-MM-dd')
Correct input format is 'MM/dd/yyyy H:mm:ss', not 'MM/DD/YYYY H:mm:ss'
Correct output format is yyyy-MM-dd, not yyyy-mm-dd. mm is minutes. MM is month
Read more about date format used in Hive here SimpleDateFormat
I have a Dataset with one column lastModified of type string with format "yyyy-MM-ddThh:mm:ss.SSS+0000" (sample data: 2018-08-17T19:58:46.000+0000).
I have to add a new column lastModif_mapped of type Timestamp by converting the lastModified's value to format "yyyy-MM-dd hh:mm:ss.SSS".
I tried the code below, but the new column is getting the value null in it:
Dataset<Row> filtered = null;
filtered = ds1.select(ds1.col("id"),ds1.col("lastmodified"))
.withColumn("lastModif_mapped", functions.unix_timestamp(ds1.col("lastmodified"), "yyyy-MM-dd HH:mm:ss.SSS").cast("timestamp")).alias("lastModif_mapped");
Where am I going wrong?
As I have answered in your original question, your input data String field didn't correspond to allowed formats of the unix_timestamp(Column s, String p):
If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
For you case, you need to use to_timestamp(Column s, String fmt)
import static org.apache.spark.sql.functions.to_timestamp;
...
to_timestamp(ds1.col("lastmodified"), "yyyy-MM-dd'T'HH:mm:ss.SSSXXX")
And you don't need to cast explicitly to Timestamp since to_timestamp returns already Timestamp.
When you use withColumn("lastModif_mapped",...) you don't need to add alias("lastModif_mapped"), because withColumn would create a new column with the provided name.
I have table which stores datetime as varchar
Format looks like this 2018-07-16 15:00:00.0 ,
I want to parse this to extract only date part so that I use date part to compare with date in string format such as '2018-07-20' in where clause. What is the best way to achieve this in presto?
This particular format (based on example value 2018-07-16 15:00:00.0 in the question) is understood by cast from varchar to timestamp. You then need to extract date part with another cast:
presto> SELECT CAST(CAST('2018-07-16 15:00:00.0' AS timestamp) AS date);
_col0
------------
2018-07-16
(1 row)
I'm using Java-Spark.
I have the following table in Dataset object:
creationDate
15/06/2018 09:15:28
I make select to this column
Dataset<Row> ds = dataframe.select(new Column("creationDate").as("mydate").cast("date"));
And I write it with:
ds.write().mode(mode).save(hdfsDirectory);
Try also:
ds.write().option("dateFormat","dd/MM/yyyy HH:mm:ss").mode(mode).save(hdfsDirectory);
But When I'm looking on my table the column mydate is null.
How can I write my date into my Hive table? I know the default date format should be dd-MM-yyyy but my text is with dd/MM/yyyy format and I can't change it.
Any suggestions?
Thanks.