How to create hive table with date format 'dd-MMM-yyyy'? - string

I,m trying to create a hive table for importing csv data into table where the date format in the csv file is 'dd-MMM-yyyy' (for example 20-Mar-2018). When i created table in hive it turns out the entire column of date into null values. Can anyone suggest me how to figure out this?
My Query:
create external table new_stock (Symbol String,Series String,Dat date,Prev_Close float,Open_Price float,High_Price float,Low_Price float,Last_Price float,Close_Price float,Avg_Price float,Volume int,Turn_Over float,Trades int,Del_Qty int,DQPQ_Per float) row format delimited fields terminated by ',' stored as textfile LOCATION '/stock_details/'

Finally some help from #leftjoin, i solved the problem of converting string date with format (dd-MMM-yyyy) to (dd-MM-yyyy) by using select query. It would work fine.
select from_unixtime(unix_timestamp(columnname ,'dd-MMM-yyyy'), 'dd-MM-yyyy') from tablename;

Related

Azure Databricks Delta Table modifies the TIMESTAMP format while writing from Spark DataFrame

I am new to Azure Databricks,I am trying to write a dataframe output to a delta table which consists TIMESTAMP column. But strangely it changes the TIMESTAMP pattern after writing to delta table.
My DataFrame Output column holds the value in this format 2022-05-13 17:52:09.771,
But After writing it to the Table, The column value is getting populated as
2022-05-13T17:52:09.771+0000
I am using below function to generate this Dataframe output
val pretsUTCText = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")
val tsUTCText: String = pretsUTCTextNew.format(ts)
val tsUTCCol : Column = lit(tsUTCText)
val df = df2.withColumn(to_timestamp(timestampConverter.tsUTCCol,"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
The Dataframe output is returning 2022-05-13 17:52:09.771 as TIMESTAMP pattern.
But After writing it to Delta Table I see the same value is getting populated as 2022-05-13T17:52:09.771+0000
Thanks in Advance. I could not find any solution.
I have just found the same behaviour on Databricks as you, and it behaves differently than the Databricks document. It seems after some versions Databricks show timezone as a default so you see additional +0000. I think you can use date_format function when you populate data if you don't want it. Also, I think you don't need 'Z' in format text as it is for timezone. See the screenshot below.

HIVE rendered timestamp column data as NULL

I am trying to create an external table using Hive. Below is the Hive query I ran:
create external table trips_raw
(
VendorID int,
tpep_pickup_datetime timestamp,
tpep_dropoff_datetime timestamp
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' location '/user/taxi_trips/';
When I looked at the output from the 'trips_raw' table created by the query above, I saw that both the 'tpep_pickup_date_time' and 'tpep_dropoff_datetime' columns are 'NULL' in all rows. I have seen other threads talked about the reason being that the '1/1/2018 11:13:00 AM' timestamp format is not accepted by Hive, but problem is that's the timestamp format I have in my csv source data (as you can see from screenshot here).
I could specify those 2 timestamp columns as 'string' and Hive will be able to render them correctly, but I still would want those 2 columns to be 'timestamp' type so specifying those 2 columns as 'string' type is not a viable option here.
I had also tried the following technique using recommendation from this site (https://community.hortonworks.com/questions/55266/hive-date-time-problem.html) but had no success:
Create the 'trips_raw' table using 'string' as type for the 2 timestamp columns. This allows the resulting table to render the timestamps correctly, albeit in 'string' type. The Hive command I used is shown below:
create external table trips_raw
(
VendorID int,
tpep_pickup_datetime string,
tpep_dropoff_datetime string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' location
'/user/taxi_trips/';
When I look at the resulting table, the dates are shown as string as you can see from this screenshot below.
But as I had mentioned earlier, I want the time columns to be in timestamp type and not string type. Therefore in the next 2 steps I tried to create a blank table and then insert the data from the table created from Step 1 but converting the string to timestamp this time.
Create an external blank table called 'trips_not_raw' using the following Hive commands:
create external table trips_not_raw
(VendorID int,
tpep_pickup_datetime timestamp,
tpep_dropoff_datetime timestamp
);
Insert data from 'trips_raw' table (which was mentioned earlier in this question), using the Hive commands below:
insert into table trips_not_raw select vendorid,
from_unixtime(unix_timestamp(tpep_pickup_datetime, 'MM/dd/yyyy HH:mm:ss
aa')) as tpep_pickup_datetime,
from_unixtime(unix_timestamp(tpep_dropoff_datetime, 'MM/dd/yyyy HH:mm:ss
aa')) as tpep_dropoff_datetime
from trips_raw;
Doing this inserts the rows into the blank table 'trips_not_raw', but the results from the 2 timestamp columns still showed as 'Null' as you can see from the screenshot below:
Is there a simple way to store the 2 time columns as 'timestamp' type and not 'string', but still be able to render them correctly in the output without seeing 'Null/None'?
I'm afraid you need to parse timestamp column and then cast string as timestamp. For example,
select cast(regexp_replace('1/1/2018 11:13:00 AM', '(\\d{1,2})/(\\d{1,2})/(\\d{4})\\s(\\d{2}:\\d{2}:\\d{2}) \\w{2}', '$3-$1-$2 $4') as timestamp)
You can create and use a macro function for convenience, e.g.,
create temporary macro parse_date (ts string)
cast(regexp_replace(ts, '(\\d{1,2})/(\\d{1,2})/(\\d{4})\\s(\\d{2}:\\d{2}:\\d{2}) \\w{2}', '$3-$1-$2 $4') as timestamp);
then use it as follows
select parse_date('1/1/2018 11:13:00 AM');

Write column as date with format Java-Spark

I'm using Java-Spark.
I have the following table in Dataset object:
creationDate
15/06/2018 09:15:28
I make select to this column
Dataset<Row> ds = dataframe.select(new Column("creationDate").as("mydate").cast("date"));
And I write it with:
ds.write().mode(mode).save(hdfsDirectory);
Try also:
ds.write().option("dateFormat","dd/MM/yyyy HH:mm:ss").mode(mode).save(hdfsDirectory);
But When I'm looking on my table the column mydate is null.
How can I write my date into my Hive table? I know the default date format should be dd-MM-yyyy but my text is with dd/MM/yyyy format and I can't change it.
Any suggestions?
Thanks.

How to read the ROW FORMAT DELIMITED with SEQUENCEFILE in Spark SQL

I have the following Hive table definition:
CREATE EXTERNAL TABLE english_1grams (
gram string,
year int,
occurrences bigint,
pages bigint,
books bigint
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS SEQUENCEFILE
location 's3://datasets.elasticmapreduce/ngrams/books/20090715/eng-all/1gram/';
From: http://aws.amazon.com/articles/Elastic-MapReduce/5249664154115844
It works just fine in Hive. However, when trying to use it wirh Spark, it gives an error:
Operation not allowed: ROW FORMAT DELIMITED is only compatible with 'textfile', not 'sequencefile'(line 1, pos 0)
How can I read this table in Spark SQL? I've removed the ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' from the definition but that returns just gibberish instead of the actual data.

Date format issue in ssis

I have to import data from Excel file to SSIS but i am facing a problem in date column,in excel sheet date format is yyyy/mm/dd and when it gets upload in database it get change into yyyy/dd/mmm format.
How to fix this?
Use the SUBSTRING function in the derived column while importing the date,
(LEN(TRIM(SUBSTRING(ReceivedDateTime,1,8))) > 0 ? (DT_DBDATE)(SUBSTRING(ReceivedDateTime,1,4) + "-" + SUBSTRING(ReceivedDateTime,5,2) + "-" + SUBSTRING(ReceivedDateTime,7,2)) : (DT_DBDATE)NULL(DT_WSTR,5))
If the Data is there then use Substring function to extract the exact date that sets in the DB or if the date does not exist then insert NULL in the DB.
I see two options:
Data Conversion Transformation to convert to a text string in
the appropriate format. Using SSIS data types.
Add a script task that converts the data type. Using VB data types
First Create Table into Your Database Using below Command
CREATE TABLE [dbo].[Manual] (
[Name] nvarchar(255),
[Location] nvarchar(255),
[Date] datetime
)
SET DATEFORMAT YDM
By using DATEFORMAT YDM ,Date Will import in YYYY/DD/MM Format .Before runnung package modify the package and at the time of Column mapping select The Check Box "Delete Rows in Destination Table" .
Then Execute the Package . It Will work .

Resources