HIVE rendered timestamp column data as NULL - string

I am trying to create an external table using Hive. Below is the Hive query I ran:
create external table trips_raw
(
VendorID int,
tpep_pickup_datetime timestamp,
tpep_dropoff_datetime timestamp
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' location '/user/taxi_trips/';
When I looked at the output from the 'trips_raw' table created by the query above, I saw that both the 'tpep_pickup_date_time' and 'tpep_dropoff_datetime' columns are 'NULL' in all rows. I have seen other threads talked about the reason being that the '1/1/2018 11:13:00 AM' timestamp format is not accepted by Hive, but problem is that's the timestamp format I have in my csv source data (as you can see from screenshot here).
I could specify those 2 timestamp columns as 'string' and Hive will be able to render them correctly, but I still would want those 2 columns to be 'timestamp' type so specifying those 2 columns as 'string' type is not a viable option here.
I had also tried the following technique using recommendation from this site (https://community.hortonworks.com/questions/55266/hive-date-time-problem.html) but had no success:
Create the 'trips_raw' table using 'string' as type for the 2 timestamp columns. This allows the resulting table to render the timestamps correctly, albeit in 'string' type. The Hive command I used is shown below:
create external table trips_raw
(
VendorID int,
tpep_pickup_datetime string,
tpep_dropoff_datetime string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' location
'/user/taxi_trips/';
When I look at the resulting table, the dates are shown as string as you can see from this screenshot below.
But as I had mentioned earlier, I want the time columns to be in timestamp type and not string type. Therefore in the next 2 steps I tried to create a blank table and then insert the data from the table created from Step 1 but converting the string to timestamp this time.
Create an external blank table called 'trips_not_raw' using the following Hive commands:
create external table trips_not_raw
(VendorID int,
tpep_pickup_datetime timestamp,
tpep_dropoff_datetime timestamp
);
Insert data from 'trips_raw' table (which was mentioned earlier in this question), using the Hive commands below:
insert into table trips_not_raw select vendorid,
from_unixtime(unix_timestamp(tpep_pickup_datetime, 'MM/dd/yyyy HH:mm:ss
aa')) as tpep_pickup_datetime,
from_unixtime(unix_timestamp(tpep_dropoff_datetime, 'MM/dd/yyyy HH:mm:ss
aa')) as tpep_dropoff_datetime
from trips_raw;
Doing this inserts the rows into the blank table 'trips_not_raw', but the results from the 2 timestamp columns still showed as 'Null' as you can see from the screenshot below:
Is there a simple way to store the 2 time columns as 'timestamp' type and not 'string', but still be able to render them correctly in the output without seeing 'Null/None'?

I'm afraid you need to parse timestamp column and then cast string as timestamp. For example,
select cast(regexp_replace('1/1/2018 11:13:00 AM', '(\\d{1,2})/(\\d{1,2})/(\\d{4})\\s(\\d{2}:\\d{2}:\\d{2}) \\w{2}', '$3-$1-$2 $4') as timestamp)
You can create and use a macro function for convenience, e.g.,
create temporary macro parse_date (ts string)
cast(regexp_replace(ts, '(\\d{1,2})/(\\d{1,2})/(\\d{4})\\s(\\d{2}:\\d{2}:\\d{2}) \\w{2}', '$3-$1-$2 $4') as timestamp);
then use it as follows
select parse_date('1/1/2018 11:13:00 AM');

Related

how to store double colon values in oracle database table

I have excel which i am trying to import in oracle database table.
Some of the values in excel consist of for example 14:39.5 with double colon. What dataype in oracle database table i should provide to store this value ?
Currently have given varchar datatype and its throwing an error during import as :
Conversion error! Value: "00:12:01.615518000" to data type: "Number". Row ignored! Value is '00:12:01.615518000'. Cannot be converted to a decimal number object. Valid format: 'Unformatted'
You can store it as an INTERVAL DAY(0) TO SECOND(9) data type:
CREATE TABLE table_name (
time INTERVAL DAY(0) TO SECOND(9)
);
Then you can use TO_DSINTERVAL passing your value with '0 ' prepended to the start:
INSERT INTO table_name (time)
VALUES ( TO_DSINTERVAL('0 ' || '00:12:01.615518000') );
db<>fiddle here
If it is part of a date/time stamp then you could store it as DATE or TIMESTAMP if you could add the date component. Oracle doesn't have just a TIME data type.
If you can't add a date component to it, then assuming it is a time interval you could convert it to seconds or microseconds (lose the colons) and store it as a NUMBER.
If you want to maintain the exact formatting as shown, your only option is to store it as text using VARCHAR2 or something similar.

SQLite returns int instead of real

I'm working on a website in node js and use SQLite as a database for the first time.
I want to be able to use real for some form data and I noticed that every real in my database are converted to integer once the query is made.
To vizualize the database i am using DB Browser and i checked if the columns are defined as REAL which they are.
If i try to query a data set as 0.1 in my DB I get this :
sqlite> select step_variable
from variables
where id=38;
0.0
After trying as suggested the command TYPEOF(step_variable) it returned :
0.0|real
In the SQLite CREATE TABLE command, one defines a data type affinity, not a data type. SQLite supports the following five column affinities: TEXT, NUMERIC, INTEGER, REAL, NONE.
Thus the data type you specify when creating a table does not enforce a certain data type. You can supply any data type you want or even omit the data type.
CREATE TABLE table1(
column1 ABC,
column2 Others,
column3 WHATEVER);
CREATE TABLE table2(column1, column2, column3);
Populate tables:
INSERT INTO table1 VALUES( 1, 'my text', 123.45);
INSERT INTO table2 VALUES( 1, 'my text', 123.45);
Now let us check what SQLite made out of it:
SELECT column1, TYPEOF(column1) from table1
SELECT column2, TYPEOF(column1) from table1
SELECT column3, TYPEOF(column1) from table1
With:
column TYPEOF(column)
------------------------
1 INTEGER
my text TEXT
123.45 REAL
When you go through a query result e.g. by using sqlite2_step you can use the sqlite3_column_type statement to confirm the column type - unless you know the result anyway and simply cast the result to the data type expected.
Martin
I found the solution it was simply that i didn't save my file after modifying it.

org.apache.spark.sql.AnalysisException: while saving Spark Dataframe

I have 1 table in 2 tables in my database.I am tring to save data from first table to second table using insertInto.
CREATE TABLE if not exists dbname.tablename_csv ( id STRING, location STRING, city STRING, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE ;
CREATE TABLE if not exists dbname.tablename_orc ( id String,location STRING, country String PARTITIONED BY (city string) CLUSTERED BY (country) into 4 buckets ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORCFILE tblproperties("orc.compress"="SNAPPY");
var query=spark.sql("id,location,city,country from dbname.tablename_csv")
query.write.insertInto("dbname.tablename_orc")
but its giving issue."
"org.apache.spark.sql.AnalysisException: `dbname`.`tablename_orc` requires that the data to be inserted have the same number of columns as the target table: target table has 3 column(s) but the inserted data has 4 column(s), including 0 partition column(s) having constant value(s).;"
Plese someone give me a hint what else need to add.I tried by adding partitionBy also but got same error and was showing partitionBy not Required.
query.write.partitionBy("city").insertInto("dbname.tablename_orc")
saveAsTable(...) with mode = "append"

How to create hive table with date format 'dd-MMM-yyyy'?

I,m trying to create a hive table for importing csv data into table where the date format in the csv file is 'dd-MMM-yyyy' (for example 20-Mar-2018). When i created table in hive it turns out the entire column of date into null values. Can anyone suggest me how to figure out this?
My Query:
create external table new_stock (Symbol String,Series String,Dat date,Prev_Close float,Open_Price float,High_Price float,Low_Price float,Last_Price float,Close_Price float,Avg_Price float,Volume int,Turn_Over float,Trades int,Del_Qty int,DQPQ_Per float) row format delimited fields terminated by ',' stored as textfile LOCATION '/stock_details/'
Finally some help from #leftjoin, i solved the problem of converting string date with format (dd-MMM-yyyy) to (dd-MM-yyyy) by using select query. It would work fine.
select from_unixtime(unix_timestamp(columnname ,'dd-MMM-yyyy'), 'dd-MM-yyyy') from tablename;

How to read the ROW FORMAT DELIMITED with SEQUENCEFILE in Spark SQL

I have the following Hive table definition:
CREATE EXTERNAL TABLE english_1grams (
gram string,
year int,
occurrences bigint,
pages bigint,
books bigint
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS SEQUENCEFILE
location 's3://datasets.elasticmapreduce/ngrams/books/20090715/eng-all/1gram/';
From: http://aws.amazon.com/articles/Elastic-MapReduce/5249664154115844
It works just fine in Hive. However, when trying to use it wirh Spark, it gives an error:
Operation not allowed: ROW FORMAT DELIMITED is only compatible with 'textfile', not 'sequencefile'(line 1, pos 0)
How can I read this table in Spark SQL? I've removed the ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' from the definition but that returns just gibberish instead of the actual data.

Resources