I am trying to create an external table using Hive. Below is the Hive query I ran:
create external table trips_raw
(
VendorID int,
tpep_pickup_datetime timestamp,
tpep_dropoff_datetime timestamp
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' location '/user/taxi_trips/';
When I looked at the output from the 'trips_raw' table created by the query above, I saw that both the 'tpep_pickup_date_time' and 'tpep_dropoff_datetime' columns are 'NULL' in all rows. I have seen other threads talked about the reason being that the '1/1/2018 11:13:00 AM' timestamp format is not accepted by Hive, but problem is that's the timestamp format I have in my csv source data (as you can see from screenshot here).
I could specify those 2 timestamp columns as 'string' and Hive will be able to render them correctly, but I still would want those 2 columns to be 'timestamp' type so specifying those 2 columns as 'string' type is not a viable option here.
I had also tried the following technique using recommendation from this site (https://community.hortonworks.com/questions/55266/hive-date-time-problem.html) but had no success:
Create the 'trips_raw' table using 'string' as type for the 2 timestamp columns. This allows the resulting table to render the timestamps correctly, albeit in 'string' type. The Hive command I used is shown below:
create external table trips_raw
(
VendorID int,
tpep_pickup_datetime string,
tpep_dropoff_datetime string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' location
'/user/taxi_trips/';
When I look at the resulting table, the dates are shown as string as you can see from this screenshot below.
But as I had mentioned earlier, I want the time columns to be in timestamp type and not string type. Therefore in the next 2 steps I tried to create a blank table and then insert the data from the table created from Step 1 but converting the string to timestamp this time.
Create an external blank table called 'trips_not_raw' using the following Hive commands:
create external table trips_not_raw
(VendorID int,
tpep_pickup_datetime timestamp,
tpep_dropoff_datetime timestamp
);
Insert data from 'trips_raw' table (which was mentioned earlier in this question), using the Hive commands below:
insert into table trips_not_raw select vendorid,
from_unixtime(unix_timestamp(tpep_pickup_datetime, 'MM/dd/yyyy HH:mm:ss
aa')) as tpep_pickup_datetime,
from_unixtime(unix_timestamp(tpep_dropoff_datetime, 'MM/dd/yyyy HH:mm:ss
aa')) as tpep_dropoff_datetime
from trips_raw;
Doing this inserts the rows into the blank table 'trips_not_raw', but the results from the 2 timestamp columns still showed as 'Null' as you can see from the screenshot below:
Is there a simple way to store the 2 time columns as 'timestamp' type and not 'string', but still be able to render them correctly in the output without seeing 'Null/None'?
I'm afraid you need to parse timestamp column and then cast string as timestamp. For example,
select cast(regexp_replace('1/1/2018 11:13:00 AM', '(\\d{1,2})/(\\d{1,2})/(\\d{4})\\s(\\d{2}:\\d{2}:\\d{2}) \\w{2}', '$3-$1-$2 $4') as timestamp)
You can create and use a macro function for convenience, e.g.,
create temporary macro parse_date (ts string)
cast(regexp_replace(ts, '(\\d{1,2})/(\\d{1,2})/(\\d{4})\\s(\\d{2}:\\d{2}:\\d{2}) \\w{2}', '$3-$1-$2 $4') as timestamp);
then use it as follows
select parse_date('1/1/2018 11:13:00 AM');
Related
I have excel which i am trying to import in oracle database table.
Some of the values in excel consist of for example 14:39.5 with double colon. What dataype in oracle database table i should provide to store this value ?
Currently have given varchar datatype and its throwing an error during import as :
Conversion error! Value: "00:12:01.615518000" to data type: "Number". Row ignored! Value is '00:12:01.615518000'. Cannot be converted to a decimal number object. Valid format: 'Unformatted'
You can store it as an INTERVAL DAY(0) TO SECOND(9) data type:
CREATE TABLE table_name (
time INTERVAL DAY(0) TO SECOND(9)
);
Then you can use TO_DSINTERVAL passing your value with '0 ' prepended to the start:
INSERT INTO table_name (time)
VALUES ( TO_DSINTERVAL('0 ' || '00:12:01.615518000') );
db<>fiddle here
If it is part of a date/time stamp then you could store it as DATE or TIMESTAMP if you could add the date component. Oracle doesn't have just a TIME data type.
If you can't add a date component to it, then assuming it is a time interval you could convert it to seconds or microseconds (lose the colons) and store it as a NUMBER.
If you want to maintain the exact formatting as shown, your only option is to store it as text using VARCHAR2 or something similar.
I'm working on a website in node js and use SQLite as a database for the first time.
I want to be able to use real for some form data and I noticed that every real in my database are converted to integer once the query is made.
To vizualize the database i am using DB Browser and i checked if the columns are defined as REAL which they are.
If i try to query a data set as 0.1 in my DB I get this :
sqlite> select step_variable
from variables
where id=38;
0.0
After trying as suggested the command TYPEOF(step_variable) it returned :
0.0|real
In the SQLite CREATE TABLE command, one defines a data type affinity, not a data type. SQLite supports the following five column affinities: TEXT, NUMERIC, INTEGER, REAL, NONE.
Thus the data type you specify when creating a table does not enforce a certain data type. You can supply any data type you want or even omit the data type.
CREATE TABLE table1(
column1 ABC,
column2 Others,
column3 WHATEVER);
CREATE TABLE table2(column1, column2, column3);
Populate tables:
INSERT INTO table1 VALUES( 1, 'my text', 123.45);
INSERT INTO table2 VALUES( 1, 'my text', 123.45);
Now let us check what SQLite made out of it:
SELECT column1, TYPEOF(column1) from table1
SELECT column2, TYPEOF(column1) from table1
SELECT column3, TYPEOF(column1) from table1
With:
column TYPEOF(column)
------------------------
1 INTEGER
my text TEXT
123.45 REAL
When you go through a query result e.g. by using sqlite2_step you can use the sqlite3_column_type statement to confirm the column type - unless you know the result anyway and simply cast the result to the data type expected.
Martin
I found the solution it was simply that i didn't save my file after modifying it.
I have 1 table in 2 tables in my database.I am tring to save data from first table to second table using insertInto.
CREATE TABLE if not exists dbname.tablename_csv ( id STRING, location STRING, city STRING, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE ;
CREATE TABLE if not exists dbname.tablename_orc ( id String,location STRING, country String PARTITIONED BY (city string) CLUSTERED BY (country) into 4 buckets ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORCFILE tblproperties("orc.compress"="SNAPPY");
var query=spark.sql("id,location,city,country from dbname.tablename_csv")
query.write.insertInto("dbname.tablename_orc")
but its giving issue."
"org.apache.spark.sql.AnalysisException: `dbname`.`tablename_orc` requires that the data to be inserted have the same number of columns as the target table: target table has 3 column(s) but the inserted data has 4 column(s), including 0 partition column(s) having constant value(s).;"
Plese someone give me a hint what else need to add.I tried by adding partitionBy also but got same error and was showing partitionBy not Required.
query.write.partitionBy("city").insertInto("dbname.tablename_orc")
saveAsTable(...) with mode = "append"
I,m trying to create a hive table for importing csv data into table where the date format in the csv file is 'dd-MMM-yyyy' (for example 20-Mar-2018). When i created table in hive it turns out the entire column of date into null values. Can anyone suggest me how to figure out this?
My Query:
create external table new_stock (Symbol String,Series String,Dat date,Prev_Close float,Open_Price float,High_Price float,Low_Price float,Last_Price float,Close_Price float,Avg_Price float,Volume int,Turn_Over float,Trades int,Del_Qty int,DQPQ_Per float) row format delimited fields terminated by ',' stored as textfile LOCATION '/stock_details/'
Finally some help from #leftjoin, i solved the problem of converting string date with format (dd-MMM-yyyy) to (dd-MM-yyyy) by using select query. It would work fine.
select from_unixtime(unix_timestamp(columnname ,'dd-MMM-yyyy'), 'dd-MM-yyyy') from tablename;
I have the following Hive table definition:
CREATE EXTERNAL TABLE english_1grams (
gram string,
year int,
occurrences bigint,
pages bigint,
books bigint
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS SEQUENCEFILE
location 's3://datasets.elasticmapreduce/ngrams/books/20090715/eng-all/1gram/';
From: http://aws.amazon.com/articles/Elastic-MapReduce/5249664154115844
It works just fine in Hive. However, when trying to use it wirh Spark, it gives an error:
Operation not allowed: ROW FORMAT DELIMITED is only compatible with 'textfile', not 'sequencefile'(line 1, pos 0)
How can I read this table in Spark SQL? I've removed the ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' from the definition but that returns just gibberish instead of the actual data.