SingleStore 1.0.1 JDBC ResultsetMetadata returns type name VARSTRING while getColumns returns VARCHAR - singlestore

JDBC version 1.0.1
Server version 7.6
A table defined as follows
create table TVCHAR ( RNUM integer not null , CVCHAR varchar(32 ) null , SHARD KEY ( RNUM ) ) ;
DatabaseMetadata.getColumns returns a type name of VARCHAR(32).
When a query select * from TVCHAR is executed, the ResultsetMetadata returned by the driver describes the column CVCHAR as VARSTRING and not VARCHAR. Would expect a consistent type name from both Resultsets.
Example shown using SQLSquirrel
Any advice?

Try updating your JDBC 1.0.1 driver to more stable version of JDBC or it might be a case that your varchar(32) 'data' exceeds its limit so the JDBC interpreted it as VARSTRING. Because JDBC converts the datatype in Result set metadata, usually when there is something wrong.

Related

how to convert nvarchar from T-SQL dialect to hiveQL?

i'm doing ETL task of transforming queries from one SQL dialect to another. The old db uses T-Sql, the new is hiveQL.
SELECT CAST(CONCAT(FMH.FLUIDMODELID,'_',RESERVOIR,'_',PRESSUREPSIA) AS NVARCHAR(255)) AS FACT_RRFP_INJ_PRESS_R_PHK
, FMH.FluidModelID ,FMH.FluidModelName ,[AnalysisDate]
FROM dbo.LZ_RRFP_FluidModelInj fmi
LEFT JOIN DBO.LZ_RRFP_FluidModelHeader fmh ON fmi.FluidModelIDFK = fmh.FluidModelID
LEFT JOIN LZ_RRFP_FluidModelAss fma on fma.InjectionFluidModelIDFK = fmi.FluidModelIDFK
WHERE FMA.RESERVOIR IN (SELECT RESERVOIR_CD FROM ATT_RESERVOIR)
the error is :
org.apache.spark.sql.catalyst.parser.ParseException:
DataType nvarchar(255) is not supported.
how to convert nvarchar?
Hive uses UTF-8 in STRINGs and VARCHARs, you are fine with using VARCHAR or STRING instead of NVARCHAR.
VARCHAR in Hive is the same as STRING + length validation. As #NickW mentioned in the comment, you can do the same without CAST at all, if you inserting result into the table with VARCHAR(255), it will work the same without CAST.

Type 'INTERVAL' are not supported in Spark SQL 2.4.3 - What is workaround?

EDIT : Apparently Spark 2.4.3 does not support INTERVAL. I cannot upgrade to Spark 3.0.0 for the time being (admin policy). I was wondering if there is a workaround or alternating approach for INTERVAL at the moment? Thanks
Running a query on Spark sql in Databricks and the query shows an error on interval line. I am trying to left-join the table on itself on the same user ID and having a one-month difference in users.
Error in SQL statement: ParseException:
Literals of type 'INTERVAL' are currently not supported.
Does not Spark SQL support the interval function?
Here is my try:
%sql
;WITH act_months AS (
SELECT DISTINCT
DATE_TRUNC('month', data_date) ::DATE AS act_month,
user_id
FROM user_sessions)
SELECT
prev.act_month,
prev.user_id,
curr.user_id IS NULL AS churned_next_month
FROM act_months AS prev
LEFT JOIN act_months AS curr
ON prev.user_id = curr.user_id
AND prev.act_month = (curr.act_month - INTERVAL '1 MONTH')
ORDER BY prev.act_month ASC, prev.user_id ASC;
here is my data structure
+----------+----------+
| data_date| user_id|
+----------+----------+
|2020-01-01|22600560aa|
|2020-01-01|17148900ab|
|2020-01-01|21900230aa|
|2020-01-01|35900050ac|
|2020-01-01|22300280ad|
|2020-01-02|19702160ac|
|2020-02-02|17900020aa|
|2020-02-02|16900120aa|
|2020-02-02|11160900aa|
|2020-03-02|16900290aa|
+----------+----------+
(Disclaimer: I am not a Spark user - and this is me reposting my comment as an answer):
From my reading of Spark's documentation, INTERVAL is only supported by Spark 3.0.0 or later.
You said you're running Spark 2.4.3, so INTERVAL is not supported in your system.
However you can use ADD_MONTHS (and DATE_ADD) which is supported by (at least) Spark 2.3.0.
Try this:
;WITH q AS (
SELECT
DISTINCT
DATE_TRUNC( data_date, 'month' ) AS act_year_month, -- DATE_TRUNC( $dt, 'month' ) returns a datetime value with only the YEAR and MONTH components set, all other components are zeroed out.
user_id
FROM
user_sessions
)
SELECT
prev.act_year_month,
prev.user_id,
( curr.user_id IS NULL ) AS churned_next_month
FROM
q AS prev
LEFT JOIN q AS curr ON
prev.user_id = curr.user_id
AND
prev.act_year_month = ADD_MONTHS( curr.act_year_month, -1 )
ORDER BY
prev.act_year_month,
prev.user_id;

Azure HDInsight cluster with Hbase + pheonix using Local Index

We have an HDInsight cluster running HBase (Ambari)
We have created a table using Pheonix
CREATE TABLE IF NOT EXISTS Results (Col1 VARCHAR(255) NOT NULL,Col1
INTEGER NOT NULL ,Col3 INTEGER NOT NULL,Destination VARCHAR(255)
NOT NULL CONSTRAINT pk PRIMARY KEY (Col1,Col2,Col3) )
IMMUTABLE_ROWS=true
We have filled some data into this table (using some java code)
Later, we decided we want to create a local index on the destination column as follows
CREATE LOCAL INDEX DESTINATION_IDX ON RESULTS (destination) ASYNC
We have run the index tool to fill the index as follows
hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table
RESULTS --index-table DESTINATION_IDX --output-path
DESTINATION_IDX_HFILES
When we run queries and filter using the destination columns everything is ok. For example
select /*+ NO_CACHE, SKIP_SCAN */ COL1,COL2,COL3,DESTINATION from
Results where COL1='data' DESTINATION='some value' ;
But, if we do not use the DESTINATION in the where query, then we will get a NullPointerException in BaseResultIterators.class
(from phoenix-core-4.7.0-HBase-1.1.jar)
This exception is thrown only when we use the new local index. If we query ignoring the index like this
select /*+ NO_CACHE, SKIP_SCAN ,NO_INDEX */ COL1,COL2,COL3,DESTINATION from
Results where COL1='data' DESTINATION='some value' ;
we will not get the exception
Showing some relevant code from the area where we get the exception
...
catch (StaleRegionBoundaryCacheException e2) {
// Catch only to try to recover from region boundary cache being out of date
if (!clearedCache) { // Clear cache once so that we rejigger job based on new boundaries
services.clearTableRegionCache(physicalTableName);
context.getOverallQueryMetrics().cacheRefreshedDueToSplits();
}
// Resubmit just this portion of work again
Scan oldScan = scanPair.getFirst();
byte[] startKey = oldScan.getAttribute(SCAN_ACTUAL_START_ROW);
byte[] endKey = oldScan.getStopRow();
====================Note the isLocalIndex is true ==================
if (isLocalIndex) {
endKey = oldScan.getAttribute(EXPECTED_UPPER_REGION_KEY);
//endKey is null for some reason in this point and the next function
//will fail inside it with NPE
}
List<List<Scan>> newNestedScans = this.getParallelScans(startKey, endKey);
We must use this version of the Jar since we run inside Azure HDInsight
and we can not select a newer jar version
Any ideas how to solve this?
What does "recover from region boundary cache being out of date" mean? it seems to be related to the problem
It appears that the version that azure HDInsight has for phoenix core (phoenix-core-4.7.0.2.6.5.3004-13.jar) has the bug but if i use a bit newer version (phoenix-core-4.7.0.2.6.5.8-2.jar, from http://nexus-private.hortonworks.com:8081/nexus/content/repositories/hwxreleases/org/apache/phoenix/phoenix-core/4.7.0.2.6.5.8-2/) we do not see the bug any more
note that it is not possible to take a much newer version like 4.8.0 since in this case the server will throw a version mismatch error

Cannot query based on TimeUUID in Spark SQL

I am trying to query the Cassandra database using SparkSQL terminal.
Query:
select * from keyspace.tablename
where user_id = e3a119e0-8744-11e5-a557-e789fe3b4cc1;
Error: java.lang.RuntimeException: [1.88] failure: ``union'' expected but identifier e5 found
Also tried:
user_id= UUID.fromString(\`e3a119e0-8744-11e5-a557-e789fe3b4cc1\`)")
user_id= \'e3a119e0-8744-11e5-a557-e789fe3b4cc1\'")
token(user_id)= token(`e3a119e0-8744-11e5-a557-e789fe3b4cc1`)
I am not sure how can I query data on timeuuid.
TimeUUIDs are not supported as a type in SparkSQL so you are only allowed to do direct string comparisons. Represent the TIMEUUID as a string
select * from keyspace.tablename where user_id = "e3a119e0-8744-11e5-a557-e789fe3b4cc1"

how to insert timeuuid into cassandra using datastax java driver OR Invalid version for TimeUUID

I have one column in the cassandra keyspace that is of type timeuuid. When I try to insert a reocord from the java code (using DataStax java driver1.0.3). I get the following exception
com.datastax.driver.core.exceptions.InvalidQueryException: Invalid version for TimeUUID type.
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:269)
at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:183)
at com.datastax.driver.core.Session.execute(Session.java:111)
Here is my sample code:
PreparedStatement statement = session.prepare("INSERT INTO keyspace:table " +
"(id, subscriber_id, transaction_id) VALUES (now(), ?, ?);");
BoundStatement boundStatement = new BoundStatement(statement);
Session.execute(boundStatement.bind(UUID.fromString(requestData.getsubscriberId()),
requestData.getTxnId()));
I have also tried to use UUIDs.timeBased() instead of now(). But I am gettting the same exception.
Any help on how to insert/read from timeuuid datatype would be appreciated.
By mistake I had created
id uuid
That is why when I try to insert timeuuid in the uuid type field I was getting that exception.
Now I have changed the type for id to timeuuid and everything is working fine.

Resources