Load connecting tables from Cassandra in QlikView with DataSatx ODBC - cassandra

I am new to both Cassandra (2.0) and QlikView (11).
I have two keyspaces (tables) with large amount of data in Cassandra and I want to load them to QlikView.
Since I can not load the entire set, filtering is necessary.
// In QlikView's edit script
ODBC CONNECT TO [DataStax Cassandra ODBC DSN64];
LOAD idsession,
logintime,
"h_id" as hid;
SQL SELECT *
FROM Cassandra.test.sessions
WHERE logintime > '2015-06-09'
ALLOW FILTERING;
LOAD idhost,
site;
SQL SELECT *
FROM Cassandra.test.hosts
WHERE idhost in hid;
The second query does not work, error from qlikview line 3:16 no viable alternative at input 'hid'.
My question: is it possible to get the h_ids from the first query and only collect the corresponding entities from the second table?
I assume that you can't do an Exists in the DataSyntax ODBC which may help. DataStax doc
This could be done with an external program like (C#) but I really want to do this in QlikView's script file:
// Not complete code
query = select * from sessions where loginTime > '2015-06-09';
foreach (var id in query) {
query2 = "select * from hosts where idhost = " + i;
}
EDIT
This can be solved when loading XML files:
TableA:
LOAD id,
itema
FROM
[C:\test1data.xlsx]
(ooxml, embedded labels);
TableB:
LOAD idb,
itemb,
ida
FROM
[C:\test2data.xlsx]
(ooxml, embedded labels) where(Exists (id,ida));
EDIT2
Besides the great answer from #i_saw_drones another solutions is to loop through ids.
For i = 1 to NoOfRows('Sessions')
Let cur_id = Peek('hid',i - 1,'Sessions');
LOAD
idhost,
site;
SQL SELECT *
FROM Cassandra.test.hosts
WHERE idhost = $(cur_id);
NEXT i
Nevertheless was the performance not the great. It took about 30 minutes to load around 300 K lines from Cassandra. The same queries were tested in a C# program with the connector and it took 9 sec. But that was just the query. Then you should write it to XML and then load it to QlikView.

The reason that the second query fails is because the WHERE clause is expecting to find a literal string list of values to look "in". For example:
LOAD
idhost,
site;
SQL SELECT *
FROM Cassandra.test.hosts
WHERE idhost in ('ID1', 'ID2', 'ID3', 'ID4');
The hid field returned by the first query is a QlikView list and as such cannot be immediately coerced into a string. We have to do a little more scripting to obtain a list of values from the first query in literal form, and then add that to the second query as part of the WHERE clause. The easiest way to do this is to concatenate all of your hids into a string and then use the string as part of your WHERE IN clause.
ODBC CONNECT TO [DataStax Cassandra ODBC DSN64];
MyData:
LOAD
idsession,
logintime,
"h_id" as hid;
SQL SELECT *
FROM Cassandra.test.sessions
WHERE logintime > '2015-06-09'
ALLOW FILTERING;
hid_entries:
LOAD
chr(39) & hids & chr(39) as hids;
LOAD
concat(hid, chr(39) & ',' & chr(39)) as hids;
LOAD DISTINCT
hid
RESIDENT MyData;
LET hid_values = '(' & peek('hids',0,'hid_entries') & ')';
DROP TABLE hid_entries;
LOAD
idhost,
site;
SQL SELECT *
FROM Cassandra.test.hosts
WHERE idhost in $(hid_values);

Related

psycopg2 SELECT query with inbuilt functions

I have the following SQL statement where i am reading the database to get the records for 1 day. Here is what i tried in pgAdmin console -
SELECT * FROM public.orders WHERE createdat >= now()::date AND type='t_order'
I want to convert this to the syntax of psycopg2but somehow it throws me errors -
Database connection failed due to invalid input syntax for type timestamp: "now()::date"
Here is what i am doing -
query = f"SELECT * FROM {table} WHERE (createdat>=%s AND type=%s)"
cur.execute(query, ("now()::date", "t_order"))
records = cur.fetchall()
Any help is deeply appreciated.
DO NOT use f strings. Use proper Parameter Passing
now()::date is better expressed as current_date. See Current Date/Time.
You want:
query = "SELECT * FROM public.orders WHERE (createdat>=current_date AND type=%s)"
cur.execute(query, ["t_order"])
If you want dynamic identifiers, table/column names then:
from psycopg2 import sql
query = sql.SQL("SELECT * FROM {} WHERE (createdat>=current_date AND type=%s)").format(sql.Identifier(table))
cur.execute(query, ["t_order"])
For more information see sql.

Spark 3.0 JdbcRDD Java - Issue in specifying lowerBound and upperBound for views with no ID column

I am using Spark 3.0, in my Java program I am querying data from views which are in Oracle DB. I used the Java API JdbcRDD to query the views.
The problem I have is that the view doesn't contain any ID or timestamp columns. So, I am unable to construct my SQL query with lowerBound and upperBound values.
Please find below the example query I need to run in Spark. Here exp_stg.usr and exp_stg.prtcpnt are the two views exposed to me.
"SELECT a.participant,
a.desc,
b.firstname,
b.lastname,
b.dept,
b.telno,
b.emailaddr
FROM usr_stg.prtcpnt a
LEFT OUTER JOIN usr_stg.usr b
ON a.participant = b.participant
WHERE a.class = 'SpSession' "
I tried using temp tables in spark and join, but the query performance is bad as there are around ~13,000,000 rows in each view. Hence I tried to use the join query in the Oracle DB.
I was able to overcome the constraint using ROWNUM in the query. Using ROWNUM as the lowerBound and upperBound I am now able to get the data using JdbcRDD.
`
SELECT ROWNUM as id, a.participant,
a.desc,
b.firstname,
b.lastname,
b.dept,
b.telno,
b.emailaddr
FROM usr_stg.prtcpnt a
LEFT OUTER JOIN usr_stg.usr b
ON a.participant = b.participant
WHERE a.class = 'SpSession' and ?<=ROWNUM and ROWNUM<=?"`

UcanAccess retrieve stored query sql

I'm trying to retrieve the SQL that makes up a stored query inside an Access database.
I'm using a combination of UcanAccess 4.0.2, and jaydebeapi and the ucanaccess console. The ultimate goal is to be able to do the following from a python script with no user intervention.
When UCanAccess loads, it successfully loads the query:
Please, enter the full path to the access file (.mdb or .accdb): /Users/.../SnohomishRiverEstuaryHydrology_RAW.accdb
Loaded Tables:
Sensor Data, Sensor Details, Site Details
Loaded Queries:
Jeff_Test
Loaded Procedures:
Loaded Indexes:
Primary Key on Sensor Data Columns: (ID)
, Primary Key on Sensor Details Columns: (ID)
, Primary Key on Site Details Columns: (ID)
, Index on Sensor Details Columns: (SiteID)
, Index on Site Details Columns: (SiteID)
UCanAccess>
When I run, from the UCanAccess console a query like
SELECT * FROM JEFF_TEST;
I get the expected results of the query.
I tried things including this monstrous query from inside a python script even using the sysSchema=True option (from here: http://www.sqlquery.com/Microsoft_Access_useful_queries.html):
SELECT DISTINCT MSysObjects.Name,
IIf([Flags]=0,"Select",IIf([Flags]=16,"Crosstab",IIf([Flags]=32,"Delete",IIf
([Flags]=48,"Update",IIf([flags]=64,"Append",IIf([flags]=128,"Union",
[Flags])))))) AS Type
FROM MSysObjects INNER JOIN MSysQueries ON MSysObjects.Id =
MSysQueries.ObjectId;
But get an object not found or insufficient privileges error.
At this point, I've tried mdbtools and can successfully retrieve metadata, and data from access. I just need to get the queries out too.
If anyone can point me in the right direction, I'd appreciate it. Windows is not a viable option.
Cheers, Seth
***********************************
* SOLUTION
***********************************
from jpype import *
startJVM(getDefaultJVMPath(), "-ea", "-Djava.class.path=/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/ucanaccess-4.0.2.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/commons-lang-2.6.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/commons-logging-1.1.1.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/hsqldb.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/jackcess-2.1.6.jar")
conn = java.sql.DriverManager.getConnection("jdbc:ucanaccess:///Users/seth.urion/PycharmProjects/pyAccess/FE_Hall_2010_2016_SnohomishRiverEstuaryHydrology_RAW.accdb")
for query in conn.getDbIO().getQueries():
print(query.getName())
print(query.toSQLString())
If you can find a satisfactory way to call Java methods from within Python then you could use the Jackcess Query#toSQLString() method to extract the SQL for a saved query. For example, I just got this to work under Jython:
from java.sql import DriverManager
def get_query_sql(conn, query_name):
sql = ''
for query in conn.getDbIO().getQueries():
if query.getName() == query_name:
sql = query.toSQLString()
break
return sql
# usage example
if __name__ == '__main__':
conn = DriverManager.getConnection("jdbc:ucanaccess:///home/gord/UCanAccessTest.accdb")
query_name = 'Jeff_Test'
query_sql = get_query_sql(conn, query_name)
if query_sql == '':
print '(Query not found.)'
else:
print 'SQL for query [%s]:' % (query_name)
print
print query_sql
conn.close()
producing
SQL for query [Jeff_Test]:
SELECT Invoice.InvoiceNumber, Invoice.InvoiceDate
FROM Invoice
WHERE (((Invoice.InvoiceNumber)>1));

python oracle where clause containing date greater than comparison

I am trying to use cx_Oracle to query a table in oracle DB (version 11.2) and get rows with values in a column between a datetime range.
I have tried the following approaches:
Tried between clause as described here, but cursor gets 0 rows
parameters = (startDateTime, endDateTime)
query = "select * from employee where joining_date between :1 and :2"
cur = con.cursor()
cur.execute(query, parameters)
Tried the TO_DATE() function and Date'' qualifiers. Still no result for Between or >= operator. Noteworthy is that < operator works. I also got the same query and tried in a sql client, and the query returns results. Code:
#returns no rows:
query = "select * from employee where joining_date >= TO_DATE('" + startDateTime.strftime("%Y-%m-%d") + "','yyyy-mm-dd')"
cur = con.cursor()
cur.execute(query)
#tried following just to ensure that some query runs fine, it returns results:
query = query.replace(">=", "<")
cur.execute(query)
Any pointers about why the between and >= operators are failing for me? (my second approach was in line with the answer in Oracle date comparison in where clause but still doesn't work for me)
I am using python 3.4.3 and used cx_Oracle 5.3 and 5.2 with oracle client 11g on windows 7 machine
Assume that your employee table contains the field emp_id and the row with emp_id=1234567 should be retrieved by your query.
Make two copies of your a program that execute the following queries
query = "select to_char(:1,'YYYY-MM-DD HH24:MI:SS')||' >= '||to_char(joining_date,'YYYY-MM-DD HH24:MI:SS')||' >= '||to_char(:2,'YYYY-MM-DD HH24:MI:SS') resultstring from employee where emp_id=1234567"
and
query="select to_char(joining_date,'YYYY-MM-DD HH24:MI:SS')||' >= '||to_char(TO_DATE('" + startDateTime.strftime("%Y-%m-%d") + "','yyyy-mm-dd'),'YYYY-MM-DD HH24:MI:SS') resultstring from employee where emp_id=1234567"
Show us the code and the value of the column resultstring
You are constructing SQL queries as strings when you should be using parameterized queries. You can't use parameterization to substitute the comparison operators, but you should use it for the dates.
Also, note that the referenced answer uses the PostgreSQL parameterisation format, whereas Oracle requires you to use the ":name" format.

ServiceStack.OrmLite Using Limit in SQL.In filter

I have a parent/child table setup - Items/ItemDetails. This part works:
var q = db.From<Item>(); //various where clauses based on request
items = db.Select<Item>(q);
q = q.Select(a => a.ITEM_NO);
itemDetails = db.Select<ItemDetail>(x => Sql.In(x.ITEM_NO, q));
Trying to add paging to improve the performance of this request for large data sets, I'm having trouble getting the .Limit(skip, rows) function to work in the SQL.In statement of the child table.
var q = db.From<Item>().Limit(skip, rows);
items = db.Select<Item>(q);
q = q.Select(a => a.ITEM_NO);
itemDetails = db.Select<ItemDetail>(x => Sql.In(x.ITEM_NO, q));
It works when limiting the results in the first select, but when used in the child data pull I get "Only one expression can be specified in the select list when the subquery is not introduced with EXISTS."
The SQL that comes out changes the where subquery to:
WHERE "ITEM_NO" IN (SELECT * FROM (SELECT "ITEM_NO", ROW_NUMBER() OVER
(ORDER BY "ITEM"."ITEM_NO") As RowNum
FROM "ITEM") AS RowConstrainedResult WHERE RowNum > 5 AND RowNum <= 15)
I understand the SQL error is because I am selecting more than one column in the IN clause. Is there a better way to write this to avoid the error?
Thanks
If you're using SQL Server 2012 or later you should use SqlServer2012Dialect.Provider, e.g:
container.Register<IDbConnectionFactory>(c =>
new OrmLiteConnectionFactory(connString, SqlServer2012Dialect.Provider));
Which lets OrmLite use the paging support added in SQL Server 2012 instead of resorting to use the windowing function hack required to implement paging for earlier versions of SQL Server.

Resources