cassandra-python driver losing records in select query?

cassandra-python driver losing records in select query? - python-3.x

I'm running a bunch of prepared statements against Cassandra 2.0.8 from a python script. I use python-cassandra driver 2.1.4 under python3 installed via pip. The code looks like this:
auth_provider = PlainTextAuthProvider(username='xxx',password='xxx')
cluster = Cluster(['xxx'], auth_provider=auth_provider, control_connection_timeout=None)
session = cluster.connect()
session.default_timeout = None
session.set_keyspace(sphere)
select_query = session.prepare("SELECT tag FROM r WHERE r_id=?")
for id in ids:
res = session.execute(select_query, (int(id),))
if not res: complain
These ids came from Cassandra in the first place, and I checked all the cases when the query returned nothing with cqlsh against the database: they are all there! Can you suggest what can be wrong? This used to work before!
Btw. I have another prepared statement in the same session that is updating this table. Could this be a problem?

Related

Datastax cassandra seem to cache preparestatent

When my application runs a long time, everything works as well. But when I change type a column from int to text(Drop table and recreate), I caught a Exception:
com.datastax.oss.driver.api.core.type.codec.CodecNotFoundException: Codec not found for requested operation: [INT <-> java.lang.String]
at com.datastax.oss.driver.internal.core.type.codec.registry.CachingCodecRegistry.createCodec(CachingCodecRegistry.java:609)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry$1.load(DefaultCodecRegistry.java:95)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry$1.load(DefaultCodecRegistry.java:92)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2276)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache.get(LocalCache.java:3951)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache.getOrLoad(LocalCache.java:3973)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4957)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4963)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry.getCachedCodec(DefaultCodecRegistry.java:117)
at com.datastax.oss.driver.internal.core.type.codec.registry.CachingCodecRegistry.codecFor(CachingCodecRegistry.java:215)
at com.datastax.oss.driver.api.core.data.SettableByIndex.set(SettableByIndex.java:132)
at com.datastax.oss.driver.api.core.data.SettableByIndex.setString(SettableByIndex.java:338)
This exception appears occasionally. I'm using PreparedStatement to execute the query, I think it is cached from DataStax's driver.
I'm using AWS Keyspaces(Cassandra version 3.11.2), DataStax driver 4.6.
Here is my application.conf:
basic.request {
timeout = 5 seconds
consistency = LOCAL_ONE
}
advanced.connection {
max-requests-per-connection = 1024
pool {
local.size = 1
remote.size = 1
}
}
advanced.reconnect-on-init = true
advanced.reconnection-policy {
class = ExponentialReconnectionPolicy
base-delay = 1 second
max-delay = 60 seconds
}
advanced.retry-policy {
class = DefaultRetryPolicy
}
advanced.protocol {
version = V4
}
advanced.heartbeat {
interval = 30 seconds
timeout = 1 second
}
advanced.session-leak.threshold = 8
advanced.metadata.token-map.enabled = false
}

Yes, Java driver 4.x caches prepared statement - it's a difference from the driver 3.x. From documentation:
the session has a built-in cache, it’s OK to prepare the same string twice.
...
Note that caching is based on: the query string exactly as you provided it: the driver does not perform any kind of trimming or sanitizing.
I'm not sure 100% about the source code, but the relevant entries in the cache may not be cleared up on the table drop. I suggest to open the JIRA against Java driver, although, such type changes are often not really recommended - it's better to introduce new field with new type, even if it's possible to re-create table.

That's correct. Prepared statements are cached -- it's the optimisation that makes prepared statements more efficient if they are reused since they only need to be prepared once (the query doesn't need to get parsed again).
But I suspect that underlying issue in your case is that your queries involve SELECT *. Best practice recommendation (regardless of the database you're using) is to explicitly enumerate the columns you are retrieving from the table.
In the prepared statement, each of the columns are bound to a data type. When you alter the schema by adding/dropping columns, the order of the columns (and their data types) no longer match the data types of the result set so you end up in situations where the driver gets an int when it's expecting a text or vice-versa. Cheers!

Multiple keys index leads in an PSQLException with Slick, Postgres on a lagom project

I have a lagom application and using for the readside a postgres with lagom jdbc.
Tables are created and works fine. After a restart and an already created table with a multiple key index I get always an error:
org.postgresql.util.PSQLException: ERROR: relation "article_number_fulfiller_idx" already exists
My table looks like this:
class ArticleTable(tag: Tag) extends Table[ArticleTableData](tag,ArticleTable.TableName) {
def entityId = column[UUID](ArticleTable.ColEntityId,O.PrimaryKey)
def articleBaseNumber = column[String](ArticleTable.ColArticleBaseNumber)
def articleSpecificationNumber = column[Option[String]](ArticleTable.ColArticleSpecificationNumber)
def fulfillerVendorNumber = column[String](ArticleTable.ColFulfillerVendorNumber)
def fulfillerName = column[String](ArticleTable.ColFulfillerName)
def availability = column[String](ArticleTable.ColAvailability)
def completeArticleNumber = column[String]("complete_article_number")
def idxKey = index("article_number_fulfiller_idx",(completeArticleNumber,fulfillerVendorNumber),unique = true)
def * = (entityId,articleBaseNumber,articleSpecificationNumber,fulfillerVendorNumber,fulfillerName,availability,completeArticleNumber) <> ( (ArticleTableData.apply _).tupled, ArticleTableData.unapply )
}
And my build handler is here:
override def buildHandler(): ReadSideProcessor.ReadSideHandler[Article.Event] = readSide
.builder[Article.Event](ArticleTable.TableName+"_offset")
.setGlobalPrepare(table.schema.createIfNotExists)
.setEventHandler[ArticleCreated](insert)
.setEventHandler[DescriptionAdded](_ => DBIOAction.successful(Done) )
.setEventHandler[DescriptionRemoved](_ => DBIOAction.successful(Done) )
.build()
I updated my sbt to use the latest:
Instead of this
lagomScaladslPersistenceJdbc
I use now this
"com.lightbend.lagom" %% "lagom-scaladsl-persistence-jdbc" % "1.6.2",
"com.typesafe.slick" %% "slick" % "3.3.2"
The exception is only ONE of the exceptions I got. I have for every multiple key index an exception :(

Lagom every restart will try to create new tables and indexes. To avoid getting these errors, do not force the creation of indexes and tables, use create if not exist. If slick does not allow to do it, look on native queries.

You are right about the slick issue.
The page describes that it is useful for dev and test environments.
In this case, you can try to have:
"create if not exist" query as #vladislav-kievski suggested,
some of the databases do not support such queries (SQL Server) and you can catch an exception with appropriate error code
I do not recommend to use globalPrepare() for creating tables and indexes. The main difficulty with this approach is table changes. In this case, you need to think about versioning your database scripts (what will you do in case index removing/adding or removing columns?).

Error writing Spark DataFrame to Redshift with Psycopg2: Can't pickle psycopg2.extensions.cursor objects

I can connect to Redshift with psycopg2 by:
import psycopg2
conn = psycopg2.connect(host=__credential__.host_redshift,
dbname=__credential__.dbname_redshift,
user=__credential__.user_redshift,
password=__credential__.password_redshift,
port=__credential__.port_redshift)
cur = conn.cursor()
Also, I can update the existed table in the database with:
cur.execute("""
UPDATE tb
SET col2='updated_target_row'
WHERE col1='target_row';
""")
conn.commit()
Now, I'd like to update the table in Redshift with Rows from Spark DataFrame. I looked up and found a pretty recent question about it (which, I'd like to justify for, is not duplicated with another question at all).
The solution seems pretty straightforward. However, I cannot even pass the Row object to a method involved the cursor.
What I am trying now:
def update_info(row):
cur.execute("""
UPDATE tb
SET col2='updated_target_row'
WHERE col1='target_row';
""")
df.rdd.foreach(update_info)
conn.commit()
And I got error:
TypeError: can't pickle psycopg2.extensions.cursor objects
Interestingly, this doesn't seem to be a common issue. Any help is appreciated.
P.S.:
Versions:
python=3.6
pyspark=2.2.0
psycopg2=2.7.4
Full error msg can be found in pastebin.
I have tried rdd.map instead of rdd.foreach and got no luck.

Connection objects and cursors are not serializable and cannot be send to the workers. You should use foreachPartition:
def update_info(rows):
conn = psycopg2.connect(...)
cur = conn.cursor()
for row in rows:
cur.execute(...)
df.rdd.foreachPartition(update_info)

UcanAccess retrieve stored query sql

I'm trying to retrieve the SQL that makes up a stored query inside an Access database.
I'm using a combination of UcanAccess 4.0.2, and jaydebeapi and the ucanaccess console. The ultimate goal is to be able to do the following from a python script with no user intervention.
When UCanAccess loads, it successfully loads the query:
Please, enter the full path to the access file (.mdb or .accdb): /Users/.../SnohomishRiverEstuaryHydrology_RAW.accdb
Loaded Tables:
Sensor Data, Sensor Details, Site Details
Loaded Queries:
Jeff_Test
Loaded Procedures:
Loaded Indexes:
Primary Key on Sensor Data Columns: (ID)
, Primary Key on Sensor Details Columns: (ID)
, Primary Key on Site Details Columns: (ID)
, Index on Sensor Details Columns: (SiteID)
, Index on Site Details Columns: (SiteID)
UCanAccess>
When I run, from the UCanAccess console a query like
SELECT * FROM JEFF_TEST;
I get the expected results of the query.
I tried things including this monstrous query from inside a python script even using the sysSchema=True option (from here: http://www.sqlquery.com/Microsoft_Access_useful_queries.html):
SELECT DISTINCT MSysObjects.Name,
IIf([Flags]=0,"Select",IIf([Flags]=16,"Crosstab",IIf([Flags]=32,"Delete",IIf
([Flags]=48,"Update",IIf([flags]=64,"Append",IIf([flags]=128,"Union",
[Flags])))))) AS Type
FROM MSysObjects INNER JOIN MSysQueries ON MSysObjects.Id =
MSysQueries.ObjectId;
But get an object not found or insufficient privileges error.
At this point, I've tried mdbtools and can successfully retrieve metadata, and data from access. I just need to get the queries out too.
If anyone can point me in the right direction, I'd appreciate it. Windows is not a viable option.
Cheers, Seth
***********************************
* SOLUTION
***********************************
from jpype import *
startJVM(getDefaultJVMPath(), "-ea", "-Djava.class.path=/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/ucanaccess-4.0.2.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/commons-lang-2.6.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/commons-logging-1.1.1.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/hsqldb.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/jackcess-2.1.6.jar")
conn = java.sql.DriverManager.getConnection("jdbc:ucanaccess:///Users/seth.urion/PycharmProjects/pyAccess/FE_Hall_2010_2016_SnohomishRiverEstuaryHydrology_RAW.accdb")
for query in conn.getDbIO().getQueries():
print(query.getName())
print(query.toSQLString())

If you can find a satisfactory way to call Java methods from within Python then you could use the Jackcess Query#toSQLString() method to extract the SQL for a saved query. For example, I just got this to work under Jython:
from java.sql import DriverManager
def get_query_sql(conn, query_name):
sql = ''
for query in conn.getDbIO().getQueries():
if query.getName() == query_name:
sql = query.toSQLString()
break
return sql
# usage example
if __name__ == '__main__':
conn = DriverManager.getConnection("jdbc:ucanaccess:///home/gord/UCanAccessTest.accdb")
query_name = 'Jeff_Test'
query_sql = get_query_sql(conn, query_name)
if query_sql == '':
print '(Query not found.)'
else:
print 'SQL for query [%s]:' % (query_name)
print
print query_sql
conn.close()
producing
SQL for query [Jeff_Test]:
SELECT Invoice.InvoiceNumber, Invoice.InvoiceDate
FROM Invoice
WHERE (((Invoice.InvoiceNumber)>1));

Cassandra does not execute insert statement with timestamp field

--Update: it seems it was a glitch with the Virtual Machine. I restarted Cassandra service and it works as expected.
--Update: It seems that the problem is not in the code, I tried to execute the insert statement in a Cassandra client and I get the same behavior, No error is displayed, nothing is inserted.
The column that causes this behavior is of type timestamp. When I set this column value to some values (ex. '2015-08-25 22:15:12')
The table is:
create table player
(msisdn varchar primary key,
game int,keyword varchar, inserted timestamp,lang int,mo int,mt int,qid int,score int)
I am new to Cassandra, downloaded the VirtualBox snapshot to test it.
I am trying the example code of the batch and it did nothing, so I tried as people suggested to execute the prepared statement directly.
var addPlayer = casDemoSession.Prepare("INSERT INTO player (msisdn,qid,keyword,mo,mt,game,score,lang,inserted) Values (?,?,?,?,?,?,?,?,?)");
for (int i = 0; i < 20; i++) {
var bs = addPlayer.Bind(getRandMSISDN(), 1, "", 1, 0, 0, 10, 0, DateTime.Now);
bs.EnableTracing(true);
casDemoSession.Execute(bs);
}
The code above does not throw any exceptions nor insert any data. I tried to trace the query but it does not show the actual cql query.
PlannetCassandra V0.1 VM running Cassandra 2.0.1
datastax driver 2.6 https://github.com/datastax/csharp-driver

One thing that might be missing is the keyspace name for your player table. Usually you would have "INSERT INTO <keyspace>.name (..."
If you're able to run cqlsh, could you add the output from "DESCRIBE TABLE <keyspace>.player" to your question, and show what happens when you attempt to do the insert in cqlsh.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

cassandra-python driver losing records in select query? - python-3.x

Related

Datastax cassandra seem to cache preparestatent

Multiple keys index leads in an PSQLException with Slick, Postgres on a lagom project

Error writing Spark DataFrame to Redshift with Psycopg2: Can't pickle psycopg2.extensions.cursor objects

UcanAccess retrieve stored query sql

Cassandra does not execute insert statement with timestamp field

Categories

Resources