Multiple keys index leads in an PSQLException with Slick, Postgres on a lagom project - slick

I have a lagom application and using for the readside a postgres with lagom jdbc.
Tables are created and works fine. After a restart and an already created table with a multiple key index I get always an error:
org.postgresql.util.PSQLException: ERROR: relation "article_number_fulfiller_idx" already exists
My table looks like this:
class ArticleTable(tag: Tag) extends Table[ArticleTableData](tag,ArticleTable.TableName) {
def entityId = column[UUID](ArticleTable.ColEntityId,O.PrimaryKey)
def articleBaseNumber = column[String](ArticleTable.ColArticleBaseNumber)
def articleSpecificationNumber = column[Option[String]](ArticleTable.ColArticleSpecificationNumber)
def fulfillerVendorNumber = column[String](ArticleTable.ColFulfillerVendorNumber)
def fulfillerName = column[String](ArticleTable.ColFulfillerName)
def availability = column[String](ArticleTable.ColAvailability)
def completeArticleNumber = column[String]("complete_article_number")
def idxKey = index("article_number_fulfiller_idx",(completeArticleNumber,fulfillerVendorNumber),unique = true)
def * = (entityId,articleBaseNumber,articleSpecificationNumber,fulfillerVendorNumber,fulfillerName,availability,completeArticleNumber) <> ( (ArticleTableData.apply _).tupled, ArticleTableData.unapply )
}
And my build handler is here:
override def buildHandler(): ReadSideProcessor.ReadSideHandler[Article.Event] = readSide
.builder[Article.Event](ArticleTable.TableName+"_offset")
.setGlobalPrepare(table.schema.createIfNotExists)
.setEventHandler[ArticleCreated](insert)
.setEventHandler[DescriptionAdded](_ => DBIOAction.successful(Done) )
.setEventHandler[DescriptionRemoved](_ => DBIOAction.successful(Done) )
.build()
I updated my sbt to use the latest:
Instead of this
lagomScaladslPersistenceJdbc
I use now this
"com.lightbend.lagom" %% "lagom-scaladsl-persistence-jdbc" % "1.6.2",
"com.typesafe.slick" %% "slick" % "3.3.2"
The exception is only ONE of the exceptions I got. I have for every multiple key index an exception :(

Lagom every restart will try to create new tables and indexes. To avoid getting these errors, do not force the creation of indexes and tables, use create if not exist. If slick does not allow to do it, look on native queries.

You are right about the slick issue.
The page describes that it is useful for dev and test environments.
In this case, you can try to have:
"create if not exist" query as #vladislav-kievski suggested,
some of the databases do not support such queries (SQL Server) and you can catch an exception with appropriate error code
I do not recommend to use globalPrepare() for creating tables and indexes. The main difficulty with this approach is table changes. In this case, you need to think about versioning your database scripts (what will you do in case index removing/adding or removing columns?).

Related

Azure HDInsight cluster with Hbase + pheonix using Local Index

We have an HDInsight cluster running HBase (Ambari)
We have created a table using Pheonix
CREATE TABLE IF NOT EXISTS Results (Col1 VARCHAR(255) NOT NULL,Col1
INTEGER NOT NULL ,Col3 INTEGER NOT NULL,Destination VARCHAR(255)
NOT NULL CONSTRAINT pk PRIMARY KEY (Col1,Col2,Col3) )
IMMUTABLE_ROWS=true
We have filled some data into this table (using some java code)
Later, we decided we want to create a local index on the destination column as follows
CREATE LOCAL INDEX DESTINATION_IDX ON RESULTS (destination) ASYNC
We have run the index tool to fill the index as follows
hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table
RESULTS --index-table DESTINATION_IDX --output-path
DESTINATION_IDX_HFILES
When we run queries and filter using the destination columns everything is ok. For example
select /*+ NO_CACHE, SKIP_SCAN */ COL1,COL2,COL3,DESTINATION from
Results where COL1='data' DESTINATION='some value' ;
But, if we do not use the DESTINATION in the where query, then we will get a NullPointerException in BaseResultIterators.class
(from phoenix-core-4.7.0-HBase-1.1.jar)
This exception is thrown only when we use the new local index. If we query ignoring the index like this
select /*+ NO_CACHE, SKIP_SCAN ,NO_INDEX */ COL1,COL2,COL3,DESTINATION from
Results where COL1='data' DESTINATION='some value' ;
we will not get the exception
Showing some relevant code from the area where we get the exception
...
catch (StaleRegionBoundaryCacheException e2) {
// Catch only to try to recover from region boundary cache being out of date
if (!clearedCache) { // Clear cache once so that we rejigger job based on new boundaries
services.clearTableRegionCache(physicalTableName);
context.getOverallQueryMetrics().cacheRefreshedDueToSplits();
}
// Resubmit just this portion of work again
Scan oldScan = scanPair.getFirst();
byte[] startKey = oldScan.getAttribute(SCAN_ACTUAL_START_ROW);
byte[] endKey = oldScan.getStopRow();
====================Note the isLocalIndex is true ==================
if (isLocalIndex) {
endKey = oldScan.getAttribute(EXPECTED_UPPER_REGION_KEY);
//endKey is null for some reason in this point and the next function
//will fail inside it with NPE
}
List<List<Scan>> newNestedScans = this.getParallelScans(startKey, endKey);
We must use this version of the Jar since we run inside Azure HDInsight
and we can not select a newer jar version
Any ideas how to solve this?
What does "recover from region boundary cache being out of date" mean? it seems to be related to the problem
It appears that the version that azure HDInsight has for phoenix core (phoenix-core-4.7.0.2.6.5.3004-13.jar) has the bug but if i use a bit newer version (phoenix-core-4.7.0.2.6.5.8-2.jar, from http://nexus-private.hortonworks.com:8081/nexus/content/repositories/hwxreleases/org/apache/phoenix/phoenix-core/4.7.0.2.6.5.8-2/) we do not see the bug any more
note that it is not possible to take a much newer version like 4.8.0 since in this case the server will throw a version mismatch error

UcanAccess retrieve stored query sql

I'm trying to retrieve the SQL that makes up a stored query inside an Access database.
I'm using a combination of UcanAccess 4.0.2, and jaydebeapi and the ucanaccess console. The ultimate goal is to be able to do the following from a python script with no user intervention.
When UCanAccess loads, it successfully loads the query:
Please, enter the full path to the access file (.mdb or .accdb): /Users/.../SnohomishRiverEstuaryHydrology_RAW.accdb
Loaded Tables:
Sensor Data, Sensor Details, Site Details
Loaded Queries:
Jeff_Test
Loaded Procedures:
Loaded Indexes:
Primary Key on Sensor Data Columns: (ID)
, Primary Key on Sensor Details Columns: (ID)
, Primary Key on Site Details Columns: (ID)
, Index on Sensor Details Columns: (SiteID)
, Index on Site Details Columns: (SiteID)
UCanAccess>
When I run, from the UCanAccess console a query like
SELECT * FROM JEFF_TEST;
I get the expected results of the query.
I tried things including this monstrous query from inside a python script even using the sysSchema=True option (from here: http://www.sqlquery.com/Microsoft_Access_useful_queries.html):
SELECT DISTINCT MSysObjects.Name,
IIf([Flags]=0,"Select",IIf([Flags]=16,"Crosstab",IIf([Flags]=32,"Delete",IIf
([Flags]=48,"Update",IIf([flags]=64,"Append",IIf([flags]=128,"Union",
[Flags])))))) AS Type
FROM MSysObjects INNER JOIN MSysQueries ON MSysObjects.Id =
MSysQueries.ObjectId;
But get an object not found or insufficient privileges error.
At this point, I've tried mdbtools and can successfully retrieve metadata, and data from access. I just need to get the queries out too.
If anyone can point me in the right direction, I'd appreciate it. Windows is not a viable option.
Cheers, Seth
***********************************
* SOLUTION
***********************************
from jpype import *
startJVM(getDefaultJVMPath(), "-ea", "-Djava.class.path=/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/ucanaccess-4.0.2.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/commons-lang-2.6.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/commons-logging-1.1.1.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/hsqldb.jar:/Users/seth.urion/local/access/UCanAccess-4.0.2-bin/lib/jackcess-2.1.6.jar")
conn = java.sql.DriverManager.getConnection("jdbc:ucanaccess:///Users/seth.urion/PycharmProjects/pyAccess/FE_Hall_2010_2016_SnohomishRiverEstuaryHydrology_RAW.accdb")
for query in conn.getDbIO().getQueries():
print(query.getName())
print(query.toSQLString())
If you can find a satisfactory way to call Java methods from within Python then you could use the Jackcess Query#toSQLString() method to extract the SQL for a saved query. For example, I just got this to work under Jython:
from java.sql import DriverManager
def get_query_sql(conn, query_name):
sql = ''
for query in conn.getDbIO().getQueries():
if query.getName() == query_name:
sql = query.toSQLString()
break
return sql
# usage example
if __name__ == '__main__':
conn = DriverManager.getConnection("jdbc:ucanaccess:///home/gord/UCanAccessTest.accdb")
query_name = 'Jeff_Test'
query_sql = get_query_sql(conn, query_name)
if query_sql == '':
print '(Query not found.)'
else:
print 'SQL for query [%s]:' % (query_name)
print
print query_sql
conn.close()
producing
SQL for query [Jeff_Test]:
SELECT Invoice.InvoiceNumber, Invoice.InvoiceDate
FROM Invoice
WHERE (((Invoice.InvoiceNumber)>1));

How to extend Spark Catalyst optimizer with custom rules?

I want to use Catalyst rules to transform star-schema (https://en.wikipedia.org/wiki/Star_schema) SQL query to SQL query to denormalized star-schema where some fields from dimensions tables are represented in facts table.
I tried to find some extension points to add own rules to make a transformation described above. But I didn't find any extension points. So there are the following questions:
How can I add own rules to catalyst optimizer?
Is there another solution to implement a functionality described above?
Following #Ambling advice you can use the sparkSession.experimental.extraStrategies to add your functionality to the SparkPlanner.
An example strategy that simply prints "Hello world" on the console
object MyStrategy extends Strategy {
def apply(plan: LogicalPlan): Seq[SparkPlan] = {
println("Hello world!")
Nil
}
}
with example run:
val spark = SparkSession.builder().master("local").getOrCreate()
spark.experimental.extraStrategies = Seq(MyStrategy)
val q = spark.catalog.listTables.filter(t => t.name == "five")
q.explain(true)
spark.stop()
You can find a working example project on friend's GitHub: https://github.com/bartekkalinka/spark-custom-rule-executor
As a clue, now in Spark 2.0, you can import extraStrategies and extraOptimizations through SparkSession.experimental.

Scala slick 2.0 updateAll equivalent to insertALL?

Looking for a way to do a batch update using slick. Is there an equivalent updateAll to insertALL? Goole research has failed me thus far.
I have a list of case classes that have varying status. Each one having a different numeric value so I cannot run the typical update query. At the same time, I want to save the multiple update requests as there could be thousands of records I want to update at the same time.
Sorry to answer my own question, but what i ended up doing is just dropping down to JDBC and doing batchUpdate.
private def batchUpdateQuery = "update table set value = ? where id = ?"
/**
* Dropping to jdbc b/c slick doesnt support this batched update
*/
def batchUpate(batch:List[MyCaseClass])(implicit subject:Subject, session:Session) = {
val pstmt = session.conn.prepareStatement(batchUpdateQuery)
batch map { myCaseClass =>
pstmt.setString(1, myCaseClass.value)
pstmt.setString(2, myCaseClass.id)
pstmt.addBatch()
}
session.withTransaction {
pstmt.executeBatch()
}
}
It's not clear to me what you are trying to achieve, insert and update are two different operation, for insert makes sense to have a bulk function, for update it doesn't in my opinion, in fact in SQL you can just write something like this
UPDATE
SomeTable
SET SomeColumn = SomeValue
WHERE AnotherColumn = AnotherValue
Which translates to update SomeColumn with the value SomeValue for all the rows which have AnotherColumn equal to AnotherValue.
In Slick this is a simple filter combined with map and update
table
.filter(_.someCulomn === someValue)
.map(_.FieldToUpdate)
.update(NewValue)
If instead you want to update the whole row just drop the map and pass a Row object to the update function.
Edit:
If you want to update different case classes I'm lead to think that these case classes are rows defined in your schema and if that's the case you can pass them directly to the update function since it's so defined:
def update(value: T)(implicit session: Backend#Session): Int
For the second problem I can't suggest you a solution, looking at the JdbcInvokerComponent trait it looks like the update function invokes the execute method immediately
def update(value: T)(implicit session: Backend#Session): Int = session.withPreparedStatement(updateStatement) { st =>
st.clearParameters
val pp = new PositionedParameters(st)
converter.set(value, pp, true)
sres.setter(pp, param)
st.executeUpdate
}
Probably because you can actually run one update query at the time per table and not multiple update on multiple tables as stated also on this SO question, but you could of course update multiple rows on the same table.

Querying with cqlengine

I am trying to hook the cqlengine CQL 3 object mapper with my web application running on CherryPy. Athough the documentation is very clear about querying, I am still not aware how to make queries on an existing table(and an existing keyspace) in my cassandra database. For instance I already have this table Movies containing the fields Title, rating, Year. I want to make the CQL query
SELECT * FROM Movies
How do I go ahead with the query after establishing the connection with
from cqlengine import connection
connection.setup(['127.0.0.1:9160'])
The KEYSPACE is called "TEST1".
Abhiroop Sarkar,
I highly suggest that you read through all of the documentation at:
Current Object Mapper Documentation
Legacy CQLEngine Documentation
Installation: pip install cassandra-driver
And take a look at this example project by the creator of CQLEngine, rustyrazorblade:
Example Project - Meat bot
Keep in mind, CQLEngine has been merged into the DataStax Cassandra-driver:
Official Python Cassandra Driver Documentation
You'll want to do something like this:
CQLEngine <= 0.21.0:
from cqlengine.connection import setup
setup(['127.0.0.1'], 'keyspace_name', retry_connect=True)
If you need to create the keyspace still:
from cqlengine.management import create_keyspace
create_keyspace(
'keyspace_name',
replication_factor=1,
strategy_class='SimpleStrategy'
)
Setup your Cassandra Data Model
You can do this in the same .py or in your models.py:
import datetime
import uuid
from cqlengine import columns, Model
class YourModel(Model):
__key_space__ = 'keyspace_name' # Not Required
__table_name__ = 'columnfamily_name' # Not Required
some_int = columns.Integer(
primary_key=True,
partition_key=True
)
time = columns.TimeUUID(
primary_key=True,
clustering_order='DESC',
default=uuid.uuid1,
)
some_uuid = columns.UUID(primary_key=True, default=uuid.uuid4)
created = columns.DateTime(default=datetime.datetime.utcnow)
some_text = columns.Text(required=True)
def __str__(self):
return self.some_text
def to_dict(self):
data = {
'text': self.some_text,
'created': self.created,
'some_int': self.some_int,
}
return data
Sync your Cassandra ColumnFamilies
from cqlengine.management import sync_table
from .models import YourModel
sync_table(YourModel)
Considering everything above, you can put all of the connection and syncing together, as many examples have outlined, say this is connection.py in our project:
from cqlengine.connection import setup
from cqlengine.management import sync_table
from .models import YourTable
def cass_connect():
setup(['127.0.0.1'], 'keyspace_name', retry_connect=True)
sync_table(YourTable)
Actually Using the Model and Data
from __future__ import print_function
from .connection import cass_connect
from .models import YourTable
def add_data():
cass_connect()
YourTable.create(
some_int=5,
some_text='Test0'
)
YourTable.create(
some_int=6,
some_text='Test1'
)
YourTable.create(
some_int=5,
some_text='Test2'
)
def query_data():
cass_connect()
query = YourTable.objects.filter(some_int=5)
# This will output each YourTable entry where some_int = 5
for item in query:
print(item)
Feel free to let ask for further clarification, if necessary.
The most straightforward way to achieve this is to make model classes which mirror the schema of your existing cql tables, then run queries on them
cqlengine is primarily an Object Mapper for Cassandra. It does not interrogate an existing database in order to create objects for existing tables. Rather it is usually intended to be used in the opposite direction (i.e. create tables from python classes). If you want to query an existing table using cqlengine you will need to create python models that exactly correspond to your existing tables.
For example, if your current Movies table had 3 columns, id, title, and release_date you would need to create a cqlengine model that had those three columns. Additionally, you would need to ensure that the table_name attribute on the class was exactly the same as the table name in the database.
from cqlengine import columns, Model
class Movie(Model):
__table_name__ = "movies"
id = columns.UUID(primary_key=True)
title = columns.Text()
release_date = columns.Date()
The key thing is to make sure that model exactly mirrors the existing table. If there are small differences you may be able to use sync_table(MyModel) to update the table to match your model.

Resources