How to add session properties of presto in spark - apache-spark

Is there any way to set session parameters of presto in spark, while building a Dataframe out of it.
public Dataset<Row> readPrestoTbl(){
Dataset<Row> stgTblDF = sparksession
.read()
.jdbc(dcrIdentity.getProperty(env + "." + "presto_url")
+ "?SSL="
+ dcrIdentity.getProperty(env + "."
+ "presto_client_SSL"), demoLckQuery, getDBProperties());
}
private Properties getDBProperties() {
Properties dbProperties = new Properties();
dbProperties.put("user", prestoCredentials.getUsername());
dbProperties.put("password", prestoCredentials.getPassword());
dbProperties.put("Driver", "io.prestosql.jdbc.PrestoDriver");
dbProperties.put("task.max-worker-threads", "10");
return dbProperties;
}
The way I have set task.max-worker-threads this property is there any option to set session properties like, required_workers_count or query_max_run_time etc.
I also tried below options, but every time its says Unrecognized connection property 'sessionProperties'.
while adding in properties
dbProperties.put("sessionProperties","task.max-worker-threads:10");
while loading in spark
.option("sessionProperties", "task.max-worker-threads:10")

Trino (formerly PrestoSQL) JDBC driver supports sessionProperties property.
https://trino.io/docs/current/installation/jdbc.html?highlight=sessionproperties#parameter-reference
Also, this is a blog post about the rebranding.
https://trino.io/blog/2020/12/27/announcing-trino.html

Related

Insert data into cassandra using datastax driver

We are trying to insert data from CSV file into Cassandra using the DataStax driver for Java. What are the available methods to do so?
We are currently using running cqlsh to load from a CSV file.
The question is quite vague. Usually, you should be able to provide code, and give an example of something that isn't working quite right for you.
That being said, I just taught a class (this week) on this subject for our developers at work. So I can give you some quick examples.
First of all, you should have a separate class built to handle your Cassandra connection objects. I usually build it with a couple of constructors so that it can be called in a couple different ways. But each essentially calls a connect method, which looks something like this:
public void connect(String[] nodes, String user, String pwd, String dc) {
QueryOptions qo = new QueryOptions();
qo.setConsistencyLevel(ConsistencyLevel.LOCAL_ONE);
cluster = Cluster.builder()
.addContactPoints(nodes)
.withCredentials(user,pwd)
.withQueryOptions(qo)
.withLoadBalancingPolicy(
new TokenAwarePolicy(
DCAwareRoundRobinPolicy.builder()
.withLocalDc(dc)
.build()
)
)
.build();
session = cluster.connect();
With that in place, I also write a few simple methods to expose some functionality of the of the session object:
public ResultSet query(String strCQL) {
return session.execute(strCQL);
}
public PreparedStatement prepare(String strCQL) {
return session.prepare(strCQL);
}
public ResultSet query(BoundStatement bStatement) {
return session.execute(bStatement);
}
With those in-place, I can then call these methods from within a service layer. A simple INSERT (with preparing a statement and binding values to it) looks like this:
String[] nodes = {"10.6.8.2","10.6.6.4"};
CassandraConnection conn = new CassandraConnection(nodes, "aploetz", "flynnLives", "West-DC");
String userID = "Aaron";
String value = "whatever";
String strINSERT = "INSERT INTO stackoverflow.timestamptest "
+ "(userid, activetime, value) "
+ "VALUES (?,dateof(now()),?)";
PreparedStatement pIStatement = conn.prepare(strINSERT);
BoundStatement bIStatement = new BoundStatement(pIStatement);
bIStatement.bind(userID, value);
conn.query(bIStatement);
In addition, the DataStax Java Driver has a folder called "examples" in their Git repo. Here's a link to the "basic" examples, which I recommend reading further.

Cassandra Trigger Exception: InvalidQueryException: table of additional mutation does not match primary update table

i am using Cassandra Trigger on a table. I am following the example and loading trigger jar with 'nodetool reloadtriggers'. Then i am using
'CREATE TRIGGER mytrigger ON ..'
command from cqlsh to create trigger on my table.
Adding an entry into that table , my audit table is being populated.
But calling a method from within my Java application, which persists an entry into my table by using
'session.execute(BoundStatement)' i am getting this exception:
InvalidQueryException: table of additional mutation does not match primary update table
Why does the insertion into the table and the audit work when doing it directly with cqlsh and why does it fail when doing pretty much exactly the same with the Java application?
i am using this as AuditTrigger, very simplified(left out all of the other operations other than Row insertion:
public class AuditTrigger implements ITrigger {
private Properties properties = loadProperties();
public Collection<Mutation> augment(Partition update) {
String auditKeyspace = properties.getProperty("keyspace");
String auditTable = properties.getProperty("table");
CFMetaData metadata = Schema.instance.getCFMetaData(auditKeyspace,
auditTable);
PartitionUpdate.SimpleBuilder audit =
PartitionUpdate.simpleBuilder(metadata, UUIDGen.getTimeUUID());
if (row.primaryKeyLivenessInfo().timestamp() != Long.MIN_VALUE) {
// Row Insertion
JSONObject obj = new JSONObject();
obj.put("message_id", update.metadata().getKeyValidator()
.getString(update.partitionKey().getKey()));
audit.row().add("operation", "ROW INSERTION");
}
audit.row().add("keyspace_name", update.metadata().ksName)
.add("table_name", update.metadata().cfName)
.add("primary_key", update.metadata().getKeyValidator()
.getString(update.partitionKey()
.getKey()));
return Collections.singletonList(audit.buildAsMutation());
It seems like using BoundStatement, the trigger fails:
session.execute(boundStatement);
, using a regular cql queryString works though.
session.execute(query)
We are using Boundstatement everywhere within our application though and cannot change that.
Any help would be appreciated.
Thanks

Not able to persist Spark data set to orientdb

I am trying to fetch data from sql server database and created a spark dataset. When i persisting dataset to orientdb, not able to do that.
Getting below error
Exception in thread "main" java.lang.RuntimeException: Connection Exception Occurred: Error on opening database 'jdbc:orient:REMOTE:localhost/test'
Here is my code:
Map<String, String> options = new HashMap<>();
options.put("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver");
options.put("url", "jdbc:sqlserver://localhost:1433;databaseName=sample");
options.put("user", "username");
options.put("password", "password");
DataFrameReader jdbcDF = spark.read().format("jdbc").options(options);
Dataset<Row> tableDataSet = jdbcDF.option("dbtable", "Table1").load();
tableDataSet.createOrReplaceTempView("TEMP_V");
Dataset<Row> tableDataset1 = spark.sql("SELECT ID AS DEPT_ID, NAME AS DEPT_NAME FROM TEMP_V");
tableDataset1.write().format("org.apache.spark.orientdb.graphs")
.option("dburl", "jdbc:orient:remote:localhost/test")
.option("user", "root")
.option("password", "root")
.option("spark", "true")
.option("vertextype", "DEPARTMENT")
.mode(SaveMode.Overwrite)
.save();
At the moment of writing the orientdb's jdbc driver isn't able to persist a spark dataset. It should be patched to improve shark compatibility . It is, although, able to read from orientdb and load a dataset.
Please open an issue.

Does Accessor of datastax cassandra java driver use pagination?

Datastax's java driver for cassandra provides Accessor. Refer here
With reference to their example as below, do they do pagination and fetch records in batches or is there a risk of the queries timing out ?
#Accessor
public interface UserAccessor {
#Query("SELECT * FROM user")
Result<User> getAll();
}
When I say pagination, do they internally do something similar to below
Statement stmt = new SimpleStatement("SELECT * FROM user");
stmt.setFetchSize(24);
ResultSet rs = session.execute(stmt);
Yes there is a fetch size used behind the scenes. The driver will auto page for you as needed.
You will probably want to set a fetch size via #QueryParameters. The default at this time is 5k, see DEFAULT_FETCH_SIZE.
Here is an example of how I am using the fetchSize in the #QueryParameters annotation within an Accessor:
#Accessor
public interface UserAccessor {
#Query("SELECT * FROM users")
#QueryParameters(fetchSize = 1000)
Result<User> getAllUsers();
}

Configure HikariCP in Slick 3

I have a project with that is using Slick 3.1.0 with jTDS as the JDBC driver. When I enable connection pooling, which is using HikariCP, I get the following exception:
java.sql.SQLException: JDBC4 Connection.isValid()
method not supported, connection test query must be configured
So for SQL Server the SQL query would be SELECT 1. My question is: When using Slick, how do I set properties for HikariCP? Is there some property to set in the config file? I tried the following to no effect:
jtds {
driver = "net.sourceforge.jtds.jdbc.Driver"
url = "jdbc:jtds:sqlserver://foobar.org:1433/somedatabase"
user = "theUser"
password = "theSecretPassword"
properties {
connectionTestQuery = "SELECT 1"
}
}
Found the solution to my own question. The HikariCP properties just go straight into the config file. For example, the solution to setting the connection test query was:
jtds {
driver = "net.sourceforge.jtds.jdbc.Driver"
url = "jdbc:jtds:sqlserver://foobar.org:1433/somedatabase"
user = "theUser"
password = "theSecretPassword"
connectionTestQuery = "SELECT 1"
}

Resources