How Datastax PreparedStatements work - cassandra

When we create a PreparedStatement object is it cached on the server side? How it is different comparing to PreparedStatement in Oracle driver? If prepared statement is reused, what data is sent to Cassandra server, param values only?
From what I understand, one Session object in java driver holds multiple connections to multiple nodes in cluster. If we reuse the same prepared statement in our application in multiple threads, will make us using only one connection to one Cassandra? I guess preparing statement is done on one connection only... What happens when routing key is updated by each execute call?
What are benefits of using prepared statements?
Thank you

Yes, only the statement ID and parameters need to be sent after preparing the statement.
The driver tracks statement IDs for each server in its connection pool; it's transparent to your application.
The benefit is improved performance from not having to re-compile the statement for each query.

Related

sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. In django 2.2

I'm facing this issue in test server. but not in production. I tried some of the solutions like python manage.py runserver --noreload and edit
/lib/python3.6/site-packages/django/utils/autoreload.py this file.
Mentioned in the document.
https://github.com/django/django/commit/5bf2c87ece216b00a55a6ec0d6c824c9edabf188
This the error message look like,
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140000522213120 and this is thread id 140000744696768.
Please suggest me a solution to rectify this problem, Anyone faced this issue before. Help me to solve this issue.
The problem here is that SQLite has to deal with conflicts arising due to concurrent access by multiple threads i.e., SQLite database created and accessed by one thread cannot allow another thread to access it. This may result from following scenarios:
global connection objects are created which are then accessed later by different threads
connection objects are not closed properly between different connections
Its always recommended that an ORM is used to deal with databases and efficiently manage their connection lifecycles. For Sqlite, the most widely used ORM is SqlAlchemy. Using an ORM can probably fix the issue.
However, for very simple applications, where using an ORM is just an overkill, you can tweak the way connection is created to the Sqlite database by allowing concurrent access. This can be done by setting check_same_thread parameter to False while establishing the connection:
def initDB(self, file_path):
self.file_path = file_path
self.cx = sqlite3.connect(file_path, check_same_thread=False)
self.cx.execute(self.create_table_str)
self.cx.execute(self.create_detail_table_str)
print("init the table strucutre successfully")
Having said that, setting up Sqlite connection this way lays responsibility to handle concurrency on the application instead of the database and user should ensure that write operations to the database are serialized in order to avoid any dirty writes/updates.
Note: When using sqlalchemy, its important to use the right libraries and code segregation. I have particularly found this post helpful as well.

MongoDB Performance when connecting to multiple databases via parent-child connections

When connecting to a mongo server containing multiple dbs, what is more performant approach using node-mongodb-native driver.
Let's say I have 8 dbs(db1...db8) on the same Mongo Server. My node app needs to connect to all 8 depending on the queries received to it. What is a better option here for me
1) Create 8 separate connections (1 with each db)
OR
2) Create one parent connection to the server on test db and then call db.db 8 times to create 8 child connections under that parent. As I read in the doc(http://mongodb.github.io/node-mongodb-native/2.0/api/Db.html#db), all 8 child connections will be running on the same socket
Has anyone researched into this or has some background or thoughts that can help me determine the right course of action?
How granular is MongoDB concurrency?: this depends on the version. Since MongoDB 3 many operations lock on the document. Earlier versions would apply a lock on the entire collection. Some operations still lock on the entire instance (aka server). This means that sometimes an operation (likely operations involving multiple databases) can block an entire instance affecting all databases within it. https://docs.mongodb.com/manual/faq/concurrency/#how-granular-are-locks-in-mongodb
Threading model: node.js is asynchronous while MongoDB is not. MongoDB will use one thread per socket. If you perceive operations are blocking each other you should keep seperate connection pools. http://mongodb.github.io/node-mongodb-native/2.2/reference/faq/

Using Sessions in Cassandra

When using cassandra datastax java driver, When can I use multiple sessions under same cluster? I am not able to find any good usecase for having a cluster and multiple sessions.
My application have multiple components/modules that accesses Cassandra. Based on the answer I may decide Should I be having one session per component/module or just one session shared across all the components of my application.
Update: Everywhere on the internet they recommend to use one session. I get it, but my question is "in what scenario do you create multiple sessions for one cluster?". If there is no such scenario, why the library allows to create multiple sessions, instead the library can just have a method to return a singleton session object.
Use Just One Session across all your component.
Because In Cassandra Session is a heavy object. Thread-safe. It maintain multiple connection, cached prepared statement etc.
Here is the JavaDoc :
A session holds connections to a Cassandra cluster, allowing it to be queried. Each session maintains multiple connections to the cluster nodes, provides policies to choose which node to use for each query (round-robin on all nodes of the cluster by default), and handles retries for failed query (when it makes sense), etc...
Session instances are thread-safe and usually a single instance is enough per application. As a given session can only be "logged" into one keyspace at a time (where the "logged" keyspace is the one used by query if the query doesn't explicitely use a fully qualified table name), it can make sense to create one session per keyspace used. This is however not necessary to query multiple keyspaces since it is always possible to use a single session with fully qualified table name in queries.
Source :
https://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Session.html
https://ahappyknockoutmouse.wordpress.com/2014/11/12/246/

Cassandra - how to manage sessions

I am new to Cassandra, and I would like to ask you something. I have some events, and on each event, the application responds with some code that is similar to this:
Cluster cluster = Cluster.builder().addContactPoint(CONTACT_POINT).build();;
Session session = cluster.connect(KEYSPACE);
Statement statement = QueryBuilder.update(KEYSPACE, TABLE_NAME)
.with(set(STATE_COLUMN, status.toString()))
.and(set(PERCENT_DONE_COLUMN, percentDone))
.where(eq(FILE_ID_COLUMN, id));
//or whatever query I might have
session.execute(statement);
cluster.close();
My question is this:
Is it better to call cluster.connect() and cluster.close() each time, or just call cluster.connect() once at application start up?
Thanks
Connections in Cassandra are designed to be persistent, so they should not be opened and closed for each CQL statement. Setting up a connection is somewhat expensive, since it creates thread pools and obtains a lot of metadata from the cluster.
You want to set up the connection once at application startup and close it when your application is shutting down. If you have multiple threads within your application, you generally want them to all share a single connection.
You need to connect and close as less as possible.
http://docs.datastax.com/en/developer/java-driver/2.1/java-driver/fourSimpleRules.html
While the session instance is centered around query execution, the
Session it also manages the per-node connection pools. The session
instance is a long-lived object, and it should not be used in a
request-response, short-lived fashion. The code should share the same
cluster and session instances across your application.

JDBC: Can I share a connection in a multithreading app, and enjoy nice transactions?

It seems like the classical way to handle transactions with JDBC is to set auto-commit to false. This creates a new transaction, and each call to commit marks the beginning the next transactions.
On multithreading app, I understand that it is common practice to open a new connection for each thread.
I am writing a RMI based multi-client server application, so that basically my server is seamlessly spawning one thread for each new connection.
To handle transactions correctly should I go and create a new connection for each of those thread ?
Isn't the cost of such an architecture prohibitive?
Yes, in general you need to create a new connection for each thread. You don't have control over how the operating system timeslices execution of threads (notwithstanding defining your own critical sections), so you could inadvertently have multiple threads trying to send data down that one pipe.
Note the same applies to any network communications. If you had two threads trying to share one socket with an HTTP connection, for instance.
Thread 1 makes a request
Thread 2 makes a request
Thread 1 reads bytes from the socket, unwittingly reading the response from thread 2's request
If you wrapped all your transactions in critical sections, and therefore lock out any other threads for an entire begin/commit cycle, then you might be able to share a database connection between threads. But I wouldn't do that even then, unless you really have innate knowledge of the JDBC protocol.
If most of your threads have infrequent need for database connections (or no need at all), you might be able to designate one thread to do your database work, and have other threads queue their requests to that one thread. That would reduce the overhead of so many connections. But you'll have to figure out how to manage connections per thread in your environment (or ask another specific question about that on StackOverflow).
update: To answer your question in the comment, most database brands don't support multiple concurrent transactions on a single connection (InterBase/Firebird is the only exception I know of).
It'd be nice to have a separate transaction object, and to be able to start and commit multiple transactions per connection. But vendors simply don't support it.
Likewise, standard vendor-independent APIs like JDBC and ODBC make the same assumption, that transaction state is merely a property of the connection object.
It's uncommon practice to open a new connection for each thread.
Usually you use a connection pool like c3po library.
If you are in an application server, or using Hibernate for example, look at the documentation and you will find how to configure the connection pool.
The same connection object can be used to create multiple statement objects and these statement objects can then used by different threads concurrently. Most modern DBs interfaced by JDBC can do that. The JDBC is thus able to make use of concurrent cursors as follows. PostgreSQL is no exception here, see for example:
http://doc.postgresintl.com/jdbc/ch10.html
This allows connection pooling where the connection are only used for a short time, namely to created the statement object and but after that returned to the pool. This short time pooling is only recommended when the JDBC connection does also parallelization of statement operations, otherwise normal connection pooling might show better results. Anyhow the thread can continue work with the statement object and close it later, but not the connection.
1. Thread 1 opens statement
3. Thread 2 opens statement
4. Thread 1 does something Thread 2 does something
5. ... ...
6. Thread 1 closes statement ...
7. Thread 2 closes statement
The above only works in auto commit mode. If transactions are needed there is still no need to tie the transaction to a thread. You can just partition the pooling along the transactions that is all and use the same approach as above. But this is only needed not because of some socket connection limitation but because the JDBC then equates the session ID with the transaction ID.
If I remember well there should be APIs and products around with a less simplistic design, where teh session ID and the transaction ID are not equated. In this APIs you could write your server with one single database connection object, even when it does
transactions. Will need to check and tell you later what this APIs and products are.

Resources