What is the disadvantage of using a bulk enabled connection for non bulk operations - sap-ase

There are lots of articles on how to use Sybase Open Client bulk interface saying that it will only work if the connection used is "bulk enabled". Which begs the question why wouldn't I always create connections that are bulk enabled, even if I may not use them for bulk operations. I can't find anything that talks about this.

Bulk operations do minimal to no logging in the database. This means that in case of a server failure, transactions could be lost, and there would be fewer recovery options available.
So the default is to make things as recoverable as possible with being bulk options being disabled, and letting the developer and dba decide whether to enable bulk enabled connections, and live with the possible impact on operations.

Related

Limiting Cassandra query syntax for clients

We plan to use Cassandra 3.x and we want to allow our customers to connect to Cassandra directly for exporting the data into their data warehouses.
They will connect via ODBC from remote.
Is there any way to prevent that the customer executes huge or bad SELECT statements that will result in a high load for all nodes? We use an extra data center in our replication strategy where only customers can connect, so live system will not be affected. But we want to setup some workers that will run on this shadow system also. Most important thing is, that a connected remote client will not have any noticable impact on other remote connections or our local worker jobs. There is a materialized view already and I want to force customers to get data based on primary key only (i.e. disallow usage of ALLOW FILTERING). It would be great also, if one can limit the number of rows returned (e.g. 1 million) to prevent a pull of all data.
Is there a best practise for this use case?
I know of BlackRocks video related to multi-tenant strategy in C* which advises to use tenant_id in schema. That is what we're doing already, but how can I ensure security/isolation via ODBC connected tenants/customers? Or do I have to write an API on my own which handles security?
I would recommend to expose access via API, not via ODBC - at least you would have greater control on what is executed, and enforce tenant_id, and other checks, like limits, etc. You can try to utilize the Cassandra's CQL parser to decompose query, and put all required things back.
Theoretically, you can could utilize Apache Calcite, for example. It has implementation of JDBC driver that could be used, plus there is existing Cassandra adapter that you can modify to accomplish your task (mapping authentication into tenant_ids, etc.), but this will be quite a lot of work.

Does the Cassandra driver have its own speculative-retry mechanism?

Cassandra since v2.0.2 have mechanism named Rapid Read Protection described in details here. For this question important notes from blog post are:
Mechanism controlled by a per-table speculative_retry setting
Coordinator node is responsible for applying this mechanism - it starts new read-request if retry condition is satisfied.
But documentation for cassandra java-driver describes something very similar here, named also similar speculative query execution. But driver needs some additional libraries to use this feature.
Q1: Am I right that this mean that it is implemented on driver-side and have no relations to Rapid Read Protection implemented inside cassandra?
If so, that means that driver will retry a query with anther coordinator, if driver retry condition is satisfied.
Q2: For read queries retry on coordinator-side seems more effective, since even when you switch coordinator for query it's still a chance that another one will query same set of nodes(and will have same response time as previous). But I didn't find how to enable driver-side retry only for write queries. So if I want to use retry on all type queries - should I disabler RR on cassandra-server-side, since double protection will give more pressure to cluster? Or I can gain some profit by enabling both of them?
Q1: Yes, speculative query execution in the driver is completely independent of the cluster rapid reads.
Q2.1: For the first part, it's not absolutely necessary as the coordinator could be busy processing other requests, etc.
Q2.2: I think you can enable both mechanisms (cluster and client side) and play a bit with their configurations.

MongoDB - how to make Replica set Step down truly seamless

The problem is - when your Replica Set is forced to step down while your application is running, all mainstream Mongo clients will throw at least one exception per connection. This happens because their database connections are hardwired to the physical server which used to be the primary, and no longer accepts queries. So, while MongoDB architects might think that the StepDown process does not create any downtime, in reality if you handle connections according to their documentation, each step down will cause a full blown crash for at least one user, and might even create a data integrity issue. I hope, this can be avoided with a simple wrapper that captures some specific Mongo exceptions and handles them by automatically re-connecting to the Replica Set, and re-running the failed query. If you already have a solution for this, please share! I am particularly interested in a solution that works with any major Mongo driver for Node.JS.
You are correct -- this is the exact behavior I experienced with both mainstream ODMs as well as the official native MongoDB driver for Node.js.
Replica set step-downs would cause my outstanding queries to fail with "Could not locate any valid servers in initial seed list", "sockets closed", and "ECONNRESET" before additional queries would get buffered up even though bufferMaxEntries is correctly configured.
Therefore, I developed Monkster to provide seamless replica set step-down and overall high-availability for MongoDB clusters for Node.js developers using the popular Monk ODM.
Monkster is a Node.js package that provides high availability for Monk, the wise MongoDB API. It implements smart error handling and retry logic to handle temporary network connectivity issues and replica set step-downs seamlessly.
https://www.npmjs.com/package/monkster

Is Mongodb's lack of transaction a deal breaker?

I've been doing some research but have reached the point where I think MongoDB/Mongoose (on Node.js) is not the right tool for the job. Here is the scenario...
Two documents: Account (money) information and Inventory information
Check if user's account has enough money
If so, check and deduct inventory
Deduct funds from Account Information
It seems like I really need a transaction system to prevent other events from altering the data in between steps.
Am I correct, or can this still be handled in MongoDB/Mongoose? If not, is there a NoSQL db that I should check out, preferably with Node.JS support?
Implementing transactional safety is usually tricky and requires more than just transactions on the database, e.g. if you need to communicate with external parties in a reliable fashion or if the transaction runs over minutes, hours or even days. But that's leading to far.
Anyhow, on the db side you can do transactions in MongoDB using two-phase commits, but it's not exactly trivial.
There's a ton of NoSQL databases with transaction support, e.g. redis, cassandra (using the Paxos protocol) and foundationdb.
However, this seems rather random to me because the idea of NoSQL databases is to use one that fits your particular problem. If you just need 'anything' with transactions, an SQL db might do the job, right?
You can always implement your own locking mechanism within your application to lock out other sections of the app while you are making your account and inventory checks and updates. That combined with findAndModify() http://docs.mongodb.org/manual/reference/command/findAndModify/#dbcmd.findAndModify may be enough for your transaction needs while also maintaining the flexibility of a NoSQL solution.
For the distributed lock I'd look at Warlock https://www.npmjs.org/package/node-redis-warlock I've not used it myself but it's node.js based and built on top of Redis, although implementing your own via Redis is not that hard to begin with.

couchdb saftey of abitrary design docs

I am creating a couchdb database per user of my application, in which the application is granted database admin privileges. This is done so that the application can sync design docs -- but I do not want to expose my server to any risks.
There is no legitimate reason for a user to run a view on my server (they only use the server for 2-way sync'ing) so it wouldn't be hard to filter requests out that were attempting to view views?
Are there other security risks or DoS attacks I'm missing?
Every user that has read access to your database is able to run view. That's not an issue since view index builds once and updates incrementally.
But database admins can create new views whatever they like. Views couldn't consume a lot of CPU time since CouchDB limits their execution with timeout (default 5 sec), but they could consume a lot of disk space, especially if full doc content will be emitted from view - this could make single index view be bigger than whole database.
More over, database admins can run database and view index compactions - these operations are very heavy for disk IO (and sometimes for CPU too), especially for large databases (100GiBs+). These tasks may significantly slow down (single compaction probably may not, but multiple - easily will) your server performance if will be running at the peak of your users activity.
Things can get worse if you're using custom view server without sandbox feature (like Python, Erlang etc.). By the fact, they will allow your db admins execute custom code on your server though CouchDB. In this case, losing all databases and finding remote shell on your server are just the top of the iceberg of possibilities.
Resume: don't assign to database admins people whom you cannot trust and you'll be safe.

Resources