CRUD operations Cassandra mappers vs prepared statement - cassandra

I would like some help defining the difference between the use of the object mapper of datastax for Cassandra and the common solution of using prepared statements. Instead of the fact that the code will be cleaner with mapping the objects to POJO classes are there any other advantages regarding performance ect. . Thanks for the answers.

Object Mapper uses prepared statements under the hood (see source code), so performance-wise there shouldn't be very big performance difference. You need to pay attention to setting options, like saveNullFields (if you save nulls, then the thombstones could be generated that could affect read performance). Also, for high-performance writes you may need to look to async versions of Mapper opperations.
Also, you need to make sure that you're not creating MappingManager more than once - it's also thread-safe like Session object.

Related

Customize jOOQ dialect to alter the order in which LIMIT and OFFSET are rendered in a statement

I'm using jOOQ to generate queries to run against Athena (AKA PrestoDB/Trino)
To do this, I am using SQLDialects.DEFAULT, and it works because I use very basic query functionalities.
However, jOOQ renders queries like this:
select *
from "Artist"
limit 10
offset 10
God knows why, but the order of limit and offset seem to matter, and the query only works if written with the order swapped:
select *
from "Artist"
offset 10
limit 10
Is there a class I can subclass, to modify the statement render function so that the order of these are swapped? Or any other way of implementing this myself?
A generic solution in jOOQ
There isn't a simple way to change something as fundamental as the SELECT clause order (or any other SELECT clause syntax) so easily in jOOQ, simply, because this was never a requirement for core jOOQ usage, other than supporting fringe SQL dialects. Since the support of a SQL dialect is a lot of work in jOOQ (with all the integration tests, edge cases, etc.) and since market shares of those dialects are low, it has simply never been a priority to improve this in jOOQ.
You may be tempted to think that this is "just" about the order of keywords in this one case. "Only this one case." It never is. It never stops, and the subtle differences in dialects never end. Just look at the jOOQ code base to get an idea of how weirdly different vendors choose to make their dialects. In this particular case, one would think that it seems extremely obvious and simple to make this clause MySQL / PostgreSQL / SQLite compatible, so your best chance is to make a case with the vendor for a feature request. It should be in their own best interest to be more compatible with the market leaders, to facilitate migration.
Workarounds in jOOQ
You can, of course, patch your generated SQL on a low level, e.g. using an ExecuteListener and a simple regex. Whenever you encounter limit (\d+|\?) offset (\d+|\?), just swap the values (and bind values!). This might work reasonably well for top level selects. It's obviously harder if you're using LIMIT .. OFFSET in nested selects, but probably still doable.
Patching jOOQ is always an option. The class you're looking for in jOOQ 3.17 is org.jooq.impl.Limit. It contains all the rendering logic for this clause. If that's your only patch, then it might be possible to upgrade jOOQ. But obviously, patching is a slippery slope, as you may start patching all sorts of clauses, making upgrades impossible.
You can obviously use plain SQL templates for simple cases, e.g. resultQuery("{0} offset {1} limit {2}", actualSelect, val(10), val(10)). This doesn't scale well, but if it's only about 1-2 queries, it might suffice
Using the SQLDialect.DEFAULT
I must warn you, at this point, that the behaviour of SQLDialect.DEFAULT is unspecified. Its main purpose is to produce something when you call QueryPart.toString() on a QueryPart that is not an Attachable, where a better SQLDialect is unavailable. The DEFAULT dialect may change between minor releases (or even patch releases, if there's an important bug in some toString() method), so any implementation you base on this is at risk of breaking with every upgrade.
The most viable long term solution
... would be to have support for these dialects in jOOQ:
#5414 Presto
#11485 Trino

JOOQ vs SQL Queries

I am on jooq queries now...I feel the SQL queries looks more readable and maintainable and why we need to use JOOQ instead of using native SQL queries.
Can someone explains few reason for using the same?
Thanks.
Here are the top value propositions that you will never get with native (string based) SQL:
Dynamic SQL is what jOOQ is really really good at. You can compose the most complex queries dynamically based on user input, configuration, etc. and still be sure that the query will run correctly.
An often underestimated effect of dynamic SQL is the fact that you will be able to think of SQL as an algebra, because instead of writing difficult to compose native SQL syntax (with all the keywords, and weird parenthesis rules, etc.), you can think in terms of expression trees, because you're effectively building an expression tree for your queries. Not only will this allow you to implement more sophisticated features, such as SQL transformation for multi tenancy or row level security, but every day things like transforming a set of values into a SQL set operation
Vendor agnosticity. As soon as you have to support more than one SQL dialect, writing SQL manually is close to impossible because of the many subtle differences in dialects. The jOOQ documentation illustrates this e.g. with the LIMIT clause. Once this is a problem you have, you have to use either JPA (much restricted query language: JPQL) or jOOQ (almost no limitations with respect to SQL usage).
Type safety. Now, you will get type safety when you write views and stored procedures as well, but very often, you want to run ad-hoc queries from Java, and there is no guarantee about table names, column names, column data types, or syntax correctness when you do SQL in a string based fashion, e.g. using JDBC or JdbcTemplate, etc. By the way: jOOQ encourages you to use as many views and stored procedures as you want. They fit perfectly in the jOOQ paradigm.
Code generation. Which leads to more type safety. Your database schema becomes part of your client code. Your client code no longer compiles when your queries are incorrect. Imagine someone renaming a column and forgetting to refactor the 20 queries that use it. IDEs only provide some degree of safety when writing the query for the first time, they don't help you when you refactor your schema. With jOOQ, your build fails and you can fix the problem long before you go into production.
Documentation. The generated code also acts as documentation for your schema. Comments on your tables, columns turn into Javadoc, which you can introspect in your client language, without the need for looking them up in the server.
Data type bindings are very easy with jOOQ. Imagine using a library of 100s of stored procedures. Not only will you be able to access them type safely (through code generation), as if they were actual Java code, but you don't have to worry about the tedious and useless activity of binding each single in and out parameter to a type and value.
There are a ton of more advanced features derived from the above, such as:
The availability of a parser and by consequence the possibility of translating SQL.
Schema management tools, such as diffing two schema versions
Basic ActiveRecord support, including some nice things like optimistic locking.
Synthetic SQL features like type safe implicit JOIN
Query By Example.
A nice integration in Java streams or reactive streams.
Some more advanced SQL transformations (this is work in progress).
Export and import functionality
Simple JDBC mocking functionality, including a file based database mock.
Diagnostics
And, if you occasionally think something is much simpler to do with plain native SQL, then just:
Use plain native SQL, also in jOOQ
Disclaimer: As I work for the vendor, I'm obviously biased.

Does CouchDB share map functions used by multiple views?

CouchDB's documentation says it supports multiple views with the same map function but different reduce functions. If both views are in the same design document then the map function will only be computed once.
Is this correct? Does the database compare the text of JavaScript map functions to decide whether to share the map?
CouchDB itself does not do this. It's an implementation detail whether or not the query server does. In theory, the query server could cache functions for future sessions, giving a benefit similar to what you're describing.
In practice, I expect the performance gain would be minuscule for most interpreted languages, like javascript (since execution is already batched), so probably not worth it in the general case. It might be worth it for certain workloads, where you may want to write your own query server.
If using a query server for a compiled language (C, Java, Go, whatever), it probably would make sense to cache the compiled artifact for re-use.

Save or update in Groovy

Usually in Java I execute a SELECT statement and check the size of ResultSet. If it is zero I issue a INSERT and otherwise an UPDATE.
Since Groovy provides syntactic sugar on top of JDBC, I'm wondering if it provides a way to ease this process? Is there an easier way to save or update a record?
Note:
I know that Hibernate offers this, but I'd rather stick only with Groovy API.
There's a lightweight ORM called GStorm here which I've had on my list of things to investigate which has basically no dependencies, but doesn't handle related domain objects
And a library to leverage Grails GORM here (which obviously pulls GORM out of Grails so has quite a few dependencies including Hibernate)
Other than that (and probably some other examples I've missed), there's nothing I know of to do what you're trying to do. I guess you'd have to write your own (you could switch between the INSERT or UPDATE depending on whether you pass a Primary Key -- assuming primary keys are auto-generated by the DB)

In CouchDB, are there ways to improve performance of the View index process?

I have some basic views and some map/reduce views with logic. Nothing too complex. Not too many documents. I've tried with 250k, 75k, and 10k documents. Seems like I'm always waiting for view indexing.
Does better, more efficient code in the view help? I'm assuming it's basically processing the view at all levels of aggregation. So there must be some improvement there.
Does emit()-ing less data help? emit(doc.id, doc) vs specifying fewer fields?
Do more or less complex keys impact view indexing?
Or is it all about memory, CPU cores, and processor speed?
There must be some documentation out there, but I can't find anything referencing ways to improve performance.
I would take a deeper look into the reduce function. Try to use the built-in Erlang functions like _sum, _count, instead of writing Javascript.
Complex views can take hours and more, that's normal.
Maybe post such not too complex map/reduce.
And don't forget: indexing all docs is only done once after changing the view (or pushing a whole bunch of new docs). Subsequent new docs are indexed incrementally.
Use a view with &stale=ok to retrieve the "old" data instantly, so you don't have to wait. (But pay attention: you always have to call a view without stale=ok at least once to trigger the indexing process). Or better: use stale=update_after.
The code you write in views is more like CREATE INDEX than SELECT. It should be irrelevant how long it takes, as long as the view builds keep up with the document change rate. Building a view is a sunk (one-time) cost.
When you query the view, that is always a binary tree scan, which operates against a static data set in logarithmic time. That is usually the performance people care about more (in production.)
If you are not seeing behavior like I describe, perhaps we could discuss your view functions and your general approach to your problem. CouchDB is very different from relational databases. In the latter, you have highly structured data and free-form queries. In CouchDB, you have free-form data but highly structured index definitions (views). Except during development, changing and rebuilding views should be rare.
not emitting anything will help, but doing the view creation in smaller batches ( there are scripts that do this automagically ) helps more than anything other than not emitting anything at all, which can't be helped sometimes.

Resources