DataStax Java Driver loop repeating rows - cassandra

I'm developing a new product using cassandra as DB. Right now installed on a single ubuntu 13.10 development laptop core i7. I have a column family and a query. This query, executed in cqlsh give 33267 rows. Executed on my java program, using the datastax java driver 2.0, some executions give the correct rows, others got into an infinite loop repeating again and again the same rows:
while (!rs.isExhausted()) {
Row row = rs.one();
long hora = row.getDate(1).getTime();
String clave = row.getString(0);
List<Long> data = row.getList(2, Long.class);
ordenados.put(hora, new Object[]{clave, data.get(0) / 100000000.0, data.get(1)});
contador2 +=1;
if (Math.floor(contador2/1000.0) == contador2/1000.0) {
System.out.println("sitio "+ contador2+ " "+clave+ " "+hora);
}
}
When profiling the app, I see lock contention betweeen new I/O workers threads, 98% time is spend on sun.nio.ch.EPollArrayWrapper.poll method.
Someone has experienced this issue and know a solution?
Someone can me direct to a link to download the cassandra-driver-core-2.0.0.src.jar so I can debug the error with the sources and report to datastax?
This is an exciting technology, but is the first time in my career a production DB give me so unreliable behaviour.
By the way: The original query had an order by that I removed. With the order by, I got this exception:
Exception in thread "main" com.datastax.driver.core.exceptions.InvalidQueryException: Cannot page queries with both ORDER BY and a IN restriction on the partition key; you must either remove the ORDER BY or the IN and sort client side, or disable paging for this query
When yesterday worked on similar queries and on cqlsh it works without problem with the order by added. I just talk about this problem because maybe both are related.
Regards

You can get the source from githib datastax/java-driver. It doesn't look like the source is included in either the maven or tarball downloads.
I think you are encountering CASSANDRA-6722 when you used IN and ORDER BY in your query. The java-driver automatically does paging with a default fetch size of 5000. You can disable automatic paging with Statement.setFetchSize(Integer.MAX_VALUE). There is more info about automatic paging in this blog post.
What version of Cassandra is you application connecting to? If you could share more about your table definition and query maybe it will be possible to reproduce the repeating rows issue.

Related

Could my large amount of tables (2k+) be causing my write timeout exceptions?

I'm running OS Cassandra 3.11.9 with Datastax Java Driver 3.8.0. I have a Cassandra keyspace that has multiple tables functioning as lookup tables / search indices. Whenever I receive a new POST request to my endpoint, I parse the object and insert it in the corresponding Cassandra table. I also put inserts to each corresponding lookup table. (10-20 per object)
When ingesting a lot of data into the system, I've been running into WriteTimeoutExceptions in the driver.
I tried to serialize the insert requests into the lookup tables by introducing Apache Camel and putting all the Statements into a queue that the Session could work off of, but it did not help.
With Camel, since the exceptions are now happening in the Camel thread, the test continues to run, instead of failing on the first exception. Eventually, the test seems to crash Cassandra. (Nothing in the Cassandra logs though)
I also tried to turn off my lookup tables and instead insert into the main table 15x per object (to simulate a similar number of writes as if I had the lookup tables on). This test passed with no exception, which makes me think the large number of tables is the problem.
Is a large number (2k+) of Cassandra tables a code smell? Should we rearchitect or just throw more resources at it? Nothing indicative has shown in the logs, mostly just some status about the number of tables etc - no exceptions)
Can the Datastax Java Driver be used multithreaded like this? It says it is threadsafe.
There is a direct effect of the high number of tables onto the performance - see this doc (the whole series is good source of information), and this blog post for more details. Basically, with ~1000 tables, you get ~20-25% degradation of performance.
That's could be a reason, not completely direct, but related. For each table, Cassandra needs to allocate memory, have a part for it in the memtable, keep information about it, etc. This specific problem could come from the blocked memtable flushes, or something like. Check the nodetool tpstats and nodetool tablestats for blocked or pending memtable flushes. It's better to setup some continuous monitoring solution, such as, metrics collector for Apache Cassandra, and and for period of time watch for the important metrics that include that information as well.

Cassandra datastax OperationTimedOutException

I'm using a 3 nodes cassandra 3.0.14 deployed on 3 differents VM. I have a lots of data (billions) and I would like to make quick search among my Cassandra architecture.
I've made a lots of research on Cassandra but I'm still facing some issues that I cannot understand:
When I am using cqlsh I can make a query that analyzes all my database
SELECT DISTINCT val_1 FROM myTable; is working.
However I cannot make the same request using my java code and datastax driver. My script return:
Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [/XX.XX.XX.XX:9042] Timed out waiting for server response
Some request are working using cqlsh but making a more specific request will lead to a request timeout:
OperationTimedOut: errors={'127.0.0.1': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.1
For example if I'm making this request:
SELECT val_1 FROM myTable where time>'2018-09-16 09:00:00'; will work
SELECT val_1 FROM myTable where time>'2018-09-16 09:00:00' and time<'2018-09-17 09:00:00'; will lead to time out
I changed my request_timeout_in_ms to 60s but I know it is not a good practice. I also increase my read_request_timeout_in_ms and range_request_timeout_in_ms but I still have the previous issues.
Would anyone have the same problems ?
-Nicolas
Try to adjust the client timeout in Java code, as follows:
//configure socket options
SocketOptions options = new SocketOptions();
options.setConnectTimeoutMillis(30000);
options.setReadTimeoutMillis(300000);
options.setTcpNoDelay(true);
//spin up a fresh connection (using the SocketOptions set up above)
cluster = Cluster.builder().addContactPoint(Configuration.getCassandraHost()).withPort(Configuration.getCassandraPort())
.withCredentials(Configuration.getCassandraUser(), Configuration.getCassandraPass()).withSocketOptions(options).build();
This happens because you're using Cassandra incorrect way. The range operations, distinct, etc. works best only if you have the partition key specified in your query. Otherwise, Cassandra will need to scan whole cluster trying to find the data that you need, and this will lead to timeout even on the medium-sized database. Don't use the ALLOW FILTERING to enforce execution of the queries.
In Cassandra, database structure is modeled around the queries that you want to execute. I recommend to take DS201 & DS220 courses from the DataStax Academy.

Azure SQL Database update performance

We're migrating some databases from an Azure VM running SQL Server to Azure SQL. The current VM is a Standard DS12 v2 with two 1TB SSDs attached.
We are using an elastic pool at the P1 performance level. We're early days in this, so nothing else is really running in the pool.
At any rate, we are doing an ETL process that involves a handful of ~20M row tables. We bulk load these tables and then update some attributes to help with the rest of the process.
For example, I am currently running the following update:
UPDATE A
SET A.CompanyId = B.Id
FROM etl.TRANSACTIONS AS A
LEFT OUTER JOIN dbo.Company AS B
ON A.CO_ID = B.ERPCode
TRANSACTIONS is ~ 20M rows; Company is fewer than 50.
I'm already 30 minutes into running this update which is far beyond what will be acceptable. The usage meter on the Pool is hovering around 40%.
For reference, our Azure VM runs this in about 2 minutes.
I load this table via the bulk copy and this update is already beyond what it took to load the entire table.
Any suggestions on speeding up this (and other) updates?
We are using an elastic pool at the P1 performance level.
Not sure ,how this translates your VM performance levels and what criteria you are using to compare both
I would recommend below steps ,since there is no execution plan provided ..
1.Check if there is any wait type ,while the update is running
select
session_id,
start_time,
command,
db_name(ec.database_id) as dbname,
blocking_session_id,
wait_type,
last_wait_type,
wait_time,
cpu_time,
logical_reads,
reads,
writes,
((database_transaction_log_bytes_used +database_transaction_log_bytes_reserved)/1024)/1024 as logusageMB,
txt.text,
pln.query_plan
from sys.dm_exec_requests ec
cross apply
sys.dm_exec_sql_text(ec.sql_handle) txt
outer apply
sys.dm_exec_query_plan(ec.plan_handle) pln
left join
sys.dm_tran_database_transactions trn
on trn.transaction_id=ec.transaction_id
the wait type,provides you lot of info,which can be used to troubleshoot..
2.You can also use below query to see in parallel ,what is happening with the query
set statistics profile on
your update query
then run below query in a seperate window
select
session_id,physical_operator_name,
row_count,actual_read_row_count,estimate_row_count,estimated_read_row_count,
rebind_count,
rewind_count,
scan_count,
logical_read_count,
physical_read_count,
logical_read_count
from
sys.dm_exec_query_profiles
where session_id=your sessionid;
as per your question,there don't seems to be an issue with DTU.So i dont see much issue on that front..
Slow performance solved in one case:
I have recently had severe problems with slow Azure updates that made it nearly unusable. It was updating only 1000 rows in 1 second. So 1M rows was 1000 seconds. I believe this is due to logging in Azure, but I haven't done enough research to be certain. Opening a MS support incident went nowhere. I finally solved the issue using two techniques:
Copy the data to a temporary table and make updates in the temp table. So in the above case, try copying the 50 rows to a temp table & then back again after updates. No/Minimal logging in this case.
My copying back was still slow (I had a few 100K rows), and I create a clustered index on that table. Update duration dropped by a factor of 4-5.
I am using a S1-20DTU database. It is still about 5 times slower than a dedicated instance, but that is fantastic performance for the price.
The real answer to this issue is that SQL Azure will spill to the tempdb much faster than you would expect if you are used to using a well provisioned VM or physical machine.
You can tell that this is happening by recording the actual execution query plan. Look for the warning icon:
The popup will complain about the spill:
At any rate, if you see this, it is likely that you're trying to do too much in the statement.
The Microsoft support person suggested updating the statistics, but this did not change the situation for us.
What seems to be working is the traditional advice to break the inserts up into smaller batches.

Cassandra nodejs DataStax driver don't return newly added columns via prepared statement execution

After adding a pair of columns in schema, I want to select them via select *. Instead select * returns old set of columns and none new.
By documentation recommendation, I use {prepare: true} to smooth JavaScript floats and Cassandra ints/bigints difference (I don't really need the prepared statement here really, it is just to resolve this ResponseError : Expected 4 or 0 byte int issue and I also don't want to bother myself with query hints).
So on first execution of select * I had 3 columns. After this, I added 2 columns to schema. select * still returns 3 columns if is used with {prepare: true} and 5 columns if used without it.
I want to have a way to reliably refresh this cache or make cassandra driver prepare statements on each app start.
I don't consider restarting database cluster a reliable way.
This is actually an issue in Cassandra that was fixed in 2.1.3 (CASSANDRA-7910). The problem is that on schema update, the prepared statements are not evicted from the cache on the Cassandra side. If you are running a version less than 2.1.3 (which is likely since 2.1.3 was released last week), there really isn't a way to work around this unless you create another separate prepared statement that is slightly different (like extra spaces or something to cause a separate unique statement).
When running with 2.1.3 and changing the table schema, C* will properly evict the relevant prepared statements from the cache, and when the driver sends another query using that statement, Cassandra will respond with an 'UNPREPARED' message, which should provoke the nodejs driver to reprepare the query and resend the request for you.
On the Node.js driver, you can programatically clear the prepared statement metadata:
client.metadata.clearPrepared();

JDBC programms running long time performance issue

My program has an issue with Oracle query performance, I believe the SQL have good performance, because it returns quickly in SQLPlus.
But when my program has been running for a long time, like 1 week, the SQL query (using JDBC) becomes slower (In my logs, the query time is much longer than when I originally started the program). When I restart my program, the query performance comes back to normal.
I think it is could be something wrong with the way I use the preparedStatement, because the SQL I'm using does not use placeholders "?" at all. Just a complex select query.
The query process is done by a util class. Here is the pertinent code building the query:
public List<String[]> query(String sql, String[] args) {
Connection conn = null;
conn = openConnection();
conn.setAutocommit(true);
....
PreparedStatement preStatm = null;
ResultSet rs = null;
....//set preparedstatment arg code
rs = preStatm.executeQuery();
....
finally{
//close rs
//close prestatm
//close connection
}
}
In my case, the args is always null, so it just passes a query sql to this query method. Is that possible this way could slow down the DB query after program long time running? Or I should use statement instead, or just pass args with "?" in the SQL? How can I find out the root cause for my issue? Thanks.
Maybe problem in jdbc cache... oracle spec
Try to turn it off.
or try to reinit the driver some times (one time per day)
You first need to look into data that will help you see where you are spending most your time, guessing is not an option when performance tunning.
So I would recommend get solid data that pin points the layer presenting the issue (JAVA or DB).
For this I would suggest to look at AWR and ASH reports when the problem is most noticeable. Also collect data on the JVM (you can use JConsole and/or JVisualVM).
When first diagnosing bad performance I always do the "USE" method, Utilization, Saturation and Error.
So first, look for Errors in logs.
Then look for any resource becoming Saturated (CPUs, Memory etc...)
Finally Look at the Utilization of each resource, having a client server layout will make this easier, if this is not the case you will need to drill down to process level to know whether its Java or the DB.
Once you have collected this data you can direct your tunning efforts accordingly. Going this approach will only make you waste time and sometimes even mask problems or induce new ones.
You can come back later with this data and we can take a look!

Resources