Is there a timeout setting in the cassandra.yaml file used to cause server-side timeouts when issuing a drop table command?
I'm using the following versions of software:
Cassandra database version: 3.11.2
Cassandra datastax java driver version: 3.4.0
I tried changing cassandra.yaml settings for write_request_timeout_in_ms, truncate_request_timeout_in_ms, and request_timeout_in_ms all to 10 ms and then issued a drop table statement via the datastax java driver. From my application logs I can see the statement takes about 2 seconds when measured from the client (client and database are all just on my local development machine and doing nothing else but this test) and finishes without a timeout.
I then executed the exact same test but replaced the "drop table" text in the statement with "truncate table" with no other changes and saw the expected timeout "com.datastax.driver.core.exceptions.TruncateException: Error during truncate: Truncate timed out - received only 0 responses".
I tried searching the Cassandra github project but couldn't find a reference in the code to see how the server side timeouts are applied so I am hoping someone knows the answer to this question.
Related
I have created a Cassandra database in DataStax Astra and am trying to load a CSV file using DSBulk in Windows. However, when I run the dsbulk load command, the operation never completes or fails. I receive no error message at all, and I have to manually terminate the operation after several minutes. I have tried to wait it out, and have let the operation run for 30 minutes or more with no success.
I know that a free tier of Astra might run slower, but wouldn't I see at least some indication that it is attempting to load data, even if slowly?
When I run the command, this is the output that is displayed and nothing further:
C:\Users\JT\Desktop\dsbulk-1.8.0\bin>dsbulk load -url test1.csv -k my_keyspace -t test_table -b "secure-connect-path.zip" -u my_user -p my_password -header true
Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
A cloud secure connect bundle was provided: ignoring all explicit contact points.
A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM.
Operation directory: C:\Users\JT\Desktop\dsbulk-1.8.0\bin\logs\LOAD_20210407-143635-875000
I know that DataStax recently changed Astra so that you need credentials from a generated Token to connect DSBulk, but I have a classic DB instance that won't accept those token credentials when entered in the dsbulk load command. So, I use my regular user/password.
When I check the DSBulk logs, the only text is the same output displayed in the console, which I have shown in the code block above.
If it means anything, I have the exact same issue when trying to run dsbulk Count operation.
I have the most recent JDK and have set both the JAVA_HOME and PATH variables.
I have also tried adding dsbulk/bin directory to my PATH variable and had no success with that either.
Do I need to adjust any settings in my Astra instance?
Lastly, is it possible that my basic laptop is simply not powerful enough for this operation or just running the operation crazy slow?
Any ideas or help is much appreciated!
I installed janus server (0.4) and cassandra (3.11) on my machine. They start correctly.
When I start the janus client to operate from the console
I run
:remote connect tinkerpop.server conf/remote.yaml
the connection is successful
then if I use this command
graph = JanusGraphFactory.open ('conf/janusgraph-cassandra.properties')
I get the following error message
WARN org.janusgraph.diskstorage.cassandra.thrift.CassandraThriftStoreManager - Cassandra Thrift protocol is deprecated and will be removed with JanusGraph 0.5.0. Please switch to the CQL backend.
Could not open global configuration
The warning is clear while the error that it cannot load the global configuration does not.
Analyzing the configuration file in question I noticed the following property:
storage.backend
This property sets the driver. By changing its value from:
cassandrathrift
to
CQL
everything works fine.
The warning should be an error if you need to use cql as a driver.
Instead, the message suggests that it is looking for a default configuration file.
It could be that, when using cassandrathrift as driver, some properties are not set and therefore look for their default value. At the moment I don't know in which path this default file should exist and how it should be done. Considering that the cassandrathrift driver is deprecated, I think it is a good solution.
In the same conf/ dir as the Thrift-based config file, you should also see a janusgraph-cql.properties file.
This file should already have storage.backend=cql set, as well as a few other parameters allowing you to connect to a local Cassandra instance running on 127.0.0.1 (without security enabled).
I have set server timeout in cassandra as 60 seconds and client timeout in cpp driver as 120 seconds.
I use Batch query which has 18K operations, I get the Request timed out error in cpp driver logs but in Cassandra server logs there is no TRACE available in spite of enabling ALL logs in Cassandra logback.xml
So how can I confirm that It is thrown from the server / client side in Cassandra?
BATCH is not intended to work that way. It’s designed to apply 6 or 7 mutations to different tables atomically. You’re trying to use it like it’s RDBMS counterpart (Cassandra just doesn’t work that way). The BATCH timeout is designed to protect the node/cluster from crashing due to how expensive that query is for the coordinator.
In the system.log, you should see warnings/failures concerning the sheer size of your BATCH. If you’ve modified them and don’t see that, you should see a warning about a timeout threshold being exceeded (I think BATCH gets its own timeout in 3.0).
If all else fails, run your BATCH statement (part of it) in cqlsh with tracing on, and you’ll see precisely why this is a bad idea (server side).
Also, the default query timeouts are there to protect your cluster. You really shouldn’t need to alter those. You should change your query/model or approach before looking at adjusting the timeout.
I want to know if there is a cqlsh query to check remote application connections in Cassandra DB, just like V$session in oracle, or processlists in mysql.
I don't think there is a cqlsh query to do that, but you can use cassandra java-diver to do a manual pooling. This link: http://docs.datastax.com/en/developer/java-driver/3.0/manual/pooling/#monitoring-and-tuning-the-pool, gives a simple example that will print the number of open connections, active requests, and maximum capacity for each host, every 5 seconds.
I'm using spark to create a table in the hive metastore, then connect with MSSQL to the Spark Thrift server to interrogate that table.
The table is created with:
df.write.mode("overwrite").saveAsTable("TableName")
The problem is that every time, after I overwrite the table (it's a daily job) when I connect with MSSQL I get an error. If I restart the Thrift Server it works OK but I want to automate this and restarting the server every time seems a bit extreme.
The most likely culprit is the Thrift cached metadata which is no longer valid after the table overwrite. How can I force Thrift to refresh the metadata after I overwrite the table, before it's accessed by any of the clients?
I can settle for a solution for MSSQL but there are other "clients" to the table, not just MSSQL. If I can force the metadata refresh from spark (or linux terminal), after I finish the overwrite, rather than ask each client to run a refresh command before it requests the data, I would prefer that.
Note:
spark.catalog.refreshTable("TableName")
Does not work for all clients, just for Spark
SQL REFRESH TABLE `TableName`;
Works for Qlick but again, if I ask each client to refresh it might mean extra work for Thrift and mistakes can happen (such as a dev forgetting to add the refresh).