why I got error "Could not retrieve endpoint rangs" when I run sstableloader? - cassandra

I used the sstableloader many times successfully, but I got the following error:
[root#localhost pengcz]# /usr/local/cassandra/bin/sstableloader -u user -pw password -v -d 172.21.0.131 ./currentdata/keyspace/table
Could not retrieve endpoint ranges:
java.lang.IllegalArgumentException
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543)
at org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124)
at org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101)
at org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30)
at org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50)
at org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68)
at org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287)
at org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833)
at org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126)
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330)
... 2 more
I don't know whether this error is relative to one of cluster nodes' linux crash?
Any advice will be appreciated!

Are you running different version sstable loader than the version of your cluster? Looks like https://issues.apache.org/jira/browse/CASSANDRA-9324 if using the 2.1 loader on 2.0 cluster.

/usr/local/cassandra/bin/sstableloader -u user -pw password -v -d 172.21.0.131 ./currentdata/keyspace/table into this command as mention by you backup dir is as ./currentdata/keyspace/table, change the parent directory name of table dir to the keyspace name into which you are restoring, at the place of keyspace and also change the name of table dir as these two are cassandra preserved delimeters and sstableloader consider the parent dir of backup directory(here parent->keyspace and backup-dir->table) as a keyspace name, So it should be same as the keyspace name into which you are restoring the data. Apart from this please make sure your table name and keyspace name should not be as cassandra preserved delimiters.

I realise this is an old question but I'm posting the answer here for posterity. It looks like you're hitting CASSANDRA-10700.
TL;DR - When sstableloader tries to read the schema, it fails when it comes across a dropped collections column.
The problem only exists in the sstableloader utility and you can easily workaround it by getting a copy from Cassandra 2.1.13+ as documented here. Cheers!

Related

Cassandra sstableloader failed on loading a table snapshot "Cannot connect"

using sstableloader load a new table from a keyspace snapshot on a different cluster, have an error
Steps to recreate:
create this table
cp snapshot files to a temp directory temp_dir.
sstableloader load ( error out )
Anybody know what the problem is? How can I fix it? Thank you.
Detail like :
sstableloader --nodes vm_cdb01 -u dba -p xxx /xxx/temp_dir/snapshot_directory
WARN 21:21:42,124 Small cdc volume detected at /cdc_raw; setting cdc_total_space_in_mb to 1773. You can override this in cassandra.yaml
WARN 21:21:42,302 Only 45.202GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
All host(s) tried for query failed (tried: vm-cdb01/10.28.60.76:9042 (com.datastax.driver.core.exceptions.TransportException: [vm-cdb01/xx.xxx.76] Cannot connect))
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: vm-cdb01/xx.xxx.76:9042 (com.datastax.driver.core.exceptions.TransportException: [vm-cdb01/xx.xxx.76] Cannot connect))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:233)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1424)
at com.datastax.driver.core.Cluster.init(Cluster.java:163)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:334)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:309)
at com.datastax.driver.core.Cluster.connect(Cluster.java:251)
at org.apache.cassandra.utils.NativeSSTableLoaderClient.init(NativeSSTableLoaderClient.java:73)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:159)
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:80)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48)
Exception in thread "main" org.apache.cassandra.tools.BulkLoadException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: vm-cdb01/xx.xxx.76:9042 (com.datastax.driver.core.exceptions.TransportException: [vm-cdb01/xx.xxx.76] Cannot connect))
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:93)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: vm-cdb01/xx.xxx.76:9042 (com.datastax.driver.core.exceptions.TransportException: [vm-cdb01/xx.xxx.76] Cannot connect))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:233)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1424)
at com.datastax.driver.core.Cluster.init(Cluster.java:163)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:334)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:309)
at com.datastax.driver.core.Cluster.connect(Cluster.java:251)
at org.apache.cassandra.utils.NativeSSTableLoaderClient.init(NativeSSTableLoaderClient.java:73)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:159)
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:80)
... 1 more
How anyone thought sstableloader was easy enough to run, to me, seemed quite ridiculous. It makes assumptions that should be covered by program switches. It has been a while since I have run sstableloader, but because of how it works, I ended up creating a shell script to do the work - especially if you wanted to copy, say, multiple tables from multiple keyspaces to different locations. At a high level, here is how my script runs the command:
sstableloader -u ${targetUser} -pw ${cassandraCopyTargetPassword} -d ${targetHost} pwd
Everything you supply is for the target. If you are running on a different port than the default, you'll need to specify "-p ####". I've noticed you have something "off" for your "-p" (port) value - like a directory path.
Now as for what it's actually loading, that's where I think the entire process falls apart - and someone should address it as it's ridiculous what the assumptions are (again, instead of switches).
sstableloader looks which directory you're in - that has to directly match up with the keyspace and table you're TARGET table will reside.
For example, on the source, let's assume I want to copy all of sstables from, say, the /opt/cassandra/data/sourceKeyspace/sourceTable directory BUT that would be mapped to the targetKeyspace/targetTable on the TARGET system. I would need to create the directory that matches the targetKeyspace/targetTable somewhere on the source host. For example, I could create that directory as /tmp/targetKeyspace/targetTable (it doesn't have to be in /tmp, but it's as good of a place as any). I would then change directories to that location, create soft links from that directory to all of the sstables in the /opt/cassandra/data/sourceKeyspace/sourceTable) and run the sstableloader supplying the name of the target directory created above (or pwd if you're sitting in the target directory, as I do with my script). Confusing to say the least.
Again, the idea that this was a good idea on how to make it work is beyond me. Anyway, hopefully this helps you get it working.

Azure Databricks - Can not create the managed table The associated location already exists

I have the following problem in Azure Databricks. Sometimes when I try to save a DataFrame as a managed table:
SomeData_df.write.mode('overwrite').saveAsTable("SomeData")
I get the following error:
"Can not create the managed table('SomeData'). The associated
location('dbfs:/user/hive/warehouse/somedata') already exists.;"
I used to fix this problem by running a %fs rm command to remove that location but now I'm using a cluster that is managed by a different user and I can no longer run rm on that location.
For now the only fix I can think of is using a different table name.
What makes things even more peculiar is the fact that the table does not exist. When I run:
%sql
SELECT * FROM SomeData
I get the error:
Error in SQL statement: AnalysisException: Table or view not found:
SomeData;
How can I fix it?
Seems there are a few others with the same issue.
A temporary workaround is to use
dbutils.fs.rm("dbfs:/user/hive/warehouse/SomeData/", true)
to remove the table before re-creating it.
This generally happens when a cluster is shutdown while writing a table. The recomended solution from Databricks documentation:
This flag deletes the _STARTED directory and returns the process to the original state. For example, you can set it in the notebook
%py
spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")
All of the other recommended solutions here are either workarounds or do not work. The mode is specified as overwrite, meaning you should not need to delete or remove the db or use legacy options.
Instead, try specifying the fully qualified path in the options when writing the table:
df.write \
.option("path", "hdfs://cluster_name/path/to/my_db") \
.mode("overwrite") \
.saveAsTable("my_db.my_table")
For a more context-free answer, run this in your notebook:
dbutils.fs.rm("dbfs:/user/hive/warehouse/SomeData", recurse=True)
Per Databricks's documentation, this will work in a Python or Scala notebook, but you'll have to use the magic command %python at the beginning of the cell if you're using an R or SQL notebook.
I have the same issue, I am using
create table if not exists USING delta
If I first delete the files lie suggested, it creates it once, but second time the problem repeats, It seems the create table not exists does not recognize the table and tries to create it anyway
I don't want to delete the table every time, I'm actually trying to use MERGE on keep the table.
Well, this happens because you're trying to write data to the default location (without specifying the 'path' option) with the mode 'overwrite'.
Like said Mike you can set "spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation" to "true", but this option was removed in Spark 3.0.0.
If you try to set this option in Spark 3.0.0 you will get the following exception:
Caused by: org.apache.spark.sql.AnalysisException: The SQL config 'spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation' was removed in the version 3.0.0. It was removed to prevent loosing of users data for non-default value.;
To avoid this problem you can explicitly specify the path where you're going to save with the 'overwrite' mode.

How to get Cassandra database dump with data

I need to get a dump(with data) from remote Cassandra database. I was able to get database schema via following command.How can i get all data in the keyspace?
I'm using Cassandra 1.1.9
echo -e "connect localhost/9260;\r\n use PWC_Keyspace;\r\n show schema;\n" | bin/cassandra-cli -h localhost -port 9260 > dilshan.cdl
With Cassandra 1.1.9, I don't believe you have access to cqlsh with the copy-to command, so you'll be stuck with 2 options.
1) Export the data from the data files (sstables) on disk using sstable2json, or
2) Write a program to iterate over every row and copy/serialize it to a format you find easier to work with.
You MAY be able to use a more recent cqlsh (say, from 2.0, which still used thrift instead of the native interface), and point it at your 1.1.9 server and use 'COPY TO' to export each table to a csv. However, the COPY command in cqlsh for 2.0 doesn't use paging, and cassandra 1.1.19 doesn't support paging, so there's a very good chance it's simply going to time out and fail.

InvalidRequestException Keyspace keyspace1 does not exist

I'm trying to connect to a Datastax Community Edition server 2.1.2 via JDBC but I keep getting the following error no matter what I try to do, even when issuing a very basic command like select * from system_traces.events;
InvalidRequestException(why:Keyspace 'keyspace1' does not exist)
Issuing that same command via cqlsh works properly, so it seems to be a JDBC issue.
InvalidRequestException(why:Keyspace 'keyspace1' does not exist)
at org.apache.cassandra.cql.jdbc.CassandraConnection.<init>(CassandraConnection.java:229):229
at org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:92):92
at java.sql.DriverManager.getConnection(DriverManager.java:664):664
at java.sql.DriverManager.getConnection(DriverManager.java:270):270
at railo.commons.db.DBUtil.getConnection(DBUtil.java:109):109
at railo.runtime.db.DatasourceConnectionPool.loadDatasourceConnection(DatasourceConnectionPool.java:89):89
at railo.runtime.db.DatasourceConnectionPool.getDatasourceConnection(DatasourceConnectionPool.java:81):81
at railo.runtime.db.DatasourceManagerImpl.getConnection(DatasourceManagerImpl.java:65):65
at railo.runtime.tag.Query.executeDatasoure(Query.java:696):696 ...
Any ideas? TIA!
InvalidRequestException(why:Keyspace 'keyspace1' does not exist)
This exception means you are trying to query for a keyspace (in this case "Keyspace1") that hasn't yet been added to Cassandra. Try creating the keyspace before querying it.
You're probably doing a select (SELECT * FROM "Keyspace1"."Standard1") that you're not seeing or passing initialisation parameters to JDBC telling it to connect to Keyspace1. Verify that your code isn't looking for the non-existent keyspace by searching through the queries you have, specifically looking for Keyspace1 (or "Keyspace1" since in this case the keyspace name is case-sensitive).
On a side-note, "Keyspace1"."Standard1" tend to be the standard ks.cf pair used for cassandra examples so it would be good to scan your code for them to make sure that they are created before they are queried.

Unable to create keyspace in cassandra-cli

I have a simple single node cassandra setup (1.1.0) (default settings). Whenever I try to create a keyspace in cassandra-cli, I get the error:
[default#unknown] create keyspace tax;
org.apache.thrift.transport.TTransportException
In cassandra server log, the exception stacktrace:
ERROR 12:15:04,722 Exception in thread Thread[MigrationStage:1,5,main]
java.lang.AssertionError
at org.apache.cassandra.db.DefsTable.updateKeyspace(DefsTable.java:441)
at org.apache.cassandra.db.DefsTable.mergeKeyspaces(DefsTable.java:339)
at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:269)
at org.apache.cassandra.service.MigrationManager$1.call(MigrationManager.java:214)
I tried deleting the contents in ./var/lib/cassandra/data and restarting the server and my mac, but still ending up with same issue.
Looks like the system keyspace was corrupted. Removing the data files from
/var/lib/cassandra/data
/var/lib/cassandra/commitlog
/var/lib/cassandra/saved_caches
and restarting the cassandra server fixed the issue. (The above directories are defined in $CASSANDRA_HOME/conf/cassandra.yaml)
Following is the flow while adding the keyspace to Cassandra.(As per comments in Cassandrda source code. Correct me if I am getting it wrong)
1) At first step it check if any new keyspaces were added.
2) At second step we check if there were any keyspaces re-created, in this context
re-created means that they were previously deleted but still exist in the low-level schema as empty keys
3) At final step we updating modified keyspaces and saving keyspaces drop them later.
While modifying Keyspace it calls to function "updateKeyspace" and here it seems if the keyspace metadata is corrupt it throws assertion error.
SO in your case it might be that you have deleted the same Keyspace and trying to recreate which was causing this issue or as you mentioned It was a Metadata corruption.

Resources