sstableloader remote bulk upload - cassandra

I'm trying to figure out how to upload data from a snapshot and why I'm getting this error on the bulk upload.
The local machine is trying to connect to cassandra.mydomain.com. The cassandra.yaml is the yaml from the remote server. I'm getting the same error with and without specifying --conf-path
Thanks for any advice.
cassandra version 3.11.2
~/deploy/cassandra/bin/sstableloader -d cassandra.mydomain.com --conf-path /tmp/cassandra.yaml /local/.data/cassandra/data/test/timeserie_time_daily-dd247b092e883bffbfce8621eff3cc3e/snapshots/1634621703263
10:10:50.138 [main] DEBUG o.a.c.config.YamlConfigurationLoader - Loading settings from file:/tmp/cassandra.yaml
Exception in thread "main" org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the file with [file:///] for local files and [file://<server>/] for remote file
s. If you are executing this from an external tool, it needs to set Config.setClientMode(true) to avoid loading configuration.
at org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:80)
at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:100)
at org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:262)
at org.apache.cassandra.config.DatabaseDescriptor.toolInitialization(DatabaseDescriptor.java:180)
at org.apache.cassandra.config.DatabaseDescriptor.toolInitialization(DatabaseDescriptor.java:151)
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:53)
at o
rg.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48)

As the exception states, you need to provide the correct URL to the cassandra.yaml.
If you're using a YAML that's on the local machine, you need to prefix it with file:///. For example:
$ sstableloader -f file:///path/to/cassandra.yaml
-d node1
ks_name/table_name
If you're specifying a YAML that's on a remote machine, you need to prefix it with file://host/. For example:
$ sstableloader -f file://hostname_or_ip/path/to/cassandra.yaml
-d node1
ks_name/table_name

Related

DSBulk CSV Load Failure to DataStax Astra Cassandra Database, missing file config.json

I am trying to load a csv into a database in DataStax Astra using the DSBulk tool.
Here is the command I ran minus the sensitive details:
dsbulk load -url D:\\App\\data.csv -k data -t data -b D:\\App\\secure-connect-myapp -u username -p password
Here is the error I get back:
Operation LOAD_20221206-004421-512000 failed: Invalid bundle: missing file config.json.
Here is the full log:
2022-12-06 00:44:21 INFO Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
2022-12-06 00:44:21 INFO A cloud secure connect bundle was provided: ignoring all explicit contact points.
2022-12-06 00:44:21 INFO A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM.
2022-12-06 00:44:21 INFO Operation directory: C:\Program Files\dsbulk-1.10.0\bin\logs\LOAD_20221206-004421-512000
2022-12-06 00:44:21 ERROR Operation LOAD_20221206-004421-512000 failed: Invalid bundle: missing file config.json.
java.lang.IllegalStateException: Invalid bundle: missing file config.json
at com.datastax.oss.driver.internal.core.config.cloud.CloudConfigFactory.createCloudConfig(CloudConfigFactory.java:114)
at com.datastax.oss.driver.api.core.session.SessionBuilder.buildDefaultSessionAsync(SessionBuilder.java:876)
at com.datastax.oss.driver.api.core.session.SessionBuilder.buildAsync(SessionBuilder.java:817)
at com.datastax.oss.driver.api.core.session.SessionBuilder.build(SessionBuilder.java:835)
at com.datastax.oss.dsbulk.workflow.commons.settings.DriverSettings.newSession(DriverSettings.java:560)
at com.datastax.oss.dsbulk.workflow.load.LoadWorkflow.init(LoadWorkflow.java:145)
at com.datastax.oss.dsbulk.runner.WorkflowThread.run(WorkflowThread.java:52)
The error says that config.json is missing, but it isn't. So I'm stuck. Unless it's looking somewhere other than in the bundle I specified, but the bundle definitely has the config.json file.
This error:
...
java.lang.IllegalStateException: Invalid bundle: missing file config.json
at com.datastax.oss.driver.internal.core.config.cloud.CloudConfigFactory.createCloudConfig(CloudConfigFactory.java:114)
...
indicates that the Java driver bundled with DSBulk is unable to connect to your Astra DB because it couldn't get the configuration details from the secure connect bundle.
Please make sure that the valid secure bundle ZIP is accessible to DSBulk. You need to provide the path to the ZIP file, not just the directory. For example:
$ dsbulk ... -b /path/to/secure-connect-db.zip ...
Please check the path in your command then try again. Cheers!
👉 Please support the Apache Cassandra community by hovering over the cassandra tag above and click on Watch tag. 🙏 Thanks!
In order for you to leverage DataStax Bulk Loader (aka DSBulk, in short), you would need to pass in the secure connect bundle (SCB) correctly. What I mean when I say is that you need either the fully qualified path or the relative path to the SCB file.
The correct command in your case would look like:
./dsbulk load -url 'D:\\App\\data.csv' -k data -t data -b 'D:\\App\\secure-connect-myapp.zip' -u username -p password
Note that -b option takes in the full SCB filename along with .zip file extension.
Other Resources:
Load data using DSBulk into DataStax Astra DB
-b command-line option reference
BONUS TIP: One could easily configure everything within a configuration file and leverage that. See documentation for additional details.

how do we copy file from hadoop to abfs remotely

how do we copy files from Hadoop to abfs (azure blob file system)
I want to copy from Hadoop fs to abfs file system but it throws an error
this is the command I ran
hdfs dfs -ls abfs://....
ls: No FileSystem for scheme "abfs"
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem not found
any idea how this can be done ?
In the core-site.xml you need to add a config property for fs.abfs.impl with value org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem, and then add any other related authentication configurations it may need.
More details on installation/configuration here - https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html
the abfs binding is already in core-default.xml for any release with the abfs client present. however, the hadoop-azure jar and dependency is not in the hadoop common/lib dir where it is needed (it is in HDI, CDH, but not the apache one)
you can tell the hadoop script to pick it and its dependencies up by setting the HADOOP_OPTIONAL_TOOLS env var; you can do this in ~/.hadoop-env; just try on your command line first
export HADOOP_OPTIONAL_TOOLS="hadoop-azure,hadoop-aws"
after doing that, download the latest cloudstore jar and use its storediag command to attempt to connect to an abfs URL; it's the place to start debugging classpath and config issues
https://github.com/steveloughran/cloudstore

Error while trying to cleanup Cassandra node

I am facing issue while launching cleanup command with nodetool.
Cleanup did work fine until now. I did'nt find any modification on my configuration. I have no clue on what could have change.
nodetool > cleanup
error: Expecting URI in variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the file with [file:///] for local files and [file://<server>/] for remote files. If you are executing this from an external tool, it needs to set Config.setClientMode(true) to avoid loading configuration.
-- StackTrace --
org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the file with [file:///] for local files and [file://<server>/] for remote files. If you are executing this from an external tool, it needs to set Config.setClientMode(true) to avoid loading configuration.
at org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:80)
at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:100)
at org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:262)
at org.apache.cassandra.config.DatabaseDescriptor.toolInitialization(DatabaseDescriptor.java:180)
at org.apache.cassandra.config.DatabaseDescriptor.toolInitialization(DatabaseDescriptor.java:151)
at org.apache.cassandra.tools.NodeProbe.checkJobs(NodeProbe.java:281)
at org.apache.cassandra.tools.NodeProbe.forceKeyspaceCleanup(NodeProbe.java:288)
at org.apache.cassandra.tools.nodetool.Cleanup.execute(Cleanup.java:55)
at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:255)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:169)
Any idea ?
Regards,
Nicolas
Nodetool uses cassandra.yaml to to find the number of concurrent compactors. Since you have cassandra.config set its using that cassandra.yaml, but the cassandra.config has an invalid value so the nodetool is choking on it.
I did find a solution that works for me (but i think it's specific to my prod environment).
Somewhere, something should have used /etc/alternative/cassandra symlink, which was heading to my empty default cassandra configuration.
After i correct it :
[root#myhost ~]# ll /etc/alternatives/
total 116
lrwxrwxrwx 1 root root 30 Oct 2 12:40 cassandra -> /etc/cassandra/my-cassandra-instance
That done, i was able to cleanup my cassandra clusters.
Didn't find time to work this case much more.
Anyway thanks for the clues, it was a context issue.
Nicolas

Changing data file directories Cassandra

I'm trying to change the Cassandra data, commit log and saved caches directories by defining a custom shell script for CASANDRA_INCLUDE. I'm modifying the properties in the script as follows :
***
data_file_directories = "/usr/pic1/kearanky/cassandra/data"
commitlog_directory = "/usr/pic1/kearanky/cassandra/commitlog"
saved_caches_directory: "/usr/pic1/kearanky/cassandra/saved_caches"
***
When I run cassandra I get the error "data_file_directories: command not found". How can I modify the directories correctly?
PS: I don't have write access to cassandra.yaml and can't create the default directories it uses.
referrer to this answer Make your own cassandra.yaml with your custom directories and then run cassandra with with -d flag and cassandra.config=directory
or set $CASSANDRA_HOME variable in your .bashrc and then run cassandra

Cassandra dead but pid file exists

I have novice to cassandra and tried my hands to install cassandra-2.1.2 on centos 7.0.
After complete installation execute cqlsh command and created few keyspace(s) and column family.
Which seems to me in first glance its working perfectly.
But later onwards i realized below issues:
1- when i execute "service cassandra status" command, i got below error:
Output:Cassandra dead but pid file exists.
I googled the above issue and found some links
http://www.datastax.com/support-forums/topic/dse-dead-but-pid-file-exists
https://baioradba.wordpress.com/2014/06/13/how-to-install-cassandra-on-centos-6-5/
and found that I had same configuration mentioned in above links but the same error still persists.
Please tell me the root cause and how to resolve it.
2- Second issue is in the cassandra.log file.
When I analysed the cassandra.log file there was an expection as :
Expecting URI in variable: [cassandra.config]. Please prefix the file with file:/// for local files or file://<server>/ for remote files. Aborting.
Below is the complete log:
12:01:40.816 [main] ERROR o.a.c.config.DatabaseDescriptor - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in variable: [cassandra.config]. Please prefix the file with file:/// for local files or file://<server>/ for remote files. Aborting.
at org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:73) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:84) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:158) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:133) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:110) [apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465) [apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) [apache-cassandra-2.1.3.jar:2.1.3]
Expecting URI in variable: [cassandra.config]. Please prefix the file with file:/// for local files or file://<server>/ for remote files. Aborting.
Fatal configuration error; unable to start. See log for stacktrace.
I again searched the same issue in google and but the links were not that useful as they contained the java class code for cassandra.config .
Again please tell the root cause and how to resolve it?
Thanks in advance.
rm /var/run/cassandra.pid
Run ps -ef | grep cassandra
Kill the pid of the cassandra process.
Start cassandra
fix this issue, Edit the cassandra-env.sh:
sudo vi /etc/cassandra/conf/cassandra-env.sh
increase heap size for cassandra .. this should resolve your issue
Check if you have enough memory to start cassandra service with this command:
cat /proc/meminfo
I was running Hortonworks VM with Virtualbox, and I had a lot of Hadoop components started which needed a lot of memory, so for me the solution was to stop unnecessary Hadoop components and add some extra memory to the virtual machine.
From https://github.com/apache/cassandra/blob/cassandra-2.1/examples/client_only/README.txt#L43-L49 :
cassandra.yaml can be on the classpath as is done here, can be
specified (by modifying the script) in a location within the classpath
like this: java -Xmx1G
-Dcassandra.config=/path/in/classpath/to/cassandra.yaml ... or can be retrieved from a location outside the classpath like this: ...
-Dcassandra.config=file:///path/to/cassandra.yaml ... or ... -Dcassandra.config=http://awesomesauce.com/cassandra.yaml ...
So you probably had a misconfigured startup option.
Remove the pid file. Try
rm /var/run/cassandra.pid

Resources