DSBulk CSV Load Failure to DataStax Astra Cassandra Database, missing file config.json - cassandra

I am trying to load a csv into a database in DataStax Astra using the DSBulk tool.
Here is the command I ran minus the sensitive details:
dsbulk load -url D:\\App\\data.csv -k data -t data -b D:\\App\\secure-connect-myapp -u username -p password
Here is the error I get back:
Operation LOAD_20221206-004421-512000 failed: Invalid bundle: missing file config.json.
Here is the full log:
2022-12-06 00:44:21 INFO Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
2022-12-06 00:44:21 INFO A cloud secure connect bundle was provided: ignoring all explicit contact points.
2022-12-06 00:44:21 INFO A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM.
2022-12-06 00:44:21 INFO Operation directory: C:\Program Files\dsbulk-1.10.0\bin\logs\LOAD_20221206-004421-512000
2022-12-06 00:44:21 ERROR Operation LOAD_20221206-004421-512000 failed: Invalid bundle: missing file config.json.
java.lang.IllegalStateException: Invalid bundle: missing file config.json
at com.datastax.oss.driver.internal.core.config.cloud.CloudConfigFactory.createCloudConfig(CloudConfigFactory.java:114)
at com.datastax.oss.driver.api.core.session.SessionBuilder.buildDefaultSessionAsync(SessionBuilder.java:876)
at com.datastax.oss.driver.api.core.session.SessionBuilder.buildAsync(SessionBuilder.java:817)
at com.datastax.oss.driver.api.core.session.SessionBuilder.build(SessionBuilder.java:835)
at com.datastax.oss.dsbulk.workflow.commons.settings.DriverSettings.newSession(DriverSettings.java:560)
at com.datastax.oss.dsbulk.workflow.load.LoadWorkflow.init(LoadWorkflow.java:145)
at com.datastax.oss.dsbulk.runner.WorkflowThread.run(WorkflowThread.java:52)
The error says that config.json is missing, but it isn't. So I'm stuck. Unless it's looking somewhere other than in the bundle I specified, but the bundle definitely has the config.json file.

This error:
...
java.lang.IllegalStateException: Invalid bundle: missing file config.json
at com.datastax.oss.driver.internal.core.config.cloud.CloudConfigFactory.createCloudConfig(CloudConfigFactory.java:114)
...
indicates that the Java driver bundled with DSBulk is unable to connect to your Astra DB because it couldn't get the configuration details from the secure connect bundle.
Please make sure that the valid secure bundle ZIP is accessible to DSBulk. You need to provide the path to the ZIP file, not just the directory. For example:
$ dsbulk ... -b /path/to/secure-connect-db.zip ...
Please check the path in your command then try again. Cheers!
👉 Please support the Apache Cassandra community by hovering over the cassandra tag above and click on Watch tag. 🙏 Thanks!

In order for you to leverage DataStax Bulk Loader (aka DSBulk, in short), you would need to pass in the secure connect bundle (SCB) correctly. What I mean when I say is that you need either the fully qualified path or the relative path to the SCB file.
The correct command in your case would look like:
./dsbulk load -url 'D:\\App\\data.csv' -k data -t data -b 'D:\\App\\secure-connect-myapp.zip' -u username -p password
Note that -b option takes in the full SCB filename along with .zip file extension.
Other Resources:
Load data using DSBulk into DataStax Astra DB
-b command-line option reference
BONUS TIP: One could easily configure everything within a configuration file and leverage that. See documentation for additional details.

Related

Validate failed: invalid checksum for migration (Evolve )

I use evolve to automate my database changes and help keep those changes in sync across all my environments and development teams. Before I run the evolve is ok. But I am currently encountering errors in evolve, and the error information shows Validate failed: invalid checksum for migration. Below is the script I use.
C:\Users\HP\Desktop\MywamProject\evolve_2.4.0_Windows-64bit>evolve migrate mysql -c "User Id=root;password=root;Host=localhost;Port=3306;Database=saas_catalogdb;" -l "C:\\Users\\HP\\Desktop\\MywamProject\\mywam.saas.backend.api\\docker-database\\evolve\\catalogdb"
Executing Migrate...
Evolve initialized.
Validate failed: invalid checksum for migration: V120__Insert_into_sa_report_proforma_detail.sql.
Validate failed: invalid checksum for migration: V120__Insert_into_sa_report_proforma_detail.sql.
May I know which part I am getting wrong? Hope someone can guide me on how to solve this problem. Thanks.
You can fix this issue by repair the checksum of already applied migrations. So instead of you put the command as migrate, change it to repair
Example:
evolve repair mysql -c ...the rest of the command you need
Should be like this:
evolve repair mysql -c "User Id=root;password=root;Host=localhost;Port=3306;Database=saas_catalogdb;" -l "C:\\Users\\HP\\Desktop\\MywamProject\\mywam.saas.backend.api\\docker-database\\evolve\\catalogdb"
You can go to this link for more options on the commands and options:
https://evolve-db.netlify.app/configuration/options/

sstableloader remote bulk upload

I'm trying to figure out how to upload data from a snapshot and why I'm getting this error on the bulk upload.
The local machine is trying to connect to cassandra.mydomain.com. The cassandra.yaml is the yaml from the remote server. I'm getting the same error with and without specifying --conf-path
Thanks for any advice.
cassandra version 3.11.2
~/deploy/cassandra/bin/sstableloader -d cassandra.mydomain.com --conf-path /tmp/cassandra.yaml /local/.data/cassandra/data/test/timeserie_time_daily-dd247b092e883bffbfce8621eff3cc3e/snapshots/1634621703263
10:10:50.138 [main] DEBUG o.a.c.config.YamlConfigurationLoader - Loading settings from file:/tmp/cassandra.yaml
Exception in thread "main" org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the file with [file:///] for local files and [file://<server>/] for remote file
s. If you are executing this from an external tool, it needs to set Config.setClientMode(true) to avoid loading configuration.
at org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:80)
at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:100)
at org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:262)
at org.apache.cassandra.config.DatabaseDescriptor.toolInitialization(DatabaseDescriptor.java:180)
at org.apache.cassandra.config.DatabaseDescriptor.toolInitialization(DatabaseDescriptor.java:151)
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:53)
at o
rg.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48)
As the exception states, you need to provide the correct URL to the cassandra.yaml.
If you're using a YAML that's on the local machine, you need to prefix it with file:///. For example:
$ sstableloader -f file:///path/to/cassandra.yaml
-d node1
ks_name/table_name
If you're specifying a YAML that's on a remote machine, you need to prefix it with file://host/. For example:
$ sstableloader -f file://hostname_or_ip/path/to/cassandra.yaml
-d node1
ks_name/table_name

Can I execute presto CLI without specifying --server or --catalog

I would like to know where, if it is possible, I can configure default catalog and server values to use when executing the presto CLI.
Presto CLI info:
ls -lthr /opt/presto-server-0.169/presto
/opt/presto-server-0.169/presto -> presto-cli-0.169-executable.jar
And instead of executing:
/opt/presto-server-0.169/presto --server localhost:6666 --schema abc --catalog catalog-1
I would like to execute:
/opt/presto-server-0.169/presto
with it picking up localhost:6666 as my server and catalog-1 as my catalog. I would like to specify the schema once I make the connection.
Any help will be appreciated!
Thanks.
There is no such option to set host in console lazily. The server needs to be defined upfront by default localhost:8080 is used.
If you cannot pass proper arguments to the presto-cli and cannot use the default server host, you can change default values in presto-cli source code and compile your version.
You need to checkout project at github.
Change default values in ClientOptions.
Package jar for presto cli: cd presto-cli && mvn package
You can find a jar in target/presto-cli-0.201-SNAPSHOT.jar
For schema/catalog, you can define it in the console itself with USE command. The syntax as follows: USE [<catalog>.]<schema>.
Please note that with each version of presto you need also compile and maintain your own version of presto-cli, which might become a burden quite soon.

OpenMPI: ORTE was unable to reliably start one or more daemons

I've been at it for days but could not solve my problem.
I am running:
mpiexec -hostfile ~/machines -nolocal -pernode mkdir -p $dstpath where $dstpath points to current directory and "machines" is a file containing:
node01
node02
node03
node04
This is the error output:
Failed to parse XML input with the minimalistic parser. If it was not
generated by hwloc, try enabling full XML support with libxml2.
[node01:06177] [[6421,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 891
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
[node01:06177] 1 more process has sent help message help-errmgr-base.txt / failed-daemon-launch
[node01:06177] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Failed to parse XML input with the minimalistic parser. If it was not
generated by hwloc, try enabling full XML support with libxml2.
[node01:06181] [[6417,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 891
I have 4 machines, node01 to node04. In order to log into these 4 nodes, I have to first log in to node00. I am trying to run some distributed graph functions. The graph software is installed in node01 and is supposed to be synchronised to the other nodes using mpiexec.
What I've done:
Made sure all passwordless login are setup, every machine can ssh to any other machine with no issues.
Have a hostfile in the home directory.
echo $PATH gives /home/myhome/bin:/home/myhome/.local/bin:/usr/include/openmpi:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
echo $LD_LIBRARY_PATH gives
/usr/lib/openmpi/lib
This has previously worked before, but it just suddenly started giving these errors. I got my administrator to install fresh machines but it still gave such errors. I've tried doing it one node at a time but it gave the same errors. I'm not entirely familiar with command line at all so please give me some suggestions. I've tried reinstalling OpenMPI from source and from sudo apt-get install openmpi-bin. I'm on Ubuntu 16.04 LTS.
You should focus on fixing:
Failed to parse XML input with the minimalistic parser. If it was not
generated by hwloc, try enabling full XML support with libxml2.
[node01:06177] [[6421,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 891

Cassandra dead but pid file exists

I have novice to cassandra and tried my hands to install cassandra-2.1.2 on centos 7.0.
After complete installation execute cqlsh command and created few keyspace(s) and column family.
Which seems to me in first glance its working perfectly.
But later onwards i realized below issues:
1- when i execute "service cassandra status" command, i got below error:
Output:Cassandra dead but pid file exists.
I googled the above issue and found some links
http://www.datastax.com/support-forums/topic/dse-dead-but-pid-file-exists
https://baioradba.wordpress.com/2014/06/13/how-to-install-cassandra-on-centos-6-5/
and found that I had same configuration mentioned in above links but the same error still persists.
Please tell me the root cause and how to resolve it.
2- Second issue is in the cassandra.log file.
When I analysed the cassandra.log file there was an expection as :
Expecting URI in variable: [cassandra.config]. Please prefix the file with file:/// for local files or file://<server>/ for remote files. Aborting.
Below is the complete log:
12:01:40.816 [main] ERROR o.a.c.config.DatabaseDescriptor - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in variable: [cassandra.config]. Please prefix the file with file:/// for local files or file://<server>/ for remote files. Aborting.
at org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:73) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:84) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:158) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:133) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:110) [apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465) [apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) [apache-cassandra-2.1.3.jar:2.1.3]
Expecting URI in variable: [cassandra.config]. Please prefix the file with file:/// for local files or file://<server>/ for remote files. Aborting.
Fatal configuration error; unable to start. See log for stacktrace.
I again searched the same issue in google and but the links were not that useful as they contained the java class code for cassandra.config .
Again please tell the root cause and how to resolve it?
Thanks in advance.
rm /var/run/cassandra.pid
Run ps -ef | grep cassandra
Kill the pid of the cassandra process.
Start cassandra
fix this issue, Edit the cassandra-env.sh:
sudo vi /etc/cassandra/conf/cassandra-env.sh
increase heap size for cassandra .. this should resolve your issue
Check if you have enough memory to start cassandra service with this command:
cat /proc/meminfo
I was running Hortonworks VM with Virtualbox, and I had a lot of Hadoop components started which needed a lot of memory, so for me the solution was to stop unnecessary Hadoop components and add some extra memory to the virtual machine.
From https://github.com/apache/cassandra/blob/cassandra-2.1/examples/client_only/README.txt#L43-L49 :
cassandra.yaml can be on the classpath as is done here, can be
specified (by modifying the script) in a location within the classpath
like this: java -Xmx1G
-Dcassandra.config=/path/in/classpath/to/cassandra.yaml ... or can be retrieved from a location outside the classpath like this: ...
-Dcassandra.config=file:///path/to/cassandra.yaml ... or ... -Dcassandra.config=http://awesomesauce.com/cassandra.yaml ...
So you probably had a misconfigured startup option.
Remove the pid file. Try
rm /var/run/cassandra.pid

Resources