How to Load schema from a file in cassandra - cassandra

I am trying to load schema to Cassandra server from a file. As suggested by some one, i tried sstable2json and json2sstable but i guess that imports and exports data files while i am trying to load the schema of the database only.Any suggestion on possible ways to do it ?
I am using Cassandra 1.2.

To get schema file go to directory where Cassandra resides ..not in bin directory within it
echo -e "use your_keyspace;\r\n show schema;\n" | bin/cassandra-cli -h your_listen_address(e.g.localhost) > mySchema.cdl
To load that file
bin/cassandra-cli -h localhost -f mySchema.cdl

Related

DSBulk CSV Load Failure to DataStax Astra Cassandra Database, missing file config.json

I am trying to load a csv into a database in DataStax Astra using the DSBulk tool.
Here is the command I ran minus the sensitive details:
dsbulk load -url D:\\App\\data.csv -k data -t data -b D:\\App\\secure-connect-myapp -u username -p password
Here is the error I get back:
Operation LOAD_20221206-004421-512000 failed: Invalid bundle: missing file config.json.
Here is the full log:
2022-12-06 00:44:21 INFO Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
2022-12-06 00:44:21 INFO A cloud secure connect bundle was provided: ignoring all explicit contact points.
2022-12-06 00:44:21 INFO A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM.
2022-12-06 00:44:21 INFO Operation directory: C:\Program Files\dsbulk-1.10.0\bin\logs\LOAD_20221206-004421-512000
2022-12-06 00:44:21 ERROR Operation LOAD_20221206-004421-512000 failed: Invalid bundle: missing file config.json.
java.lang.IllegalStateException: Invalid bundle: missing file config.json
at com.datastax.oss.driver.internal.core.config.cloud.CloudConfigFactory.createCloudConfig(CloudConfigFactory.java:114)
at com.datastax.oss.driver.api.core.session.SessionBuilder.buildDefaultSessionAsync(SessionBuilder.java:876)
at com.datastax.oss.driver.api.core.session.SessionBuilder.buildAsync(SessionBuilder.java:817)
at com.datastax.oss.driver.api.core.session.SessionBuilder.build(SessionBuilder.java:835)
at com.datastax.oss.dsbulk.workflow.commons.settings.DriverSettings.newSession(DriverSettings.java:560)
at com.datastax.oss.dsbulk.workflow.load.LoadWorkflow.init(LoadWorkflow.java:145)
at com.datastax.oss.dsbulk.runner.WorkflowThread.run(WorkflowThread.java:52)
The error says that config.json is missing, but it isn't. So I'm stuck. Unless it's looking somewhere other than in the bundle I specified, but the bundle definitely has the config.json file.
This error:
...
java.lang.IllegalStateException: Invalid bundle: missing file config.json
at com.datastax.oss.driver.internal.core.config.cloud.CloudConfigFactory.createCloudConfig(CloudConfigFactory.java:114)
...
indicates that the Java driver bundled with DSBulk is unable to connect to your Astra DB because it couldn't get the configuration details from the secure connect bundle.
Please make sure that the valid secure bundle ZIP is accessible to DSBulk. You need to provide the path to the ZIP file, not just the directory. For example:
$ dsbulk ... -b /path/to/secure-connect-db.zip ...
Please check the path in your command then try again. Cheers!
👉 Please support the Apache Cassandra community by hovering over the cassandra tag above and click on Watch tag. 🙏 Thanks!
In order for you to leverage DataStax Bulk Loader (aka DSBulk, in short), you would need to pass in the secure connect bundle (SCB) correctly. What I mean when I say is that you need either the fully qualified path or the relative path to the SCB file.
The correct command in your case would look like:
./dsbulk load -url 'D:\\App\\data.csv' -k data -t data -b 'D:\\App\\secure-connect-myapp.zip' -u username -p password
Note that -b option takes in the full SCB filename along with .zip file extension.
Other Resources:
Load data using DSBulk into DataStax Astra DB
-b command-line option reference
BONUS TIP: One could easily configure everything within a configuration file and leverage that. See documentation for additional details.

sstableloader remote bulk upload

I'm trying to figure out how to upload data from a snapshot and why I'm getting this error on the bulk upload.
The local machine is trying to connect to cassandra.mydomain.com. The cassandra.yaml is the yaml from the remote server. I'm getting the same error with and without specifying --conf-path
Thanks for any advice.
cassandra version 3.11.2
~/deploy/cassandra/bin/sstableloader -d cassandra.mydomain.com --conf-path /tmp/cassandra.yaml /local/.data/cassandra/data/test/timeserie_time_daily-dd247b092e883bffbfce8621eff3cc3e/snapshots/1634621703263
10:10:50.138 [main] DEBUG o.a.c.config.YamlConfigurationLoader - Loading settings from file:/tmp/cassandra.yaml
Exception in thread "main" org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the file with [file:///] for local files and [file://<server>/] for remote file
s. If you are executing this from an external tool, it needs to set Config.setClientMode(true) to avoid loading configuration.
at org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:80)
at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:100)
at org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:262)
at org.apache.cassandra.config.DatabaseDescriptor.toolInitialization(DatabaseDescriptor.java:180)
at org.apache.cassandra.config.DatabaseDescriptor.toolInitialization(DatabaseDescriptor.java:151)
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:53)
at o
rg.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48)
As the exception states, you need to provide the correct URL to the cassandra.yaml.
If you're using a YAML that's on the local machine, you need to prefix it with file:///. For example:
$ sstableloader -f file:///path/to/cassandra.yaml
-d node1
ks_name/table_name
If you're specifying a YAML that's on a remote machine, you need to prefix it with file://host/. For example:
$ sstableloader -f file://hostname_or_ip/path/to/cassandra.yaml
-d node1
ks_name/table_name

Changing data file directories Cassandra

I'm trying to change the Cassandra data, commit log and saved caches directories by defining a custom shell script for CASANDRA_INCLUDE. I'm modifying the properties in the script as follows :
***
data_file_directories = "/usr/pic1/kearanky/cassandra/data"
commitlog_directory = "/usr/pic1/kearanky/cassandra/commitlog"
saved_caches_directory: "/usr/pic1/kearanky/cassandra/saved_caches"
***
When I run cassandra I get the error "data_file_directories: command not found". How can I modify the directories correctly?
PS: I don't have write access to cassandra.yaml and can't create the default directories it uses.
referrer to this answer Make your own cassandra.yaml with your custom directories and then run cassandra with with -d flag and cassandra.config=directory
or set $CASSANDRA_HOME variable in your .bashrc and then run cassandra

Hive Tables are created from spark but are not visible in hive

From spark using:
DataFrame.write().mode(SaveMode.Ignore).format("orc").saveAsTable("myTableName")
Table is getting saved I can see using below command's hadoop fs -ls /apps/hive/warehouse\test.db' where test is my database name
drwxr-xr-x - psudhir hdfs 0 2016-01-04 05:02
/apps/hive/warehouse/test.db/myTableName
but when I trying to check tables in Hive I cannot view them either with command SHOW TABLES from hiveContext.
sudo cp /etc/hive/conf.dist/hive-site.xml /etc/spark/conf/
This worked for me in a Cloudera quick start Virtual Box.
You have to copy the hive-site.xml file (mine is located at /etc/hive/conf.dist/hive-site.xml) to Spark conf folder (mine is located at /etc/spark/conf/)
sudo cp /etc/hive/conf.dist/hive-site.xml /etc/spark/conf/
Restart Spark and it should work.
I think you need to run INVALIDATE METADATA; in the hive console to refresh the databases and view your new table.

running presto-cli warnning:SerDe org.apache.hadoop.hive.contrib.serde2.RegexSerDe does not exist

Deploy the presto on single node . when running Presto-cli ,i got following errors:
presto:default> select * from test1;
Query 20131116_233859_00005_5a2yh failed: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe org.apache.hadoop.hive.contrib.serde2.RegexSerDe does not exist)
hive is Operating normally. why prestodb was failed?
my profile:
export JAVA_HOME=/usr/java
export JRE_HOME=/usr/java/jre
export HADOOP_HOME=/usr/hadoop
export HIVE_HOME=/usr/hive
export PRESTO_HOME=/usr/presto
export CLASSPATH=:.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME:`find /usr/hadoop -name '*.jar' | grep -v 'test' | grep -v 'example' | perl -e '#jars=<STDIN>;chomp #jars; print join(":",#jars);'`:$PRESTO_HOME/lib:$HADOOP/lib:$HIVE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PRESTO_HOME/bin:$HIVE_HOME/bin
export HADOOP_HEAPSIZE=4096
I got it,when the table of hive was created by RegEx, the presto-cli running failed.
I had no idea. somebody help me please!
On Master node:
Place your RegexSerDe.jar into hive connectors plugin directory related to your hadoop distribution. (Ex: for hadoop2 distribution you may place JAR file to ../presto/plugin/hive-hadoop2/)
Make sure the RegexSerDe.jar file has correct ownership as other jar files present in this directory.
Restart the presto-server process. (sudo service presto-server restart). If this do not work, you may need to restart with launcher sudo /usr/lib/presto/bin/launcher restart
Repeat this on all slave nodes !
Followed the same process as you mentioned above getting same error any suggestions please.
Query failed: org/apache/hadoop/hive/serde2/SerDe
Using hive-hadoop2 version placed the serde jar in the following path
presto/plugin/hive-hadoop2/hive-serde-1.0.0.jar and restarted presto.
You have to put the SerDe jar inside the plugin directory (plugin/hive-cdh4).
I didn't use RegexSerDe but it worked for CSVSerDe.

Resources