Im new using Cassandra 3.11.4, and just installed it on a ubuntu VM, and following the instructions, I tried to change the cluster name on the .yaml config file, but when I save the doc, and go Start Cassandra, it throws a failure, and this happens with anything I change on the .yaml file, it just doesn't work the way documentation says it should(I located the Cassandra files in a location where my user has all permissions)
enter image description here. enter image description here
If I make no changes to the file, and Start Cassandra, it does successfully.
I found out that I can change the cluster name or listen address, or any other parameter listed on the .yaml file successfully after connecting the database and querying for example
update cluster_name from system.local where cluster_name='Test Cluster';
but that's not the point of having the .yaml conf file.
Does someone know why this happens?
I've had this issue even using other Cassandra versions, like 3.11.2
Thanks in advance.
In Cassandra you can't simply change some configuration parameters and except it to work after restart. cluster_name is not specific to the node. It is applicable to entire cluster. parameters like data_file_directories can be changed at node level.
And if you want to change the name of cluster it is whole different process. Refer below link
https://support.datastax.com/hc/en-us/articles/205289825-Change-Cluster-Name-
Related
I'm trying to learn Elassandra and am having an issue configuring it to my current Cassandra instance (I'm learning Cassandra as well).
I downloaded version 3.11.3 of Cassandra to my local computer. I didn't change anything except the cluster_name inside of cassandra.conf. It runs fine and I used bin/cqlsh to create a keyspace and a "user" table with a couple of rows for testing.
I followed the steps on the Elassandra integration page. I downloaded version 6.2.3.10 of Elassandra. I replaced the cassandra.yaml, cassandra-rackdc.properties and cassandra-topology.properties in the Elassandra conf with the ones from the Cassandra conf (I am assuming those last 2 are the "snitch configuration file" mentioned in the instructions but I'm not sure). I stopped my Cassandra instance and then ran the bin/cassandra -e f from my Elassandra directory.
When I run curl -X GET localhost:9200, the output seems to have my correct cluster name, etc.
However, if I run bin/cqlsh from my Elassandra directory and run describe keyspaces, the keyspace I created under Cassandra isn't there. I tried copying the data directory from Cassandra to Elassandra and that seemed to work, but I feel this can't possibly be the actual solution.
Can someone point me to what I am missing in regards to this configuration? With the steps being listed on the website, I'm sure there must be some dumb thing I'm missing.
Thanks in advance.
I am trying to change the location spark writes temporary files to. Everything I've found online says to set this by setting the SPARK_LOCAL_DIRS parameter in the spark-env.sh file, but I am not having any luck with the changes actually taking effect.
Here is what I've done:
Created a 2-worker test cluster using Amazon EC2 instances. I'm using spark 2.2.0 and the R sparklyr package as a front end. The worker nodes are spun up using an auto scaling group.
Created a directory to store temporary files in at /tmp/jaytest. There is one of these in each worker and one in the master.
Puttied into the spark master machine and the two workers, navigated to home/ubuntu/spark-2.2.0-bin-hadoop2.7/conf/spark-env.sh, and modified the file to contain this line: SPARK_LOCAL_DIRS="/tmp/jaytest"
Permissions for each of the spark-env.sh files are -rwxr-xr-x, and for the jaytest folders are drwxrwxr-x.
As far as I can tell this is in line with all the advice I've read online. However, when I load some data into the cluster it still ends up in /tmp, rather than /tmp/jaytest.
I have also tried setting the spark.local.dir parameter to the same directory, but also no luck.
Can someone please advise on what I might be missing here?
Edit: I'm running this as a standalone cluster (as the answer below indicates that the correct parameter to set depends on the cluster type).
As per the spark documentation it is clearly saying that if you have configured Yarn Cluster manager then it will be overwrite the spark-env.sh setting. Can you just check in Yarn-env or yarn-site file for the local dir folder setting.
"this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager."
source - https://spark.apache.org/docs/2.3.1/configuration.html
Mac env, spark-2.1.0, and spark-env.sh contains:
export SPARK_LOCAL_DIRS=/Users/kylin/Desktop/spark-tmp
Using spark-shell, it works.
Did you use the right format?
I have downloaded cassandra datastax-ddc-64bit-3.4.0.msi and installed it to WINDOWS 8 .I runs ok. But when i edit cassandra.yaml
cluster_name: 'Test Cluster'
to
cluster_name: 'MyCluster1'
then the services does not start.
I check error log in C:\Program Files\DataStax-DDC\logs\datastax_ddc_server-stdout.2016-04-04.log.
it shows
ERROR 09:08:34 Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Saved cluster name Test Cluster != configured name MYCLUSTER
at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:915) ~[apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.service.StartupChecks$8.execute(StartupChecks.java:297) ~[apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:106) ~[apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:169) [apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:551) [apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:680) [apache-cassandra-3.4.0.jar:3.4.0]
what is the problem with my cluster name ??? Am i missing something??? why there are 3 log files in Log directory?
The reason is, the saved cluster name value is not equal to the configured cluster name. Which means, Once if you start the cassandra for very first time, it will take the cluster name from yaml configuration file and save the cluster name in the column family local which is available in system keyspace. So if you try to restart it again for second time, it will try to retrieve the cluster name from the saved data (system.local) and compare it with yaml configration. So In your case you have created the cluster in the name of Test Cluster for the very first time, so the cassandra expects the cluster name as Test Cluster for every startup process. If you want to change the cluster name then this link explains the step more clearly. You can retrieve the saved cluster name by the following cql query,
select * from system.local;
After trying and found an answer on link
Empty the /var/lib/cassandra/data ,/var/lib/cassandra/commitlog
,/var/lib/cassandra/saved_caches directory and restart Casandra after
changing cluster name . This works very well in version 1.2.4 ,try
with your version .
It is recommended to use stable release of in envelopment product, if
not done with above use 1.2.5 or 1.2.4 instead.
I have a single node Cassandra cluster on EC2 (launched from a Datastax AMI) and I manually added a new node which is also backed by the same Datastax AMI after deleting data directory and modifying cassandra.yaml. I can see two nodes in the Nodes section of Opscenter but I see Opscenter agent is not installed in the new node (1 of 2 agents are connected). It looks like in the new node it has its own opscenter installation and that somehow conflicts with the opscenter installation in the first node? I guess I have to fix some configuration file of opscenter agent in the new node so that it can point to the opscenter installation of the first node? But I can't find where to modify.
Thanks!
It is stomp_interface section of /var/lib/datastax-agent/conf/address.yaml
I had to manually put stomp_interface into the configuration file. Also, I noticed that the process was looking for /etc/datastax-agent/address.yaml and never looked for /var/lib/datastax-agent/conf/address.yaml
Also, local_interface was not necessary to get things to work for me. YMMV.
I'm not sure where this gets set, or if this changed between agent versions at some point in time. FWIW, I installed both opscenter and the agents via packages.
I'm trying to use the sstableloader to load data into an existing Cassandra ring, but cant figure out how to actually get it to work. I'm trying to run it on a machine that has a running cassandra node on it, but when I run it I get an error saying that port 7000 is already in use, which is the port the running Cassandra node is using for gossip.
So does that mean I can only use sstableloader on a machine that is in the same network as the target cassandra ring, but isn't actually running a cassandra node?
Any details would be useful, thanks.
Played around with sstableloader, read the source code, and finally figured out how to run sstableloader on the same machine that hosts a running cassandra node. There are two key points to get this running. First you need to create a copy of the cassandra install folder for sstableloader. This is becase sstableloader reads the yaml file to figure out what ipaddress to use for gossip, and the existing yaml file is being used by Cassandra. The second point is that you'll need to create a new loopback ipaddress (something like 127.0.0.2) on your machine. Once this is done, change the yaml file in the copied Cassandra install folder to listen to this ipaddress.
I wrote a tutorial going more into detail about how to do this here: http://geekswithblogs.net/johnsPerfBlog/archive/2011/07/26/how-to-use-cassandrs-sstableloader.aspx
The Austin Cassandra Users Group just had a presentation on this:
http://www.slideshare.net/alex_araujo/etl-with-cassandra-streaming-bulk-loading/
I have used the sstableloader utility provided in cassandra-0.8.4 to successfully load the sstables into cassandra.From Some of the issues i have faced i have following tips
If you are running it on single machine,you have to create a copy the cassandra installation folder and have to run sstable-loader from this folder.Also change the listen address,rpc address also provide the ip address of running cassandra as seeds in cassandra.yaml file of this copied one.Check if the cluster name in both the cassandra.yaml file is same.
These sstables have to be in a directory whose name is the name of the keyspace
It requires a directory containing a cassandra.yaml configuration file in the classpath.
Note that the schema for the column families to be loaded should be defined beforehand
For Reference SEE: Using Cassandra SStableloader
For Reference SEE: Using Cassandra SStableloader for bulkloading the data into cassandra
http://ramuprograms.blogspot.com/2014/07/bulk-loading-data-into-cassandra-using.html
If you are looking to do this in Java see below utility class:
BulkWriterLoader
List<String> argList = new ArrayList<>();
argList.add("-v");
argList.add("-d");
argList.add(params.hosts);
argList.add("-f");
argList.add(params.cassYaml);
argList.add(params.fullpath);
LoaderOptions options = LoaderOptions.builder()
.parseArgs(argList.stream().toArray(String[]::new))
.build();
try
{
BulkLoader.load(options);
}
catch (BulkLoadException e)
{
e.printStackTrace();
}
...
The code will also generate the sstable files using the CQLSSTableWriter class.
Things improve and the whole procedure of using sstableloader is much easier including a easier way to generate sstables with CQLSSTableWriter.
For all the details:
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/tools/toolsBulkloader.html