Programmatic way of modifying cassandra.yaml - cassandra

We are creating a cassandra cluster at runtime and information like "cluster-name" and IP addresses of "seeds" are available only at runtime. Is there a java wrapper for cassandra.yaml that allows setters and getters for cassandra.yaml and saves it to disk? I understand that I can always create a wrapper myself but wanted to know if theres already one available.

Is there a java wrapper for cassandra.yaml that allows setters and getters for cassandra.yaml and saves it to disk?
Not that I am aware of. Although, that's a good idea for an OpenSource project!
I have done this a few different ways in the past. One is with a combination of Chef and consul-template. Essentially, your cassandra.yaml contains variable place-holders, which are filled by a combination of default attributes (Chef) and cluster-specific settings (consul-template) when your deployment recipe runs.
I have also done this with a Bash script using sed (for a couple of our non-Chef environments). This is an excerpt from a script I wrote to migrate a DataStax Enterprise installation to an Apache Cassandra (open source) install:
#!/bin/bash
cp /etc/dse/cassandra/cassandra.yaml /etc/cassandra/conf
#set GossipingPropertyFileSnitch in cassandra.yaml
sed -i 's/endpoint_snitch: com.datastax.bdp.snitch.DseDelegateSnitch/endpoint_snitch: GossipingPropertyFileSnitch/' /etc/cassandra/conf/cassandra.yaml
#set truststore location
sed -i 's/truststore: \/etc\/dse\/cassandra\//truststore: \/etc\/cassandra\/conf\//g' /etc/cassandra/conf/cassandra.yaml
#set keystore location
sed -i 's/keystore: \/etc\/dse\/cassandra\//keystore: \/etc\/cassandra\/conf\//g' /etc/cassandra/conf/cassandra.yaml
Essentially here, you're doing a regex-replace for specific yaml property settings. Specifically, I needed to update the snitch, and the locations of the keystore/truststore. It's not pretty, but it works.

Related

Where is yarn.nodemanager.log-dirs in spark?

I have looked into:
log4j2.properties in /etc/spark/conf
yarn-site.xml
yarn-env.sh (via YARN_LOG_DIR it is not getting set. In fact while running a job there is no env variable YARN_LOG_DIR in my executors)
log4j.properties in /etc/hadoop/conf
Where can I find and modify the yarn.nodemanager.log-dirs property?
To find this, we need to traverse some of Hadoop's source code:
yarn.nodemanager.log-dirs defaults to ${yarn.log.dir}/userlogs.
yarn.log.dir defaults to $HADOOP_LOG_DIR
$HADOOP_LOG_DIR defaults to ${HADOOP_HOME}/logs
So, have a look at $HADOOP_HOME/logs/userlogs to see whether you find something in there!
If you want to edit it, you can do either of the following:
edit $HADOOP_HOME
edit $HADOOP_LOG_DIR
add -Dyarn.log.dir=<your_chosen_value> to your spark application
add -Dyarn.nodemanager.log-dirs=<your_chosen_value> to your spark application

Cannot create a Dataproc cluster when setting the fs.defaultFS property?

This was already the object of discussion in previous post, however, I'm not convinced with the answers as the Google docs specify that it is possible to create a cluster setting the fs.defaultFS property. Moreover, even if possible to set this property programmatically, sometimes, it's more convenient to set it from command line.
So I wanted to know why the following option when passed to my cluster creation command does not work: --properties core:fs.defaultFS=gs://my-bucket? Please note I haven't included all parameters as I ran the command without the previous flag and it succeeded to create the cluster. However, when passing this, I get: "failed: Cannot start master: Insufficientnumber of DataNodes reporting."
If anyone managed to create a dataproc cluster by setting the fs.defaultFS that'd be great? Thanks.
It's true there are still known issues due to certain dependencies on actual HDFS; the docs were not intended to imply that setting fs.defaultFS to a GCS path at cluster-creation time would work, but to simply provide a convenient example of a property that appears in core-site.xml; in theory it would work to set fs.defaultFS to a different preexisting HDFS cluster, for example. I've filed a ticket to change the example in the documentation to avoid confusion.
Two options:
Just override fs.defaultFS at job-submission time using per-job properties
Workaround some of the known issues by setting fs.defaultFS explicitly using an initialization action instead of cluster properties.
Option 1 is better understood to work because cluster-level HDFS dependencies won't change. Option 2 works because most of the incompatibilities occur during initial startup only, and initialization actions run after the relevant daemons start up already. To override the setting in an init action, you'd use bdconfig:
bdconfig set_property \
--name 'fs.defaultFS' \
--value 'gs://my-bucket' \
--configuration_file /etc/hadoop/conf/core-site.xml \
--clobber

Load an additional config file aside from cassandra.yaml in Cassandra

Is it possible to have Cassandra load an additional config file with other properties, along with the default one? (/etc/cassandra/cassandra.yaml)
Thanks
No, you can change it to use a different file (by setting -Dcassandra.config=file://your/path), but only a single file. If you really want though, you can write a custom config loader (like the YamlConfigurationLoader) and set it with: -Dcassandra.config.loader=your.Loader.

why I got error "Could not retrieve endpoint rangs" when I run sstableloader?

I used the sstableloader many times successfully, but I got the following error:
[root#localhost pengcz]# /usr/local/cassandra/bin/sstableloader -u user -pw password -v -d 172.21.0.131 ./currentdata/keyspace/table
Could not retrieve endpoint ranges:
java.lang.IllegalArgumentException
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543)
at org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124)
at org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101)
at org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30)
at org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50)
at org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68)
at org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287)
at org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833)
at org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126)
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330)
... 2 more
I don't know whether this error is relative to one of cluster nodes' linux crash?
Any advice will be appreciated!
Are you running different version sstable loader than the version of your cluster? Looks like https://issues.apache.org/jira/browse/CASSANDRA-9324 if using the 2.1 loader on 2.0 cluster.
/usr/local/cassandra/bin/sstableloader -u user -pw password -v -d 172.21.0.131 ./currentdata/keyspace/table into this command as mention by you backup dir is as ./currentdata/keyspace/table, change the parent directory name of table dir to the keyspace name into which you are restoring, at the place of keyspace and also change the name of table dir as these two are cassandra preserved delimeters and sstableloader consider the parent dir of backup directory(here parent->keyspace and backup-dir->table) as a keyspace name, So it should be same as the keyspace name into which you are restoring the data. Apart from this please make sure your table name and keyspace name should not be as cassandra preserved delimiters.
I realise this is an old question but I'm posting the answer here for posterity. It looks like you're hitting CASSANDRA-10700.
TL;DR - When sstableloader tries to read the schema, it fails when it comes across a dropped collections column.
The problem only exists in the sstableloader utility and you can easily workaround it by getting a copy from Cassandra 2.1.13+ as documented here. Cheers!

How to add entries with slapadd in openldap replication(synrepl)

We have openldap replication with syncrepl, I don't know how to add slapadd entries into it.
On standalone it works fine. but when i add entries in one of the machine in replication, second machines fails to start slapd.
Thanks
Unfortunately slapadd doesn't write to the accesslog and thus the modifications aren't replicable. This is especially bad, because some attributes can't be modified via ldapadd.
If you only need ordinary attributes, use ldapadd instead.
UPDATE:
It looks like you can use the -w switch:
Write syncrepl context information. After all entries are added, the
contextCSN will be updated with the greatest CSN in the database.

Resources