Can pulsar geo-replicate among different release versions, for example 2.8.x and 2.7.x? - apache-pulsar

Apache Pulsar supports geo-replication between clusters in different regions. I am wondering if there are any compatibility issues between clusters running different version of Pulsar.
The question is for planning purposes, so that I know whether I need to upgrade one of the clusters or not.

Geo-replication is compatible between all clusters running the same major version of Pulsar, e.g., 2.x

Related

It is possible to upgrade pulsar broker without upgrading bookkeeper?

I want to upgrade the Pulsar brokers in a pulsar cluster (from 2.6.3 to 2.10.1)
The question is could I just upgrade the brokers to 2.10.1 (and leave other components (Bookkeeper as well ZooKeeper) in 2.6.3)
(Asking this because according to this https://pulsar.apache.org/docs/administration-upgrade, I am not sure if I also need to upgrade BookKeeper or not)
Thank you !
2.6.3 contains a very old version of ZooKeeper (3.5.x) and from Pulsar 2.8.x onwards we require ZooKeeper 3.6.x because Pulsar uses the Persistent Recursive Watches feature.
I suggest to upgrade ZooKeeper and BookKeeper as well, at least to Pulsar 2.8.x.
As a general rule of thumb in Pulsar is that we support rolling upgrades from one major version to the next (so from 2.6 to 2.7...).
Jumping from 2.6 to 2.10 is not supported officially, but it should work.

Migrate Data from one Riak cluster to another

I have a situation where we need to migrate data from one Riak cluster to another and then remove the old cluster. The ring size will be same, even the region will be the same. We need to do this to upgrade the instances to AL2. Is there a clean approach to do so on Prod, without realtime data loss?
The answer to this may be tied to your version of Riak KV. If you have the open source version of Riak KV 2.2.3 or earlier, this will require an in-situ upgrade to Riak KV 2.2.6 before progressing. See https://www.tiot.jp/riak-docs/riak/kv/2.2.6/setup/upgrading/version/ with packages at https://files.tiot.jp/riak/kv/2.2/2.2.6/
For an Enterprise Editions of Riak KV 2.2.3 and earlier or the open source edition of Riak KV 2.2.6 or higher, you can use multi-data centre replication (MDC).
Use both of these at the same time for proper replication and to prevent data loss:
fullsync replication will copy across all stored data on its first run and then any missing data on subsequent runs.
realtime replication will replicate all transactions in almost realtime.
If you then set this up as bidirectional replication (get each cluster to replicate to the other for both fullsync and realtime) then you will be able to seemlessly switch your production environment from one cluster to the other without any issues. Once you are happy everything is working as expected, you can kill the old cluster.
Please see the documentation for replication at https://www.tiot.jp/riak-docs/riak/kv/2.2.6/using/cluster-operations/v3-multi-datacenter/

Kafka using Docker for production clusters

We need to build a Kafka production cluster with 3-5 nodes in cluster ,
We have the following options:
Kafka in Docker containers (Kafka cluster include zookeeper and schema registry on each node)
Kafka cluster not using docker (Kafka cluster include zookeeper and schema registry on each node)
Since we are talking on production cluster we need good performance as we have high read/write to disks (disk size is 10T), good IO performance, etc.
So does Kafka using Docker meet the requirements for productions clusters?
more info - https://www.infoq.com/articles/apache-kafka-best-practices-to-optimize-your-deployment/
It can be done, sure. I have no personal experience with it, but if you don't otherwise have experience managing other stateful containers, I'd suggest avoiding it.
As far as "getting started" with Kafka in containers, Kubernetes is the most documented way, and Strimzi (free, optional commercial support by Lightbend) or Confluent Operator (commercial support by Confluent) can make this easy when using Kubernetes or Openshift. Or DC/OS offers a Kafka service over Mesos/Marathon. If you don't already have any of these services, then I think it's apparent that you should favor not using containers.
Bare metal or virtualized deployments would be much easier to maintain than hand-deployed containerized ones, from what I have experienced. Particularly for logging, metric gathering, and statically assigned Kafka listener mappings over the network. Confluent provides Ansible scripts for doing deployments to such environments
That isn't to say there's companies that have been successful at it, or at least tried. IBM, RedHat, and Shopify immediately pop up in my searches, for example
Here's a few talk about things to consider when Kafka is in containers
https://www.confluent.io/kafka-summit-london18/kafka-in-containers-in-docker-in-kubernetes-in-the-cloud
https://kafka-summit.org/sessions/running-kafka-kubernetes-practical-guide/

YCSB for Cassandra 3.0 Benchmarking

I have a cassandra ubuntu visual cluster and need to benchmark it.
I try to do it with yahoo's ycsb (without use of maven if possible).
I use cassandra 3.0.1 but I cant find a suitbale version of ycsb.
I dont want to change to an oldest version of cassandra (ycsb latest cassandra-binding is for cassandra 2.x)
What should I do?
As suggested here, despite Cassandra 3.x is not officially supported, you can use the cassandra-cql binding.
For instance:
/bin/ycsb load cassandra-cql -threads 4 -P workloads/workloada
I just tested it on Cassandra 3.11.0 and it works for both load and run.
That said, the benchmark software to use depends on your test schedule. If you want to benchmark only Cassandra, then #gsteiner 's solution might be the best. If you want to benchmark different databases using the same tool to avoid variability, then YCSB is the right one.
I would recommend using Cassandra-stress to perform a load/performance test on your Cassandra cluster. It is very customizable, to the point that you can test distributions with different data models as well as specify how hard you want to push your cluster.
Here is a link to the Datastax documentation for it that goes into how to use the tool in depth.
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html

Mixing Datastax Enterprise with Cassandra community

I'm experimenting with Datastax Enterprise and I'm trying to have a cluster that mixes Enterprise nodes and standard Cassandra community nodes. I would only need a few nodes with advanced features like Solr and it would be nice to have all the nodes in the same cluster.
I tried to bootstrap a community node to a test Enterprise cluster, and it couldn't join the ring properly, throwing exceptions like that:
Unable to find compaction strategy class
'com.datastax.bdp.hadoop.cfs.compaction.CFSCompactionStrategy'
I assume that the Enterprise node tries to replicate CFs that have features from DSE, which are not recognized by the community node.
Is there a way to prevent that from happening? Am I trying to do something that's not possible/supported/allowed by DSE?
That is an unsupported configuration. The full cluster needs to be installed with DataStax enterprise binaries on all nodes. You can choose which nodes run as vanilla Cassandra, Hadoop or Solr by startup options on each node. DSE has a custom compaction strategy and snitch so that error is expected.

Resources