We have 2 kubernetes cluster where we are testing hazelcast.
In first we have 3 members using replicated map with 200 entries in each map.
Reading from this map is almost 0s:
{"#timestamp":"2022-09-05T15:45:05.49+02:00","#version":"1","message":"Member: [x.xxx.xx.xxx]:5701","logger_name":"HazelcastService","thread_name":"http-nio-8080-exec-2","level":"INFO","level_value":20000,"SERVERNAME":"app-c655c54bf-zh4gj"}
{"#timestamp":"2022-09-05T15:45:05.499+02:00","#version":"1","message":"Member: [x.xxx.xx.xxx]:5701","logger_name":"HazelcastService","thread_name":"http-nio-8080-exec-2","level":"INFO","level_value":20000,"SERVERNAME":"app-c655c54bf-zh4gj"}
{"#timestamp":"2022-09-05T15:45:05.505+02:00","#version":"1","message":"Member: [x.xxx.xx.xxx]:5701","logger_name":"HazelcastService","thread_name":"http-nio-8080-exec-2","level":"INFO","level_value":20000,"SERVERNAME":"app-c655c54bf-zh4gj"}
In second we have 1 member using imap (it is because to exclude some performance issue)
and when with just half amount of data reading all entries takes over 10s+
Why it takes so long ? Is there some options how to make it faster ?
configuration:
user-code-deployment:
enabled: true
class-cache-mode: ETERNAL
provider-mode: LOCAL_AND_CACHED_CLASSES
serialization:
compact-serialization:
enabled: true
registered-classes:
- class: com.my.CompactEdit
type-name: edit
serializer: com.my.CompactEditSerializer
Related
I'm trying to emulate the stream drawing from Sources.mapJournal through IMap which receives data from IoT device. The processing of this stream is too slow and I'm getting the big accumulated outcome after 30-60 seconds.
When I started to update the IMap frequently with small data (12 KB per value), the exception is:
com.hazelcast.ringbuffer.StaleSequenceException: sequence:123 is too small and data store is disabled.
I increased the default capacity of IMap journal 10 times. It became stable after that, but very slow. A similar issue is when I'm updating the IMap with big values (about 1.2 MB per 5 seconds). Additionally I have several connected IoT devices and each of them has its own Jet job with the same pipeline:
StreamStage<TagPosition> sourceSteam =
p.drawFrom(Sources.<TagPosition, String, TagPosition>mapJournal(
Constants.IMAP_TAGS_POSITIONS_BUFFER,
Util.mapPutEvents().and(entry -> ((String) entry.getKey()).startsWith(instanceNumber)),
Util.mapEventNewValue(),
JournalInitialPosition.START_FROM_OLDEST));
// Drain to SmartMap
sourceSteam.drainTo(SmartMapSinks.newTagPositionSink(instanceNumber));
Thanks in advance!
UPD:
The journal size is EventJournalConfig.DEFAULT_CAPACITY * 10 = 100
000 (1 partition)
Jet version is 0.7.2
Serialazable classes
implements com.hazelcast.nio.serialization.IdentifiedDataSerializable
The issue was in the using of single IMap (and map journal) by multiple jobs. Map journal was producing events like a batch with delay, but not as stream.
Solved. Thanks!
I am using "node-rdkafka" npm module for our distributed service architecture written in Nodejs. We have a use case for metering where we allow only a certain amount of messages to be consumed and processed every n seconds. For example, a "main" topic has 100 messages pushed by a producer and "worker" consumes from main topic every 30 seconds. There is a lot more to the story of the use case.
The problem I am having is that I need to progamatically get the lag of a given topic(all partitions).
Is there a way for me to do that?
I know that I can use "bin/kafka-consumer-groups.sh" to access some of the data I need but is there another way?
Thank you in advance
You can retrieve that information directly from your node-rdkafka client via several methods:
Client metrics:
The client can emit metrics at defined interval that contain the current and committed offsets as well as the end offset so you can easily calculate the lag.
You first need to enable the metrics events by setting for example 'statistics.interval.ms': 5000 in your client configuration. Then set a listener on the event.stats events:
consumer.on('event.stats', function(stats) {
console.log(stats);
});
The full stats are documented on https://github.com/edenhill/librdkafka/wiki/Statistics but you probably are mostly interested in the partition stats: https://github.com/edenhill/librdkafka/wiki/Statistics#partitions
Query the cluster for offsets:
You can use queryWatermarkOffsets() to retrieve the first and last offsets for a partition.
consumer.queryWatermarkOffsets(topicName, partition, timeout, function(err, offsets) {
var high = offsets.highOffset;
var low = offsets.lowOffset;
});
Then use the consumer's current position (position()) or committed (committed()) offsets to calculate the lag.
Kafka exposes "records-lag-max" mbean which is the max records in lag for a partition via jmx, so you can get the lag querying this mbean
Refer to below doc for the exposed jmx mbean in detail .
https://docs.confluent.io/current/kafka/monitoring.html#consumer-group-metrics
The debug.log files for one of our Cassandra 3.10 clusters has frequent messages similar to “FailureDetector.java:457 - Ignoring interval time of…”
The messages appear even if the cluster is idle. I see the messages at a rate of about 1 per second on each node of this 6 node cluster (3 nodes each in two data centers).
Can someone tell me what causes the messages and if they are something to be concerned about?
We have a couple of other small clusters supporting the same application (different environments) and I see this message much less often (days apart).
The FailureDetector is responsible of deciding if a node is considered UP or DOWN.
The gossip process tracks state from other nodes both directly (nodes
gossiping directly to it) and indirectly (nodes communicated about
secondhand, third-hand, and so on). Rather than have a fixed threshold
for marking failing nodes, Cassandra uses an accrual detection
mechanism to calculate a per-node threshold that takes into account
network performance, workload, and historical conditions. During
gossip exchanges, every node maintains a sliding window of
inter-arrival times of gossip messages from other nodes in the
cluster.
Here you can find the source code, which gives you the log message. It is set to DEBUG level because they may be helpful in tracking down the actual issue causing the latency, but don't indicate a problem on their own.
In other words: your node measures the acknowledgement latency for each gossip message sent to the other nodes e.g: X nanosec for IP address1, Z nanosec for IP address2, etc. If eitherX or Y is above the expected 2 sec threshold as stated in MAX_INTERVAL_IN_NANO, it will get reported.
Problems, which can cause this log message:
Huge load on the node(s): e.g too many large partitions
High pressure: e.g. too many queries in sort period of time
Bad network connection
The extra FailureDetector logging was added with this:
Expose phi values from failure detector via JMX and tweak debug
and trace logging (CASSANDRA-9526)
and also I found this open issue, might be related to your problem:
The failure detector becomes more sensitive when the network is flakey(CASSANDRA-9536)
Also I find this article about Gossiping and Failure Detection very useful.
I'm currently exploring CouchDB replication and trying to figure out the difference between max_replication_retry_count and retries_per_request configuration options in [replicator] section of configuration file.
Basically I want to configure continuous replication of local couchdb to the remote instance that would never stop replication attempts, considering potentially continuous periods of being offline(days or even weeks). So, I'd like to have infinite replication attempts with maximum retry interval of 5 minutes or so. Can I do this? Do I need to change default configuration to achieve this?
Here's the replies I've got at CouchDB mailing lists:
If we are talking Couch 1.6, the attribute retries_per_request
controls a number of attempts a current replication is going to do to
read _changes feed before giving up. The attribute
max_replication_retry_count controls a number of attempts the whole replication job is going to be retried by a replication manager.
Setting this attribute to “infinity” should make the replicaton
manager to never give up.
I don’t think the interval between those attempts is configurable. As
far as I understand it’s going to start from 2.5 sec between the
retries and then double until reached 10 minutes, which is going to be
hard upper limit.
Extended answer:
The answer is slightly different depending if you're using 1.x/2.0
releases or current master.
If you're using 1.x or 2.0 release: Set "max_replication_retry_count =
infinity" so it will always retry failed replications. That setting
controls how the whole replication job restarts if there is any error.
Then "retries_per_request" can be used to handle errors for individual
replicator HTTP requests. Basically the case where a quick immediate
retry succeeds. The default value for "retries_per_request" is 10.
After the first failure, there is a 0.25 second wait. Then on next
failure it doubles to 0.5 and so on. Max wait interval is 5 minutes.
But If you expect to be offline routinely, maybe it's not worth
retrying individual requests for too long so reduce the
"retries_per_request" to 6 or 7. So individual requests would retry a
few times for about 10 - 20 seconds then the whole replication job
will crash and retry.
If you're using current master, which has the new scheduling
replicator: No need to set "max_replication_retry_count", that setting
is gone and all replication jobs will always retry for as long as
replication document exists. But "retries_per_request" works the same
as above. Replication scheduler also does exponential backoffs when
replication jobs fail consecutively. First backoff is 30 seconds. Then
it doubles to 1 minute, 2 minutes, and so on. Max backoff wait is
about 8 hours. But if you don't want to wait 4 hours on average for
the replication to restart when network connectivity is restored, and
want to it be about 5 minutes or so, set "max_history = 8" in the
"replicator" config section. max_history controls how much history of
past events are retained for each replication job. If there is less
history of consecutive crashes, that backoff wait interval will also
be shorter.
So to summarize, for 1.x/2.0 releases:
[replicator] max_replication_retry_count = infinity
retries_per_request = 6
For current master:
[replicator] max_history = 8 retries_per_request = 6
Been using a 6GB dataset with each source record being ~1KB in length when I accidentally added an index on a column that I am pretty sure has a 100% cardinality.
Tried dropping the index from cqlsh but by that point the two node cluster had gone into a run away death spiral with loadavg surpassing 20 on each node and cqlsh hung on the drop command for 30 minutes. Since this was just a test setup, I shut-down and destroyed the cluster and restarted.
This is a fairly disconcerting problem as it makes me fear a scenario where a junior developer is on a production cluster and they set an index on a similar high cardinality column. I scanned through the documentation and looked at the options in nodetool but there didn't seem to be anything along the lines of "abort job or abort building index".
Test environment:
2x m1.xlarge EC2 instances with 2 Raid 0 ephemeral disks
Dataset was 6GB, 1KB per record.
My question in summary: Is it possible to abort the process of building a secondary index AND or possible to stop/postpone running builds (indexing, compaction) for a later date.
nodetool -h node_address stop index_build
See: http://www.datastax.com/docs/1.2/references/nodetool#nodetool-stop