Hazelcast - Error in reading cache with 2 million objects with apprx 500 requests/second read - hazelcast

We have apprx 2 million distributed data objects(not replicated) in cache of 10 nodes cluster (apprx 500 MB data). Backup count is one. We are seeing given below errors/warnings.
Do you guys know when I can see these errors? I have sanitize some logs to not share something sensitive. Majority of time we do cache read(around 400 request/second), and whole cache gets reinitialized every 2 hours.
I know that we can do replicated cache to improve performance, but wondering what's wrong going on here. When I run with smaller cluster(e.g. 5 nodes) then everything works fine.
Hazelcast version 3.6.3
Server size 8 core, 16 GB
Windows Server 2012 R2
IO Input thread count size is 30
IO Output thread count size is 50
2017-06-24 23:46:22.679 ERROR (hz._hzInstance_1_My-App.partition-operation.thread-5) [c.h.m.i.o.GetOperation] - [192.168.111.11]:5701 [My-App] [3.6.3] Cannot send response: HeapData{type=-2, hashCode=113248027, partitionHash=113248027, totalSize=722, dataSize=714, heapCost=742} to Address[192.168.111.13]:5701. Op: com.hazelcast.map.impl.operation.GetOperation{identityHash=1124265765, serviceName='hz:impl:mapService', partitionId=189, replicaIndex=0, callId=3490089, invocationTime=1498362385498 (Sat Jun 24 23:46:25 EDT 2017), waitTimeout=-1, callTimeout=8000, name=HKF/my-cache-id-3, name=HKF/my-cache-id-3}
com.hazelcast.spi.exception.ResponseNotSentException: Cannot send response: HeapData{type=-2, hashCode=113248027, partitionHash=113248027, totalSize=722, dataSize=714, heapCost=742} to Address[192.168.111.13]:5701. Op: com.hazelcast.map.impl.operation.GetOperation{identityHash=1124265765, serviceName='hz:impl:mapService', partitionId=189, replicaIndex=0, callId=3490089, invocationTime=1498362385498 (Sat Jun 24 23:46:25 EDT 2017), waitTimeout=-1, callTimeout=8000, name=HKF/my-cache-id-3, name=HKF/my-cache-id-3}
at com.hazelcast.spi.impl.operationservice.impl.RemoteInvocationResponseHandler.sendResponse(RemoteInvocationResponseHandler.java:54)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.sendResponse(OperationRunnerImpl.java:278)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.handleResponse(OperationRunnerImpl.java:251)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:173)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:393)
at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processPacket(OperationThread.java:184)

Why do you have such huge number of input and output threads (30/50). In most cases the default of 3+3 is more than sufficient. If you don't have 50+ connections; all these threads will be idle. Even with 50+ connections, you will not get good performance with so many IO threads.
The error you are seeing seems to indicate a networking issue: response can't be send. The big question is why this is happening.
Can you enable diagnostics:
http://docs.hazelcast.org/docs/latest-development/manual/html/Management/Diagnostics/Enabling_Diagnostics_Logging.html
And send the log files to peter at hazelcast dot com So I can have a look at it.

Related

Hazelcast causing heavy traffic between nodes

NOTE: Found the root cause in application code using hazelcast which started to execute after 15 min, the code retrieved almost entire data, so the issue NOT in hazelcast, leaving the question here if anyone will see same side effect or wrong code.
What can cause heavy traffic between Hazelcast (v3.12.12, also tried 4.1.1) 2 nodes ?
It holds maps with lot of data, no new entries are added/removed within that time, only map values are updated.
Java 11, Memory usage 1.5GB out of 12GB, no full GCs identified.
Following JFR the high IO is from:
com.hazelcast.internal.networking.nio.NioThread.processTaskQueue()
Below chart of Network IO, 15 min after start traffic jumps from 15 to 60 MB. From application perspective nothing changed after these 15 min.
This smells garbage collection, you are most likely to be running into long gc pauses. Check your gc logs, which you can enable using verbose gc settings on all members. If there are back-to-back GCs then you should do various things:
increase the heap size
tune your gc, I'd look into G1 (with -XX:MaxGCPauseMillis set to a reasonable number) and/or ZGC.

Cassandra-2.2.3 : Repeatedly facing "writing large partition error" even after multiple repairs

We have a 6 node each 2 datacenter Cassandra cluster production environment setup. We encounter large partition warning. We ran 2 successful repairs, still this is not getting resolved. How can I analyze and fix this?
BigTableWriter.java:184 - Writing large partition system_distributed/repair_history:rf_key_space:my_table (108140638 bytes)
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 1171896
Mismatch (Blocking): 808
Mismatch (Background): 131
Pool Name Active Pending Completed
Large messages n/a 11 0
Small messages n/a 0 48881938
Gossip messages n/a 0 113659
The system_distributed.repair_history table is not one that you really need to concern yourself with. Unfortunately, this can happen when a lot of repairs have been run. With 2.2, the only real solution is to TRUNCATE that table every now and then.

Couchdb views crashing for large documents

Couchdb keeps crashing whenever I try to build the index of the views of a design document emitting values for large documents. The total size of the database is 40 MB and I guess the documents are about 5 MB each. We're talking about large JSON without any attachment.
What concerns me is that I have 2.5 GB of free ram before trying to access the views but as soon as I try to access them, the CPU usage raises to 99% and all the free RAM gets eaten by erl.exe before the indexing fails with exit code 1.
Here is the log:
[info] 2016-11-22T22:07:52.263000Z couchdb#localhost <0.212.0> -------- couch_proc_manager <0.15603.334> died normal
[error] 2016-11-22T22:07:52.264000Z couchdb#localhost <0.15409.334> b9855eea74 rexi_server throw:{os_process_error,{exit_status,1}} [{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,56}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
Views skipping these documents can be accessed without issue. Which general guidelines could you provide me to help with this kind of situation? I am using couchdb 2.0 on windows.
Many thanks
Update : I tried to limit the number of view server instances to 1 and vary the max RAM allowed for couchjs, but it keeps crashing. Also I noticed that even though CouchDb is supposed to pass only one document at a time to the view server, erl.exe keeps eating all the available RAM (3GB used for three 5mb docs to update...). Initially I thought this could be because of the multiple couchjs instances but apparently this isn't the case.
Update : Made some progress, now it looks like the indexing is progressing well for just less than 10 minutes then erl.exe crashes. I have posted the dump here (just to clarify "well" means, 99% CPU usage and computer screen completely frozen).

How to tune "spark.rpc.askTimeout"?

We have a spark 1.6.1 application, which takes input from two kafka topics and writes the result to another kafka topic. The application receives some large (approximately 1MB) files in the first input topic and some simple conditions from the second input topic. If the condition is satisfied, the file is written to output topic else held in state (we use mapWithState).
The logic works fine for less (few hundred) number of input files, but fails with org.apache.spark.rpc.RpcTimeoutException and recommendation is to increase spark.rpc.askTimeout. After increasing from default (120s) to 300s the ran fine longer but crashed with the same error after 1 hour. After changing the value to 500s, the job ran fine for more than 2 hours.
Note: We are running the spark job in local mode and kafka is also running locally in the machine. Also, some time I see warning "[2016-09-06 17:36:05,491] [WARN] - [org.apache.spark.storage.MemoryStore] - Not enough space to cache rdd_2123_0 in memory! (computed 2.6 GB so far)"
Now, 300s seemed large enough a timeout considering all local configuration. But any idea, how to come up to an ideal timeout value instead of just using 500s or higher based on testing, as I see crashed cases using 800s and cases suggesting to use 60000s?
I was facing the same problem, I found this page saying that under heavy workloads it is wise to set spark.network.timeout(which controls all the timeouts, also the RPC one) to 800. At the moment it solved my problem.

cassandra, secondary indexes building during adding of a new node lasts forever

I'm trying to add new node to our cluster (cassandra 2.1.11, 16 nodes, 32Gb ram, 2x3Tb hdd, 8core cpu, 1 datacenter, 2 racks, about 700Gb of data on each node). After start of new node, data (approx 600Gb total) from 16 existing nodes successfully transfered to new node and building of secondary indexes starts. The process of secondary indexes building looks normal, i see info about successfull completition of some secondary indexes building and some stream tasks:
INFO [StreamReceiveTask:9] 2015-11-22 02:15:23,153 StreamResultFuture.java:180 - [Stream #856adc90-8ddd-11e5-a4be-69bddd44a709] Session with /192.168.21.66 is complete
INFO [StreamReceiveTask:9] 2015-11-22 02:15:23,152 SecondaryIndexManager.java:174 - Index build of [docs.docs_ex_pl_ph_idx, docs.docs_lo_pl_ph_idx, docs.docs_author_login_idx, docs.docs_author_extid_idx, docs.docs_url_idx] complete
Curently 9 out of 16 streams successfully finished, according to logs. Everything looks fine, except one issue: this process already lasts 5 full days. There is no errors in logs, no anything suspicious, except extremely slow progress.
nodetool compactionstats -H
shows
Secondary index build ... docs 882,4 MB 1,69 GB bytes 51,14%
So there is some process of index building and it has some progress, but very slow, 1% in half a hour or so.
The only significant difference between the new node and any of existing nodes is the fact that cassandra java process has 21k open files, in contrast of 300 open files on any existing node, and 80k files in the data dir on new node in contrast of 300-500 files in the data dir on any existing node.
Is it normal? At this speed it looks i'll spend 16 weeks or so to add 16 more nodes.
I know this is an old question, but we ran into this exact issue with 2.1.13 using DTCS. We were able to fix it in our test environment by increasing memtable flush thresholds to 0.7 - which didn't make any sense to us, but may be worth trying.

Resources