Getting "Error initializing cluster data" in OpsCenter

Getting "Error initializing cluster data" in OpsCenter - cassandra

I am getting following error in Opscenter, after some time the issue got resolved itself.
Error initializing cluster data: The request to
/APP_Live/keyspaces?ksfields=column_families%2Creplica_placement_strategy%2Cstrategy_options%2Cis_system%2Cdurable_writes%2Cskip_repair%2Cuser_types%2Cuser_functions%2Cuser_aggregates&cffields=solr_core%2Ccreate_query%2Cis_in_memory%2Ctiers timed out after 10 seconds..
If you continue to see this error message, you can workaround this timeout by setting [ui].default_api_timeout to a value larger than 10 in opscenterd.conf and restarting opscenterd.
Note that this is a workaround and you should also contact DataStax
Support to follow up.

Workaround of this timeout is by setting [ui].default_api_timeout to a value larger than 10 in opscenterd.conf and restarting opscenterd.

Related

advanced.session-leak after sometime of starting spark thrift server with datastax cassandra connector

Hi I am getting the following error after some time of inactivity.
Error: Error running query: com.typesafe.config.ConfigException$Missing: withValue(advanced.reconnection-policy.base-delay): No configuration setting found for key 'advanced.session-leak' (state=,code=0)
restarting thrift server seems to solve the issue for sometime.

Databricks error:Internal error, sorry. Attach your notebook to a different cluster or restart the current cluster

I'm working in databricks 8.1. If I’m running my Pyspark code line by line using Shift + Enter then my code is running fine but if I’m running the entire notebook using Run All button then I’m getting internal error message. The complete error message is
Internal error, sorry. Attach your notebook to a different cluster or restart the current cluster.
com.databricks.rpc.RPCResponseTooLarge: rpc response (of 20984709 bytes) exceeds limit of 20971520
bytes at com.databricks.rpc.Jetty9Client$$anon$1.onContent(Jetty9Client.scala:370) at
shaded.v9_4.org.eclipse.jetty.client.api.Response$Listener$Adapter.onContent(Response.java:248) at
shaded.v9_4.org.eclipse.jetty.client.ResponseNotifier.notifyContent(ResponseNotifier.java:135) at
shaded.v9_4.org.eclipse.jetty.client.ResponseNotifier.notifyContent(ResponseNotifier.java:126) at
shaded.v9_4.org.eclipse.jetty.client.HttpReceiver.responseContent(HttpReceiver.java:340) at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.content(HttpReceiverOverHTTP.java:283)
at shaded.v9_4.org.eclipse.jetty.http.HttpParser.parseContent(HttpParser.java:1762) at
shaded.v9_4.org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1490) at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:172)
at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:135)
at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:73)
at
shaded.v9_4.org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:133)
at shaded.v9_4.org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:151)
at shaded.v9_4.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at shaded.v9_4.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at shaded.v9_4.org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:426)
at shaded.v9_4.org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:320) at
shaded.v9_4.org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:158) at
shaded.v9_4.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at
shaded.v9_4.org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at
shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at
shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
at
shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at shaded.v9_4.org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367) at shaded.v9_4.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782) at shaded.v9_4.org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:914) at java.base/java.lang.Thread.run(Thread.java:834)
Several times, I restarted the cluster but still I'm getting same issue
Can you suggest me the steps to resolve this issue?

Application failed 2 times due to AM container for appattempt_ exited with exitCode: 0

When I submit my spark program, it fails at the end but with a ExitCode:0 as shown in the picture.
The program should write a table on hive and despite the failure, the table was created successfully.
But I can't figure out the origin of the error. Can you help please.
Yarn logs -appID gives the following output here

I finally solved my problem. In fact today I suprisingly got another error from the same program saying
ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver X:37478 disassociated! Shutting down.
And I found so many solutions talking about memory or timeOut.
What did the trick is just that I forgot to close my sparkSession (spark.close()).

Can't backup to S3 with OpsCenter 5.2.1

I upgraded OpsCenter from 5.1.3 to 5.2.0 (and then to 5.2.1). I had a scheduled backup to local server and an S3 location configured before the upgrade, which worked fine with OpsCenter 5.1.3. I made to no changes to the scheduled backup during or after the upgrade.
The day after the upgrade, the S3 backup failed. In opscenterd.log, I see these errors:
2015-09-28 17:00:00+0000 [local] INFO: Instructing agents to start backups at Mon, 28 Sep 2015 17:00:00 +0000
2015-09-28 17:00:00+0000 [local] INFO: Scheduled job 458459d6-d038-41b4-9094-7d450e4bac6f finished
2015-09-28 17:00:00+0000 [local] INFO: Snapshots started on all nodes
2015-09-28 17:00:08+0000 [] WARN: Marking request d960ad7b-2ccd-40a4-be7e-8351ac038c53 as failed: {'sstables': {u'solr_admin': {u'solr_resources': {'total_size': 155313, 'total_files': 12, 'done_files': 0, 'errors': [u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', shortened for brevity.
The S3 location no longer appears in OpsCenter when I edit the scheduled backup job. When I try to re-add the S3 location, using the same bucket and credentials as before, I get the following error:
Location validation error: Call to /local/backups/destination_validate timed out.
Also, I don't know if this is related, but for completeness, I see some of these errors in the opscenterd.log as well:
WARN: No http agent exists for definition file update. This is likely due to SSL import failure.
I get this behavior with either DataStax Enterprise 4.5.1 or 4.7.3.

I have been having the exact same problem since updating to OpsCenter 5.2.x and just was able to get it working properly.
I removed all the settings suggested in the previous answer and then created new buckets in us-west-1, us-west-2 and us-standard. After this I was able to successfully able to add all of those as destinations quickly and easily.
It appears to me that the problem is that OpsCenter may be trying to list the objects in the bucket that you configure initially, which in my case for the 2 existing ones we were using had 11TB and 19GB of data in them respectively.
This could explain why increasing the timeout for some worked and not others.
Hope this helps.

Try adding the remote_backup_region property to the cluster configuration file under the [agents] heading in "cluster-name".conf. Valid values are: us-standard, us-west-1, us-west-2, eu-west-1, ap-northeast-1, ap-southeast-1
Does that help?

The problem was resolved by a combination of 2 things.
Delete the entire contents of the existing S3 bucket (or create a new bucket as previously suggested by #kaveh-nowroozi).
Edit /etc/datastax-agent/datastax-agent-env.sh and increase the heap size to 512M as suggested by a DataStax engineer. The default was set at 128M and I kept doubling it until backups became successful.

Cassandra Streaming error - Unknown keyspace system_traces

In our dev cluster, which has been running smooth before, when we replace a node (which we have been doing constantly) the following failure occurs and prevents the replacement node from joining.
cassandra version is 2.0.7
What can be done about it?
ERROR [STREAM-IN-/10.128.---.---] 2014-11-19 12:35:58,007 StreamSession.java (line 420) [Stream #9cad81f0-6fe8-11e4-b575-4b49634010a9] Streaming error occurred
java.lang.AssertionError: Unknown keyspace system_traces
at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:260)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:239)
at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:368)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:289)
at java.lang.Thread.run(Thread.java:745)

I got the same error while I was trying to setup my cluster, and as I was experimenting with different switches in cassandra.yaml, I restarted the service multiple times and removed the system dir under data directory (/var/lib/cassandra/data as mentioned here).
I guess for some reason cassandra tries to load system_traces keyspace and fails (the other dir under /var/lib/cassandra/data), and nodetool throws this error. You can just remove both system and system_traces before starting cassandra service, or even better delete all content of bommitlog, data and savedcache there.
This works obviously if you dont have any data just yet in the system.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Getting "Error initializing cluster data" in OpsCenter - cassandra

Workaround of this timeout is by setting [ui].default_api_timeout to a value larger than 10 in opscenterd.conf and restarting opscenterd.

Related

advanced.session-leak after sometime of starting spark thrift server with datastax cassandra connector

Databricks error:Internal error, sorry. Attach your notebook to a different cluster or restart the current cluster

Application failed 2 times due to AM container for appattempt_ exited with exitCode: 0

Can't backup to S3 with OpsCenter 5.2.1

Cassandra Streaming error - Unknown keyspace system_traces

Categories

Resources