memsql aggregator fail - how to recover the cluster - singlestore

I have a memsql cluster with 4 child aggregators, 30 leaves and one master that failed. At this point i can't recover the master no matter what i am going to do. That instance is gone. I have promoted one of the child aggregators to master.
Once i connect to memsql and i run show databases; shoe leaves/aggregators ... everything is in place. However how do i manage to convert this child into a master? I mean , on the web UI the master appears running a freshly start cluster with zero leaves. Also i can't see any master folder created on the child aggregator that was promoted.
So my question is where am i going from here? For example if i want to restart the entire cluster how am i going to do it given the fact that from the child promoted node i get memsql-ops memsql-list
No MemSQL nodes were found ?
How will i performa the typical operations - update, restart?

It sounds like you have successfully promoted a child aggregator to master in the MemSQL cluster, but MemSQL Ops has lost all the cluster information because the Ops primary agent - which by default was on the same host as the Master Aggregator - is gone.
I'm not sure about your situation - did you promote a new Ops primary agent? - but in general, if you have a functioning MemSQL cluster and MemSQL Ops on all the nodes of the cluster, but Ops is not monitoring MemSQL (i.e. memsql-ops memsql-list is empty), you would run memsql-ops memsql-monitor for each MemSQL node to add them into Ops monitoring.
EDIT: answer was you haven't promoted a new Ops primary agent yet. In that case, here is what you should need to do.
Run memsql-ops unfollow on every node except the old primary
Choose a node to be the new primary - e.g. the new Master Aggregator.
Run memsql-ops follow -h NEW_PRIMARY_HOSTNAME on every node except the new primary
Run memsql-ops monitor -h NEW_MASTER_AGGREGATOR

Related

How to know if a machine in a Spark cluster 'participate's a job

I wanted to know when it is safe to remove a node from a machine from a cluster.
My assumption is that it could be safe to remove a machine if the machine does not have any containers, and it does not store any useful data.
By the APIs at https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html, we can do
GET http://<rm http address:port>/ws/v1/cluster/nodes
to get the information of each node like
<node>
<rack>/default-rack</rack>
<state>RUNNING</state>
<id>host1.domain.com:54158</id>
<nodeHostName>host1.domain.com</nodeHostName>
<nodeHTTPAddress>host1.domain.com:8042</nodeHTTPAddress>
<lastHealthUpdate>1476995346399</lastHealthUpdate>
<version>3.0.0-SNAPSHOT</version>
<healthReport></healthReport>
<numContainers>0</numContainers>
<usedMemoryMB>0</usedMemoryMB>
<availMemoryMB>8192</availMemoryMB>
<usedVirtualCores>0</usedVirtualCores>
<availableVirtualCores>8</availableVirtualCores>
<resourceUtilization>
<nodePhysicalMemoryMB>1027</nodePhysicalMemoryMB>
<nodeVirtualMemoryMB>1027</nodeVirtualMemoryMB>
<nodeCPUUsage>0.006664445623755455</nodeCPUUsage>
<aggregatedContainersPhysicalMemoryMB>0</aggregatedContainersPhysicalMemoryMB>
<aggregatedContainersVirtualMemoryMB>0</aggregatedContainersVirtualMemoryMB>
<containersCPUUsage>0.0</containersCPUUsage>
</resourceUtilization>
</node>
If numContainers is 0, I assume it does not run containers. However can it still store any data on disk that other downstream tasks can read?
I did not get if Spark lets us know this. I assume if a machine still stores some data useful for the running job, the machine may maintain a heart beat with Spark Driver or some central controller? Can we check this by scanning tcp or udp connections?
Is there any other way to check if a machine in a Spark cluster participates a job?
I am not sure whether you just want to know if a node is running any task (is that's what you mean by 'participate') or you want to know if it is safe to remove a node from the Spark cluster
I will try to explain the latter point.
Spark has the ability to recover from the failure, which also applies to any node being removed from the cluster.
The node removed can be an executor or an application master.
If an application master is removed, the entire job fails. But is you are using yarn as a resource manager, the job is retried and yarn gives a new application master. The number if retries is configured in :
yarn.resourcemanager.am.max-attempts
By default, this value is 2
If a node on which a task is running is removed, the resource manager (which is handled by yarn) will stop getting heartbeats from that node. Application master will know it is supposed to reschedule the failed job as it will no longer receive progress status from the previous node. It will then request resource manager for resources and then reschedule the job.
As far as data on these nodes is concerned, you need to understand how the tasks and their output are handled. Every node has its own local storage to store the output of the tasks running on them. After the tasks are run successfully, the OutputCommitter will move the output from local storage to the shared storage (HDFS) of the job from where the data is picked for the next step of the job.
When a task fails (may be because the node that runs this job failed or was removed), the task is rerun on another available node.
In fact, the application master will also rerun the successfully run tasks on this node as their output stored on the node's local storage will not longer be available.

How long does memsql upgrade takes?

I have started an offline upgrade process to upgrade my MemSql Cluster from 5.8 To 6.5, Data size is around 300G it's been 5 hours already but i have lost all access to cluster and also there is no way to check the status.
memsql-ops memsql-list returns all leaves and aggregator shows online.
But, memsql> SHOW LEAVES; return empty set, my master aggregator automatically converted to child aggregator, so now i don't have any master aggregator.
I can't execute any command (Like AGGREGATOR SET AS MASTER) to child aggregator, it says 'memsql is not running as an aggregator', Or 'memsql node is not running', and sql query returns "The database 'xxx' is not available for queries, as it is waiting for the Master Aggregator to bring it online. Run SHOW DATABASES EXTENDED ..."
Also performing any management command like memsql-ops restart returns "Job cannot run because there is a MemSql upgrade intention with ID xxx is in progress"
Any information about this will be helpful as i am not able to find any related information online.
Thanks in advance...
We debugged the issue in MemSQL public chat and it was found that the Master Agg was running an unsupported beta version of MemSQL (6.0.0) which prevented the upgrade and then corrupted the database post upgrade.
For future readers please audit that you are not running beta versions of MemSQL on production clusters. If you are, not only will upgrade likely break, but it may not be possible to recover your data on a non-beta cluster.

Does Kubernetes reschedule pods to other minions if the minions join latter?

I was running a test with initially a minion node and a master node. I created 5 pods on the cluster and later on 2 minion nodes joined the cluster.
So the problem I faced was that all the pods were only scheduled on the master and minion nodes. They were not re-scheduled to new nodes so as to divide the whole resources. Due to which my new minion nodes were just sitting idle and didn't do any processing.
Is there anything specially to be run to make this happen ?
Not really. The scheduler is called whenever something needs to be scheduled, so unless you deploy new replicas of the pod, the scheduler won't be bothered again.
Whenever you want to schedule something, like creating a Deployment or a Pod, the scheduler looks at the available resources to place the Pods where it thinks is best. Next time you schedule something, it will take into account the new minions added to the cluster. Or if your pods are created via a Deployment object, you could try deleting one Pod, so the ReplicationController will create a new Pod and the scheduler may choose one of the new minions.
The documentation also recommends creating a Service before creating a Deployment`, so the scheduler will spread the pods better among the minions.

Unable to start DSE using SPARK_ENABLED=1

We are running 6 node cluster with:
HADOOP_ENABLED=0
SOLR_ENABLED=0
SPARK_ENABLED=0
CFS_ENABLED=0
Now, we would like to add Spark to all of them. It seems like "adding" is not the right term because this would not fail. Anyways, the steps we've done:
1. drained one of the nodes
2. changed /etc/default/dse to SPARK_ENABLED=1 and HADOOP_ENABLED=0
3. sudo service dse restart
And got the following in the log:
ERROR [main] 2016-05-17 11:51:12,739 CassandraDaemon.java:294 - Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Cannot start node if snitch's data center (Analytics) differs from previous data center (Cassandra). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
There are two related questions that have been already answered:
Unable to start solr aspect of DSE search
Two node DSE spark cluster error setting up second node. Why?
Unfortunately, clearing the data on the node is not an option - why would I do that? I need the data to be intact.
Using "-Dcassandra.ignore_rack=true -Dcassandra.ignore_dc=true" is a bit scary in production. I don't understand why DSE wants to create another DC and why can't it just use the existing one?
I know that according to datastax's doc one should partition the load using different DC for different workloads. In our case we just want to run SPARK jobs on the same nodes that Cassandra is running using the same DC.
Is that possible?
Thanks!
The other answers are correct. The issue here is trying to warn you that you have previously identified this node as being in another DC. This means that it probably doesn't have the right data for any key-spaces with Network Topology Strategy. For example if you had a NTS keyspace which had only one replica in "Cassandra" and changed the DC to "Analytics" you could inadvertently lose all of the data.
This warning and the accompanying flag are telling you that you are doing something that you should not be doing in a production cluster.
The real solution to this is to explicitly name your dc's using GossipingFileSnitch and not rely on SimpleSnitch which names based on the DSE workload.
In this case, switch to GPFS and set the DC name to Cassandra.

Changing Snitch on live cluster in datastax 4.5

I have 8 nodes in one region and now i want to add new node in other region.Presently i m using ec2snitch ,after adding node to new region i need to change snitchs of all nodes to ec2 multiregion snitch.
Now my question is, does this change will impact my current running cluster? and what would be the best practice for doing this .
Thanks
You should do a rolling restart changing to ec2 multi region snitch before adding the new node. It should not impact your running cluster. Though I would suggest you bring up a test cluster briefly to test making the change.
To perform a rolling restart from Opscenter:
Click Nodes in the left pane.
In the contextual menu select Restart
from the Cluster Actions dropdown.
Set the amount of time to wait after restarting each node, select whether the node should be
drained before stopping, and then click Restart Cluster.
See more details here:
http://www.datastax.com/documentation/opscenter/5.0/opsc/online_help/opscRestartingCluster_t.html
Here is a link to the DataStax documentation for switching snitches. I found that to be useful when I switched to the GossipingPropertiesFileSnitch. I also had to edit cassandra-rackdc.properties on all nodes before doing the rolling restart.
Even though my topology didn't change, I followed the instruction in the reference. Stopped all the nodes, restarted them (start with the seeds), then ran 'nodetool repair' and 'nodetool cleanup' on all nodes.

Resources