memsql-deploy leaf node consistently failed - singlestore

On the same host as master to memsql-deploy leaf node always failed with same error. Switching the operation to new machines has the same failure.
Here is the steps to deploy master role:
# memsql-ops memsql-deploy -a Af53bfb -r master -P 3306 --community-edition
2017-03-24 16:15:54: Je5725b [INFO] Deploying MemSQL to 172.17.0.3:3306
2017-03-24 16:15:59: Je5725b [INFO] Installing MemSQL
2017-03-24 16:16:02: Je5725b [INFO] Finishing MemSQL Install
Waiting for MemSQL to start...
MemSQL successfully started
Here is the immediate steps to add leaf node after deploying master:
# memsql-ops memsql-deploy -a Af53bfb -r leaf -P 3308
2017-03-24 16:16:43: J32c71f [INFO] Deploying MemSQL to 172.17.0.3:3308
2017-03-24 16:16:43: J32c71f [INFO] Installing MemSQL
2017-03-24 16:16:46: J32c71f [INFO] Finishing MemSQL Install
Waiting for MemSQL to start...
MemSQL failed to start: Failed to start MemSQL:
set_mempolicy: Operation not permitted
setting membind: Operation not permitted
What can be the possible reasons behind the error messages and what way that I can follow to find out the root cause or fix?

After one day search on Google, I believe I finally locate the root cause of this error. I feel strange why no one asked before because it should be happened more often than just me.
The real cause for this issue is I installed numactl package per MemSQL's best practice suggestion on a non-NUMA machine. This would effectively let the memsql node other than the first one try to run numactl sub-command set_mempolicy to bind individual MemSQL nodes to CPUs but this command would eventually fails. And the start of the node by sub-commands memsql-start or memsql-deploy from memsql-ops will all fail.
The workaround to this is very simple, just remove the package numactl. Then everything will be fine. This workaround particularly applies to some virtualization based memsql deployments like Docker.

Can you try on the master:
memsql-ops start
memsql-ops memsql-deploy --role master -P 3306 --community-edition
On the agent:
memsql-ops start
memsql-ops follow -h <host of primary agent> -P <port of primary agent if configured to use one>
memsql-ops memsql-deploy --role leaf -P 3308 --community-edition

Related

Multi-node multi-datacenter CASSANDRA

I am trying to setup a multi-node multi-datacenter cluster in Cassandra 3.11
For data-center 1 I have Cassandra running on 3 nodes(eg. 10.90.22.11, 10.90.22.12 and 10.90.22.13) and for data-center 2 I have Cassandra running on 2 nodes(eg. 10.90.22.21 and 10.90.22.22).
The ring is up but they are working separately. To make them work together I update the endpoint_snitch to be GossipingPropertyFileSnitch and also the dc and rac in cassandra-rackdc.properties to be DC1 and DC2 for respective nodes following the steps mentioned in this link.
After these changes when I restart Cassandra, the status of Cassandra is running however when I check for the ring with nodetool status I receive a error:
nodetool: Failed to connect to '127.0.0.1:7199'
ConnectException: 'Connection refused (Connection refused)'
What am I missing?
This error you posted indicates that nodetool couldn't connect to JMX that is supposed to be listening on port 7199:
Failed to connect to '127.0.0.1:7199'
Verify that Cassandra is running and check that the process is bound to various ports including 7199, 9042 and 7000. You can try running one of these commands:
$ netstat -tnlp
$ sudo lsof -nPi | grep LISTEN | grep java
Cheers!
You should try nodetool command with host/IP what you have put in your cassandra.yaml. Also, you should check your port 7199 or custom port if you set is open/allow from firewall.
nodetool -h hostname/ip status.
you can mention username.password if you enabled. please refer below link for more details and understanding:-
http://cassandra.apache.org/doc/latest/tools/nodetool/status.html

run hazelcast based on docker locally

I want to run Hazelcast for POC for future use based on docker in the aws instances.
I use the next configuration to run in on my laptop for some investigation:
docker run -e JAVA_OPTS="-Dhazelcast.local.publicAddress=192.168.1.227:5701" -itd -p 5701:5701 hazelcast/hazelcast
docker run -e JAVA_OPTS="-Dhazelcast.local.publicAddress=192.168.1.227:5702" -itd -p 5702:5701 hazelcast/hazelcast
It starts ok, but once try to open it in the browser I got next warnings:
docker logs -ft a91ed298117a
2020-02-02T16:30:41.846203500Z ########################################
2020-02-02T16:30:41.846284000Z # JAVA_OPTS=-Dhazelcast.mancenter.enabled=false -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:MaxRAMPercentage=80.0 -Dhazelcast.local.publicAddress=192.168.1.227:5702
2020-02-02T16:30:41.846346700Z # CLASSPATH=/opt/hazelcast/*:/opt/hazelcast/lib/*
2020-02-02T16:30:41.846374200Z # starting now....
2020-02-02T16:30:41.846424700Z ########################################
2020-02-02T16:30:41.846467100Z + exec java -server -Dhazelcast.mancenter.enabled=false -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:MaxRAMPercentage=80.0 -Dhazelcast.local.publicAddress=192.168.1.227:5702 com.hazelcast.core.server.StartServer
Members {size:2, ver:2} [
2020-02-02T16:30:52.360102700Z Member [192.168.1.227]:5701 - e152d11b-df3e-4c29-a363-188842fc624c
2020-02-02T16:30:52.360128200Z Member [192.168.1.227]:5702 - e7811c67-34ef-4ec5-9687-1945d7c36b69 this
2020-02-02T16:30:52.360159400Z ]
2020-02-02T16:30:52.360183200Z
2020-02-02T16:30:53.384531200Z Feb 02, 2020 4:30:53 PM com.hazelcast.core.LifecycleService
2020-02-02T16:30:53.384586000Z INFO: [192.168.1.227]:5702 [dev] [3.12.6] [192.168.1.227]:5702 is STARTED
2020-02-02T16:31:00.582731400Z Feb 02, 2020 4:31:00 PM com.hazelcast.nio.tcp.TcpIpConnection
2020-02-02T16:31:00.582871900Z WARNING: [192.168.1.227]:5702 [dev] [3.12.6] Connection[id=2, /172.17.0.3:5701->/172.17.0.1:60574, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Exception in Connection[id=2, /172.17.0.3:5701->/172.17.0.1:60574, qualifier=null, endpoint=null, alive=true, type=NONE], thread=hz._hzInstance_1_dev.IO.thread-in-1
2020-02-02T16:31:00.582909200Z java.lang.IllegalStateException: REST API is not enabled.
2020-02-02T16:31:00.583013000Z at com.hazelcast.nio.tcp.UnifiedProtocolDecoder.onRead(UnifiedProtocolDecoder.java:96)
2020-02-02T16:31:00.583049600Z at com.hazelcast.internal.networking.nio.NioInboundPipeline.process(NioInboundPipeline.java:135)
2020-02-02T16:31:00.583077900Z at com.hazelcast.internal.networking.nio.NioThread.processSelectionKey(NioThread.java:369)
2020-02-02T16:31:00.583122400Z at com.hazelcast.internal.networking.nio.NioThread.processSelectionKeys(NioThread.java:354)
2020-02-02T16:31:00.583189100Z at com.hazelcast.internal.networking.nio.NioThread.selectLoop(NioThread.java:280)
2020-02-02T16:31:00.583220000Z at com.hazelcast.internal.networking.nio.NioThread.run(NioThread.java:235)
2020-02-02T16:31:00.583249400Z
2020-02-02T16:31:00.604505300Z Feb 02, 2020 4:31:00 PM com.hazelcast.nio.tcp.TcpIpConnection
Could you please help me to understand where I goes wrong?
The Hazelcast REST API is not enabled by default and that is why you get the exception in the logs. Also, keep in mind, that it does not make much sense to open Hazelcast in the browser, since it does not serve any HTTP webpage.
Saying that, you successfully run Hazelcast cluster in Docker. Now if you want to play with it, the simplest way is to either enable REST API or to use your language of choice and connect with Hazelcast client.
1. REST API
To start Hazelcast with REST API enabled, you need to add -Dhazelcast.rest.enabled=true to your JAVA_OPTS. So in your case, you can run the following commands:
docker run -e JAVA_OPTS="-Dhazelcast.local.publicAddress=192.168.1.227:5701 -Dhazelcast.rest.enabled=true" -itd -p 5701:5701 hazelcast/hazelcast:3.12.6
docker run -e JAVA_OPTS="-Dhazelcast.local.publicAddress=192.168.1.227:5702 -Dhazelcast.rest.enabled=true" -itd -p 5702:5701 hazelcast/hazelcast:3.12.6
Then, you can use Hazelcast REST API, for example to add and read the value form the map:
$ curl -X POST 192.168.1.227:5701/hazelcast/rest/maps/mapName/foo -d "bar"
$ curl 192.168.1.227:5701/hazelcast/rest/maps/mapName/foo
bar
2. Hazelcast Client
There are Hazelcast Clients in most programming languages. You only need to specify 192.168.1.227:5701 and 192.168.1.227:5702 as the address of your Hazelcast cluster. For example, in Python it would look like this.
import hazelcast
config = hazelcast.ClientConfig()
config.network_config.addresses.append("192.168.1.227:5701")
config.network_config.addresses.append("192.168.1.227:5702")
client = hazelcast.HazelcastClient(config)
my_map = client.get_map("map")
my_map.put("key", "value")
client.shutdown()
Then, you can run it with:
pip install hazelcast-python-client && python client.py

Memsql Master Node is not running

I have a memsql cluster with 1 master and 4 leaf node.
I have a problem my master node is not running but it is connected in the cluster. And i can read and write a data to my cluster.
while trying to restart the master node its showing some error.
2018-03-31 20:54:22: Jb2ae955f6 [ERROR] Failed to connect to MemSQL node BD60BED7C8082966F375CBF983A46A9E39FAA791: ProcessHandshakeResponsePacket() failed. Sending back 1045: Access denied for user 'root'#'xx.xx.xx.xx' (using password: NO)
ProcessHandshakeResponsePacket() failed. Sending back 1045: Access denied for user 'root'#'10.254.34.135' (using password: NO)
Cluster status
Index ID Agent Id Process State Cluster State Role Host Port Version
1 BD60BED Afb08cd NOT RUNNING CONNECTED MASTER 10.254.34.135 3306 5.8.10
2 D84101F A10aad5 RUNNING CONNECTED LEAF 10.254.42.244 3306 5.8.10
3 3D2A2AF Aa2ac03 RUNNING CONNECTED LEAF 10.254.38.76 3306 5.8.10
4 D054B1C Ab6c885 RUNNING CONNECTED LEAF 10.254.46.99 3306 5.8.10
5 F8008F7 Afb08cd RUNNING CONNECTED LEAF 10.254.34.135 3307 5.8.10
That error means that while the node is online, memsql-ops is unable to log in to the node, most likely because the root user's password is misconfigured somewhere in the system - memsql-ops is configured with no password for that node, but likely the memsql node does have a root password set.
Did you set a root password in memsql? Are you able to connect to the master node directly via mysql client?
If yes, you can fix this by logging in to the memsql master node directly and changing the root password to blank:
GRANT ALL PRIVILEGES ON *.* TO 'root'#'%' identified by '' WITH GRANT OPTION
Then, after ensuring that connectivity is restored, you can update the root password in the future with the command https://docs.memsql.com/memsql-ops-cli-reference/v6.0/memsql-update-root-password/.

dockerd: Error running deviceCreate (CreatePool) dm_task_run failed

I'm building some CentOS VM with VMWare, with no access to internet, so I've downloaded and made local repositories, including this one
Then I have installed docker-engine.x86_64, and when starting the docker daemon, I get the following errors :
[root]# dockerd
DEBU[0000] docker group found. gid: 993
...
...
DEBU[0001] Error retrieving the next available loopback: open /dev/loop-control: no such device
ERRO[0001] **There are no more loopback devices available.**
ERRO[0001] [graphdriver] prior storage driver "devicemapper" failed: loopback attach failed
DEBU[0001] Cleaning up old mountid : start.
FATA[0001] Error starting daemon: error initializing graphdriver: loopback attach failed
After manually add the loop module which control loop device with this command :
insmod /lib/modules/3.10.0-327.36.2.el7.x86_64/kernel/drivers/block/loop.ko
The error changes to :
[graphdriver] prior storage driver "devicemapper" failed: devicemapper: Error running deviceCreate (CreatePool) dm_task_run failed
I've read that it could be because I have not enough space disk, I think it's not that, any idea?
[root]# df -k .
Filesystem blocs de 1K Used Available Used Mounted on
/dev/mapper/centos-root 51887356 2436256 49451100 5% /
I got the "There are no more loopback devices available" error, which stopped dockerd from running.
I fixed it by ensuring the storage driver was 'overlay':
# /usr/bin/dockerd -D --storage-driver=overlay
This was on Debian Jessie and docker running as a systemd service/unit.
To make it permanent, I created a systemd drop-in:
$ cat /etc/systemd/system/docker.service.d/docker.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H fd:// --storage-driver=overlay

Cassandra won't start in linux as a service

I have a debian linux image running on Google compute. Can successfully get cassandra working with "sudo cassandra" or "sudo cassandra -f" but then as soon as I log off this stops working. But when I try to run this as a service it simply doesnt say anything and doesnt start it either! I installed it using the aptget package v2.1.
I've tried sudo service cassandra start. It looks like its doing something and then quits without any logs.
Please help me run this up as a service. I can't even locate where the logs are stored when I run it as a service.
I ran into this issue recently, and as BrianC indicated it can be an out of memory condition. In my case I could successfully start cassandra with sudo cassandra -f but not with /etc/init.d/cassandra start.
For me, the last log entry in /var/log/cassandra/system.log when starting as a service was:
INFO [main] 2015-04-30 10:58:16,234 CassandraDaemon.java (line 248) Classpath: /etc/cassandra:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.0.14.jar:/usr/share/cassandra/apache-cassandra-thrift-2.0.14.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar::/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar
And nothing afterwards. If it is a memory problem you should be able to verify this in your syslog. If if contains something like:
Apr 30 10:53:39 dev kernel: [1173246.957818] Out of memory: Kill process 8229 (java) score 132 or sacrifice child
Apr 30 10:53:39 dev kernel: [1173246.957831] Killed process 8229 (java) total-vm:634084kB, anon-rss:286772kB, file-rss:12676kB
Increase your ram. In my case I increased it to 2GB and it started fine.

Resources