Spring Data GemFire Server java.net.BindException in Linux - linux

I have a Spring Boot app that I am using to start a Pivotal GemFire CacheServer.
When I jar up the file and run it locally:
java -jar gemfire-server-0.0.1-SNAPSHOT.jar
It runs fine without issue. The server is using the default properties
spring.data.gemfire.cache.log-level=info
spring.data.gemfire.locators=localhost[10334]
spring.data.gemfire.cache.server.port=40404
spring.data.gemfire.name=CacheServer
spring.data.gemfire.cache.server.bind-address=localhost
spring.data.gemfire.cache.server.host-name-for-clients=localhost
If I deploy this to a Centos distribution and run it with the same script but passing the "test" profile:
java -jar gemfire-server-0.0.1-SNAPSHOT.jar -Dspring.profiles.active=test
with my test profile application-test.properties looking like this:
spring.data.gemfire.cache.server.host-name-for-clients=server.centralus.cloudapp.azure.com
I can see during startup that the server finds the Locator already running on the host (I start it through a separate process with Gfsh).
The server even joins the cluster for about a minute. But then it shuts down because of a bind exception.
I have checked to see if there is anything running on that port (40404) - and nothing shows up
EDIT
Apparently I DO get this exception locally - it just takes a lot longer.
It is almost instant when I start it up on the Centos distribution. On my Mac it takes around 2 minutes before the process throws the exception:
Adding a few more images of this:
Two bash windows - left is monitoring GF locally and right I use to check the port and start the Java process:
The server is added to the cluster. Note the timestamp of 16:45:05.
Here is the server added and it appears to be running:
Finally, the exception after about two minutes - again look at the timestamp on the exception - 16:47:09. The server is stopped and dropped from the cluster.

Did you start other servers using Gfsh? That is, with a Gfsh command similar to...
gfsh>start server --name=ExampleGfshServer --log-level=config
Gfsh will start CacheServers listening on the default CacheServer port of 40404.
You have a few options.
1) First, you can disable the default CacheServer when starting a server with Gfsh like so...
gfsh>start server --name=ExampleGfshServer --log-level=config --disable-default-server
2) Alternatively, you can change the CacheServer port when starting other servers using Gfsh...
gfsh>start server --name=ExampleGfshServer --log-level=config --server-port=50505
3) If you are starting multiple instances of your Spring Boot, Pivotal GemFire CacheServer class, then you can vary the spring.data.gemfire.cache.server.port property by declaring the property as a System property when you startup.
For instance, you can, in the Spring Boot application.properties, do...
#application.properties
...
spring.data.gemfire.cache.server.port=${gemfire.cache.server.port:40404}
And then when starting the application from the command-line...
java -Dgemfire.cache.server.port=48484 -jar ...
Of course, you could just set the SDG property from the command line too...
java -Dspring.data.gemfire.cache.server.port=48484 --jar ...
Anyway, I guarantee you that you have another process (e.g. Pivotal GemFire CacheServer) with a ServerSocket listening on port 40404, running. netstat -a | grep 40404 should give you better results.
Hope this helps.
Regards,
John

Related

How to connect to Cassandra Database using Python code

I had followed the steps given in https://docs.datastax.com/en/developer/python-driver/3.25/getting_started/ to connect to cassandra database using python code, but still after running the code snippet I am getting
NoHostAvailable: ('Unable to connect to any servers', {'hosts"port': OperationTimedOut('errors=None, last_host=None'),
Python version 2.7 and 3 (classpath is set for both the python versions)
Java 1.8 (class path has been set)
Apache cassandra 3.11.6 (apache home classpath has been set)
I tend to use a very simple app to test connectivity to a Cassandra cluster:
from cassandra.cluster import Cluster
cluster = Cluster(['10.1.2.3'], port=45678)
session = cluster.connect()
row = session.execute("SELECT release_version FROM system.local").one()
if row:
print(row[0])
Then run it:
$ python HelloCassandra.py
4.0.6
In your comment you mentioned that you're getting OperationTimedOut which indicates that the driver never got a response back from the node within the client timeout period. This usually means (a) you're connecting to the wrong IP, (b) you're connecting to the wrong CQL port, or (c) there's a network connectivity issue between your app and the cluster.
Make sure that you're using the IP address that you've set in rpc_address of cassandra.yaml. Also make sure that the node is listening for CQL clients on the right port. You can easily verify this by checking the output of either of these Linux utilities like netstat or lsof, for example:
$ sudo lsof -nPi -sTCP:LISTEN
Cheers!
So that error message suggests that the host/port combination either does not have Cassandra running on it or is under heavy load and unable to respond.
Can you edit your question to include the Cassandra connection portion of your code, as well as maybe how you're calling it? I have a test script which I use (and you're welcome to check it out), and here is the connection portion:
protocol=4
hostname=sys.argv[1]
username=sys.argv[2]
password=sys.argv[3]
nodes = []
nodes.append(hostname)
auth_provider = PlainTextAuthProvider(username=username, password=password)
cluster = Cluster(nodes,auth_provider=auth_provider, protocol_version=protocol)
session = cluster.connect()
I call it like this:
$ python3 testCassandra.py 127.0.0.1 aaron notReallyMyPassword
local
One thing you might try too, would be to run a nodetool status on the cluster just to make sure it's running ok.
Edit
local variable 'session' referenced before assignment
So this sounds to me like you're attempting a session.execute before session = cluster.connect(). Have a look at my Git repo (linked above) to see the correct order for instantiating session.
I am not using default port
In that case, make sure the port is being set in the cluster definition. Ex:
port = 19099
cluster = Cluster(nodes,auth_provider=auth_provider, port=port)

How to debug presto server project?

I am trying to debug presto server on eclipse with following steps,
com.facebook.presto.PrestoServer class > set debug break point inside main method.
but, following errors are coming.
1) Explicit bindings are required and com.facebook.presto.execution.TaskManager is not explicitly bound.
while locating com.facebook.presto.execution.TaskManager
for the 1st parameter of com.facebook.presto.server.GracefulShutdownHandler.(GracefulShutdownHandler.java:66)
at com.facebook.presto.server.GracefulShutdownModule.setup(GracefulShutdownModule.java:27)
The simplest is to follow the https://github.com/prestosql/presto/blob/master/README.md#running-presto-in-your-ide
In case if you would like attach a debugger to a server from existing Presto installation, please add the below to jvm.config file (see https://prestosql.io/docs/current/installation/deployment.html#jvm-config), restart Presto server and then attach the debugger:
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005

Hazelcast Eureka Cloud Discovery Plugin not working

We have implemented Hazelcast as an embedded cache in our Spring Boot app, and need a way using which Hazelcast members within a "cluster group" can discover each other dynamically so that we dont have to provide possible IP address/port where Hazelcast might be running.
We came across this hazelcast plugin on github:
https://github.com/hazelcast/hazelcast-eureka which seems to provide the same feature using Eureka as discovery/registration tool.
As mentioned in this github documentation, hazelcast-eureka-one library is included within our boot app classpath, we also disabled TCP-IP & multicast discovery and added below discovery strategy in hazelcast.xml:
<discovery-strategies>
<discovery-strategy class="com.hazelcast.eureka.one.EurekaOneDiscoveryStrategy" enabled="true">
<properties>
<property name="self-registration">true</property>
<property name="namespace">hazelcast</property>
</properties>
</discovery-strategy>
</discovery-strategies>
Our application also provides configured EurekaClient, which is what we are autowiring and inject into this plugin implementation:
*
Config hazelcastConfig = new FileSystemXmlConfig(hazelcastConfigFilePath);
**EurekaOneDiscoveryStrategyFactory.setEurekaClient(eurekaClient);**
hazelcastInstance = Hazelcast.newHazelcastInstance(hazelcastConfig);
*
Problem:
We are able to start 2 instances of our spring boot app on same machine and we notice that each app is starting hazelcast instance embedded on separate port (5701, 5702). But it doesnt seem to recognize each other running within a cluster, this is what we see in app logs when 2nd instance is starting:
Members [1] {
Member [10.41.70.143]:5702 - 7c42eb24-3fa0-45cb-9394-17175cc92b9c this
}
17-12-13 12:22:44.480 WARN [main] c.h.i.Node.log(LoggingServiceImpl.java:168) - [10.41.70.143]:5702 [domain-services] [3.8.2] Config seed port is 5701 and cluster size is 1. Some of the ports seem occupied!
which seem to indicate that both hazelcast instances are running independently and doesnt recognize other running instance in a cluster/group.
Also, immediately after restart we see this exception thrown frequently on both the nodes:
*
java.lang.ClassCastException: com.hazelcast.nio.tcp.MemberWriteHandler cannot be cast to com.hazelcast.nio.ascii.TextWriteHandler
at com.hazelcast.nio.ascii.TextReadHandler.<init>(TextReadHandler.java:109) ~[hazelcast-3.8.2.jar:3.8.2]
at com.hazelcast.nio.tcp.SocketReaderInitializerImpl.init(SocketReaderInitializerImpl.java:89) ~[hazelcast-3.8.2.jar:3.8.2]
*
which seem to indicate there is Incompatibility between hazelcast library in the classpath?
It seems like your Eureka service returns the wrong ports. Hazelcast tries to connect to 8080 and other ports in the same range, whereas Hazelcast uses 5701. Not exactly sure why this happens but it feels like you requesting the wrong service name from Eureka which ends up returning the HTTP (Tomcat?!) ports instead of the separate Hazelcast service that should be registered.

Stopping a Running Spark Application

I'm running a Spark cluster in standalone mode.
I've submitted a Spark application in cluster mode using options:
--deploy-mode cluster –supervise
So that the job is fault tolerant.
Now I need to keep the cluster running but stop the application from running.
Things I have tried:
Stopping the cluster and restarting it. But the application resumes
execution when I do that.
Used Kill -9 of a daemon named DriverWrapper but the job resumes again after that.
I've also removed temporary files and directories and restarted the cluster but the job resumes again.
So the running application is really fault tolerant.
Question:
Based on the above scenario can someone suggest how I can stop the job from running or what else I can try to stop the application from running but keep the cluster running.
Something just accrued to me, if I call sparkContext.stop() that should do it but that requires a bit of work in the code which is OK but can you suggest any other way without code change.
If you wish to kill an application that is failing repeatedly, you may do so through:
./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>
You can find the driver ID through the standalone Master web UI at http://:8080.
From Spark Doc
Revisiting this because I wasn't able to use the existing answer without debugging a few things.
My goal was to programmatically kill a driver that runs persistently once a day, deploy any updates to the code, then restart it. So I won't know ahead of time what my driver ID is. It took me some time to figure out that you can only kill the drivers if you submitted your driver with the --deploy-mode cluster option. It also took me some time to realize that there was a difference between application ID and driver ID, and while you can easily correlate an application name with an application ID, I have yet to find a way to divine the driver ID through their api endpoints and correlate that to either an application name or the class you are running. So while run-class org.apache.spark.deploy.Client kill <master url> <driver ID> works, you need to make sure you are deploying your driver in cluster mode and are using the driver ID and not the application ID.
Additionally, there is a submission endpoint that spark provides by default at http://<spark master>:6066/v1/submissions and you can use http://<spark master>:6066/v1/submissions/kill/<driver ID> to kill your driver.
Since I wasn't able to find the driver ID that correlated to a specific job from any api endpoint, I wrote a python web scraper to get the info from the basic spark master web page at port 8080 then kill it using the endpoint at port 6066. I'd prefer to get this data in a supported way, but this is the best solution I could find.
#!/usr/bin/python
import sys, re, requests, json
from selenium import webdriver
classes_to_kill = sys.argv
spark_master = 'masterurl'
driver = webdriver.PhantomJS()
driver.get("http://" + spark_master + ":8080/")
for running_driver in driver.find_elements_by_xpath("//*/div/h4[contains(text(), 'Running Drivers')]"):
for driver_id in running_driver.find_elements_by_xpath("..//table/tbody/tr/td[contains(text(), 'driver-')]"):
for class_to_kill in classes_to_kill:
right_class = driver_id.find_elements_by_xpath("../td[text()='" + class_to_kill + "']")
if len(right_class) > 0:
driver_to_kill = re.search('^driver-\S+', driver_id.text).group(0)
print "Killing " + driver_to_kill
result = requests.post("http://" + spark_master + ":6066/v1/submissions/kill/" + driver_to_kill)
print json.dumps(json.loads(result.text), indent=4)
driver.quit()
https://community.cloudera.com/t5/Support-Questions/What-is-the-correct-way-to-start-stop-spark-streaming-jobs/td-p/30183
according this link use to stop if your master use yarn
yarn application -list
yarn application -kill application_id

Using memcached failover servers in nodejs app

I'm trying to set up a robust memcached configuration for a nodejs app with the node-memcached driver, but it does not seem to use the specified failover servers when one server dies.
My local experiment goes as follows:
shell
memcached -p 11212
node
MC = require('memcached')
c = new MC('localhost:11211', //this process does not exist
{failOverServers: ['localhost:11212']})
c.get('foo', console.log) //this will eventually time out
c.get('foo', console.log) //repeat 5 or 6 times to exceed the retries number
//wait until all the connection errors appear in the console
//at this point, the failover server should be in use
c.get('foo', console.log) //this still times out :(
Any ideas of what might we be doing wrong?
It seems that the failover feature is somewhat buggy in node-memcached.
To enable failover you must set the remove options:
c = new MC('localhost:11211', //this process does not exist
{failOverServers: ['localhost:11212'],
remove : true})
Unfortunately, this is not going to work because of the following error:
[depricated] HashRing#replaceServer is removed.
[depricated] the API has no replacement
That is, when trying to replace a dead server with a replacement from the failover list, node-memcached outputs a deprecation error from the HashRing library (which, in turn, is maintained by the same author of node-memcached). IMHO, feel free to open a bug :-)
This is come when your nodejs server not getting any session id from memcached
Please check properly in php.ini file you are setting properly or not for memcached
session.save = 'memcache'
session.path = 'tcp://localhost:11212'

Resources