I am new to dask, and have what appears to be a bit of an odd use-case, where I would like to set up a dask scheduler on a "bridging" machine with two network interfaces, such that clients can connect to one of the interfaces (the "front"), and workers will live on multiple machines connected to another interface (the "back"). The interfaces have separate IP addresses and hostnames.
Essentially, I want to do this picture where the brown and blue pieces have no route between them, except through the machine with the scheduler on it. (The picture is from some old documentation for dask distributed, 0.7 I think, when things were apparently less settled than now.)
Everything is 64-bit Linux (Debian 8 "jessie"), and I'm working with version 0.14.0 of dask and 1.16.0 of distributed, installed in an anaconda environment.
The dask-scheduler command-line tool does not seem to have a way to do more than one hostname, which I think is what I want.
I can get the effect I want by SSH port-forwarding.
For example, suppose the relevant interfaces are machines worker, scheduler-front, scheduler-back, and client. The two scheduler-* interfaces are different NICs on the same machine, and there is a TCP route from client to scheduler-front, and one from scheduler-back to worker, but there is no route from client to worker, from scheduler-front to worker, or from scheduler-back to client.
Then, the following works (the leading bit below is meant to be a command-line prompt indicating which machine the command is run on, with '#' meaning the shell, and '>>>' meaning Python):
First, start a scheduler listening on the "back" of the bridge host:
scheduler# dask-scheduler --host schedular-back
Second, start a worker and connect it to the scheduler in the ordinary way:
worker# dask-worker scheduler-back:8786
Third, forward localhost:8786 on the client to scheduler-back:8786 on the scheduler machine, ssh-ing in through the scheduler-front interface:
client# ssh -L 8786:scheduler-back:8786 scheduler-front
Finally, start up the client on the client machine, and connect to the near end of the forwarded port whose other end can see the scheduler.
client>>> from distributed import Client
client>>> cl = Client('127.0.0.1:8786')
client>>> ...
As I say, this works, I can do maps and gathers and get results.
But I can't help thinking that I'm over-doing it, and maybe I missed something simple that allows multi-homed schedulers. Private sub-nets aren't all that strange, they come up in the context of containers and clusters.
Is there a smarter way to do this?
In case it's of interest, the reason for not using the cluster queuing system is that the target "worker" machine is the one with a GPU, and we are having some difficulty getting the queuing system to allocate it properly, so at the moment, that machine is working outside of the queuing system. We will eventually solve that problem, but for now, we're trying to do this.
Also, for completeness, the reason for not having the client be on the scheduler machine is that, in our scenario, the client needs to do visualizations, and the scheduler is a cluster head-node that's in rack in the machine room and is not physically accessible to users.
If you don't specify any --host to dask-scheduler, it will listen on all interfaces by default. For instance:
$ dask-scheduler
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Scheduler at: tcp://192.168.1.68:8786
distributed.scheduler - INFO - http at: 0.0.0.0:9786
distributed.scheduler - INFO - bokeh at: 0.0.0.0:8788
distributed.bokeh.application - INFO - Web UI: http://127.0.0.1:8787/status/
distributed.scheduler - INFO - -----------------------------------------------
and:
$ netstat -tnlp | \grep 8786
tcp 0 0 0.0.0.0:8786 0.0.0.0:* LISTEN 23969/python
tcp6 0 0 :::8786 :::* LISTEN 23969/python
So you can then connect from the subnetwork you want, using the right IP (v4 or v6) address to contact the scheduler. Your workers might use tcp://192.168.1.68:8786 and your clients tcp://10.1.2.3:8786, for instance.
If you're willing to listen on more than one interface, but not all of them, however, this is not possible currently.
Related
Problem Description/Context
I have a nodeJS-based application using Axios to make HTTP requests (Outbound REST API calls) against a web service (say https://any.example.restapis.com). And these HTTP requests occasionally used to take > 1-minute latency. After some debugging - when we tried httpsAgent property to keep the HTTP connections live (persistent) it did the trick and now the APIs are taking < 1 second and the application is working OK. Basically, my understanding is with this property the TCP connections used by the HTTP calls are persistent now and the httpsAgent is opening multiple socket connections against the web service (i.e; it's keeping the connections alive based on default configs and opening multiple TCP connections based on the load as required - basically maintaining a pool of connections)
httpsAgent: new https.Agent({ keepAlive: true }),
Question
We are not yet sending the full traffic 100% to the micro-service (just 1%). So I would like to understand in detail what is happening underneath to make sure the fix is indeed complete and my micro-service will scale to full traffic.
So, can anyone please let me know after SSH into the pod's container how I can check if my node JS application is indeed making number of TCP (socket) connections against the web service rather than just using single TCP connection but keeping it alive (I tried to use netstat -atp command like below - however I'm not able to make the connection). So, it will great if anyone help me with how to check the number of TCP connections made by my micro-service.
// example cmd -
// Looking at cmds like netstat, lsof as they may (hoping!) give me details that I want!
netstat -atp | grep <my process ID>
In a microservices architecture, the number of server to server connections increases dramatically compared to alternative setups. Interactions which would traditionally have been an in-memory process in one application now often rely on remote calls to other REST based services over HTTP, meaning it is more important than ever to ensure these remote calls are both fast and efficient.
The netstat command is used to show network status.
# netstat -at : To list all tcp ports.
# netstat -lt : To list only the listening tcp ports.
It is used more for problem determination than for performance measurement. However, the netstat command can be used to determine the amount of traffic on the network to ascertain whether performance problems are due to network congestion.
Let me be more specific about my question with an example: Let's say that I have a slew of little servers that all start up on different ports using TCPv4. These ports are going to be destination ports, of course. Let's further assume that these little servers don't just start up at boot time like a typical server, but rather they churn dynamically based on demand. They start up when needed, and may shut themselves down for a while, and then later start up again.
Now let's say that on this same computer, we also have lots of client processes making requests to server processes on other computers via TCPv4. When a client makes such a request, it is assigned a source port by the OS.
Let's say for the sake of this example that a client processes makes a web request to a RESTful server running on a different computer. Let's also say that the source port assigned by the OS to this request is port 7777.
For this example let's also say that while the above request is still occurring, one of our little servers wants to start up, and it wants to start up on destination port 7777.
My question is will this cause a conflict? I.e., will the server get an error because port 7777 is already in use? Or will everything be fine because these two different kinds of ports live in different address spaces that cannot conflict with each other?
One reason I'm worried about the potential for conflict here is that I've seen web pages that say that "ephemeral source port selection" is typically done in a port number range that begins at a relatively high number. Here is such a web page:
https://www.cymru.com/jtk/misc/ephemeralports.html
A natural assumption for why source ports would begin at high numbers, rather than just starting at 1, is to avoid conflict with the destination ports used by server processes. Though I haven't yet seen anything that explicitly comes out and says that this is the case.
P.S. There is, of course, a potential distinction between what the TCPv4 protocol spec has to say on this issue, and what OSes actually do. E.g., perhaps the protocol is agnostic, but OSes tend to only use a single address space? Or perhaps different OSes treat the issue differently?
Personally, I'm most interested at the moment in what Linux would do.
The TCP specification says that connections are identified by the tuple:
{local addr, local port, remote addr, remote port}
Based on this, there theoretically shouldn't be a conflict between a local port used in an existing connection and trying to bind that same port for a server to listen on, since the listening socket doesn't have a remote address/port (these are represented as wildcards in the spec).
However, most TCP implementations, including the Unix sockets API, are more strict than this. If the local port is already in use in any existing socket, you won't be able to bind it, you'll get the error EADDRINUSE. A special exception is made if the existing sockets are all in TIME_WAIT state and the new socket has the SO_REUSEADDR socket option; this is used to allow a server to restart while the sockets left over from a previous process are still waiting to time out.
For this reason, the port range is generally divided into ranges with different uses. When a socket doesn't bind a local port (either because it just called connect() without calling bind(), or by specifying IPPORT_ANY as the port in bind()), the port is chosen from the ephemeral range, which is usually very high numbered ports. Servers, on the other hand, are expected to bind to low-numbered ports. If network applications follow this convention, you should not run into conflicts.
arangod is running for some time without any problems, but at some point no more connections can be made.
aranogsh then shows the following error message:
Error message 'Could not connect to 'tcp://127.0.0.1:8529' 'connect() failed with #99 - Cannot assign requested address''
In the log file arangod still writes more trace information.
After restarting aranogd it is running without problems again, until the problem suddenly reoccurs.
Why is this happening?
Since this question was sort of answered by time, I'll use this answer to elaborate howto dig into such a situation and to get a valuable analysis on which operating system parameters to look. I'll base this on linux targets.
First we need to find out whats currently going on using the netstat tool as a root user (we care for tcp ports only):
netstat -alnpt
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
...
tcp 0 0 0.0.0.0:8529 0.0.0.0:* LISTEN 3478/arangod
tcp 0 0 127.0.0.1:45218 127.0.0.1:8529 ESTABLISHED 6902/arangosh
tcp 1 0 127.0.0.1:46985 127.0.0.1:8529 CLOSE_WAIT 485/arangosh
We see an overview of the 3 possible value groups:
LISTEN: These are daemon processes offering tcp services to remote ends, in this case the arangod process with its server socket. It binds port 8529 on all available ipv4 addresses of the system (0.0.0.0) and accepts connections from any remote location (0.0.0.0:*)
ESTABLISHED: this is an active tcp connection in this case between arangosh and arangod; Arangosh has its client port (45218) in the higher range connecting arangod on port 8529.
CLOSE_WAIT: this is a connection in termination state. Its normal to have them. The TCP stack of the operating system keeps them around for a while to have a knowledge where to sort in stray TCP-packages that may have been sent, but did not arive on time.
As you see TCP ports are 16 bits unsigned integers, ranging from 0 to 65535. Server sockets start from the lower end, and most operating systems require processes to be running as root to bind ports below 1024. Client sockets start from the upper end and range down to a specified limit on the client. Since multiple clients can connect one server, while the server port range seems narrow, its usually the client side ports that wear out. If the client frequently closes and reopens the connection you may see many sockets in CLOSE_WAIT state, as many discussions across the net hint, these are the symptoms of your system eventually running out of resources. In general the solution to this problem is to to re-use existing connections through the keepalive feature.
As the solaris ndd command explains thoroughly which parameters it may modify with which consequences in the solaris kernel, the terms explained there are rather generic to tcp sockets, and may be found on many other operating systems in other ways; in linux - which we focus on here - through the /proc/sys/net-filesystem.
Some valuable switches there are:
ipv4/ip_local_port_range This is the range for the local sockets. You can try to narrow it, and use arangob --keep-alive false to explore whats happening if your system runs out of these.
time wait (often shorted to tw) is the section that controls what the TCP-Stack should do with already closed sockets in CLOSE_WAIT state. The Linux kernel can do a trick here - it can instantly re-use connections in that state for new connections. Vincent Bernat explains very nicely which screws to turn and what the differnt parameters in the kernel mean.
So once you decided to change some of your values in /proc so your host better scales to the given situation, you need to make them reboot persistant - since /proc is volatile and won't remember values across reboots.
Most linux systems therefore offer the /etc/sysctl.[d|conf] file; It maps slashes in the proc filesystem to dots, so /proc/sys/net/ipv4/tcp_tw_reuse will translate into net.ipv4.tcp_tw_reuse.
I'm trying to deploy Cassandra on a small (test) Mesos cluster. I have one master node (say 10.10.10.1) and three worker nodes: 10.10.10.2-4.
On the official site of apache mesos there is a link to cassandra framework developed for mesos (it is here: https://github.com/mesosphere/cassandra-mesos).
I'm following the tutorial that they provide there. In step 3 they are saying I should edit the conf/mesos.yaml file, specifically that I should set mesos.master.url so that it points to the master node (on which I also have the conf file).
The first thing I tried was just to replace localhost by the master node ip, so I had
mesos.master.url: 'zk://10.10.10.1:2181/mesos'
but when I then started the deployment script (by running bin/cassandra-mesos as they say in point 5 I should) I get the following error:
2015-02-24 09:18:24,262:12041(0x7fad617fa700):ZOO_ERROR#handle_socket_error_msg#1697: Socket [10.10.10.1:2181] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
It keeps retrying and displays the same error until I terminate it.
I tried removing 'zk' or replacing it with 'mesos' in the URL, changing (or removing altogether) the port removing the 'mesos' word in the URL but I keep getting the same error.
I also tried looking at how other frameworks do it (specifically spark, which I am hoping to deploy next) but didn't find anything helpful. Any ideas how to run it? Thanks!
The URL provided to mesos.master.url is passed directly to the underlying Mesos Native Java Library. The format listed in your example looks correct.
Next steps in debugging the connection issue would be to verify the IP address the ZooKeeper sever has bound to. You can find out by running sudo netstat -ntplv | grep 2181 on the server that is running ZooKeeper.
I would expect to see something like the following:
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 3957/java
Another possibility could be that ZooKeeper is binding specifically to localhost:
tcp 0 0 127.0.0.1:2181 0.0.0.0:* LISTEN 3957/java
If ZooKeeper has bound to localhost a client will only be able to connect to it with the URL zk://127.0.0.1:2181/mesos
A note about the future of the Cassandra Mesos Framework.
I am one of the developers working on rewriting the cassandra-mesos project to be more robust, stable and easier to run. The code in current master(6aa82acfac) is end-of-life and will be replaced within the next couple of weeks with the code that is in the rewrite branch.
If you would like to try out the latest build of the rewrite branch a marathon.json for running the framework can be found here. After downloading the marathon.json update the values for MESOS_ZK and CASSANDRA_ZK (and any resource values you want to update) then POST the json to marathon at /v2/apps.
If you have one master and no zk, how about setting the mesos.master.url to 10.10.10.1:5050 (where 5050 is the default mesos-master port)?
If Ben is right and the Cassandra framework otherwise requires ZK for its own persistence/HA, then try disabling that feature if possible. Otherwise you may have to rip the ZK code out yourself and recompile, if you really want a ZK-free setup (consequently without any HA features).
I am trying to run a waveform with components operating on distinct machines. That is, I want A->B where component A runs on the GPP on machine 1 and component B runs on the GPP on machine 2. The CORBA nameserver on system A is visible in REDHAWK on system B, but I cannot access remote devices or components when I run a waveform.
How can I make the devices on one machine available to REDHAWK running on another?
Thanx for your assistance!
-jerhill
The essential thing for spreading REDHAWK components and devices across multiple machines is making sure that your CORBA communication works correctly between the machines. This usually amounts to configuring /etc/omniORB.cfg correctly. First, on one machine, you should have omniNames and omniEvents running, and setup your config per section 2.6 of the documentation. For reference:
InitRef = NameService=corbaname::127.0.0.1
InitRef = EventService=corbaloc::127.0.0.1:11169/omniEvents
On the second machine, your InitRef's must point to the first machine. If the first machine was 192.168.1.100, then your second machine's config could contain:
InitRef = NameService=corbaname::192.168.1.100
InitRef = EventService=corbaloc::192.168.1.100:11169/omniEvents
You should be able to verify this is working correctly on the second machine with:
$ nameclt list
The next issue you need to tackle is making sure that CORBA objects are listening on the appropriate network interfaces, and are publishing information in their IORs that allows them to be reached. In each of your config files, I recommend you add a line to tell omniORB what endpoint CORBA objects created on that machine should listen on. For example, on your first machine:
endPoint = giop:tcp:192.168.1.100:
endPoint = giop:unix:
This tells omniORB that CORBA objects should listen on a TCP port of their choosing on 192.168.1.100. It also adds a Unix pipe for fast access by objects on the same machine. omniORB will publish this information in the IOR for that object. What you choose here is important - if you use an IP that other machines can't reach, or use a hostname that other machines can't resolve then CORBA connections will fail.
After you've configured the endPoint setting on both machines, you may find it useful to inspect the information contained in your IORs. If you can access the naming service then you can retrieve IORs for your objects. For example, if you had a domain named 'REDHAWK_DEV' running, you can get the domain manager's IOR via:
$ nameclt resolve REDHAWK_DEV/REDHAWK_DEV
Then, feed the IOR to catior:
$ catior IOR:012345...
catior will decypher the IOR for you and show you what address and port a client would connect to.
Based on the fact that programs on B can see the name service on A, then I assume that the problem relates to Device/Device Manager configuration.
Make sure that the Device Manager on B meets these criteria:
the id attribute of the deviceconfiguration element of the dcd.xml file is unique
the id attribute of the GPP's componentinstantiation element on the dcd.xml file is unique
the name attribute of the namingservice element on the dcd.xml file is the Domain you are trying to connect to (of the form DomainName/DomainName)
you do not have a Domain Manager running on B that has a colliding name with the Domain Manager on A (an error should occur if you do that)
If these criteria are met and your system still does not work, please post the stdout from running nodeBooter from command-line for both the Device Manager that's not registering and the Domain Manager you're trying to register with.