apache commons ftp using new socket every time? - ftp-client

The Apache Commons FTPClient creates calls openDataConnection everytime i.e for every command it uses a separated socket.
Which means many ports are used for data transfer? Because of this sometimes i am getting into SockeReadTimeOutException which results because some Timed_Waiting ports are being used.
Not able to understand why don't a single port used for data transfer.Which consumes less memory and less stress on system. Any Advice??

If this aspect is important to you, you may search another library. If your system allows secured file transfer (SFTP), have a look to: JSch.
I did not check the code but it might work differently compared to FTPClient and might not open a socket for each command.

Related

Node.js GPS device tracking performance considerations

Using node.js as a tcp server, I am going to manage relatively large number of GPS devices( ~3000 device ) and as first step just going to store incoming data in database, but even in this phase i envision some performance issues which bothers me and I'd like to caught them before they bite me.
1 - Looking at written similar servers using languages like java or ruby I see some code like the following:
java
Thread serverThread = new Thread(() -> {
System.out.println("Listening to server port 9000");
while (true) {
try {
Socket socket = serverSocket.accept();
...
ruby
require 'socket'
server = TCPServer.new ("127.0.0.1",8080)
loop do
Thread.start(server.accept) do |client|
...
Which seems they gives separate thread to every device(socket) which get connected to tcp server? As node.js is single-threaded and acts asynchronously, should i be concerned about incoming connections or something like the following simple approach will satisfy large number of simultaneous connections?
net.createServer(function(device) {
device.on('data', function(data) {
// parse data
// store in database
});
});
2 - Should I confine database connections using connection pool? As database also query from the other side for GIS and monitoring, how much the pool size should be?
3 - How could I benefit caching( for example using redis ) in such system?
It should be great if someone sheds some light on this thoughts. I also willingly would like to hear any other performance thoughts you might be also experiencing or aware of in implementing such systems. Thanks.
Choosing among the options you have listed I would say NodeJS is actually a better option for your use case because it does not use one thread per connection like the other two options. Threads are normally a finite resource on a given machine. Java and Ruby do have 'evented' servers though and these are worth looking at if you want an apples to apples comparison.
I think you need to say more about the database you intend to use if you want advice on connection pooling. However reusing connections if they are costly to setup would be a good thing to do. It is probably a good idea to have the facility to configure the minimum and maximum size of the pool. Ultimately the correct size to use is a matter of testing.
I think the benefit of caching in this system would be minimal as you are mostly writing data. If the data is valuable you will want to write it to disk rather than memory. On the other hand, if you have clients that are reading the data collected perhaps caching their reads in something like Redis might be a good idea.
I'm sure you're aware, but this sounds like you're trying to prematurely optimize your application here.
1- Node being event-driven and non-blocking makes it a perfect candidate for holding a large number of open socket connections, no need for forking per connection. As always though, make sure your application is properly clustered. I was able to hold ~100k open TCP sockets on a dirt cheap laptop. If the number of device you need to support ever grows beyond that, just scale accordingly.
2- I saw you were planning on using postgres. Pools are always a good thing.
3- Caching is useful for 'hot' data. Stuff that gets queried a lot, and therefore having it in memory or inside redis (in-memory storage) makes these data lookups faster and removes strain on the system. In your case, if you just need to get certain chunks of data, for analytics or for more causal use, I would recommend spark or solr as opposed to a plain caching layer. It's also going to be much cheaper and easier to maintain.

Weird Tomcat outage, possibly related to maxConnections

In my company we experienced a serious problem today: our production server went down. Most people accessing our software via a browser were unable to get a connection, however people who had already been using the software were able to continue using it. Even our hot standby server was unable to communicate with the production server, which it does using HTTP, not even going out to the broader internet. The whole time the server was accessible via ping and ssh, and in fact was quite underloaded - it's normally running at 5% CPU load and it was even lower at this time. We do almost no disk i/o.
A few days after the problem started we have a new variation: port 443 (HTTPS) is responding but port 80 stopped responding. The server load is very low. Immediately after restarting tomcat, port 80 started responding again.
We're using tomcat7, with maxThreads="200", and using maxConnections=10000. We serve all data out of main memory, so each HTTP request completes very quickly, but we have a large number of users doing very simple interactions (this is high school subject selection). But it seems very unlikely we would have 10,000 users all with their browser open on our page at the same time.
My question has several parts:
Is it likely that the "maxConnections" parameter is the cause of our woes?
Is there any reason not to set "maxConnections" to a ridiculously high value e.g. 100,000? (i.e. what's the cost of doing so?)
Does tomcat output a warning message anywhere once it hits the "maxConnections" message? (We didn't notice anything).
Is it possible there's an OS limit we're hitting? We're using CentOS 6.4 (Linux) and "ulimit -f" says "unlimited". (Do firewalls understand the concept of Tcp/Ip connections? Could there be a limit elsewhere?)
What happens when tomcat hits the "maxConnections" limit? Does it try to close down some inactive connections? If not, why not? I don't like the idea that our server can be held to ransom by people having their browsers on it, sending the keep-alive's to keep the connection open.
But the main question is, "How do we fix our server?"
More info as requested by Stefan and Sharpy:
Our clients communicate directly with this server
TCP connections were in some cases immediately refused and in other cases timed out
The problem is evident even connecting my browser to the server within the network, or with the hot standby server - also in the same network - unable to do database replication messages which normally happens over HTTP
IPTables - yes, IPTables6 - I don't think so. Anyway, there's nothing between my browser and the server when I test after noticing the problem.
More info:
It really looked like we had solved the problem when we realised we were using the default Tomcat7 setting of BIO, which has one thread per connection, and we had maxThreads=200. In fact 'netstat -an' showed about 297 connections, which matches 200 + queue of 100. So we changed this to NIO and restarted tomcat. Unfortunately the same problem occurred the following day. It's possible we misconfigured the server.xml.
The server.xml and extract from catalina.out is available here:
https://www.dropbox.com/sh/sxgd0fbzyvuldy7/AACZWoBKXNKfXjsSmkgkVgW_a?dl=0
More info:
I did a load test. I'm able to create 500 connections from my development laptop, and do an HTTP GET 3 times on each, without any problem. Unless my load test is invalid (the Java class is also in the above link).
It's hard to tell for sure without hands-on debugging but one of the first things I would check would be the file descriptor limit (that's ulimit -n). TCP connections consume file descriptors, and depending on which implementation is in use, nio connections that do polling using SelectableChannel may eat several file descriptors per open socket.
To check if this is the cause:
Find Tomcat PIDs using ps
Check the ulimit the process runs with: cat /proc/<PID>/limits | fgrep 'open files'
Check how many descriptors are actually in use: ls /proc/<PID>/fd | wc -l
If the number of used descriptors is significantly lower than the limit, something else is the cause of your problem. But if it is equal or very close to the limit, it's this limit which is causing issues. In this case you should increase the limit in /etc/security/limits.conf for the user with whose account Tomcat is running and restart the process from a newly opened shell, check using /proc/<PID>/limits if the new limit is actually used, and see if Tomcat's behavior is improved.
While I don't have a direct answer to solve your problem, I'd like to offer my methods to find what's wrong.
Intuitively there are 3 assumptions:
If your clients hold their connections and never release, it is quite possible your server hits the max connection limit even there is no communications.
The non-responding state can also be reached via various ways such as bugs in the server-side code.
The hardware conditions should not be ignored.
To locate the cause of this problem, you'd better try to replay the scenario in a testing environment. Perform more comprehensive tests and record more detailed logs, including but not limited:
Unit tests, esp. logic blocks using transactions, threading and synchronizations.
Stress-oriented tests. Try to simulate all the user behaviors you can come up with and their combinations and test them in a massive batch mode. (ref)
More specified Logging. Trace client behaviors and analysis what happened exactly before the server stopped responding.
Replace a server machine and see if it will still happen.
The short answer:
Use the NIO connector instead of the default BIO connector
Set "maxConnections" to something suitable e.g. 10,000
Encourage users to use HTTPS so that intermediate proxy servers can't turn 100 page requests into 100 tcp connections.
Check for threads hanging due to deadlock problems, e.g. with a stack dump (kill -3)
(If applicable and if you're not already doing this, write your client app to use the one connection for multiple page requests).
The long answer:
We were using the BIO connector instead of NIO connector. The difference between the two is that BIO is "one thread per connection" and NIO is "one thread can service many connections". So increasing "maxConnections" was irrelevant if we didn't also increase "maxThreads", which we didn't, because we didn't understand the BIO/NIO difference.
To change it to NIO, put this in the element in server.xml:
protocol="org.apache.coyote.http11.Http11NioProtocol"
From what I've read, there's no benefit to using BIO so I don't know why it's the default. We were only using it because it was the default and we assumed the default settings were reasonable and we didn't want to become experts in tomcat tuning to the extent that we now have.
HOWEVER: Even after making this change, we had a similar occurrence: on the same day, HTTPS became unresponsive even while HTTP was working, and then a little later the opposite occurred. Which was a bit depressing. We checked in 'catalina.out' that in fact the NIO connector was being used, and it was. So we began a long period of analysing 'netstat' and wireshark. We noticed some periods of high spikes in the number of connections - in one case up to 900 connections when the baseline was around 70. These spikes occurred when we synchronised our databases between the main production server and the "appliances" we install at each customer site (schools). The more we did the synchronisation, the more we caused outages, which caused us to do even more synchronisations in a downward spiral.
What seems to be happening is that the NSW Education Department proxy server splits our database synchronisation traffic into multiple connections so that 1000 page requests become 1000 connections, and furthermore they are not closed properly until the TCP 4 minute timeout. The proxy server was only able to do this because we were using HTTP. The reason they do this is presumably load balancing - they thought by splitting the page requests across their 4 servers, they'd get better load balancing. When we switched to HTTPS, they are unable to do this and are forced to use just one connection. So that particular problem is eliminated - we no longer see a burst in the number of connections.
People have suggested increasing "maxThreads". In fact this would have improved things but this is not the 'proper' solution - we had the default of 200, but at any given time, hardly any of these were doing anything, in fact hardly any of these were even allocated to page requests.
I think you need to debug the application using Apache JMeter for number of connection and use Jconsole or Zabbix to look for heap space or thread dump for tomcat server.
Nio Connector of Apache tomcat can have maximum connections of 10000 but I don't think thats a good idea to provide that much connection to one instance of tomcat better way to do this is to run multiple instance of tomcat.
In my view best way for Production server: To Run Apache http server in front and point your tomcat instance to that http server using AJP connector.
Hope this helps.
Are you absolutely sure you're not hitting the maxThreads limit? Have you tried changing it?
These days browsers limit simultaneous connections to a max of 4 per hostname/ip, so if you have 50 simultaneous browsers, you could easily hit that limit. Although hopefully your webapp responds quickly enough to handle this. Long polling has become popular these days (until websockets are more prevalent), so you may have 200 long polls.
Another cause could be if you use HTTP[S] for app-to-app communication (that is, no browser involved). Sometimes app writers are sloppy and create new connections for performing multiple tasks in parallel, causing TCP and HTTP overhead. Double check that you are not getting an inflood of requests. Log files can usually help you on this, or you can use wireshark to count the number of HTTP requests or HTTP[S] connections. If possible, modify your API to handle multiple API calls in one HTTP request.
Related to the last one, if you have many HTTP/1.1 requests going across one connection, and intermediate proxy may be splitting them into multiple connections for load balancing purposes. Sounds crazy I know, but I've seen it happen.
Lastly, some crawl bots ignore the crawl delay set in robots.txt. Again, log files and/or wireshark can help you determine this.
Overall, run more experiments with more changes. maxThreads, https, etc. before jumping to conclusions with maxConnections.

Linux: need to design pre-fetcher to cache files from NAS into system memory

I am designing a server for the following scenario:
a series of single images are stored on a NAS, lets say 100 of them
a client connects to the server over TCP socket and requests image39
server reads image39 from NAS and sends back to client over socket
it is likely that the client will also request other images from the series, so:
I would like to launch a thread that iterates through the images, reads them, and does a cat image39 > /dev/null to force cache into memory on server
thread will fetch images as follows: image38, image40, image37, image41, etc.
already fetched images are ignored
if client now requests image77, I want to reset the fetch thread to fetch: image76, image78, etc.
This has to scale to many series and clients. Probably on the order of 1000 concurrent
prefetches. I understand that threads can cause performance hit if there are too many. Would it be better to fork a new process instead? Is there a more efficient way than threads or processes ?
Thanks!!!
This is premature optimization. Try implementing your system without tricks to "force" the cache, and see how it works. I bet it'll be fine--and you won't then need to worry about nasty surprises if it turns out your tricks don't play nice with other things on the system.

PUB/SUB with short-lived publisher and long-lived subscribers

Context: OS: Linux (Ubuntu), language: C (actually Lua, but this should not matter).
I would prefer a ZeroMQ-based solution, but will accept anything sane enough.
Note: For technical reasons I can not use POSIX signals here.
I have several identical long-living processes on a single machine ("workers").
From time to time I need to deliver a control message to each of processes via a command-line tool. Example:
$ command-and-control worker-type run-collect-garbage
Each of workers on this machine should receive a run-collect-garbage message. Note: it would be perfect if the solution would somehow work for all workers on all machines in the cluster, but I can write that part myself.
This is easily done if I will store some information about running workers. For example keep the PIDs for them in a known location and open a control Unix domain socket on a known path with a PID somewhere in it. Or open TCP socket and store host and port somewhere.
But this would require careful management of the stored information — e.g. what if worker process suddenly dies? (Nothing unmanageable, but, still, extra fuss.) Also, the information needs to be stored somewhere, thus adding an extra bit of complexity.
Is there a good way to do this in PUB/SUB style? That is, workers are subscribers, command-and-control tool is a publisher, and all they know is a single "channel url", so to say, on which to come for messages.
Additional requirements:
Messages to the control channel must wake up workers from the poll (select, whatever)
loop.
Message delivery must be guaranteed, and it must reach each and every worker that is listening.
Worker should have a way to monitor for messages without blocking — ideally by the poll/select/whatever loop mentioned above.
Ideally, worker process should be "server" in a sense — he should not bother about keeping connections to the "channel server" (if any) persistent etc. — or this should be done transparently by the framework.
Usually such a pattern requires a proxy for the publisher, i.e. you send to the proxy which immediately accepts delivery and then that reliably forwads to the end subscriber workers. The ZeroMQ guide covers a few different methods of implementing this.
http://zguide.zeromq.org/page:all
Given your requirements, Steve's suggestion does seem the simplest: run a daemon which listens on two known sockets - the workers connect to that and the command tool pushes to it which redistributes to connected workers.
You could do something complicated that would probably work, by effectively nominating one of the workers. For example, on startup workers attempt to bind() a PUB ipc:// socket somewhere accessible, like tmp. The one that wins bind()s a second IPC as a PULL socket and acts as a forwarder device on top of it's normal duties, the others connect() to the original IPC. The command line tool connect()s to the second IPC, and pushes it's message. The risk there is that the winner dies, leaving a locked file. You could identify this in the command line tool, rebind then sleep (to allow the connections to be established). Still, that's all a little bit complex, I think I'd go with a proxy!
I think what you're describing would fit well with a gearmand/supervisord implementation.
Gearman is a great task queue manager and supervisord would allow you to make sure that the process(es) are all running. It's TCP based too so you could have clients/workers on different machines.
http://gearman.org/
http://supervisord.org/
I recently set something up with multiple gearmand nodes, linked to multiple workers so that there's no single point of failure
edit: Sorry - my bad, I just re-read and saw that this might not be ideal.
Redis has some nice and simple looking pub/sub functionality that I've not used yet but sounds promising.
Use a mulitcast PUB/SUB. You'll have to make sure the pgm option is compiled into your ZeroMQ distribution (man 7 zmq_pgm).

Keeping FTP control connection alive

A while back I asked a question regarding keeping the control connection on an FTP session alive during a large transfer. Although I though I had success after implementing a solution for a question I'd already asked, it appears as though the ISP is the problem, i.e. they are causing my control connections to die during large transfers.
Interestingly, the old-school FTP client program "Leap-FTP" gets around this issue by just sending 'NOOP' commands to the server on the control connection during a download. While other popular clients die during transfers (Filezilla, my Python FTP script), LeapFTP runs strong due to this workaround.
I've done some research into threading and Queue, but am having trouble coming up with the code to make this happen.
The solution seems simple enough (in my head, at least): initiate a download, while that download function runs, send a NOOP command every n seconds. Stop sending the NOOP command after the download function completes.
I'm hoping that someone can give me a suggestion as to how this might be done. Will it involve the use of threading, Queue, or is there a more simple solution?
Bottom line is, after a lot of testing, the 'NOOP' command is going to have to be sent during the large downloads (which take place on high-numbered TCP ports).
Thanks!
In order to handle multiple sockets at one time in a single program, you can use the select function instead of threads. This is either simpler or more complicated, depending on your programming experience.
I find threads are usually simple but when something does go wrong debugging it is a real pain, while writing the code for socket multiplexing using select is more complex but less difficult to debug than threads.
The basics of using select is that you set up your sockets and call the select function. It will tell you which sockets are ready to read or write. Then you check the time. If it's been X seconds since your last NOOP, send one on the control socket. If the transfer socket is ready to read or write, handle it. If the control socket is ready to read, read it and check for NOOP response, error messages, control channel being closed, etc.
Since you don't care (much, anyway) about performance in this case, it's probably easiest to use a separate thread for it that sits in a loop simply sleeps for N seconds, checks to see if it's been cancelled, and if not sends a NOP and sleeps again.
If you are running on a Unix, it would be just as efficient to have the control connection program open the sockets for a transfer and then spawn a new process to do the transfer. That would leave the control program ready to wait for completion, send NOOP commands, or even start new transfers if the FTP server can support it.
That is sort of how the original FTP model was supposed to work and the reason it uses a control connection and separate data connections instead of the HTTP model with control and data mixed together.

Resources