Peers reported but do not connect on a self-hosted bittorrent tracker - bittorrent

I'm at a loss. I am trying to set up a private tracker for a friend (to distribute his content). He has a VPS, so I thought I would just install opentracker and be done with it. However, I've encountered a problem, or rather a multitude of problems:
NB: in all that follows the trackers are open and do not contain whitelists or such. In all the cases the clients get the correct number of peers. In all the cases, one of the peers has the file the other tries to download it.
#############################################
[2 Peers on same LAN]
Situation 1: Tracker is on server external to LAN
Public tracker (first google search) => hours of wait and nothing
Self hosted tracker on VPS (tried peertracker, bittornado/bttrack, opentracker) => hours of wait and nothing except one time with opentracker when it spontaneously transmitted the file after some wait time.
Situation 2: Tracker is on server internal to LAN on a third computer:
opentracker on a third PC on LAN => hours of wait and nothing
Situation 3: Tracker is on the seeding computer:
bttrack (bittornado) on seeding computer => half an hour of wait and then spontaneously transmitted.
opentracker on seeding computer => hours of wait and nothing
Situation 4: Tracker is on receiving computer:
opentracker on receiving computer => hours of wait and nothing.
[2 Peers on different LANs]
Situation 1: Tracker is on server external to both LANs
Public tracker (first google search) => hours of wait and nothing
Self hosted tracker on VPS (tried peertracker, bittornado/bttrack, opentracker) => hours of wait and nothing
#############################################
The clients used are Transmission and Ktorrent.
I tried dissecting the communication using Wireshark. The response to the GET request seems to vary randomly at different parts of the day for the same setup. Sometimes the peers12 contains nothing. Sometimes it contains something like \177\000\000\001 which is obviously not my IP. Sometimes it transforms into peers18 and contains someting weird. One time it just spontaneously started responding with peers6.
I tried placing the servers behind a reverse proxy and not. I've tried adding the IP to the request both in-client (activating the option to use it in opentracker) and in-nginx by rewriting the request. But when I tried it on the same LAN and it didn't work I realised that was not the problem.
To resume, from the clients' perspective, the amount of peers is right but it connects to none of them and the clients do not show them in the lists of peers. Sometimes though it starts to work spontaneously (2 times only on same LAN in more than 50 hours of testing). I think I'm missing something trivial here.
If anyone has any idea... please go ahead. I can setup the VPS as a playground for a bit so that I can try out any solutions anyone has.

The problem appears to be with the main computer I used to run tests. Turning on uTP allowed local connections. However it did not allow me to seed to an outside computer. Another computer on the same LAN however had no problem doing that.
So, the problem is client-side and non-reproducible on another computer it would seem.

Related

MVC5 controller (POST) being called twice (once a week)

I have a C# WebApp MVC5. Everything usually works perfectly, users create invoices every minute, there are 10 users making invoices concurrently in different locations and different machines.
The issue happens once a week.
In the logs, I can see the post is called twice at the same time by the same user, I see some network lag on the client-side when this happens, but I'm not able to reproduce it, even using the network utility of chrome DevTools to simulate network lag.
Of course, I can add some business validation before persisting the data into the database in order to avoid duplicate data, but that's not the real issue.
I've read on the internet it would be because IIS Http2 is enabled and should be disabled, so I've done that a couple of weeks ago, but the error is still occurring.
This is not an issue of an "unintentional double click on a button", I'm pretty sure is not because I make sure to disable the button once it is clicked and enabled back once the server returns a response.
See the logs: the first one takes 9002ms to completes while the second one takes 444ms. That's the network lag I've identified so far because this post usually takes less than a second to completes.
2021-09-22 16:21:41 167.86.95.177 POST /Sales/Invoices/Save - 443 jnamicela 45.225.105.89 Mozilla/5.0+(Windows+NT+6.3;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/93.0.4577.82+Safari/537.36 https://xpertdynamics.com/Home/Index 200 0 1236 9002
2021-09-22 16:21:41 167.86.95.177 POST /Sales/Invoices/Save - 443 jnamicela 45.225.105.89 Mozilla/5.0+(Windows+NT+6.3;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/93.0.4577.82+Safari/537.36 https://xpertdynamics.com/Home/Index 200 0 0 444
It's solved. It was an issue on client-side. Basically they have unstable internet connection. When they click on the 'save' button and in the middle of the process they unspecteclly lose internet connection, the jquery.post will go directly to the post.fail, but the request was successfully sent to the server, it is just the browser that doesn't know it because internet connection was lost. So the user clicks on the 'save' button once again.
I just included a validation step before calling jquery.post. It is: check for internet connection using navigator.onLine, if yes, then check for the user session is still alive. if(true && true) then call jquery.post.
I've been monitoring since 3 weeks ago, and the error never happened again.

What should my expectations be regarding IPFS response times?

IPFS-SERVER
I have a go-ipfs daemon, configured with the standard ipfs "server" profile, running on a linux server, hosted by a large cloud provider .
IPFS-CLIENT
I have an go-ipfs daemon, configured with the ipfs "default" profile, running on a Windows 10 laptop at my SOHO, behind NAT.
Observation #1
When I "publish" via CLI or API (ipfs name publish...) the multihash of a small text file from "IPFS-SERVER" the command takes about 120 seconds to 150 seconds to complete.
When I "cat" via CLI or API (ipfs cat /ipns/multihash) from "IPFS-CLIENT" the command takes about 60 seconds to 120 seconds to complete.
Questions
Are these the typical or expected response times for these commands?
Are there tweaks that can be made to the ipfs config on the client and/or server to reduce these response times?
Observation #2
When I use the same setup but with a "private swarm" the response times are almost instantaneous.
Things I've Tried
I've tried adding "IPFS-SERVER" to "IPFS-CLIENT" bootstrap list with
no improvement
I've tried a "swarm connect" from "IPFS-CLIENT" to
"IPFS-SERVER" with no improvement
I suspect being part of the "public swarm" comes with this performance hit as the DHT is larger and so takes longer to parse? Or is there some other mechanism at play here? - Thank You!!!
First thing up is that you're measuring IPNS response time and not IPFS response times. There are some tradeoffs regarding the mutability property of IPNS that cause it to be slower than immutable IPFS.
I suspect being part of the "public swarm" comes with this performance hit as the DHT is larger and so takes longer to parse? Or is there some other mechanism at play here?
Yes, the reason the public swarm search takes longer is because of the DHT's performance. As of go-ipfs v0.5.0 the DHT algorithm is much more performant, however the properties of the DHT depend on its members and many of them are still pre v0.5.0. As more people upgrade (or if there is some version bump to the DHT protocol the effectively forks away from old) things should improve.
Are these the typical or expected response times for these commands?
Your measurements seem on the high end (I average about 30 seconds for IPNS Publish/Resolve, and 2 minutes in the outlier cases), but I'm not surprised by them. Note: the time to do ipfs cat /ipfs/Hash should be much faster than ipfs cat /ipns/Hash (unless you are running IPNS over PubSub and the publisher of /ipns/Hash has the data it references, e.g. /ipfs/Hash)
Are there tweaks that can be made to the ipfs config on the client and/or server to reduce these response times?
If you enable IPNS over PubSub --enable-namesys-pubsub on both the SERVER and CLIENT your search times should DRASTICALLY improve. As a bonus, IPNS over PubSub (as of go-ipfs v0.5.0) will get even faster if you happen to already be connected to someone else who has the IPNS record (e.g. the publisher or another IPNS over PubSub subscriber who has previously fetched that record).
If you don't want to enable IPNS over PubSub you can also modify the settings for ipfs name resolve, such as setting --dht-record-count to a low number (e.g. 1 if you're not so picky about finding the latest version, or if the data updates infrequently) or setting --stream if you're ok getting the latest records as you discover them.

What are some good settings for seeding a ton of torrents? (>10000)

I'm running into a lot of trouble when trying to seed a lot of torrents( > 10k) with libtorrent.
They include:
Choking my network connection
Tracker requests timing out(libtorrent tracker error)
When using auto-manage(they go from checking to seeding very slowly, even when my active_seeding is set to unlimited.
I used to let them be automanaged, but I'd find that it makes nearly all of them unavailable.
Here are my current settings:
sessionSettings.setActiveDownloads(5);
sessionSettings.setActiveLimit(-1);
sessionSettings.setActiveSeeds(-1);
sessionSettings.setActiveDHTLimit(5);
sessionSettings.setPeerConnectTimeout(25);
sessionSettings.announceDoubleNAT(true);
sessionSettings.setUploadRateLimit(0);
sessionSettings.setDownloadRateLimit(0);
sessionSettings.setHalgOpenLimit(5);
sessionSettings.useReadCache(false);
sessionSettings.setMaxPeerlistSize(500);
My current method is to loop over all my 10k+ torrents, and run torrent.resume(). When using automanage, this basically only starts ~ 50 of the torrents, and the others start about at a rate of 1 torrent per 10 minutes, which wouldn't work. When not using automanage, it chokes my connection.
BUT, when I do only 30 of them, they all seem to seed correctly, so my next plan is try to resume() them in groupings either with a time delay, or after they've received a tracker_reply.
I tried to garner what I could from this, but don't know what my settings should be specifically:
http://blog.libtorrent.org/2012/01/seeding-a-million-torrents/
I'd really appreciate someone sharing their settings for seeding thousands of torrents,
When not using automanage, it chokes my connection.
Since you say it can run either on a hosted server or domestic internet connection then you will have not much of a choice but to throttle torrent startups. Domestic internet connections are generally behind consumer grade routers and possibly CGNAT, both of which have fairly small NAT tables that will eventually choke from concurrently established TCP connections (peer-peer connections, tracker announces) or UDP pseudo-connections (UDP trackers, µTP, DHT)
So to run many torrents at once you will have to limit all active maintenance traffic of that kind so that the torrents are only started to listen passively for incoming connections.

Site scraping: why am I getting DNS issues after multiple hits?

I am scraping a site for data every 50-90 seconds randomly using a C# console application running on .net 4.5. There are couple of values I am posting to the site and based off the returned value I kick off some other process. The problem is after say about a thousand hits or so I get what looks like a DNS error. I am trying to sort out what the source of the problem is first, before trying to fix it. Here below are some of the errors I see in my logs:
The remote name could not be resolved
Unable to connect to the remote server
Unexpected character encountered while parsing value <. Path '',
line 0, position 0.
Unable to read data from the transport connection An existing
connection was forcibly closed by the remote host.
Unable to read data from the transport connection An established
connection was aborted by the software in your host machine.
About 60% of the time I have got the first error. The remaining 40% is divided between the rest of the errors listed above.Are these issues caused by the website I am scraping or by the DNS servers on my end or something else? For all practical purposes the website I am scraping is ok with it as long as I keep the interval between automated hits above 45 seconds which I am doing. The data I am downloading is on an average about 30KB per hit. Please help me understand what could be going wrong and what things I could try to fix this.
I'd say you're running against an automated system designed to protect the site against a DDoS attack http://en.wikipedia.org/wiki/Denial-of-service_attack.
It's seeing that your same IP address is hitting it repeatedly in a short space of time and is simply blocking your resolution of the eventual server.

Weird Tomcat outage, possibly related to maxConnections

In my company we experienced a serious problem today: our production server went down. Most people accessing our software via a browser were unable to get a connection, however people who had already been using the software were able to continue using it. Even our hot standby server was unable to communicate with the production server, which it does using HTTP, not even going out to the broader internet. The whole time the server was accessible via ping and ssh, and in fact was quite underloaded - it's normally running at 5% CPU load and it was even lower at this time. We do almost no disk i/o.
A few days after the problem started we have a new variation: port 443 (HTTPS) is responding but port 80 stopped responding. The server load is very low. Immediately after restarting tomcat, port 80 started responding again.
We're using tomcat7, with maxThreads="200", and using maxConnections=10000. We serve all data out of main memory, so each HTTP request completes very quickly, but we have a large number of users doing very simple interactions (this is high school subject selection). But it seems very unlikely we would have 10,000 users all with their browser open on our page at the same time.
My question has several parts:
Is it likely that the "maxConnections" parameter is the cause of our woes?
Is there any reason not to set "maxConnections" to a ridiculously high value e.g. 100,000? (i.e. what's the cost of doing so?)
Does tomcat output a warning message anywhere once it hits the "maxConnections" message? (We didn't notice anything).
Is it possible there's an OS limit we're hitting? We're using CentOS 6.4 (Linux) and "ulimit -f" says "unlimited". (Do firewalls understand the concept of Tcp/Ip connections? Could there be a limit elsewhere?)
What happens when tomcat hits the "maxConnections" limit? Does it try to close down some inactive connections? If not, why not? I don't like the idea that our server can be held to ransom by people having their browsers on it, sending the keep-alive's to keep the connection open.
But the main question is, "How do we fix our server?"
More info as requested by Stefan and Sharpy:
Our clients communicate directly with this server
TCP connections were in some cases immediately refused and in other cases timed out
The problem is evident even connecting my browser to the server within the network, or with the hot standby server - also in the same network - unable to do database replication messages which normally happens over HTTP
IPTables - yes, IPTables6 - I don't think so. Anyway, there's nothing between my browser and the server when I test after noticing the problem.
More info:
It really looked like we had solved the problem when we realised we were using the default Tomcat7 setting of BIO, which has one thread per connection, and we had maxThreads=200. In fact 'netstat -an' showed about 297 connections, which matches 200 + queue of 100. So we changed this to NIO and restarted tomcat. Unfortunately the same problem occurred the following day. It's possible we misconfigured the server.xml.
The server.xml and extract from catalina.out is available here:
https://www.dropbox.com/sh/sxgd0fbzyvuldy7/AACZWoBKXNKfXjsSmkgkVgW_a?dl=0
More info:
I did a load test. I'm able to create 500 connections from my development laptop, and do an HTTP GET 3 times on each, without any problem. Unless my load test is invalid (the Java class is also in the above link).
It's hard to tell for sure without hands-on debugging but one of the first things I would check would be the file descriptor limit (that's ulimit -n). TCP connections consume file descriptors, and depending on which implementation is in use, nio connections that do polling using SelectableChannel may eat several file descriptors per open socket.
To check if this is the cause:
Find Tomcat PIDs using ps
Check the ulimit the process runs with: cat /proc/<PID>/limits | fgrep 'open files'
Check how many descriptors are actually in use: ls /proc/<PID>/fd | wc -l
If the number of used descriptors is significantly lower than the limit, something else is the cause of your problem. But if it is equal or very close to the limit, it's this limit which is causing issues. In this case you should increase the limit in /etc/security/limits.conf for the user with whose account Tomcat is running and restart the process from a newly opened shell, check using /proc/<PID>/limits if the new limit is actually used, and see if Tomcat's behavior is improved.
While I don't have a direct answer to solve your problem, I'd like to offer my methods to find what's wrong.
Intuitively there are 3 assumptions:
If your clients hold their connections and never release, it is quite possible your server hits the max connection limit even there is no communications.
The non-responding state can also be reached via various ways such as bugs in the server-side code.
The hardware conditions should not be ignored.
To locate the cause of this problem, you'd better try to replay the scenario in a testing environment. Perform more comprehensive tests and record more detailed logs, including but not limited:
Unit tests, esp. logic blocks using transactions, threading and synchronizations.
Stress-oriented tests. Try to simulate all the user behaviors you can come up with and their combinations and test them in a massive batch mode. (ref)
More specified Logging. Trace client behaviors and analysis what happened exactly before the server stopped responding.
Replace a server machine and see if it will still happen.
The short answer:
Use the NIO connector instead of the default BIO connector
Set "maxConnections" to something suitable e.g. 10,000
Encourage users to use HTTPS so that intermediate proxy servers can't turn 100 page requests into 100 tcp connections.
Check for threads hanging due to deadlock problems, e.g. with a stack dump (kill -3)
(If applicable and if you're not already doing this, write your client app to use the one connection for multiple page requests).
The long answer:
We were using the BIO connector instead of NIO connector. The difference between the two is that BIO is "one thread per connection" and NIO is "one thread can service many connections". So increasing "maxConnections" was irrelevant if we didn't also increase "maxThreads", which we didn't, because we didn't understand the BIO/NIO difference.
To change it to NIO, put this in the element in server.xml:
protocol="org.apache.coyote.http11.Http11NioProtocol"
From what I've read, there's no benefit to using BIO so I don't know why it's the default. We were only using it because it was the default and we assumed the default settings were reasonable and we didn't want to become experts in tomcat tuning to the extent that we now have.
HOWEVER: Even after making this change, we had a similar occurrence: on the same day, HTTPS became unresponsive even while HTTP was working, and then a little later the opposite occurred. Which was a bit depressing. We checked in 'catalina.out' that in fact the NIO connector was being used, and it was. So we began a long period of analysing 'netstat' and wireshark. We noticed some periods of high spikes in the number of connections - in one case up to 900 connections when the baseline was around 70. These spikes occurred when we synchronised our databases between the main production server and the "appliances" we install at each customer site (schools). The more we did the synchronisation, the more we caused outages, which caused us to do even more synchronisations in a downward spiral.
What seems to be happening is that the NSW Education Department proxy server splits our database synchronisation traffic into multiple connections so that 1000 page requests become 1000 connections, and furthermore they are not closed properly until the TCP 4 minute timeout. The proxy server was only able to do this because we were using HTTP. The reason they do this is presumably load balancing - they thought by splitting the page requests across their 4 servers, they'd get better load balancing. When we switched to HTTPS, they are unable to do this and are forced to use just one connection. So that particular problem is eliminated - we no longer see a burst in the number of connections.
People have suggested increasing "maxThreads". In fact this would have improved things but this is not the 'proper' solution - we had the default of 200, but at any given time, hardly any of these were doing anything, in fact hardly any of these were even allocated to page requests.
I think you need to debug the application using Apache JMeter for number of connection and use Jconsole or Zabbix to look for heap space or thread dump for tomcat server.
Nio Connector of Apache tomcat can have maximum connections of 10000 but I don't think thats a good idea to provide that much connection to one instance of tomcat better way to do this is to run multiple instance of tomcat.
In my view best way for Production server: To Run Apache http server in front and point your tomcat instance to that http server using AJP connector.
Hope this helps.
Are you absolutely sure you're not hitting the maxThreads limit? Have you tried changing it?
These days browsers limit simultaneous connections to a max of 4 per hostname/ip, so if you have 50 simultaneous browsers, you could easily hit that limit. Although hopefully your webapp responds quickly enough to handle this. Long polling has become popular these days (until websockets are more prevalent), so you may have 200 long polls.
Another cause could be if you use HTTP[S] for app-to-app communication (that is, no browser involved). Sometimes app writers are sloppy and create new connections for performing multiple tasks in parallel, causing TCP and HTTP overhead. Double check that you are not getting an inflood of requests. Log files can usually help you on this, or you can use wireshark to count the number of HTTP requests or HTTP[S] connections. If possible, modify your API to handle multiple API calls in one HTTP request.
Related to the last one, if you have many HTTP/1.1 requests going across one connection, and intermediate proxy may be splitting them into multiple connections for load balancing purposes. Sounds crazy I know, but I've seen it happen.
Lastly, some crawl bots ignore the crawl delay set in robots.txt. Again, log files and/or wireshark can help you determine this.
Overall, run more experiments with more changes. maxThreads, https, etc. before jumping to conclusions with maxConnections.

Resources