Concerning Tor relays, what is the difference between ORPort and DirPort? - tor

I'm setting up a relay and my understanding is that the ORPort is required to be open to relay traffic from the outside world but do I need to enable the DirPort too?

No, you don't need to enable DirPort. If you set the DirPort, your relay will also serve as a directory mirror.
As a directory mirror other relays can query you for information about other relays on the network (for example to get a list of relays for building circuits).
Note that enabling this can significantly increase your bandwidth usage (full directory listings can be fairly large) and you'll have a lot more incoming connections. For example, one of my relays serving as a mirror has over 1200 incoming connections for dir requests and pushes an extra 10-20 Mbps as a result.
I believe the BandwidthRate option includes limiting directory traffic (which is separate from RelayBandwidthRate).
If you want to just run a relay, it's fine to leave DirPort as 0 so you can dedicate as much bandwidth to relaying. There are a lot of relays running as mirrors so I think the capacity for them is pretty good, but running one when possible is encouraged.

Related

Load balancing sockets on a horizontally scaling WebSocket server?

Every few months when thinking through a personal project that involves sockets I find myself having the question of "How would you properly load balance sockets on a dynamic horizontally scaling WebSocket server?"
I understand the theory behind horizontally scaling the WebSockets and using pub/sub models to get data to the right server that holds the socket connection for a specific user. I think I understand ways to effectively identify the server with the fewest current socket connections that I would want to route a new socket connection too. What I don't understand is how to effectively route new socket connections to the server you've picked with low socket count.
I don't imagine this answer would be tied to a specific server implementation, but rather could be applied to most servers. I could easily see myself implementing this with vert.x, node.js, or even perfect.
First off, you need to define the bounds of the problem you're asking about. If you're truly talking about dynamic horizontal scaling where you spin up and down servers based on total load, then that's an even more involved problem than just figuring out where to route the latest incoming new socket connection.
To solve that problem, you have to have a way of "moving" a socket from one host to another so you can clear connections from a host that you want to spin down (I'm assuming here that true dynamic scaling goes both up and down). The usual way I've seen that done is by engaging a cooperating client where you tell the client to reconnect and when it reconnects it is load balanced onto a different server so you can clear off the one you wanted to spin down. If your client has auto-reconnect logic already (like socket.io does), you can just have the server close the connection and the client will automatically re-connect.
As for load balancing the incoming client connections, you have to decide what load metric you want to use. Ultimately, you need a score for each server process that tells you how "busy" you think it is so you can put new connections on the least busy server. A rudimentary score would just be number of current connections. If you have large numbers of connections per server process (tens of thousands) and there's no particular reason in your app that some might be lots more busy than others, then the law of large numbers probably averages out the load so you could get away with just how many connections each server has. If the use of connections is not that fair or even, then you may have to also factor in some sort of time moving average of the CPU load along with the total number of connections.
If you're going to load balance across multiple physical servers, then you will need a load balancer or proxy service that everyone connects to initially and that proxy can look at the metrics for all currently running servers in the pool and assign the connection to the one with the most lowest current score. That can either be done with a proxy scheme or (more scalable) via a redirect so the proxy gets out of the way after the initial assignment.
You could then also have a process that regularly examines your load score (however you decided to calculate it) on all the servers in the cluster and decides when to spin a new server up or when to spin one down or when things are too far out of balance on a given server and that server needs to be told to kick several connections off, forcing them to rebalance.
What I don't understand is how to effectively route new socket connections to the server you've picked with low socket count.
As described above, you either use a proxy scheme or a redirect scheme. At a slightly higher cost at connection time, I favor the redirect scheme because it's more scalable when running and creates fewer points of failure for an existing connection. All clients connect to your incoming connection gateway server which is responsible for knowing the current load score for each of the servers in the farm and based on that, it assigns an incoming connection to the host with the lowest score and this new connection is then redirected to reconnect to one of the specific servers in your farm.
I have also seen load balancing done purely by a custom DNS implementation. Client requests IP address for farm.somedomain.com and that custom DNS server gives them the IP address of the host it wants them assigned to. Each client that looks up the IP address for farm.somedomain.com may get a different IP address. You spin hosts up or down by adding or removing them from the custom DNS server and it is that custom DNS server that has to contain the logic for knowing the load balancing logic and the current load scores of all the running hosts.
Route the websocket requests to a load balancer that makes the decision about where to send the connections.
As an example, HAProxy has a leastconn method for long connections that picks the least recently used server with the lowest connection count.
The HAProxy backend server weightings can also be modified by external inputs, #jfriend00 detailed the technicalities of weighting in their answer.
I found this project that might be useful:
https://github.com/apundir/wsbalancer
A snippet from the description:
Websocket balancer is a stateful reverse proxy for websockets. It distributes incoming websockets across multiple available backends. In addition to load balancing, the balancer also takes care of transparently switching from one backend to another in case of mid session abnormal failure.
During this failover, the remote client connection is retained as-is thus remote client do not even see this failover. Every attempt is made to ensure none of the message is dropped during this failover.
Regarding your question : that new connection will be routed by the load balancer if configured to do so.
As #Matt mentioned, for example with HAProxy using the leastconn option.

Sniffing the traffic from a TOR exit node

Question regarding the traffic coming out from the TOR exit node:
I have been reading on a forum of people arguing the capabilities and risks of using TOR network. I have never used TOR before, nor would I have the need to use it, but I still want to know more about it.
I understand TOR uses randomly selected relays for traffic to travel through, but the traffic eventually comes out of an exit node somewhere. I have read that such traffic can be used to trace the user.
What i don't understand is if this traffic can be analysed, wouldn't it just show the requests are coming from the last relay instead of the original IP? Or does it show the entire trail including all the relay nodes that the traffic has passed through?
Say, this traffic can indeed be traced, does using encryption makes any difference? IF i was running an exit node (I'm not, I know the risks) and analyse the exiting traffic that is encrypted, can I still trace the original IP?
What if the user:
*is on open Wifi > connects to it with a laptop with dual NICs > is using live USB OS with say...a squid box as proxy > connects to it with another laptop > > connects to VPN > uses TOR with encryption
Is there a way for a normal user or a researcher, without ample resources like the government/law enforcement has, to still analyse the exiting traffic and trace the original IP?
Thanks in advance.
Since an exit relay is responsible for relaying source traffic out to the internet, if that traffic uses an unencrypted protocol (e.g. http), it can see the contents of that traffic.
For that reason, you shouldn't send sensitive data over Tor unencrypted when possible. The guard (entry), and middle (relay) nodes can't see the actual traffic, only the exit can. Only the guard node can see your true IP address.
The exit (while it can see the actual traffic and the destination) has no way of knowing your IP. If it could, you'd be much less anonymous when using Tor.
The threats here are if an adversary controls many relays. One of the worst case scenarios for being tracked through Tor would be if you selected a circuit where the guard node and exit node were controlled by the same adversary.
In this scenario, they could see your actual source IP address and your exit traffic (if unencrypted) or at the very least the destination for your exit traffic.
The other tricky part is correlating your entry traffic with the exit traffic. Whether or not entry traffic to a relay they control is also related to exit traffic from another relay they control is strictly up to timing and traffic analysis.
To understand more, you first need to understand how Tor works on a basic level, for that see the documentation page and the overview. Then, search for things like "Tor traffic analysis", "Tor traffic fingerprinting", "Tor timing attacks", and "Tor traffic correlation" to understand more and the research being done to defend against it.
More recent versions of Tor have started padding all cells to make smaller and larger traffic indistinguishable from eachother, and much past research has been done into relay selection
to prevent the chances of randomly selecting malicious exits or guards.
Hope that helps.

Linux webserver load balance howto

I need to deliver a lot of HTTP content (Lets say it simple - a Big storage with HTTTP Access - Similar to AWS S3)
The Bandwith needed for this excedds the Bandwith of one Server (We get 200MBit each Server and the question is not to change this)
For out Prog we need 1Gbit that woudl mean 5 Servers.
When I connect them togeter with mod_proxy then I have one Server in front which only has 200MBit. So thats not the right way.
But these Servers must be accassible from the Web with one Domain Name. Is there a possibillity to so that? Example: One gets the HTTP Request, but the Resonse comes from a different Server?
DNS Round Robin?
Different Idea?
Thanx
If the outbound network traffic is not CPU limited, you can use this open source Linux Network Balancer
http://lnlb.sourceforge.net/
The inbound network speed will remain at 200MBit, but with five nodes the maximum outbound limit is 5*200MBit.
A lot of people condemn round robin DNS, perhaps assuming that it will take a full TCP timeout to detect all failed node which is simply not the case. Its a simple way to solve the performance problem and improves availability a lot. This also helps to solve the potential bottleneck of your lan without having to go to 10gbit Ethernet which would be a requirement between a router and a load balancer for the rate of traffic you describe.
There may be scope for getting more throughput from your servers and hence only needing 3 or 4 servers rather than 5. But that's a very different question.

Using DNS for failover using multiple A records

It has recently come to my attention that setting up multiple A records for a hostname can be used not only for round-robin load-balancing but also for automatic failover.
So I tried testing it:
I loaded a page from our domain
Noted which of our servers had served the page
Turned off the web server on that host
Reloaded the page
And indeed the browser automatically tried a different server to load the page. This worked in Opera, Safari, IE, and Firefox. Only Chrome failed to try a different server.
But after leaving that server offline for a few minutes and looking at the access logs, I found that the number of requests to the other servers had not significantly increased. With 1 out of 3 servers offline, I had expected accesses to each of the remaining 2 servers to roughly increase by 50%, but instead I only saw 7-10%. That can only mean DNS-based failover does not work for the majority of browsers/visitors, which directly contradicts what I had just tested.
Does anyone have an idea what is up with DNS-based web browser failover? What possible reason could there be why automatic failover works for me but not the majority of our visitors?
What's happening is that the browsers are not doing automatic DNS failover.
If you have multiple A records on a domain then when your nameserver requests the IP for the domain you typed into your browser, it'll request one from the SOA. It could be any of those A records. Then it passes it along.
Some nameservers are 'smart' enough to request a new A record if the one it gets doesn't work and some aren't. So if you set multiple A records then you will have set up a pseudo redundancy failover, but only for those people with 'smart' nameservers. The rest get a toss of the dice on which IP they get and if it works then good, and if not then it will fail to load as it did for you in Chrome.
If you want to specifically test this then you can use your hosts file C:\Windows\system32\drivers\etc\hosts in Windows and /etc/hosts in Linux to specify what IP you want to go with what domain to see if you get a true failover - as what you'll run into in practicality is that DNS servers across the net will cache your domain name resolution based on its TTL. So if/when you get a real failure, that IP will still need to be resolve and be otherwise farmed out to another nameserver.
Another possible explanation is that, for most public websites, the bulk of traffic comes from bots not from browsers. Depending on the bot it is possible that they aren't quite as smart as the browsers when it comes to handling multiple A records for a domain.
Also, some bots use keep-alives to keep the TCP connections open & make multiple HTTP requests over the same connection. Given that the DNS lookup is only done when a connection is made, they will continue to make requests to the old IP address at least as long as the connection is kept open.
If the above explanation has any weight you should be able to see it in your logs by examining the user agent strings.

How can I download a file over multiple interfaces in OS X or Linux?

I have a large file I want to download from a server I have root access to. I also have several different, concurrent internet connections from my machine to the server at my disposal.
Do you know of any protocol, (S)FTP client, HTTP client, AFP client, or any other file transfer protocol server and client combination that supports multithreaded downloads over different connections?
One option would be the "old fashioned" multi-part file..
split -b 50m hugefile multiparthugefile_
That will create multiparthugefile_a, multiparthugefile_b and so on. To rejoin them, use the cat command:
cat multiparthugefile_* > hugefile_rejoined
To actually transfer the files using different interfaces, the wget --bind-address=ADDRESS flag should work:
--bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host.
This problem seems like something Bittorrent is positioned to do well, but I'm not sure exactly how you would do this..
Perhaps create a temporary tracker (or use something like OpenBitTorrent.com), and run multiple clients locally - as long as the clients support the LAN transfer feature, each client would grab different parts from the server, and share them with the (local) clients. You'd end up with multiple copies of the file locally, but it would only transferred over the internet once
Any of these? You'll need a webserver hosting the same file on all the interfaces though.
In case of HTTP or HTTPS, as long as server supports range requests you can fetch the ranges separately and stitch them together. I started working on a use case that is pointed by you. If you are still interested, here is a link to my repository https://github.com/m0hithreddy/MID.
The program (MID) uses SO_BINDTODEVICE socket option to bind to a specific interface, so in most of the cases you require super user permissions and CAP_NET_RAW capability (root user has).
MID determines the network interfaces to use in the download and adopts two step split for downloading the content.
First step: The file is divided among network interfaces (in real time).
Second step: Further the file is divided among several HTTP range requests that arises from that particular interface (NOTE: Server should support them at the first place to make all of this possible)
MID supports HTTP and HTTPS protocol.
Cheers :)
http - check out one of the various download manager (ie firefox with http://www.downthemall.net/ extension)
there are also ftp downloader that support multiple streams

Resources