flow-based traffic classification for traffic shaping

flow-based traffic classification for traffic shaping - linux

I’m wondering if there are ways to achieve flow-based traffic shaping with linux.
Traditional traffic shaping approaches seem be based on creating classes for specific protocols or types of packets (such as ssh, http, SYN or ACK) that need high troughput.
Here I want to see every TCP connection as a flow characterized by a certain data-rate.
There’ll be
quick flows such as interactive ssh or IRC chat and
slow flows (bulk data) such as scp or http file transfers
Now I’m looking for a way to characterize / classify an incoming packet to one of these classes, so I can run a tc based traffic shaper on it. Any hints?

Since you mention a dedicated machine I'll assume that you are managing from a network bridge and, as such, have access to the entirety of the packet for the lifetime it is in your system.
First and foremost: throttling at the receiving side of a connection is meaningless when you are speaking of link saturation. By the time you see the packet it has already consumed resources. This is true even if you are a bridge; you can only realistically do anything intelligent on the egress interface.
I don't think you will find an off-the-shelf product that is going to do exactly what you want. You are going to have to modify something like dummynet to be dynamic according to rules you derive during execution or you are going to have to program a dynamic software router using some existing infrastructure. One I am familiar with is Click modular router, but there are others. I really dont know how things like tc and ipfw will react to being configured/reconfigured with high frequency - I suspect poorly.
There are things that you should address ahead of time, however. Things that are going to make this task difficult regardless of the implementation. For instance,
How do you plan on differentiating between scp bulk and ssh interactive behavior? Will you monitor initial behavior and apply a rule based on that?
You mention HTTP-specific throttling; this implies DPI. Will you be able to support that on this bridge/router? How many classes of application traffic will you support?
How do you plan on handling contention? (you allot for 'bulk' flows to each get 30% of the capacity but get 10 'bulk' flows trying to consume)
Will you hard-code the link capacity or measure it? Is it fixed or will it vary?
In general, you can get a fairly rough idea of 'flow' by just hashing the networking 5-tuple. Once you start dealing with applications semantics, however, all bets are off and you need to plow through packet contents to get what you want.
If you had a more specific purpose it might render some of these points moot.

Related

How to protect a website from DoS attacks

What is the best methods for protecting a site form DoS attack. Any idea how popular sites/services handles this issue?.
what are the tools/services in application, operating system, networking, hosting levels?.
it would be nice if some one could share their real experience they deal with.
Thanks

Sure you mean DoS not injections? There's not much you can do on a web programming end to prevent them as it's more about tying up connection ports and blocking them at the physical layer than at the application layer (web programming).
In regards to how most companies prevent them is a lot of companies use load balancing and server farms to displace the bandwidth coming in. Also, a lot of smart routers are monitoring activity from IPs and IP ranges to make sure there aren't too many inquiries coming in (and if so performs a block before it hits the server).
Biggest intentional DoS I can think of is woot.com during a woot-off though. I suggest trying wikipedia ( http://en.wikipedia.org/wiki/Denial-of-service_attack#Prevention_and_response ) and see what they have to say about prevention methods.

I've never had to deal with this yet, but a common method involves writing a small piece of code to track IP addresses that are making a large amount of requests in a short amount of time and denying them before processing actually happens.
Many hosting services provide this along with hosting, check with them to see if they do.

I implemented this once in the application layer. We recorded all requests served to our server farms through a service which each machine in the farm could send request information to. We then processed these requests, aggregated by IP address, and automatically flagged any IP address exceeding a threshold of a certain number of requests per time interval. Any request coming from a flagged IP got a standard Captcha response, if they failed too many times, they were banned forever (dangerous if you get a DoS from behind a proxy.) If they proved they were a human the statistics related to their IP were "zeroed."

Well, this is an old one, but people looking to do this might want to look at fail2ban.
http://go2linux.garron.me/linux/2011/05/fail2ban-protect-web-server-http-dos-attack-1084.html
That's more of a serverfault sort of answer, as opposed to building this into your application, but I think it's the sort of problem which is most likely better tackled that way. If the logic for what you want to block is complex, consider having your application just log enough info to base the banning policy action on, rather than trying to put the policy into effect.
Consider also that depending on the web server you use, you might be vulnerable to things like a slow loris attack, and there's nothing you can do about that at a web application level.

Fully Decentralized P2P?

I’m looking at creating a P2P system. During initial research, I’m reading from Peer-to-Peer – Harnessing the Power of Disruptive Technologies. That book states “a fully decentralized approach to instant messaging would not work on today's Internet.” Mostly blaming firewalls and NATs. The copyright is 2001. Is this information old or still correct?

It's still largely correct. Most users still are behind firewalls or home routers that block incoming connections. Those can be opened easier today than in 2001 (using uPnP for example, requiring little user interaction and knowledge) but most commercial end-user-targeting applications - phone (Skype, VoIP), chat (the various Messengers), remote control - are centralized solutions to circumvent firewall problems.

I would say that it is just plain wrong, both now and then. Yes, you will have many nodes that will be firewalled, however, you will also have a significant number who are not. So, if end-to-end encryption is used to protect the traffic from snooping, then you can use non-firewalled clients to act as intermediaries between two firewalled clients that want to chat.
You will need to take care, however, to spread the load around, so that a few unfirewalled clients aren't given too much load.
Skype uses a similar idea. They even allow file transfers through intermediaries, though they limit the through-put so as not to over load the middle-men.
That being said, now in 2010, it is a lot easier to punch holes in firewalls than it was in 2001, as most routers will allow you to automate the opening of ports via UPNP, so you are likely to have a larger pool of unfirewalled clients to work with.

Firewalls and NATs still commonly disrupt direct peer-to-peer communication between home-based PCs (and also between home-based PCs and corporate desktops).
They can be configured to allow particular peer-to-peer protocols, but that remains a stumbling block for most unsavvy users.

I think the original statement is no longer correct. But the field of Decentralized Computing is still in its infancy, with little serious contenders.
Read this interesting post on ZeroTier (thanks to #joehand): The State of NAT Traversal:
NAT is Traversable
In reading the Internet chatter on this subject I've been shocked by how many people don't really understand this, hence the reason this post was written. Lots of people think NAT is a show-stopper for peer to peer communication, but it isn't. More than 90% of NATs can be traversed, with most being traversable in reliable and deterministic ways.
At the end of the day anywhere from 4% (our numbers) to 8% (an older number from Google) of all traffic over a peer to peer network must be relayed to provide reliable service. Providing relaying for that small a number is fairly inexpensive, making reliable and scalable P2P networking that always works quite achievable.
I personally know of Dat Project, a decentralized data sharing toolkit (based on their hypercore protocol for P2P streaming).
From their Dat - Distributed Dataset Synchronization And Versioning paper:
Peer Connections
After the discovery phase, Dat should have a list of
potential data sources to try and contact. Dat uses
either TCP, UTP, or HTTP. UTP is designed to not
take up all available bandwidth on a network (e.g. so
that other people sharing wifi can still use the Inter-
net), and is still based on UDP so works with NAT
traversal techniques like UDP hole punching.
HTTP is supported for compatibility with static file servers and
web browser clients. Note that these are the protocols
we support in the reference Dat implementation, but
the Dat protocol itself is transport agnostic.
Furthermore you can use it with Bittorrent DHT. The paper also contains some references to other technologies that inspired Dat.
For implementation of peer discovery, see: discovery-channel
Then there is also IPFS, or 'The Interplanetary File System' which is currently best positioned to become a standard.
They have extensive documentation on their use of DHT and NAT traversal to achieve decentralized P2P.

The session messenger seem to have solved the issue with a truly decentralized p2p messenger by using a incentivized mixnet to relay and store messages. Its a fork of the Signal messenger with a mixnet added in. https://getsession.org -- whitepaper: https://getsession.org/wp-content/uploads/2020/02/Session-Whitepaper.pdf

It's very old and not correct. I believe there is a product out called Tribler (news article) which enables BitTorrent to function in a fully decentralized way.
If you want to go back a few years (even before that document) you could look at Windows. Windows networking used to function in a fully decentralized way. In some cases it still does.
UPNP is also decentralized in how it determines available devices on your local network.
In order to be decentralized you need to have a way to locate other peers. This can be done proactively by scanning the network (time consuming) or by having some means of the clients announcing that they are available.
The announcements can be simple UDP packets that get broadcast every so often to the subnet which other peers listen for. Another mechanism is broadcasting to IIRC channels (most common for command and control of botnets), etc. You might even use twitter or similar services. Use your imagination here.
Firewalls don't really play a part because they almost always leave open a few ports, such as 80 (http). Obviously you couldn't browse the network if that was closed. Now if the firewall is configured to only allow connections that originated from internal clients, then you'd have a little more work to do. But not much.
NATs are also not a concern for similiar issues.

How many open udp or tcp/ip connections can a linux machine have?

There are limits imposed by available memory, bandwidth, CPU, and of course, the network connectivity. But those can often be scaled vertically. Are there any other limiting factors on linux? Can they be overcome without kernel modifications? I suspect that, if nothing else, the limiting factor would become the gigabit ethernet. But for efficient protocols it could take 50K concurrent connections to swamp that. Would something else break before I could get that high?
I'm thinking that I want a software udp and/or tcp/ip load balancer. Unfortunately nothing like that in the open-source community seems to exist, except for the http protocol. But it is not beyond my abilities to write one using epoll. I expect it would go through a lot of tweaking to get it to scale, but that's work that can be done incrementally, and I would be a better programmer for it.

The one parameter you will probably have some difficulty with is jitter. Has you scale the number of connections per box, you will undoubtedly put strain on all the resources of the said system. As a result, the jitter characteristics of the forwarding function will likely suffer.
Depending on your target requirements, that might or not be an issue: if you plan to support mainly elastic traffic (traffic which does not suffer much from jitter and latency) then it's ok. If the proportion of inelastic traffic is high (e.g. interactive voice/video), then this might be more of an issue.
Of course you can always over engineer in this case ;-)

If you intend to have a server which holds one socket open per client, then it needs to be designed carefully so that it can efficiently check for incoming data from 10k+ clients. This is known as the 10k problem.
Modern Linux kernels can handle a lot more than 10k connections, generally at least 100k. You may need some tuning, particularly the many TCP timeouts (if using TCP) to avoid closing / stale sockets using up lots of resource if a lot of clients connect and disconnect frequently.
If you are using netfilter's conntrack module, that may also need tuning to track that many connections (this is independent of tcp/udp sockets).
There are lots of technologies for load balancing, the most well-known is LVS (Linux Virtual Server) which can act as the front end to a cluster of a real servers. I don't know how many connections it can handle, but I think we use it with at least 50k in production.

To your question, you are only restrained by hardware limitations. This was the design philosophy for linux systems. You are describe exactly what would be your limiting factors.

Try HAProxy software load balancer:
http://haproxy.1wt.eu/

What protocol should I use for fast command/response interactions?

I need to set up a protocol for fast command/response interactions. My instinct tells me to just knock together a simple protocol with CRLF separated ascii strings like how SMTP or POP3 works, and tunnel it through SSH/SSL if I need it to be secured.
While I could just do this, I'd prefer to build on an existing technology so people could use a friendly library rather than the socket library interface the OS gives them.
I need...
Commands and responses passing structured data back and forth. (XML, S expressions, don't care.)
The ability for the server to make unscheduled notifications to the client without being polled.
Any ideas please?

If you just want request/reply, HTTP is very simple. It's already a request/response protocol. The client and server side are widely implemented in most languages. Scaling it up is well understood.
The easiest way to use it is to send commands to the server as POST requests and for the server to send back the reply in the body of the response. You could also extend HTTP with your own verbs, but that would make it more work to take advantage of caching proxies and other infrastructure that understands HTTP.
If you want async notifications, then look at pub/sub protocols (Spread, XMPP, AMQP, JMS implementations or commercial pub/sub message brokers like TibcoRV, Tibco EMS or Websphere MQ). The protocol or implementation to pick depends on the reliability, latency and throughput needs of the system you're building. For example, is it ok for notifications to be dropped when the network is congested? What happens to notifications when a client is off-line -- do they get discarded or queued up for when the client reconnects.

AMQP sounds promising. Alternatively, I think XMPP supports much of what you want, though with quite a bit of overhead.
That said, depending on what you're trying to accomplish, a simple ad hoc protocol might be easier.

How about something like SNMP? I'm not sure if it fits exactly with the model your app uses, but it supports both async notify and pull (i.e., TRAP and GET).

That's a great question with a huge number of variables to consider, and the question only mentioned a few them: packet format, asynchronous vs. synchronized messaging, and security. There are many, many others one could think about. I suggest going through a description of the 7-layer protocol stack (OSI/ISO) and asking yourself what you need at those layers, and whether you want to build that layer or get it from somewhere else. (You seem mostly interested in layer 6 and 7, but also mentioned bits of lower layers.)
Think also about whether this is in a safety-critical application or part of a system with formal V&V. Really good, trustworthy communication systems are not easy to design; also an "underpowered" protocol can put a lot of coding burden on application to do error-recovery.
Finally, I would suggest looking at how other applications similar to yours do the job (check open source, read books, etc.) Also useful is the U.S. Patent Office database, etc; one can get great ideas just from reading the description of the communication problem they were trying to solve.

Dynamic IP-based blacklisting

Folks, we all know that IP blacklisting doesn't work - spammers can come in through a proxy, plus, legitimate users might get affected... That said, blacklisting seems to me to be an efficient mechanism to stop a persistent attacker, given that the actual list of IP's is determined dynamically, based on application's feedback and user behavior.
For example:
- someone trying to brute-force your login screen
- a poorly written bot issues very strange HTTP requests to your site
- a script-kiddie uses a scanner to look for vulnerabilities in your app
I'm wondering if the following mechanism would work, and if so, do you know if there are any tools that do it:
In a web application, developer has a hook to report an "offense". An offense can be minor (invalid password) and it would take dozens of such offenses to get blacklisted; or it can be major, and a couple of such offenses in a 24-hour period kicks you out.
Some form of a web-server-level block kicks in on before every page is loaded, and determines if the user comes from a "bad" IP.
There's a "forgiveness" mechanism built-in: offenses no longer count against an IP after a while.
Thanks!
Extra note: it'd be awesome if the solution worked in PHP, but I'd love to hear your thoughts about the approach in general, for any language/platform

Take a look at fail2ban. A python framework that allows you to raise IP tables blocks from tailing log files for patterns of errant behaviour.

are you on a *nix machine? this sort of thing is probably better left to the OS level, using something like iptables
edit:
in response to the comment, yes (sort of). however, the idea is that iptables can work independently. you can set a certain threshold to throttle (for example, block requests on port 80 TCP that exceed x requests/minute), and that is all handled transparently (ie, your application really doesn't need to know anything about it, to have dynamic blocking take place).
i would suggest the iptables method if you have full control of the box, and would prefer to let your firewall handle throttling (advantages are, you don't need to build this logic into your web app, and it can save resources as requests are dropped before they hit your webserver)
otherwise, if you expect blocking won't be a huge component, (or your app is portable and can't guarantee access to iptables), then it would make more sense to build that logic into your app.

I think it should be a combination of user-name plus IP block. Not just IP.

you're looking at custom lockout code. There are applications in the open source world that contain various flavors of such code. Perhaps you should look at some of those, although your requirements are pretty trivial, so mark an IP/username combo, and utilize that for blocking an IP for x amount of time. (Note I said block the IP, not the user. The user may try to get online via a valid IP/username/pw combo.)
Matter of fact, you could even keep traces of user logins, and when logging in from an unknown IP with a 3 strikes bad username/pw combo, lock that IP out for however long you like for that username. (Do note that a lot of ISPs share IPs, thus....)
You might also want to place a delay in authentication, so that an IP cannot attempt a login more than once every 'y' seconds or so.

I have developed a system for a client which kept track of hits against the web server and dynamically banned IP addresses at the operating system/firewall level for variable periods of time for certain offenses, so, yes, this is definitely possible. As Owen said, firewall rules are a much better place to do this sort of thing than in the web server. (Unfortunately, the client chose to hold a tight copyright on this code, so I am not at liberty to share it.)
I generally work in Perl rather than PHP, but, so long as you have a command-line interface to your firewall rules engine (like, say, /sbin/iptables), you should be able to do this fairly easily from any language which has the ability to execute system commands.

err this sort of system is easy and common, i can give you mine easily enough
its simply and briefly explained here http://www.alandoherty.net/info/webservers/
the scripts as written arn't downloadable {as no commentry currently added} but drop me an e-mail, from the site above, and i'll fling the code at you and gladly help with debugging/taloring it to your server

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string