Can one intercept data outside of their local network?

Can one intercept data outside of their local network? - security

Is it possible to access and intercept data transmissions between two hosts which are on separate subnets but still on the internet.
example say intercepting two hosts located in say japanese ISP subnet by an attacker located in a US ISP subnet without the use of malware or physical access or ISP intervention?
or is it just movie stuff?

There are things you can do, to gain access to specific users data-streams with sufficient time, effort and energy. It does not require malware or viruses, nor does it require physical access to the target networks.
What is does require is persistence and an awful lot of late nights and long days, filled with, quite often not a great deal beyond probes and tests for vulnerable network or user kit. Once you find a way through perimeter firewalls and networks, there is still a fair amount of work to go, however, there are things you can do on perimeter routers and switches, that allows you to dump copious amounts of data, flowing across the ports to a disk or elsewhere for inspection and fact-finding on your theoretical targets..
This is just one, very quickly drafted way, of achieving your specified goal - but I can assure you, it is almost never like you see it at the movies..
HTH

Related

Is basic HTTP auth in CouchDB safe enough for replication across EC2 regions?

I can appreciate that seeing "basic auth" and "safe enough" in the same sentence is a lot like reading "Is parachuting without a parachute still safe?", so I'll do my best to clarify what I am getting at.
From what I've seen online, people typically describe basic HTTP auth as being unsecured due to the credentials being passed in plain text from the client to the server; this leaves you open to having your credentials sniffed by a nefarious person or man-in-the-middle in a network configuration where your traffic may be passing through an untrusted point of access (e.g. an open AP at a coffee shop).
To keep the conversation between you and the server secure, the solution is to typically use an SSL-based connection, where your credentials might be sent in plain text, but the communication channel between you and the server is itself secured.
So, onto my question...
In the situation of replicating one CouchDB instance from an EC2 instance in one region (e.g. us-west) to another CouchDB instance in another region (e.g. singapore) the network traffic will be traveling across a path of what I would consider "trusted" backbone servers.
Given that (assuming I am not replicating highly sensitive data) would anyone/everyone consider basic HTTP auth for CouchDB replication sufficiently secure?
If not, please clarify what scenarios I am missing here that would make this setup unacceptable. I do understand for sensitive data this is not appropriate, I just want to better understand the ins and outs for non-sensitive data replicated over a relatively-trusted network.

Bob is right, it is better to err on the side of caution, but I disagree. Bob could be right in this case (see details below), but the problem with his general approach is that it ignores the cost of paranoia. It leaves "peace dividend" money on the table. I prefer Bruce Schneier's assessment that it is a trade-off.
Short answer
Start replicating now! Do not worry about HTTPS.
The greatest risk is not wire sniffing, but your own human error, followed by software bugs, which could destroy or corrupt your data. Make a replica!. If you will replicate regularly, plan to move to HTTPS or something equivalent (SSH tunnel, stunnel, VPN).
Rationale
Is HTTPS is easy with CouchDB 1.1? It is as easy as HTTPS can possibly be, or in other words, no, it is not easy.
You have to make an SSL key pair, purchase a certificate or run your own certificate authority—you're not foolish enough to self-sign, of course! The user's hashed password is plainly visible from your remote couch! To protect against cracking, will you implement bi-directional SSL authentication? Does CouchDB support that? Maybe you need a VPN instead? What about the security of your key files? Don't check them into Subversion! And don't bundle them into your EC2 AMI! That defeats the purpose. You have to keep them separate and safe. When you deploy or restore from backup, copy them manually. Also, password-protect them so if somebody gets the files, they can't steal (or worse, modify!) your data. When you start CouchDB or replicate, you must manually input the password before replication will work.
In a nutshell, every security decision has a cost.
A similar question is, "should I lock my house at night? It depends. Your profile says you are in Tuscon, so you know that some neighborhoods are safe, while others are not. Yes, it is always safer to always lock all of your doors all of the time. But what is the cost to your time and mental health? The analogy breaks down a bit because time invested in worst-case security preparedness is much greater than twisting a bolt lock.
Amazon EC2 is a moderately safe neighborhood. The major risks are opportunistic, broad-spectrum scans for common errors. Basically, organized crime is scanning for common SSH accounts and web apps like Wordpress, so they can a credit card or other database.
You are a small fish in a gigantic ocean. Nobody cares about you specifically. Unless you are specifically targeted by a government or organized crime, or somebody with resources and motivation (hey, it's CouchDB—that happens!), then it's inefficient to worry about the boogeyman. Your adversaries are casting broad nets to get the biggest catch. Nobody is trying to spear-fish you.
I look at it like high-school integral calculus: measuring the area under the curve. Time goes to the right (x-axis). Risky behavior goes up (y-axis). When you do something risky you saved time and effort, but the the graph spikes upward. When you do something the safe way, it costs time and effort, but the graph moves down. Your goal is to minimize the long-term area under the curve, but each decision is case-by-case. Every day, most Americans ride in automobiles: the single most risky behavior in American life. We intuitively understand the risk-benefit trade-off. Activity on the Internet is the same.

As you imply, basic authentication without transport layer security is 100% insecure. Anyone on EC2 that can sniff your packets can see your password. Assuming that no one can is a mistake.
In CouchDB 1.1, you can enable native SSL. In earlier version, use stunnel. Adding SSL/TLS protection is so simple that there's really no excuse not to.

I just found this statement from Amazon which may help anyone trying to understand the risk of packet sniffing on EC2.
Packet sniffing by other tenants: It is not possible for a virtual instance running in promiscuous mode to receive or "sniff" traffic that is intended for a different virtual instance. While customers can place their interfaces into promiscuous mode, the hypervisor will not deliver any traffic to them that is not addressed to them. This includes two virtual instances that are owned by the same customer, even if they are located on the same physical host. Attacks such as ARP cache poisoning do not work within EC2. While Amazon EC2 does provide ample protection against one customer inadvertently or maliciously attempting to view another's data, as a standard practice customers should encrypt sensitive traffic.
http://aws.amazon.com/articles/1697

flow-based traffic classification for traffic shaping

I’m wondering if there are ways to achieve flow-based traffic shaping with linux.
Traditional traffic shaping approaches seem be based on creating classes for specific protocols or types of packets (such as ssh, http, SYN or ACK) that need high troughput.
Here I want to see every TCP connection as a flow characterized by a certain data-rate.
There’ll be
quick flows such as interactive ssh or IRC chat and
slow flows (bulk data) such as scp or http file transfers
Now I’m looking for a way to characterize / classify an incoming packet to one of these classes, so I can run a tc based traffic shaper on it. Any hints?

Since you mention a dedicated machine I'll assume that you are managing from a network bridge and, as such, have access to the entirety of the packet for the lifetime it is in your system.
First and foremost: throttling at the receiving side of a connection is meaningless when you are speaking of link saturation. By the time you see the packet it has already consumed resources. This is true even if you are a bridge; you can only realistically do anything intelligent on the egress interface.
I don't think you will find an off-the-shelf product that is going to do exactly what you want. You are going to have to modify something like dummynet to be dynamic according to rules you derive during execution or you are going to have to program a dynamic software router using some existing infrastructure. One I am familiar with is Click modular router, but there are others. I really dont know how things like tc and ipfw will react to being configured/reconfigured with high frequency - I suspect poorly.
There are things that you should address ahead of time, however. Things that are going to make this task difficult regardless of the implementation. For instance,
How do you plan on differentiating between scp bulk and ssh interactive behavior? Will you monitor initial behavior and apply a rule based on that?
You mention HTTP-specific throttling; this implies DPI. Will you be able to support that on this bridge/router? How many classes of application traffic will you support?
How do you plan on handling contention? (you allot for 'bulk' flows to each get 30% of the capacity but get 10 'bulk' flows trying to consume)
Will you hard-code the link capacity or measure it? Is it fixed or will it vary?
In general, you can get a fairly rough idea of 'flow' by just hashing the networking 5-tuple. Once you start dealing with applications semantics, however, all bets are off and you need to plow through packet contents to get what you want.
If you had a more specific purpose it might render some of these points moot.

How to protect a website from DoS attacks

What is the best methods for protecting a site form DoS attack. Any idea how popular sites/services handles this issue?.
what are the tools/services in application, operating system, networking, hosting levels?.
it would be nice if some one could share their real experience they deal with.
Thanks

Sure you mean DoS not injections? There's not much you can do on a web programming end to prevent them as it's more about tying up connection ports and blocking them at the physical layer than at the application layer (web programming).
In regards to how most companies prevent them is a lot of companies use load balancing and server farms to displace the bandwidth coming in. Also, a lot of smart routers are monitoring activity from IPs and IP ranges to make sure there aren't too many inquiries coming in (and if so performs a block before it hits the server).
Biggest intentional DoS I can think of is woot.com during a woot-off though. I suggest trying wikipedia ( http://en.wikipedia.org/wiki/Denial-of-service_attack#Prevention_and_response ) and see what they have to say about prevention methods.

I've never had to deal with this yet, but a common method involves writing a small piece of code to track IP addresses that are making a large amount of requests in a short amount of time and denying them before processing actually happens.
Many hosting services provide this along with hosting, check with them to see if they do.

I implemented this once in the application layer. We recorded all requests served to our server farms through a service which each machine in the farm could send request information to. We then processed these requests, aggregated by IP address, and automatically flagged any IP address exceeding a threshold of a certain number of requests per time interval. Any request coming from a flagged IP got a standard Captcha response, if they failed too many times, they were banned forever (dangerous if you get a DoS from behind a proxy.) If they proved they were a human the statistics related to their IP were "zeroed."

Well, this is an old one, but people looking to do this might want to look at fail2ban.
http://go2linux.garron.me/linux/2011/05/fail2ban-protect-web-server-http-dos-attack-1084.html
That's more of a serverfault sort of answer, as opposed to building this into your application, but I think it's the sort of problem which is most likely better tackled that way. If the logic for what you want to block is complex, consider having your application just log enough info to base the banning policy action on, rather than trying to put the policy into effect.
Consider also that depending on the web server you use, you might be vulnerable to things like a slow loris attack, and there's nothing you can do about that at a web application level.

Cross-colo fail-over design, DNS level fail-over?

I'm interested in cross-colo fail-over strategies for web applications, such that if the main site fails users seamlessly land at the fail-over site in another colo.
The application side of things looks to be mostly figured out with a master-slave database setup between the colos and services designed to recover and be able to pick up mid-stream. I'm trying to figure out the strategy for moving traffic from the main site to the fail-over site. DNS failover, even with low TTLs, seems to carry a fair bit of latency.
What strategies would you recommend for quickly moving traffic between colos, assuming the servers at the main colo are unreachable?
If you have other interesting experience / words of wisdom about cross-colo failover I'd love to hear those as well.

DNS based mechanisms are troublesome, even if you put low TTLs in your zone files.
The reason for this is that many applications (e.g. MSIE) maintain their own caches which ignore the TTL. Other software will do a single gethostbyname() or equivalent call and store the result until the program is restarted.
Worse still, many ISPs' recursive DNS servers are known to ignore TTLs below their own preferred minimum and impose their own higher TTLs.
Ultimately if the site is to run from both data centers without changing its IP address then you need to look at arrangements for "Multihoming" via global BGP4 route announcements.
With multihoming you need to get at least a /24 netblock of "provider independent" (aka "PI") IP address space, and then have that only be announced to the global routing table from the backup site if the main site goes offline.

As for DNS, I like to reference, "Why DNS Based Global Server Load Balancing Doesn't Work". For everything else -- use BGP.
Designing networks in order to load balance using BGP is still not an easy task and I myself certainly am not an expert on this. It's also more complex than Wikipedia can tell you but there are a couple interesting articles on the web that detail how it can be done:
Load Balancing In BGP Networks
Load Sharing in Single and Multi homed environments
There is always more if you search for BGP and load balancing. There are also a couple whitepapers on the net which describe how Akamai does their global loadbalancing (I believe it's BGP too.), which is always interesting to read and learn about.
Beyond the obvious concepts you can use software and hardware to achieve, you might also want to check with your ISP/provider/colo if they can set you up.
Also, no offense in regard to your choice of colo (Who's the provider?), but most places should be setup to deal with downtimes and so on, they should not require you to take actions. Of course floods or aliens can always strike, but in that case I guess there are more important issues. :-)

If you can, Multicast - http://en.wikipedia.org/wiki/Multicast or AnyCast - http://en.wikipedia.org/wiki/Anycast

How does geographic lookup by IP work?

Is which IPs are assigned to which ISPs public information? How do geo IP services obtain this information and maintain this information?
How can I personally figure out where a certain IP belongs without using one of these services?

For what it's worth, I worked at a senior level in the ISP industry for more than a decade so I have quite some experience with this.
Large IP ranges are allocated as needed by IANA to each of the Regional Internet Registries.
The regions are generally continental in size - IP addresses are not assigned on a per-country basis.
The RIRs in turn then allocate IP addresses to ISPs, who in turn assign them to end-users.
Each of the RIRs maintain a whois server which can be queried to find out not only which ISP has been assigned any netblock, but to a certain extent which end-user, and that end-user's address.
Note that many ISPs do not fill out this information for every single customer. Hence if you're a residential subscriber of a DSL service, it's likely that the Geo records will give the address of your ISP, and not your own address.
The various GeoLocation providers mostly work by mining these whois records. Note that the legality of doing so is something of a gray area - RIPE's database copyright statement is here.
IANA also maintains the root zone for the DNS, but that is completely separate from any IP allocation functions. It is very important to maintain the distinction between domain name operations and IP addresses.

To answer the specific question about "how it works": there's alot of manual labor involved, and the databases are to a large extent maintained manually. Just as other answers point out, there's no real correlation between IP ranges and countries, much less specific regions. Recently the system of IP address space distribution has been even more decentralized which means small private vendors can acquire IPv4 address ranges regardless of geographic region. This is why Google acquried Urchin so they could use their services for Google Analytics, which provides very accurate IP-to-geographic-region information.
If you don't want to use a service like MaxMind (free for personal use, and the database is open to some extent) or Google Analytics (free for personal use), there's free (and hence always slightly outdated) databases floating around, sometimes as flat files.

There are a variety of libraries that have mapping tables as well as services you can incorporate into your code.
The most important thing to understand is that there is no direct relationship between an IP address and any part of the world. The addresses are allocated in large blocks to organizations that are roughly geographical, which in turn allocate smaller blocks, this may happen at several levels for any given IP address (Alnitak explains the process well).
The fact is: WHOIS data does not have to be accurate. If I have an address block, I can say it is on Mars. And even if you narrow down the location of the final organization (say a very small ISP in Alaska), the user might be using dialup from Hawaii, or the server might be hosting a company from Guam.
So, there is always an element of risk/estimation in mapping an IP address (or a domain name) to a physical location. This is not to say you should never do it, there are many applications where rough or imperfect information is very useful.

Beware, the data is often slow to be updated, and even slower to replicate. My work place changed ISPs a number of years ago, and we were assigned a block of formerly Canadian IP addresses (we're based in the US), for months Google continued to give us google.ca as our default search engine. About 1/2 the time my home IP address comes up as being from my town, the other 1/2 from a town in another state.
Jason is right that the process is the same, but the updates are even slower and the data less accurate.

Alnitak's answer is pretty much on the mark.
As a side note, if you want to use a .dll to determine the user's location, then you can try this IPAddressExtension found on CodePlex. It has an internal database of ISP's to locations. As mentioned above by Alnitak, each ISP have IP blocks .. so this information is all buried inside the .dll :)
It's really easy to use. Just reference the .dll and then create an instance of a System.Net.IpAddress object! the extensions are listed on it.
I also need to declare that i'm the author of that codeplex project/product.
Please check it out :)
EDIT: added information about me being the author of that product.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string