What identifying information can a website capture?

What identifying information can a website capture? - security

If the owner of a web site wants to track who their users are as much as possible, what things can they capture (and how). You might want to know about this in order to capture information on a site you create or, as a user, to prevent a site from capturing data on you.
Here is a starting list, but I'm sure I have missed some important ones:
Referrer (what web page had the link you followed to get here). This is a HTTP header.
IP Address of the machine you are browsing from. This is available with the HTTP headers.
User Agent (what browser you are using). This is a HTTP header.
Cookie placed on a previous visit. This is a header, available only if a cookie was placed earlier and was not deleted by the user.
Flash Cookie placed on a previous visit. Some users turn off cookies, but very few know how to turn off Flash cookies. Works like a normal cookie although it depends on Flash.
Web Bugs. Place something small (like a transparent single-pixel GIF) on the page that's served up from a 3rd party. Some third parties (such as DoubleClick) will have their own cookies and can correlate with other visits the user makes (for a fee!).
Those are the common ones I think of, but there have to be LOTS of unusual ones. For instance, this:
Time on the user's clock. Use JavaScript to transmit it.
... which I had never heard of before reading it here.
ADDED LATER (after reading this):
Please try to put just ONE item per answer, then we can use voting up to sort out the better/more-interesting ones. The list below is probably less effective.
Ah well... NEXT time I ask a question like this I'll set it up better.
And here are some of the best answers I got:
James points out that IE transmits the .NET framework version.
AviewAnew points out that one can find what sites you have visited.
Mecki points out that Screen Resolution can be determined.
Mecki also points out that any auto-fill information your browser has cached can be determined, by creating a hidden field, then reading it with JavaScript.
jjrv points out that Flash can list the fonts on the user's machine.
Kent points out that you can find out what websites a person has visited.
Silver Dragon points out you can determine the location of the mouse within the browsing window using Flash and AJAX.
Jim points out that you can tell what language the user has configured in their browser from a HTTP header.
Jim also mentions that you can detect whether people are using Greasemonkey or something similar to modify the page.

Modifications to your original:
can be escaped ( i think its an option in some browsers )
only avoidable with a proxy ( javascript can contravene this however with smart lookaround )
is unreliable, easily forged.
And assuming it was not wiped by browser closure ( session cookie ) and cookie is in the same domain/path
The real nasty ones are
Using javascript to probe your network/lan
Using javascript to access your firewall from behind the firewall and adjust its settings ( no joke )
Using the feature of the "visited link" to determine which of a list of urls have been visited. ( deep history probing ! )
Goodness knows what if the user has Windows/IE/ActiveX

There's a header that can include information about a proxy server the user is using, and that can also include the user's IP address (in which case the other IP is the one of the proxy)
Screen Resolution, Operating System, Color Depth, size of your taskbar (compare max and current resolution), if Java is enabled, Anti-Aliasing Fonts, Plugins Installed all via Javascript
A Java applet can give you a bunch of information as well, but I don't know what.
Sites you've visited
Details of your local network such as active hosts, web servers. Paper Also outlines drive-by printing, drive-by router modification
And this is all assuming the attacker doesn't pull off arbitrary code execution

Javascript can get more information than just time. E.g. screen resolution (+ color depth) being one of them.
See Getting Screen Resolution with JS
Everything JS can capture, can be transmitted using AJAX without the user performing any interaction. Other examples are (not all will work in every browser):
It can look into your browser history, e.g. what URL your browser would go if you hit back or forward.
The language of your browser (Note: usually the HTTP request will also contain a list of preferred languages for the page you request. However this list is user editable in the prefs of many browser, while JS can actually find out what the language translation your browser is using in the interface)
If your browser auto fills form fields (e.g. e-mail, username, etc.), JS can actually already read what your browser entered into the fields before you submitted the form (thus it can even read what your browser pre-filled there, even if you never submit the form at all).
A Java applet could also gather some information and transmit it, though there is not much information you wouldn't already get elsewhere. Since it's easy to get the IP of a visitor, it's possible to find out which online service he's using (looking up the IP at address services like IANA for USA or RIPE for Europe and so on) and there are services that translate IPs to country, so it's possible to find out where the user most likely is currently located.

Some additional info, that might be of interest:
Using the ip address, one can resolve the hostname, net provider / organization the IP belongs to, and rough geographic location.
Using the referer, the list of queries a specified client makes, and a reliable cookie mechanism, one can resolve the path the visitor makes (even clickthroughs to other sides, with AJAX and/or a forwarder page)
Using flash, with a combination of AJAX, the mouse location within the browsing window can be captured
The User Agent might contain information regarding operation system, installed .NET frameworks, and other curiosities

.NET framework versions are transmitted in IE, in the User Agent.

Flash can give you a list of fonts on the user's machine among other things. Javascript can send information when the mouse stops over an ad without clicking it. You can also get the window size, whether the site is open in a frame, if popups or specific plugins have been blocked, looking for Javascript features can tell if the user agent header is correct or faked...

If you're concerned about your personal security (I'm not sure if that's what you're really getting after, so my apologies if this is misguided), you can always use a Tor network. If you use Firefox, you can use Torbutton for one click enabling. It has the benefit (drawback, to some), of disabling Flash because it's otherwise impossible to protect against Flash information leaks.

You can usually determine which language the user speaks through the Accept-Language HTTP header.
You can determine whether certain applications and browser plugins are installed by looking at the Accept HTTP header.
Browser version/patchlevel and .NET framework version through the User-Agent HTTP header.
Your ISP/Employer and geographical location through IP address.
Whether or not you have visited particular URLs through CSS and/or timing load events. If a particular website has user-specific URIs, this could disclose whether you are a certain user on that site or not.
Which fonts are available through measuring ems and/or Flash.
Screen resolution, window size, timezone through JavaScript.
Where you move your mouse and keystrokes through JavaScript. For instance, you can see what people type into text boxes even if they don't hit submit.
Many UserJS/Greasemonkey scripts leak information (e.g. if you filter out certain people, the sites it is configured for may be able to find out who).

Can the browser support JS
Can the browser support flash
Operating system platform
Screen resolution
Supports CSS
Supports tables

I need to dig up the link, but if the user is using IE, with common software titles installed, determining which ones are installed is possible.

As far as I know, it's possible to get clipboard data via javascript. Not sure how possible it is by default these days, but it was all the rage not long ago. I do believe IE still allows it.
People have a habit of leaving very important data in their clipboard, so this is pretty bad.

late to the party here, the website can also scan your ports, to find what software you are running!

Related

Is there a way to get a client's browser and os name such that client cannot modify it?

So i have to get a client's browser and os name. But the thing is that we don't want the user to be able to manipulate information about os or browser. But some websites show that there is only one way to do it that is by using request header userAgent.
Below are the links I've been through:
Retrieving Browser, OS and Device Type By Parsing User Agent
How to prevent user-agent to be changed by user
How do I prevent websites from detecting my OS? Which browser should I use?
so according to these we can only do it with the help of userAgent And it is not a difficult thing for a client to change it and also there is no way that we can detect that if a client has modified it. And it turns out that even mnc's like amazon and facebook rely on userAgent.
So on learning about Device fingerprint i got to know about a javascript library called FingerprintJs and it seems that they don't rely on userAgent for finding out the clients os name as i tried using it and turns out that on manipulating userAgent i got the original result. I am still trying to figure out how they exactly work for getting the os and browser name. And even if client can manipulate this too is there still a way that we can atleast make it difficult for a client to fake about browser and os ?

You are not able to restrict values that are sent with a request to your server. A user will always be able to use e.g. curl to send some arbitrary headers, cookies, etc. You can make it more difficult to tamper with the values through some obscurity, but that is not making such a solution secure.
Device fingerprinting might help, but you will most probably get blocked by ad blockers as they target fingerprinting as well. Still, even if you do implement device fingerprinting and get more accurate data about the user's browser, the user still can tamper with requests and change that data.
I don't know what are your requirements, but normally, you shouldn't be that much concerned with the user's browser or OS.

As there's no guaranteed way of knowing the user's OS/browser (since the user is able to send anything with their request), the more important question to ask may be:
Why do you want to know the user's OS/browser?
This can help us find a better answer for your actual requirements.
For example, this might help: https://developer.mozilla.org/en-US/docs/Web/HTTP/Browser_detection_using_the_user_agent#considerations_before_using_browser_detection

One method I can think of, is through a custom browser extension/plugin. You may even be able to use a browser API, depending on the target browser.
You would then craft a payload, which would compute/calculate the "client signature" out-of-band, not within the browsers standard request cycles and compute a signed, self validating hash, stored as a cookie.
This would require some knowledge of the related layers involved.
You are essentially talking about device fingerprinting.
While there are a vast number of approaches, you may not really want to maintain the overhead required, as it is generally done using multiple approaches, many of which are accomplished by exploiting bugs in browsers, http protocals, network routing analysis and even the clever targeting of numerous OS bugs and or quirks.
A much simpler approach is to feed your user a hashed cookie, with a scheme to detect if it's been modified. That cookie, along with other authentication and verification mechanisms would be far simpler and may be enough for your purposes.
There are 3rd party APIs which provide such a service, if it's really mission critical.
Of course philosophically speaking, if weather or not should you be fingerprinting your users? Is really up to you and the expectations of your users.
But there you go, I hope that provides a broader view of what's involved.

Launching Custom Applications from the browser

I have been looking around SO and other on-line resources but cant seem to locate how this is done. I was wondering how things like magnet links worked on torrent website. They automatically open up and application and pass the appropriate params. I was wondering how could I create one to send a custom program params from the net?
Thanks
s654m

I wouldn't say this is an answer, but it is actually too long for a comment to fit.
Apps tend to register as authorities that can open a specific scheme. I don't know how it's done in desktop apps (especially because depending on each OS, it will vary), but on Android you can catch schemes or base urls by Intent Filters.
The way it works (and I'm pretty sure the functionality is cross-OS) is:
Your app tells the system it can "read" a specific scheme or base url (it could be magnet:// or even http://www.twitter.com/).
When you try to open a URI (Uniform resource identifier, a supergroup that can contain URLs), the system searches for any application that was registered for that kind of URI. I guess it runs from more specific and complete formats to the base. So for instance, this tweet: https://twitter.com/korcholis/status/491724155176222720 may be traced in this order:
https://twitter.com/korcholis/status/491724155176222720 Oh, no registrar? Moving on
https://twitter.com/korcholis/status Nothing yet? Ok
https://twitter.com/korcholis Nnnnnnope?
https://twitter.com Anybody? Ah, you, Totally random name for a Twitter Client know how to handle these links? Then it's yours
This random twitter client gets the full URI and does something accordingly.
As you see, nobody had a chance to track https://, since another application caught the URI before them. In this case, nobody could be your browsers.
It also defines, somehow, a default value. This is the true key why browsers tend to battle to be your default browser of choice. This just tells you they want to be the default applications that catch http://, https:// and probably some more.
The true wonder here is that, as long as there's an app that catches a scheme, you can set the one you want. For instance, it's a common practice that apps from the same developer contain the same schemes, in case the developer wants to share tasks between them. This ensures the user will have to use a group of apps. So, one app can just offer data such as:
my-own-scheme://user/12
While another app is registered to get links that start with
my-own-scheme://
So, if you want to make your own schemes, it's ok, as long as they don't collide with other's. And if you want to read other's schemes, well, that's up to you to search for that. See? This is not a real answer, but I hope it removes almost all doubt.

Identifying requests made by Chrome extensions?

I have a web application that has some pretty intuitive URLs, so people have written some Chrome extensions that use these URLs to make requests to our servers. Unfortunately, these extensions case problems for us, hammering our servers, issuing malformed requests, etc, so we are trying to figure out how to block them, or at least make it difficult to craft requests to our servers to dissuade these extensions from being used (we provide an API they should use instead).
We've tried adding some custom headers to requests and junk-json-preamble to responses, but the extension authors have updated their code to match.
I'm not familiar with chrome extensions, so what sort of access to the host page do they have? Can they call JavaScript functions on the host page? Is there a special header the browser includes to distinguish between host-page requests and extension requests? Can the host page inspect the list of extensions and deny certain ones?
Some options we've considered are:
Rate-limiting QPS by user, but the problem is not all queries are equal, and extensions typically kick off several expensive queries that look like user entered queries.
Restricting the amount of server time a user can use, but the problem is that users might hit this limit by just navigating around or running expensive queries several times.
Adding static custom headers/response text, but they've updated their code to mimic our code.
Figuring out some sort of token (probably cryptographic in some way) we include in our requests that the extension can't easily guess. We minify/obfuscate our JS, so are ok with embedding it in the JS source code (since the variable name it would have would be hard to guess).
I realize this may not be a 100% solvable problem, but we hope to either give us an upper hand in combatting it, or make it sufficiently hard to scrape our UI that fewer people do it.

Welp, guess nobody knows. In the end we just sent a custom header and starting tracking who wasn't sending it.

Possible solutions for keeping track of anonymous users

I'm currently developing a web application that has one feature while allows input from anonymous users (No authorization required). I realize that this may prove to have security risks such as repeated arbitrary inputs (ex. spam), or users posting malicious content. So to remedy this I'm trying to create a sort of system that keeps track of what each anonymous user has posted.
So far all I can think of is tracking by IP, but it seems as though it may not be viable due to dynamic IPs, are there any other solutions for anonymous user tracking?

I would recommend requiring them to answer a captcha before posting, or after an unusual number of posts from a single ip address.
"A CAPTCHA is a program that protects websites against bots by generating and grading tests >that humans can pass but current computer programs cannot. For example, humans can read >distorted text as the one shown below, but current computer programs can't"
That way the spammers are actual humans. That will slow the firehose to a level where you can weed out any that does get through.
http://www.captcha.net/

There's two main ways: clientside and serverside. Tracking IP is all that I can think of serverside; clientside there's more accurate options, but they are all under user's control, and he can reanonymise himself (it's his machine, after all): cookies and storage come to mind.

Drop a cookie with an ID on it. Sure, cookies can be deleted, but this at least gives you something.

My suggestion is:
Use cookies for tracking of user identity. As you yourself have said, due to dynamic IP addresses, you can't reliably use them for tracking user identity.
To detect and curb spam, use IP + user browser agent combination.

How to defend excessive login requests?

Our team have built a web application using Ruby on Rails. It currently doesn't restrict users from making excessive login requests. We want to ignore a user's login requests for a while after she made several failed attempts mainly for the purpose of defending automated robots.
Here are my questions:
How to write a program or script that can make excessive requests to our website? I need it because it will help me to test our web application.
How to restrict a user who made some unsuccessful login attempts within a period? Does Ruby on Rails have built-in solutions for identifying a requester and tracking whether she made any recent requests? If not, is there a general way to identify a requester (not specific to Ruby on Rails) and keep track of the requester's activities? Can I identify a user by ip address or cookies or some other information I can gather from her machine? We also hope that we can distinguish normal users (who make infrequent requests) from automatic robots (who make requests frequently).
Thanks!

One trick I've seen is having form fields included on the login form that through css hacks make them invisible to the user.
Automated systems/bots will still see these fields and may attempt to fill them with data. If you see any data in that field you immediately know its not a legit user and ignore the request.
This is not a complete security solution but one trick that you can add to the arsenal.

In regards to #1, there are many automation tools out there that can simulate large-volume posting to a given url. Depending on your platform, something as simple as wget might suffice; or something as complex (relatively speaking) a script that asks a UserAgent to post a given request multiple times in succession (again, depending on platform, this can be simple; also depending on language of choice for task 1).
In regards to #2, considering first the lesser issue of someone just firing multiple attempts manually. Such instances usually share a session (that being the actual webserver session); you should be able to track failed logins based on these session IDs ang force an early failure if the volume of failed attempts breaks some threshold. I don't know of any plugins or gems that do this specifically, but even if there is not one, it should be simple enough to create a solution.
If session ID does not work, then a combination of IP and UserAgent is also a pretty safe means, although individuals who use a proxy may find themselves blocked unfairly by such a practice (whether that is an issue or not depends largely on your business needs).
If the attacker is malicious, you may need to look at using firewall rules to block their access, as they are likely going to: a) use a proxy (so IP rotation occurs), b) not use cookies during probing, and c) not play nice with UserAgent strings.

RoR provides means for testing your applications as described in A Guide to Testing Rails Applications. Simple solution is to write such a test containing a loop sending 10 (or whatever value you define as excessive) login request. The framework provides means for sending HTTP requests or fake them
Not many people will abuse your login system, so just remembering IP addresses of failed logins (for an hour or any period your think is sufficient) would be sufficient and not too much data to store. Unless some hacker has access to a great many amount of IP addresses... But in such situations you'd need more/serious security measurements I guess.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string