Launching Custom Applications from the browser - web

I have been looking around SO and other on-line resources but cant seem to locate how this is done. I was wondering how things like magnet links worked on torrent website. They automatically open up and application and pass the appropriate params. I was wondering how could I create one to send a custom program params from the net?
Thanks
s654m

I wouldn't say this is an answer, but it is actually too long for a comment to fit.
Apps tend to register as authorities that can open a specific scheme. I don't know how it's done in desktop apps (especially because depending on each OS, it will vary), but on Android you can catch schemes or base urls by Intent Filters.
The way it works (and I'm pretty sure the functionality is cross-OS) is:
Your app tells the system it can "read" a specific scheme or base url (it could be magnet:// or even http://www.twitter.com/).
When you try to open a URI (Uniform resource identifier, a supergroup that can contain URLs), the system searches for any application that was registered for that kind of URI. I guess it runs from more specific and complete formats to the base. So for instance, this tweet: https://twitter.com/korcholis/status/491724155176222720 may be traced in this order:
https://twitter.com/korcholis/status/491724155176222720 Oh, no registrar? Moving on
https://twitter.com/korcholis/status Nothing yet? Ok
https://twitter.com/korcholis Nnnnnnope?
https://twitter.com Anybody? Ah, you, Totally random name for a Twitter Client know how to handle these links? Then it's yours
This random twitter client gets the full URI and does something accordingly.
As you see, nobody had a chance to track https://, since another application caught the URI before them. In this case, nobody could be your browsers.
It also defines, somehow, a default value. This is the true key why browsers tend to battle to be your default browser of choice. This just tells you they want to be the default applications that catch http://, https:// and probably some more.
The true wonder here is that, as long as there's an app that catches a scheme, you can set the one you want. For instance, it's a common practice that apps from the same developer contain the same schemes, in case the developer wants to share tasks between them. This ensures the user will have to use a group of apps. So, one app can just offer data such as:
my-own-scheme://user/12
While another app is registered to get links that start with
my-own-scheme://
So, if you want to make your own schemes, it's ok, as long as they don't collide with other's. And if you want to read other's schemes, well, that's up to you to search for that. See? This is not a real answer, but I hope it removes almost all doubt.

Related

Is there a way to get a client's browser and os name such that client cannot modify it?

So i have to get a client's browser and os name. But the thing is that we don't want the user to be able to manipulate information about os or browser. But some websites show that there is only one way to do it that is by using request header userAgent.
Below are the links I've been through:
Retrieving Browser, OS and Device Type By Parsing User Agent
How to prevent user-agent to be changed by user
How do I prevent websites from detecting my OS? Which browser should I use?
so according to these we can only do it with the help of userAgent And it is not a difficult thing for a client to change it and also there is no way that we can detect that if a client has modified it. And it turns out that even mnc's like amazon and facebook rely on userAgent.
So on learning about Device fingerprint i got to know about a javascript library called FingerprintJs and it seems that they don't rely on userAgent for finding out the clients os name as i tried using it and turns out that on manipulating userAgent i got the original result. I am still trying to figure out how they exactly work for getting the os and browser name. And even if client can manipulate this too is there still a way that we can atleast make it difficult for a client to fake about browser and os ?
You are not able to restrict values that are sent with a request to your server. A user will always be able to use e.g. curl to send some arbitrary headers, cookies, etc. You can make it more difficult to tamper with the values through some obscurity, but that is not making such a solution secure.
Device fingerprinting might help, but you will most probably get blocked by ad blockers as they target fingerprinting as well. Still, even if you do implement device fingerprinting and get more accurate data about the user's browser, the user still can tamper with requests and change that data.
I don't know what are your requirements, but normally, you shouldn't be that much concerned with the user's browser or OS.
As there's no guaranteed way of knowing the user's OS/browser (since the user is able to send anything with their request), the more important question to ask may be:
Why do you want to know the user's OS/browser?
This can help us find a better answer for your actual requirements.
For example, this might help: https://developer.mozilla.org/en-US/docs/Web/HTTP/Browser_detection_using_the_user_agent#considerations_before_using_browser_detection
One method I can think of, is through a custom browser extension/plugin. You may even be able to use a browser API, depending on the target browser.
You would then craft a payload, which would compute/calculate the "client signature" out-of-band, not within the browsers standard request cycles and compute a signed, self validating hash, stored as a cookie.
This would require some knowledge of the related layers involved.
You are essentially talking about device fingerprinting.
While there are a vast number of approaches, you may not really want to maintain the overhead required, as it is generally done using multiple approaches, many of which are accomplished by exploiting bugs in browsers, http protocals, network routing analysis and even the clever targeting of numerous OS bugs and or quirks.
A much simpler approach is to feed your user a hashed cookie, with a scheme to detect if it's been modified. That cookie, along with other authentication and verification mechanisms would be far simpler and may be enough for your purposes.
There are 3rd party APIs which provide such a service, if it's really mission critical.
Of course philosophically speaking, if weather or not should you be fingerprinting your users? Is really up to you and the expectations of your users.
But there you go, I hope that provides a broader view of what's involved.

Is it safe to use a UUID in a URL for semi-private data?

I run a landscaping company and have multiple crews. I want to provide each one with a custom URL (like mysite.com/xxxx-xxxx-xxxx) that shows their daily schedule. Going to the page will list the name, address and phone number of 5-10 customers for the day.
Is it safe/wise to use a UUID in a URL for semi-private data?
Depends on how safe you want it to be.
Are the UUIDs used for anything else? If not, they are fine for creating random URLs.
But, browser history would allow anyone using the same machine to find the URLs. Also, unless using https, a network sniffer could easily see the requested URLs and go to the same page.
Another concern is spider bots. Make sure nothing links to those pages, use a robots.txt to prevent indexing the site, but you still might find that some of the pages show up on search engines. It might be better to have the UUID set in a cookie and check that for determining which employee it is, lest your semi-private pages start showing up on google.
Whether or not that schema would work for you, depends on your threat model (as well as some implementation details). Without a concrete threat model, it is not possible to give a definitive answer to your question.
I can, however, give you some ideas about potential issues with the solution, so you can determine if they are relevant for your application. This is not a complete list.
On the implementation side of things:
Not all UUID generators are created equal. Ideally, you want to use a generator based on a cryptographically secure RNG, providing an UUID where every byte is chosen at random.
Using the UUID for a database lookup or similar operation is not necessarily a constant-time operation (and thus there might be side-channel attacks unless you implement the lookup by yourself)
Make sure your URI does not leak via referrer
Some tools attempt to detect 'secret' URLs to protect them from history synchronization or other automatic features. Your schema will most likely not be detected as 'secret'. It might be better to artificially lengthen your URI and to move your UUID into a query parameter.
You can further reduce attack surface with the usual methods (rate limiting, server hardening, etc.)
On the conceptual side of things:
A single identifier for both identification and authentication is not necessarily a bad thing. However, in most cases there is a need for an identification-only identifier – you must not use the 'secret' UUID in those scenarios
If a 'crew' consists of multiple people: you cannot revoke access for a single crew member
Some software (antivirus, browser, etc.) treats information in URLs as public information, and might upload them without user interaction

Possible solutions for keeping track of anonymous users

I'm currently developing a web application that has one feature while allows input from anonymous users (No authorization required). I realize that this may prove to have security risks such as repeated arbitrary inputs (ex. spam), or users posting malicious content. So to remedy this I'm trying to create a sort of system that keeps track of what each anonymous user has posted.
So far all I can think of is tracking by IP, but it seems as though it may not be viable due to dynamic IPs, are there any other solutions for anonymous user tracking?
I would recommend requiring them to answer a captcha before posting, or after an unusual number of posts from a single ip address.
"A CAPTCHA is a program that protects websites against bots by generating and grading tests >that humans can pass but current computer programs cannot. For example, humans can read >distorted text as the one shown below, but current computer programs can't"
That way the spammers are actual humans. That will slow the firehose to a level where you can weed out any that does get through.
http://www.captcha.net/
There's two main ways: clientside and serverside. Tracking IP is all that I can think of serverside; clientside there's more accurate options, but they are all under user's control, and he can reanonymise himself (it's his machine, after all): cookies and storage come to mind.
Drop a cookie with an ID on it. Sure, cookies can be deleted, but this at least gives you something.
My suggestion is:
Use cookies for tracking of user identity. As you yourself have said, due to dynamic IP addresses, you can't reliably use them for tracking user identity.
To detect and curb spam, use IP + user browser agent combination.

Programmatic Bot Detection

I need to write some code to analyze whether or not a given user on our site is a bot. If it's a bot, we'll take some specific action. Looking at the User Agent is not something that is successful for anything but friendly bots, as you can specify any user agent you want in a bot. I'm after behaviors of unfriendly bots. Various ideas I've had so far are:
If you don't have a browser ID
If you don't have a session ID
Unable to write a cookie
Obviously, there are some cases where a legitimate user will look like a bot, but that's ok. Are there other programmatic ways to detect a bot, or either detect something that looks like a bot?
User agents can be faked. Captchas have been cracked. Valid cookies can be sent back to your server with page requests. Legitimate programs, such as Adobe Acrobat Pro can go in and download your web site in one session. Users can disable JavaScript. Since there is no standard measure of "normal" user behaviour, it cannot be differentiated from a bot.
In other words: it can't be done short of pulling the user into some form of interactive chat and hope they pass the Turing Test, then again, they could be a really good bot too.
Clarify why you want to exclude bots, and how tolerant you are of mis-classification.
That is, do you have to exclude every single bot at the expense of treating real users like bots? Or is it okay if bots crawl your site as long as they don't have a performance impact?
The only way to exclude all bots is to shut down your web site. A malicious user can distribute their bot to enough machines that you would not be able to distinguish their traffic from real users. Tricks like JavaScript and CSS will not stop a determined attacker.
If a "happy medium" is satisfactory, one trick that might be helpful is to hide links with CSS so that they are not visible to users in a browser, but are still in the HTML. Any agent that follows one of these "poison" links is a bot.
A simple test is javascript:
<script type="text/javascript">
document.write('<img src="/not-a-bot.' + 'php" style="display: none;">');
</script>
The not-a-bot.php can add something into the session to flag that the user is not a bot, then return a single pixel gif.
The URL is broken up to disguise it from the bot.
Here's an idea:
Most bots don't download css, javascript and images. They just parse the html.
If you could keep track in a user's session whether or not they download all of the above, e.g. by routing all of the download requests through a script that logs the attempts, then you could quickly identify users that only download the raw html (very few normal users will do this).
You say that it is okay that some users appear as bots, therefore,
Most bots don't run javascript. Use javascript to do an Ajax like call to the server that identifies this IP address as NonBot. Store that for a set period of time to identify future connections from this IP as good clients and to prevent further wasteful javascript calls.
For each session on the server you can determine if the user was at any point clicking or typing too fast. After a given number of repeats, set the "isRobot" flag to true and conserve resources within that session. Normally you don't tell the user that he's been robot-detected, since he'd just start a new session in that case.
Well, this is really for a particular page of the site. We don't want a bot submitting the form b/c it messes up tracking. Honestly, the friendly bots, Google, Yahoo, etc aren't a problem as they don't typically fill out the form to begin with. If we suspected someone of being a bot, we might show them a captcha image or something like that... If they passed, they're not a bot and the form submits...
I've heard things like putting a form in flash, or making the submit javascript, but I'd prefer not to prevent real users from using the site until I suspected they were a bot...
I think your idea with checking the session id will already be quite useful.
Another idea: You could check whether embedded resources are downloaded as well.
A bot which does not load images (e.g. to save time and bandwidth) should be distinguishable from a browser which typically will load images embedded into a page.
Such a check however might not be suited as a real-time check because you would have to analyze some sort of server log which might be time consuming.
Hey, thanks for all the responses. I think that a combination of a few suggestions will work well. Mainly, the hidden form element that times how fast the form was filled out, and possibly the "poison link" idea. I think that it will cover most basis. When you're talking about bots, you're not going to find them all, so there's no point thinking that you will... Silly bots.

What identifying information can a website capture?

If the owner of a web site wants to track who their users are as much as possible, what things can they capture (and how). You might want to know about this in order to capture information on a site you create or, as a user, to prevent a site from capturing data on you.
Here is a starting list, but I'm sure I have missed some important ones:
Referrer (what web page had the link you followed to get here). This is a HTTP header.
IP Address of the machine you are browsing from. This is available with the HTTP headers.
User Agent (what browser you are using). This is a HTTP header.
Cookie placed on a previous visit. This is a header, available only if a cookie was placed earlier and was not deleted by the user.
Flash Cookie placed on a previous visit. Some users turn off cookies, but very few know how to turn off Flash cookies. Works like a normal cookie although it depends on Flash.
Web Bugs. Place something small (like a transparent single-pixel GIF) on the page that's served up from a 3rd party. Some third parties (such as DoubleClick) will have their own cookies and can correlate with other visits the user makes (for a fee!).
Those are the common ones I think of, but there have to be LOTS of unusual ones. For instance, this:
Time on the user's clock. Use JavaScript to transmit it.
... which I had never heard of before reading it here.
ADDED LATER (after reading this):
Please try to put just ONE item per answer, then we can use voting up to sort out the better/more-interesting ones. The list below is probably less effective.
Ah well... NEXT time I ask a question like this I'll set it up better.
And here are some of the best answers I got:
James points out that IE transmits the .NET framework version.
AviewAnew points out that one can find what sites you have visited.
Mecki points out that Screen Resolution can be determined.
Mecki also points out that any auto-fill information your browser has cached can be determined, by creating a hidden field, then reading it with JavaScript.
jjrv points out that Flash can list the fonts on the user's machine.
Kent points out that you can find out what websites a person has visited.
Silver Dragon points out you can determine the location of the mouse within the browsing window using Flash and AJAX.
Jim points out that you can tell what language the user has configured in their browser from a HTTP header.
Jim also mentions that you can detect whether people are using Greasemonkey or something similar to modify the page.
Modifications to your original:
can be escaped ( i think its an option in some browsers )
only avoidable with a proxy ( javascript can contravene this however with smart lookaround )
is unreliable, easily forged.
And assuming it was not wiped by browser closure ( session cookie ) and cookie is in the same domain/path
The real nasty ones are
Using javascript to probe your network/lan
Using javascript to access your firewall from behind the firewall and adjust its settings ( no joke )
Using the feature of the "visited link" to determine which of a list of urls have been visited. ( deep history probing ! )
Goodness knows what if the user has Windows/IE/ActiveX
There's a header that can include information about a proxy server the user is using, and that can also include the user's IP address (in which case the other IP is the one of the proxy)
Screen Resolution, Operating System, Color Depth, size of your taskbar (compare max and current resolution), if Java is enabled, Anti-Aliasing Fonts, Plugins Installed all via Javascript
A Java applet can give you a bunch of information as well, but I don't know what.
Sites you've visited
Details of your local network such as active hosts, web servers. Paper Also outlines drive-by printing, drive-by router modification
And this is all assuming the attacker doesn't pull off arbitrary code execution
Javascript can get more information than just time. E.g. screen resolution (+ color depth) being one of them.
See Getting Screen Resolution with JS
Everything JS can capture, can be transmitted using AJAX without the user performing any interaction. Other examples are (not all will work in every browser):
It can look into your browser history, e.g. what URL your browser would go if you hit back or forward.
The language of your browser (Note: usually the HTTP request will also contain a list of preferred languages for the page you request. However this list is user editable in the prefs of many browser, while JS can actually find out what the language translation your browser is using in the interface)
If your browser auto fills form fields (e.g. e-mail, username, etc.), JS can actually already read what your browser entered into the fields before you submitted the form (thus it can even read what your browser pre-filled there, even if you never submit the form at all).
A Java applet could also gather some information and transmit it, though there is not much information you wouldn't already get elsewhere. Since it's easy to get the IP of a visitor, it's possible to find out which online service he's using (looking up the IP at address services like IANA for USA or RIPE for Europe and so on) and there are services that translate IPs to country, so it's possible to find out where the user most likely is currently located.
Some additional info, that might be of interest:
Using the ip address, one can resolve the hostname, net provider / organization the IP belongs to, and rough geographic location.
Using the referer, the list of queries a specified client makes, and a reliable cookie mechanism, one can resolve the path the visitor makes (even clickthroughs to other sides, with AJAX and/or a forwarder page)
Using flash, with a combination of AJAX, the mouse location within the browsing window can be captured
The User Agent might contain information regarding operation system, installed .NET frameworks, and other curiosities
.NET framework versions are transmitted in IE, in the User Agent.
Flash can give you a list of fonts on the user's machine among other things. Javascript can send information when the mouse stops over an ad without clicking it. You can also get the window size, whether the site is open in a frame, if popups or specific plugins have been blocked, looking for Javascript features can tell if the user agent header is correct or faked...
If you're concerned about your personal security (I'm not sure if that's what you're really getting after, so my apologies if this is misguided), you can always use a Tor network. If you use Firefox, you can use Torbutton for one click enabling. It has the benefit (drawback, to some), of disabling Flash because it's otherwise impossible to protect against Flash information leaks.
You can usually determine which language the user speaks through the Accept-Language HTTP header.
You can determine whether certain applications and browser plugins are installed by looking at the Accept HTTP header.
Browser version/patchlevel and .NET framework version through the User-Agent HTTP header.
Your ISP/Employer and geographical location through IP address.
Whether or not you have visited particular URLs through CSS and/or timing load events. If a particular website has user-specific URIs, this could disclose whether you are a certain user on that site or not.
Which fonts are available through measuring ems and/or Flash.
Screen resolution, window size, timezone through JavaScript.
Where you move your mouse and keystrokes through JavaScript. For instance, you can see what people type into text boxes even if they don't hit submit.
Many UserJS/Greasemonkey scripts leak information (e.g. if you filter out certain people, the sites it is configured for may be able to find out who).
Can the browser support JS
Can the browser support flash
Operating system platform
Screen resolution
Supports CSS
Supports tables
I need to dig up the link, but if the user is using IE, with common software titles installed, determining which ones are installed is possible.
As far as I know, it's possible to get clipboard data via javascript. Not sure how possible it is by default these days, but it was all the rage not long ago. I do believe IE still allows it.
People have a habit of leaving very important data in their clipboard, so this is pretty bad.
late to the party here, the website can also scan your ports, to find what software you are running!

Resources