Legally parsing browser history - browser

I want to create a service that suggests things based on the sites one has visited. It would be a user controllable process. In other words, the service would start "recording / suggesting" from the users browsing history only tell the service to start (and stop).
I'm not looking for hacks or potentially illegal methods. Technically would this be possible with Javascript as (say) something like a bookmarklet? Or would it need something with more fundamental browser access like an extension?
Thanks in advance for any guidance.

You would need an extension to get access to this much data and it'll have to be custom for each browser.

Related

How secure is data passed via Custom Protocol Handler?

Let's say you set up a custom protocol handler to run an application with some startup data, right?
myapp:\\somedata
How secure is that data? I'm having trouble finding any resources talking about:
Do browsers cache this data?
Can other applications see this information get passed and how?
I've found resources talking about obvious problems, like attacking sites can abuse your protocol if they find you have one.
Otherwise, for developers looking to use their website to launch an app in this way, what do we need to be concerned about if we don't want anyone else seeing "somedata"? More specifically, how is the data accessible to attackers?
Any MDN or other official references would be much appreciated!

Script for Online Tasks

I was wondering if there is a way to write a script that could perform online tasks . There are actions a user does that is so repetitive/planed, that make me wonder if there is a way i could write a script that could do that for me. What i mean is, for example, a script that could go online, lets say facebook, write/read a post. It seems such a straight forward action that it has to be possible to be done by a scrip.
The thing is I have no idea how to do, my question here is some guidance, if possible, all i need to know is a good language for this and a good aproach. I can't seem to find anything for this, probably not searching the right terms.
Thanks for your time. :)
If you're looking to mock user behavior in the browser (such as filling out a form), you could use Python, a web driver, and the module Selenium.
Selenium will open the web driver, then allow you to mock user actions, such as selecting a text box, typing data, and then clicking submit. This allows you automate actions such as a search on a website, verifying a website works the way you expect it to when a user takes certain actions, and filling out input elements on a page and submitting a form.
You need to understand concepts like javascript, ajax, servlets plus advance level concepts like in advance java as these all are event associated for bringing dynamisation to your web page and other than this session level concepts to deal if session is active or expired this all adds to automating based on session attribute.
Finally, at database level you can use triggers to event fire changes when needed.

"Sandbox" Google Analytics for security

By including Google Analytics in a website (specifically the Javascript version) isn't it true that you are giving Google complete access to all your cookies and site information? (ie. it could be a security hole).
Can this be mitigated by putting Google in an iFrame that is sandboxed? Or maybe only passing Google the necessary information (ie. browser type, screen resolution, etc)?
How can someone get the most out of Google Analytics without leaving the entire site open?
Or perhaps passing the data through my own server and then uploading it to Google?
You can create a scriptless implementation via the measurement protocol (for Universal Analytics enabled properties). This not only avoids any security issues with the script (although I'd rather trust Google on that), it also means you have more control what data is submitted to the Google Server.
A script run on your site can read cookies on your site, yes. And that data can be sent back to google, yes. That is why you shouldn't store sensitive information in cookies. You shouldn't do this even if you don't use google analytics. Even if you don't use ANY other code except your own. Browsers and browser addons can also read that stuff and you definitely cannot control that. Again, never store sensitive information in cookies.
As far as access to "site information".. javascript can be used to read the content on your pages, know urls of pages, etc.. IOW anything you serve up on a web page. Anything that is not behind a wall (e.g. login barrier) is surely up for grabs. But crawlers will look at that stuff anyway. Stuff behind walls can still be grabbed automatically, depending on what they have to actually do to get past those walls (e.g. simple registration/login barriers are pretty easy to get past).
This is also why you should never display sensitive information even in content of your site. E.g. credit card numbers, passwords, etc.. that's why virtually every site you go to that has even remotely sensitive information always shows a mask (e.g. ** ) instead of actual values.
Google Analytics does not actively do these things, but you're right: there's nothing stopping them from doing it, and you've already given them the right to do it by using their script.
And you are right: the safest way to control what Google can actually see is to send server-side requests to them. And also put all your content behind barriers that cannot be easily crawled or scraped. The strongest barrier being one that involves having to pay for access. People are ingenious about making bots about making crawlers and bots to get past all sorts of forms and "human" checks etc.. and you're fighting a losing battle on that count, but nothing stops a bot faster than requiring someone to give you money to access your stuff. Of course, this also means you'd have to make everybody pay for access...
Anyways.. if you're that paranoid about this stuff, why use GA at all? Use something you host yourself (e.g. Piwik). This won't solve for crawlers/bots, obviously, but it will solve for worries about GA grabbing more than you want it to.

I want to use security through obscurity for the admin interface of a simple website. Can it be a problem?

For the sake of simplicity I want to use admin links like this for a site:
http://sitename.com/somegibberish.php?othergibberish=...
So the actual URL and the parameter would be some completely random string which only I would know.
I know security through obscurity is generally a bad idea, but is it a realistic threat someone can find out the URL? Don't take the employees of the hosting company and eavesdroppers on the line into account, because it is a toy site, not something important and the hosting company doesn't give me secure FTP anyway, so I'm only concerned about normal visitors.
Is there a way of someone finding this URL? It wouldn't be anywhere on the web, so Google won't now it about either. I hope, at least. :)
Any other hole in my scheme which I don't see?
Well, if you could guarantee only you would ever know it, it would work. Unfortunately, even ignoring malicious men in the middle, there are many ways it can leak out...
It will appear in the access logs of your provider, which might end up on Google (and are certainly read by the hosting admins)
It's in your browsing history. Plugins, extensions etc have access to this, and often use upload it elsewhere (i.e. StumbleUpon).
Any proxy servers along the line see it clearly
It could turn up as a Referer to another site
some completely random string
which only I would know.
Sounds like a password to me. :-)
If you're going to have to remember a secret string I would suggest doing usernames and passwords "properly" as HTTP servers will have been written to not leak password information; the same is not true of URLs.
This may only be a toy site but why not practice setting up security properly as it won't matter if you get it wrong. So hopefully, if you do have a site which you need to secure in future you'll have already made all your mistakes.
I know security through obscurity is
generally a very bad idea,
Fixed it for you.
The danger here is that you might get in the habit of "oh, it worked for Toy such-and-such site, so I won't bother implementing real security on this other site."
You would do a disservice to yourself (and any clients/users of your system) if you ignore Kerckhoff's Principle.
That being said, rolling your own security system is a bad idea. Smarter people have already created security libraries in the other major languages, and even smarter people have reviewed and tweaked those libraries. Use them.
It could appear on the web via a "Referer leak". Say your page links to my page at http://entrian.com/, and I publish my web server referer logs on the web. There'll be an entry saying that http://entrian.com/ was accessed from http://sitename.com/somegibberish.php?othergibberish=...
As long as the "login-URL" never posted anywhere, there shouldn't be any way for search engines to find it. And if it's just a small, personal toy-site with no personal or really important content, I see this as a fast and decent-working solution regarding security compared to implementing some form of proper login/authorization system.
If the site is getting a big number of users and lots of content, or simply becomes more than a "toy site", I'd advice you to do it the proper way
I don't know what your toy admin page would display, but keep in mind that when loading external images or linking to somewhere else, your referrer is going to publicize your URL.
If you change http into https, then at least the url will not be visible to anyone sniffing on the network.
(the caveat here is that you also need to consider that very obscure login system can leave interesting traces to be found in the network traces (MITM), somewhere on the site/target for enabling priv.elevation, or on the system you use to log in if that one is no longer secure and some prefer admin login looking no different from a standard user login to avoid that)
You could require that some action be taken # of times and with some number of seconds of delays between the times. After this action,delay,action,delay,action pattern was noticed, the admin interface would become available for login. And the urls used in the interface could be randomized each time with a single use url generated after that pattern. Further, you could only expose this interface through some tunnel and only for a minute on a port encoded by the delays.
If you could do all that in a manner that didn't stand out in the logs, that'd be "clever" but you could also open up new holes by writing all that code and it goes against "keep it simple stupid".

Where does the browser fail as a client

Where should the browser be improved upon to help improve application experiences?
For instance some of my main gripes are
A) Different browsers will need different configurations / plugins (I dont want to download different JRE's, RIA platforms such as flash, silverright, gears so forth)
B) I want to always be able to drag data from my desktop to a 'Webapp'. I don't like clicking browse for file and then uploading it. I think this is something that should be easily handled.
Additionally based on the above point - I'd like for it to be very easy to drag information from a web page to my computer to be used in whatever shape form or matter needed. For instance I'd like if I could drag my user ID from stack overflow into my mail / crm client which would take relevant information and maybe even build a picture up of my knowledge.
What else am I missing ?
I see that current problem is ever-growing pile of technologies. Just connect straight to users browser API calls via RPC. RPC is future way to go and it puts end to this tech-piling-up-trend. Currently security is too weak for this kind of technology. Maybe some sort of virtual sandbox would fix that.
See RPyc theory of operation and screencast, it explains the idea pretty well.

Resources