Google Docs as Content Management System - google-docs

I'm thinking of using Google Docs as a content management system, and to integrate it with my java/j2ee web application.
I only need to upload, view, search meta-data, and organize docs.
Would anybody have a reason to believe I should not try this?

One good reason not to do that is that then you have no control over your system's uptime. Google does occasionally have outages, which would take your system down as well.
In addition, by storing them on Google's servers, you are giving up any control over privacy. There is nothing you can do to ensure Google's security of both their live systems and their backup systems will never be broken, and if they get broken in to, your documents' privacy is lost. In addition, you'll need to keep an eye on Google's terms of use. They may very well update it to read "We reserve the right to sell your documents to whomever we please." which may include your competitors.
That being said, if downtime won't break you, and privacy isn't a huge concern, it doesn't sound like a bad idea. Just make sure they're not the ONLY place you're storing your documents.

Related

How could I protect the users from my webpage from being tracked?

This is maybe quite a broad question and I tried to look for other stack exchanges where addressing my question would suit better – but in the end I decided that it might be still a question of a technical nature, and so I am posting it here:
I recently started to think more about privacy and security and I realized that I as a web user can only do so much about staying untracked. VPN, (slow) Tor, privacy helpers, add-blockers, Firefox are just a few tools to name, but still I realize that the information that I normally share (like installed add-ons, browser size, IP location etc.) can still very well be fingerprinted.
Normally as a web-developer I am told that we should add analytics, that we should find out more about the users to «make a better service», but I think I would like to do the opposite.
So:
Are there steps I could take, when building a website, that help the visitors to stay untracked? And I don't mean «not installing google analytics», I mean things like somehow actively messing with the statistics, so that my hosters server is incapable of tracking things correctly or similar things...
Right now I can't really think of anything, but I somehow believe that I as a person who builds bricks of the internet could and should be able to influence these kind of things directly...
For now I see the obvious things:
- not using statistic services
- use https
- not using any third party tools that might include tracking or open doors for other trackers
But still this seems to just omit the bad things, but I can't actually do active stuff...
So I would be very glad to hear your thoughts about this. (Or guide me to a place, where this discussion fits a better..)
Cheers
merc
As a web developer, you can only control your website.
Assuming you aren't caching any data or using cookies, then users shouldn't be tracked while using your website by tools like 3rd party cookies.
Here is a good article about online tracking and how it works.
As far as I know, there isn't an effective way to actively mess with tracking statistics. Your best bet is to avoid installing libraries or tools that track your users.

"Sandbox" Google Analytics for security

By including Google Analytics in a website (specifically the Javascript version) isn't it true that you are giving Google complete access to all your cookies and site information? (ie. it could be a security hole).
Can this be mitigated by putting Google in an iFrame that is sandboxed? Or maybe only passing Google the necessary information (ie. browser type, screen resolution, etc)?
How can someone get the most out of Google Analytics without leaving the entire site open?
Or perhaps passing the data through my own server and then uploading it to Google?
You can create a scriptless implementation via the measurement protocol (for Universal Analytics enabled properties). This not only avoids any security issues with the script (although I'd rather trust Google on that), it also means you have more control what data is submitted to the Google Server.
A script run on your site can read cookies on your site, yes. And that data can be sent back to google, yes. That is why you shouldn't store sensitive information in cookies. You shouldn't do this even if you don't use google analytics. Even if you don't use ANY other code except your own. Browsers and browser addons can also read that stuff and you definitely cannot control that. Again, never store sensitive information in cookies.
As far as access to "site information".. javascript can be used to read the content on your pages, know urls of pages, etc.. IOW anything you serve up on a web page. Anything that is not behind a wall (e.g. login barrier) is surely up for grabs. But crawlers will look at that stuff anyway. Stuff behind walls can still be grabbed automatically, depending on what they have to actually do to get past those walls (e.g. simple registration/login barriers are pretty easy to get past).
This is also why you should never display sensitive information even in content of your site. E.g. credit card numbers, passwords, etc.. that's why virtually every site you go to that has even remotely sensitive information always shows a mask (e.g. ** ) instead of actual values.
Google Analytics does not actively do these things, but you're right: there's nothing stopping them from doing it, and you've already given them the right to do it by using their script.
And you are right: the safest way to control what Google can actually see is to send server-side requests to them. And also put all your content behind barriers that cannot be easily crawled or scraped. The strongest barrier being one that involves having to pay for access. People are ingenious about making bots about making crawlers and bots to get past all sorts of forms and "human" checks etc.. and you're fighting a losing battle on that count, but nothing stops a bot faster than requiring someone to give you money to access your stuff. Of course, this also means you'd have to make everybody pay for access...
Anyways.. if you're that paranoid about this stuff, why use GA at all? Use something you host yourself (e.g. Piwik). This won't solve for crawlers/bots, obviously, but it will solve for worries about GA grabbing more than you want it to.

Corporate Espionage of Website Source Code

This may not be the most technical question, but I was just interested, nonetheless...
How does a giant company like Google keep from having their code stolen by employees? Maybe I'm wrong, but I would assume that their source code to their search algorithms (amongst other things) would be valuable to their competitors (i.e. Microsoft).
I guess I can best phrase it like this:
What's keeping an unscrupulous
employee who has sufficient clearance from
accessing Google's code repository for
a specific project and copying significant amounts of code
to a flash drive and taking it to their
competitors?
Fear of being sued?
Things within a company like Google are also compartmentalized. So not everybody has access to all code. If someone has access to code, you can bet that Google knows when they access it. I'm sure they have some kind of algorithm that looks and sees if somebody just downloads a lot of files very fast. The search algorithm isn't a small file obviously, it is a gigantic application.
All this would allow them to track who has stolen the code from within. There is also the fact that any self-respecting company or company with something to lose (i.e. Microsoft) would not take anything like this from somebody. They would probably even tell Google about it.
It is called protocol. The idea that only a few people get to know the code. In which then those few have to tell a major very embarrassing secret to the others. So then nobody can tell or else they get outed in the public. Which can be very simple like they like something, compared to as bashful as they are all the way to they killed somebody.
Many employers, including one that I've worked for, completely block flash drives.
In many cases, though, this is to protect non-technical confidential information.
Companies that are serious about protecting their assets will have access logging on their core systems and active scanning to detect suspicious patterns. Similar security is implemented for employees of government agencies (e.g. tax, social security) holding sensitive personal information. Users who access data outside of their assigned cases can be flagged and investigated.
I suspect (but don't know) that similar scanning could be implemented in high value source code repositories.
Some organizations block the use of removable media (It has been reported that some agencies have reacted to Wikileaks with such policies), in some cases by physically gluing up the USB/media ports. This restricts potential thiefs to network transfers of material which can be scanned.
I think companies such as Google will implement access control on their source code repository / version control system. So their employee would only be able to access source code in which they were involved. And their access could be revoked from previous repository if they're being assigned to different project. Its the same thing with normal internal documents, would a security-conscious company let documents be downloaded by any employee freely ?
I think codethis hit the nail on the head. Some fly-by-night operation may be interested, but Microsoft, Yahoo, etc - wouldn't touch stolen code with a ten foot pole. And the fly-by-night wouldn't have the infrastructure. If you didn't tell anybody it was stolen - it's not like you could get away with walking in to a company with an entire spider/searching algorithm on your thumbdrive and declare you wrote it last week.
The bigger threat is details of the search algorithm getting out. SEOers, as a whole, are rather shady - and many would kill for solid facts about how the algorithm ranked or downranked pages. Even then, Google has demonstrated the ability to change their ranking algorithms so quickly that it wouldn't much matter.
On the other hand, Google doesn't have that much super-secret code. Most of their cool stuff (MapReduce et.al) is publicly available (see Hadoop). This question is probably more applicable to a company like Adobe. Some of their Photoshop algorithms are really cool, and would probably hurt them if they got out - but again, no legit company would touch it.

Are services like AWS secure enough for an organization that is highly responsible for it's clients privacy?

Okay, so we have to store our clients` private medical records online and also the web site will have a lot of requests, so we have to use some scaling solutions.
We can have our own share of a datacenter and run something like Zend Server Cluster Manager on it, but services like Amazon EC2 look a lot easier to manage, and they are incredibly cheaper too. We just don't know if they are secure enough!
Are they?
Any better solutions?
More info: I know that there is a reference server and it's highly secured and without it, even the decrypted data on the cloud server would be useless. It would be a bunch of meaningless numbers that aren't even linked to each other.
Making the question more clear: Are there any secure storage and process service providers that guarantee there won't be leaks from their side?
First off, you should contact AWS and explain what you're trying to build and the kind of data you deal with. As far as I remember, they have regulations in place to accommodate most if not all the privacy concerns.
E.g., in Germany such thing is a called a "Auftragsdatenvereinbarung". I have no idea how this relates and translates to other countries. AWS offers this.
But no matter if you go with AWS or another cloud computing service, the issue stays the same. And therefor, whatever is possible is probably best answered by a lawyer and based on the hopefully well educated (and expensive) recommendation, I'd go cloud shopping, or maybe not. If you're in the EU, there are a ton of regulations especially in regards to medical records -- some countries add more to it.
From what I remember it's basically required to have end to end encryption when you deal with these things.
Last but not least security also depends on the setup and the application, etc..
For complete and full security, I'd recommend a system that is not connected to the Internet. All others can fail.
You should never outsource highly sensitive data. Your company and only your company should have access to it - in both software and hardware terms. Even if your hoster is generally trusted someone there might just steal hardware.
Depending on the size of your company you should have your custom servers - preferable even unaccessible for the technicans in your datacenter (supposing you don't own the datacenter ;).
So the more important the data is, the less foreign people should have access to it in any means. In the best case you can name all people that have access to them in any way.
(Update: This might not apply to anonymous data, but as you're speaking of customers I don't think that applies here?)
(On a third thought: There're are probably laws to take into consideration of how you have to handle that kind of information ;)

network drive file sharing

For the better part of 10 years + we have relied on various network mapped drives to allow file sharing. One drive letter for sharing files between teams, a seperate file share for the entire organization, a third for personal use etc. I would like to move away from this and am trying to decide if an ECM/Sharepoint type solution, or home grown app, is worth the cost and the way to go? Or if we should simply remain relying on login scripts/mapped drives for file sharing due to its relative simplicity? Does anyone have any exeperience within their own organization or thoughts on this?
Thanks.
SharePoint is very good at document sharing.
Documents generally follow a process for approval, have permissions, live in clusters... and these things lend themselves well to SharePoints document libraries.
However there are somethings that don't lend themselves well to living inside SharePoint... do you have a virtual hard drive (.vhd) file that you want to share with a workmate? Not such a good idea to try and put a 20GB file into SharePoint.
SharePoint can handle large files, and so can SQL Server behind it... but do you want your SQL Server bandwidth being saturated by such large files? Do you want your backup of SQL Server to hold copies of such large files multiple times?
I believe that there are a few Microsoft partners who offer the ability to disassociate file blobs from the SharePoint database, so that SharePoint can hold the metadata and a file system holds the actual files, and SharePoint simply becomes the gateway to manage access, permissions, and offer a centralised interface to files throughout an organisation. This would offer you the best of both worlds.
Right now though, I consider SharePoint ideal for documents, and I keep large files (that are not document centric) on Windows file shares.
Definetely, use a tool.
The main benefit here is version control. Being able to jump easily to a previous version, diff'ing and seeing who modified what (see most VCS' blame/annotate tool- it prints out a text file showing when/who modified each line in the text file).
Second, you can probably benefit from issue tracking/task tracking.
Other benefits include web access from the internet, having a wiki (which can be great in some situations), etc.
I use Subversion + Redmine at work, and I find it highly useful- test a few solutions and you will surely find out further advantages for you.
One thing that can be overlooked in the change to an document management tool is the planning required around how much is going to be stored and information architecture issues like where different content is going to end up.
SharePoint particularly is easy to setup without a good plan going forward and is particularly vulnerable to difficulties later on when things get to busy.
I would not recommend a home grown app for something like this. The problem has been solved by off the shelf tools and growing one from scratch is going to cost a huge amount and not get you any way near the features for the money.
Did I mention how important planning your security groups and document areas (IA) was?
If you need just document storage then sharepoint can do very well. WSS is ewen free and it provides very good document storage capabilities.
But you have to plan carefully as updating existing applications is painfull. If you decide to go with Sharepoint then I can give you few advices from top of my head
Pay attention to security configuration (user groups, privilegies,..)
Plan your document libraries well as it is not easy to just move documents betveen them
Also consider limiting number of versions that one document can have, because sharepoint stores full backups betveen verions, not just changes
Don't use infopath:) we have very bad experience with it (just don't tell this to managers)
If you don't really need to change graphical look of Sharepoint than don't bother with it as it brings many problems (I'm talking about custom masterpages and custom site templates)
Try to use as much OOB stuff as possible, because developing your own webparts not only cost more, but it can be quite complicated.
Make sure to turn-on search indexing. This is quite tricky, because it is by default turned off and then you will be as surprised that search is not working as I was :)
If you try to just deploy it and load 10.000 documents into it then you will surely have problems with it later. If you give a little thought about structure then you will end up with really good document storage.
Migrating is very probably worth the cost in the long term. You will gain reliability, versioning, traceability, and extensibility.
Be sure to first identify the groups/rights, and to identify which links need to be fixed (maybe you have applications that use links to the shares).
An open source alternative to SharePoint is Alfresco, it is very good for CIFS (Windows shares) too.

Resources