Considerations regarding a p2p social network

Considerations regarding a p2p social network - security

While the are many social networks in the wild, most rely on data stored on a central site owned by a third party.
I'd like to build a solution, where data remains local on member's systems. Think of the project as an address book, which automagically updates contact's data as soon a a contact changes its coordinates. This base idea might get extended later on...
Updates will be transferred using public/private key cryptography using a central host. The sole role of the host is to be a store and forward intermediate. Private keys remain private on each member's system.
If two client are both online and a p2p connection could be established, the clients could transfer data telegrams without the central host.
Thus, sender and receiver will be the only parties which are able create authentic messages.
Questions:
Do exist certain protocols which I should adopt?
Are there any security concerns I should keep in mind?
Do exist certain services which should be integrated or used somehow?
More technically:
Use e.g. Amazon or Google provided services?
Or better use a raw web-server? If yes: Why?
Which algorithm and key length should be used?
UPDATE-1
I googled my own question title and found this academic project developed 2008/09: http://www.lifesocial.org/.

The solution you are describing sounds remarkably like email, with encrypted messages as the payload, and an application rather than a human being creating the messages.
It doesn't really sound like "p2p" - in most P2P protocols, the only requirement for central servers is discovery - you're using store & forward.
As a quick proof of concept, I'd set up an email server, and build an application that sends emails to addresses registered on that server, encrypted using PGP - the tooling and libraries are available, so you should be able to get that up and running in days, rather than weeks. In my experience, building a throw-away PoC for this kind of question is a great way of sifting out the nugget of my idea.
The second issue is that the nature of a social network is that it's a network. Your design may require you to store more than the data of the two direct contacts - you may also have to store their friends, or at least the public interactions those friends have had.
This may not be part of your plan, but if it is, you need to think it through early on - you may end up having to transmit the entire social graph to each participant for local storage, which creates a scalability problem....

The paper about Safebook might be interesting for you.
Also you could take a look at other distributed OSN and see what they are doing.

None of the federated networks mentioned on http://en.wikipedia.org/wiki/Distributed_social_network is actually distributed. What Stefan intends to do is indeed new and was only explored by some proprietary folks.

I've been thinking about the same concept for the last two years. I've finally decided to give it a try using Python.
I've spent the better part of last night and this morning writing a sockets communication script & server. I also plan to remove the central server from the equation as it's just plain cumbersome and there's no point to it when all the members could keep copies of their friend's keys.
Each profile could be accessed via a hashed string of someone's public key. My social network relies on nodes and pods. Pods are computers which have their ports open to the network. They help with relaying traffic as most firewalls block incoming socket requests. Nodes store information and share it with other nodes. Each node will get a directory of active pods which may be used to relay their traffic.

The PeerSoN project looks like something you might be interested in: http://www.peerson.net/index.shtml
They have done a lot of research and the papers are available on their site.

Some thoughts about it:
protocols to use: you could think exactly on P2P programs and their design
security concerns: privacy. Take a great care to not open doors: a whole system can get compromised 'cause you have opened some door.
services: you could integrate with the regular social networks through their APIs
People will have to install a program in their computers and remeber to open it everytime, like any P2P client. Leaving everything on a web-server has a smaller footprint / necessity of user action.
Somehow you'll need a centralized server to manage the searches. You can't just broadcast the internet to find friends. Or you'll have to rely uppon email requests to add somenone, and to do that you'll need to know the email in advance.
The fewer friends /contacts use your program, the fewer ones will want to use it, since it won't have contact information available.
I see that your server will be a store and forward, so the update problem is solved.

Related

what is the simplest protocol to securely tether a hardware device to a network?

After the Sony PSN debacle, I am trying to find examples of secure hardware tethering to a network. There are two use cases in particular:
1- computer downloads a piece of software that then uniquely and securely labels it to a cloud service
2- a hardware manufacturer uniquely labels a hardware device that then negotiates membership on the network.
Given the fact that the hardware device might have to change (revoke or service enhancements) it feels like #2 becomes #1.
The broad outline is this:
- connect to the service via HTTPS to protect against man in the middle
- device generates a GUID and presents it via HTTPS to service
- service records GUID against account
- on success, service 'enables' device
But how do you protect the GUID so that it cannot be stolen?

I just wanted to comment here:
Sony's PSN issues started with horrible practices with regards to their QA environment.
First, they defaulted to trusting anything that was sent to those servers using their developers toolkit. The reason they did this was that the dev kit used to cost upwards of $10k US and therefore they thought anyone who paid that amount would be on the up and up. However, when they radically lowered the price things changed externally and they didn't account for it.
The second issue with PSN was that the security between QA and live was, well, weak at best and easily circumvented. My understanding is that you could send commands to live using QA credentials. Because QA credentials were used, all chargeable actions were approved without money changing hands and the actions were applied to live accounts. When several people told Sony about this they did nothing.
A third issue was a reliance on hardware based encryption keys. Even hardware encryption keys installed on the devices can be figured out.
Point is, Sony dug their own grave on it so I wouldn't use anything they did as a template for how to do things. Heck, a lot of their websites were open to SQL injection which in today's day and age should get you fired.
Another example here is the iPhone. Each iPhone has a unique identifier that installed apps can grab and send back across the network; similar to a serial number. Some apps use this ID to try and tie a particular device to a person. However, it's trivial to create ID's and broadcast them, so this hasn't worked out so well for the partners. Also Apple does not expose a way to ensure a given ID (UUID) is valid to app producers.
A third example is mobile phone carriers. They use a particular ID baked into your SIM card to identify your account in order to know who to bill when a call is made. This ID is verified whenever the phone checks in with the network. However, we're dealing with radio signals and any device that can broadcast a correct ID can gain access. Point is, honest people think that only AT&T approved devices can get on an AT&T network. Reality is, anything can but they are going to bill the owner of the particular ID...
That said, any software you have running on a remote device that is not under your direct control is likely to be hacked. The popularity of the device will increase the likelihood of it happening sooner rather than later.
Where do we go from here?
On a basic level you associate an ID with an account in your service. PSN, Apple and others have done this. When an ID is broadcast, you need to verify that it exists AND that it's tied to an active account. If both pass then you have two options: either perform the action requested OR request additional verification.
For any actions that require money to be spent, do the additional verification (usually some form of username/password), capture the funds, then perform the action. Go one step further and every time a bad login is entered, send an email to the user on file. Further, automatically send a receipt. These are typically done so that your honest users can tell when something is going on.
Anything else just let through.
Bearing in mind, of course, that QA credentials should NOT work in your Live environment. Those systems should not be tied to each other under any condition and, quite frankly, should even live on separate hardware. In other words, QA and Live should NOT share a login database.
The thing here is that you shouldn't care about the device itself; just the account. You can't control the device as it's out of your hands; heck you can't even be sure it hasn't been physically tampered with. (XBox has been fighting this one with people adding resistors or burning out certain components to get past physical security features).
So, IMHO, do a bit to keep honest people honest but overall don't worry about it. Now, you should transfer everything via SSL or someother encrypted connection between the device and your cloud so that you don't leak ID's to anyone that wants to grab them. This will help protect those honest people.
Further, you shouldn't have a direct way to query whether an ID is valid or not from the outside. This will make it a bit more difficult for a hacker to find existing valid IDs and take over accounts. If you want to get fancy you could honey pot those and track the hackers down in order to sue them into oblivion, but that takes time and resources companies don't normally have. Also you could log all of the requests that contained bad IDs and use that to track hackers down.
Note that even after the device has been "enabled" I still suggest you have two levels of authentication. The first is for simple actions like downloading free content; the second kicks in anytime there is a fee associated. Again, we're trying to protect your honest subscribers.
For the dishonest ones you will have to apply some statistical analysis on the transactions coming across. Things like the transaction rate can help identify bots that are running and allow you to kill their IDs. There are others but they'll be unique to your application.
This was long winded. But my point is:
You can't secure the ID or anything else you pass out.
You can't ensure the requests are coming from your devices or your own approved devices.
You better take actions to keep QA and production separate for those building software for these devices using your services.
You better take actions to protect your normal honest users.
Trust NOTHING.
Due to the above you should evaluate your business model so that you don't care what device was used and instead focus on the individual accounts themselves; which you do have control over.

I am not sure I entirely understand the question, but I think you want some sort of device to hold on to a GUID assigned to it by a web service, and you don't want someone finding out what that GUID is, correct?
If so, there isn't a lot you can do. You have already mentioned one option... using HTTPS during the assigning of the ID. That is a good start, but remember that anyone who has physical access to the device can do a lot of things to look up this ID.
In short, it is impossible to completely hide. Someone can always reverse engineer it. There are folks out there reading data right out of memory with hardware.

Is basic HTTP auth in CouchDB safe enough for replication across EC2 regions?

I can appreciate that seeing "basic auth" and "safe enough" in the same sentence is a lot like reading "Is parachuting without a parachute still safe?", so I'll do my best to clarify what I am getting at.
From what I've seen online, people typically describe basic HTTP auth as being unsecured due to the credentials being passed in plain text from the client to the server; this leaves you open to having your credentials sniffed by a nefarious person or man-in-the-middle in a network configuration where your traffic may be passing through an untrusted point of access (e.g. an open AP at a coffee shop).
To keep the conversation between you and the server secure, the solution is to typically use an SSL-based connection, where your credentials might be sent in plain text, but the communication channel between you and the server is itself secured.
So, onto my question...
In the situation of replicating one CouchDB instance from an EC2 instance in one region (e.g. us-west) to another CouchDB instance in another region (e.g. singapore) the network traffic will be traveling across a path of what I would consider "trusted" backbone servers.
Given that (assuming I am not replicating highly sensitive data) would anyone/everyone consider basic HTTP auth for CouchDB replication sufficiently secure?
If not, please clarify what scenarios I am missing here that would make this setup unacceptable. I do understand for sensitive data this is not appropriate, I just want to better understand the ins and outs for non-sensitive data replicated over a relatively-trusted network.

Bob is right, it is better to err on the side of caution, but I disagree. Bob could be right in this case (see details below), but the problem with his general approach is that it ignores the cost of paranoia. It leaves "peace dividend" money on the table. I prefer Bruce Schneier's assessment that it is a trade-off.
Short answer
Start replicating now! Do not worry about HTTPS.
The greatest risk is not wire sniffing, but your own human error, followed by software bugs, which could destroy or corrupt your data. Make a replica!. If you will replicate regularly, plan to move to HTTPS or something equivalent (SSH tunnel, stunnel, VPN).
Rationale
Is HTTPS is easy with CouchDB 1.1? It is as easy as HTTPS can possibly be, or in other words, no, it is not easy.
You have to make an SSL key pair, purchase a certificate or run your own certificate authority—you're not foolish enough to self-sign, of course! The user's hashed password is plainly visible from your remote couch! To protect against cracking, will you implement bi-directional SSL authentication? Does CouchDB support that? Maybe you need a VPN instead? What about the security of your key files? Don't check them into Subversion! And don't bundle them into your EC2 AMI! That defeats the purpose. You have to keep them separate and safe. When you deploy or restore from backup, copy them manually. Also, password-protect them so if somebody gets the files, they can't steal (or worse, modify!) your data. When you start CouchDB or replicate, you must manually input the password before replication will work.
In a nutshell, every security decision has a cost.
A similar question is, "should I lock my house at night? It depends. Your profile says you are in Tuscon, so you know that some neighborhoods are safe, while others are not. Yes, it is always safer to always lock all of your doors all of the time. But what is the cost to your time and mental health? The analogy breaks down a bit because time invested in worst-case security preparedness is much greater than twisting a bolt lock.
Amazon EC2 is a moderately safe neighborhood. The major risks are opportunistic, broad-spectrum scans for common errors. Basically, organized crime is scanning for common SSH accounts and web apps like Wordpress, so they can a credit card or other database.
You are a small fish in a gigantic ocean. Nobody cares about you specifically. Unless you are specifically targeted by a government or organized crime, or somebody with resources and motivation (hey, it's CouchDB—that happens!), then it's inefficient to worry about the boogeyman. Your adversaries are casting broad nets to get the biggest catch. Nobody is trying to spear-fish you.
I look at it like high-school integral calculus: measuring the area under the curve. Time goes to the right (x-axis). Risky behavior goes up (y-axis). When you do something risky you saved time and effort, but the the graph spikes upward. When you do something the safe way, it costs time and effort, but the graph moves down. Your goal is to minimize the long-term area under the curve, but each decision is case-by-case. Every day, most Americans ride in automobiles: the single most risky behavior in American life. We intuitively understand the risk-benefit trade-off. Activity on the Internet is the same.

As you imply, basic authentication without transport layer security is 100% insecure. Anyone on EC2 that can sniff your packets can see your password. Assuming that no one can is a mistake.
In CouchDB 1.1, you can enable native SSL. In earlier version, use stunnel. Adding SSL/TLS protection is so simple that there's really no excuse not to.

I just found this statement from Amazon which may help anyone trying to understand the risk of packet sniffing on EC2.
Packet sniffing by other tenants: It is not possible for a virtual instance running in promiscuous mode to receive or "sniff" traffic that is intended for a different virtual instance. While customers can place their interfaces into promiscuous mode, the hypervisor will not deliver any traffic to them that is not addressed to them. This includes two virtual instances that are owned by the same customer, even if they are located on the same physical host. Attacks such as ARP cache poisoning do not work within EC2. While Amazon EC2 does provide ample protection against one customer inadvertently or maliciously attempting to view another's data, as a standard practice customers should encrypt sensitive traffic.
http://aws.amazon.com/articles/1697

Chat program without a central server

I'm developing a chat application (in VB.Net). It will be a "secure" chat program. All traffic will be encrypted (I also need to find the best approach for this, but that's not the question for now).
Currently the program works. I have a server application and a client application. However I want to setup the application so that it doesn't need a central server for it to work.
What approach can I take to decentralize the network?
I think I need to develop the clients in a way so that they do also act as a server.
How would the clients know what server it needs to connect with / what happens if a server is down? How would the clients / servers now what other nodes there are in the network without having a central server?
At best I don't want the clients to know what the IP addresses are of the different nodes, however I don't think this would be possible without having a central server.
As stated the application will be written in VB.Net, but I think the language doesn't really matter at this point.
Just want to know the different approaches I can follow.

Look for example at the paper of the Kademlia protocol (you can find it here). If you just want a quick overview, look at the Wikipedia page http://en.wikipedia.org/wiki/Kademlia. The Kademlia protocol defines a way of node lookups in a network in a decentral way. It has been successfully applied in the eMule software - so it is tested to really work.
It should cause no serious problems to apply it to your chat software.

You need some known IP address for clients to initially get into a network. Once a client is part of a network, things can be more decentralized, but that first step needs something.
There are basically only two options - either the user provides one (for an existing node of the network - essentially how BitTorrent trackers work), or you hard-code in a gateway node (which is effectively a central server).

Maybe you can see uChat program. It's a program from uTorrent creator with chat without server in mind.
The idea is connect to a swarm from a magnetlink and use it to send an receive messages. This is as Amber answer, you need an access point, may it be a server, a know swarm, manual ip, etc.
Here is uChat presentation: http://blog.bittorrent.com/2011/06/30/uchat-we-just-need-each-other/

Accessing data in internal production databases from a web server in DMZ

I'm working on an external web site (in DMZ) that needs to get data from our internal production database.
All of the designs that I have come up with are rejected because the network department will not allow a connection of any sort (WCF, Oracle, etc.) to come inside from the DMZ.
The suggestions that have come from the networking side generally fall under two categories -
1) Export the required data to a server in the DMZ and export modified/inserted records eventually somehow, or
2) Poll from inside, continually asking a service in the DMZ whether it has any requests that need serviced.
I'm averse to suggestion 1 because I don't like the idea of a database sitting in the DMZ. Option 2 seems like a ridiculous amount of extra complication for the nature of what's being done.
Are these the only legitimate solutions? Is there an obvious solution I'm missing? Is the "No connections in from DMZ" decree practical?
Edit: One line I'm constantly hearing is that "no large company allows a web site to connect inside to get live production data. That's why they send confirmation emails". Is that really how it works?

I'm sorry, but your networking department are on crack or something like that - they clearly do not understand what the purpose of a DMZ is. To summarize - there are three "areas" - the big, bad outside world, your pure and virginal inside world, and the well known, trusted, safe DMZ.
The rules are:
Connections from outside can only get to hosts in the DMZ, and on specific ports (80, 443, etc);
Connections from the outside to the inside are blocked absolutely;
Connections from the inside to either the DMZ or the outside are fine and dandy;
Only hosts in the DMZ may establish connections to the inside, and again, only on well known and permitted ports.
Point four is the one they haven't grasped - the "no connections from the DMZ" policy is misguided.
Ask them "How does our email system work then?" I assume you have a corporate mail server, maybe exchange, and individuals have clients that connect to it. Ask them to explain how your corporate email, with access to internet email, works and is compliant with their policy.
Sorry, it doesn't really give you an answer.

I am a security architect at a fortune 50 financial firm. We had these same conversations. I don't agree with your network group. I understand their angst, and I understand that they would like a better solution but most places don't opt with the better choices (due to ignorance on their part [ie the network guys not you]).
Two options if they are hard set on this:
You can use a SQL proxy solution like greensql (I don't work for them, just know of them) they are just greensql dot com.
The approach they refer to that most "Large orgs" use is a tiered web model. Where you have a front end web server (accessed by the public at large), a mid-tier (application or services layer where the actual processes occurs), and a database tier. The mid-tier is the only thing that can talk to the database tier. In my opinion this model is optimal for most large orgs. BUT that being said, most large orgs will run into either a vendor provided product that does not support a middle tier, they developed without a middle tier and the transition requires development resources they dont have to spare to develop the mid-tier web services, or plain outright there is no priorty at some companies to go that route.
Its a gray area, no solid right or wrong in that regard, so if they are speaking in finality terms then they are clearly wrong. I applaud their zeal, as a security professional I understand where they are coming from. BUT, we have to enable to business to function securely. Thats the challange and the gauntlet I always try and throw down to myself. how can I deliver what my customer (my developers, my admins, my dbas, business users) what they want (within reason, and if I tell someone no I always try to offer an alternative that meets most of their needs).
Honestly it should be an open conversation. Here's where I think you can get some room, ask them to threat model the risk they are looking to mitigate. Ask them to offer alternative solutions that enable your web apps to function. If they are saying they cant talk, then put the onus on them to provide a solution. If they can't then you default to it working. Site that you open connections from the dmz to the db ONLY for the approved ports. Let them know that DMZ is for offering external services. External services are not good without internal data for anything more than potentially file transfer solutions.
Just my two cents, hope this comment helps. And try to be easy on my security brethren. We have some less experienced misguided in our flock that cling to some old ways of doing things. As the world evolves the threat evolves and so does our approach to mitigation.

Why don't you replicate your database servers? You can ensure that the connection is from the internal servers to the external servers and not the other way.
One way is to use the ms sync framework - you can build a simple windows service that can synchronize changes from internal database to your external database (which can reside on a separate db server) and then use that in your public facing website. Advantage is, your sync logic can filter out sensitive data and keep only things that are really necessary. And since the entire control of data will be in your internal servers (PUSH data out instead of pull) I dont think IT will have an issue with that.
The connection formed is never in - it is out - which means no ports need to be opened.

I'm mostly with Ken Ray on this; however, there appears to be some missing information. Let's see if I get this right:
You have a web application.
Part of that web application needs to display data from a different production server (not the one that normally backs your site).
The data you want/need is handled by a completely different application internally.
This data is critical to the normal flow of your business and only a limited set needs to be available to the outside world.
If I'm on track, then I would have to say that I agree with your IT department and I wouldn't let you directly access that server either.
Just take option 1. Have the production server export the data you need to a commonly accessible drop location. Have the other db server (one in the DMZ) pick up the data and import it on a regular basis. Finally, have your web app ONLY talk to the db server in the dmz.
Given how a lot of people build sites these days I would also be loath to just open a sql port from the dmz to the web server in question. Quite frankly I could be convinced to open the connection if I was assured that 1. you only used stored procs to access the data you need; 2. the account information used to access the database was encrypted and completely restricted to only running those procs; 3. those procs had zero dynamic sql and were limited to selects; 4. your code was built right.
A regular IT person would probably not be qualified to answer all of those questions. And if this database was from a third party, I would bet you might loose support if you were to start accessing it from outside it's normal application.

Before talking about your particular problem I want to deal with the Update that you provided.
I haven't worked for a "large" company - though large is hard to judge without a context, but I have built my share of web applications for the non profit and university department that I used to work for. In both situations I have always connected to the production DB that is on the internal Network from the Web server on the DMZ. I am pretty sure many large companies do this too; think for example of how Sharepoint's architecture is setup - back-end indexing, database, etc. servers, which are connected to by front facing web servers located in the DMZ.
Also the practice of sending confirmation e-mails, which I believe you are referring to confirmations when you register for a site don't usually deal with security. They are more a method to verify that a user has entered a valid e-mail address.
Now with that out of the way, let us look at your problem. Unfortunately, other than the two solutions you presented, I can't think of any other way to do this. Though some things that you might want to think about:
Solutions 1:
Depending on the sensitivity of the data that you need to work with, extracting it onto a server on the DMZ - whether using a service or some sort of automatic synchronization software - goes against basic security common sense. What you have done is move the data from a server behind a firewall to one that is in front of it. They might as well just let you get to the internal db server from the DMZ.
Solution 2:
I am no networking expert, so please correct me if I am wrong, but a polling mechanism still requires some sort of communication back from the web server to inform the database server that it needs some data back, which means a port needs to be open, and again you might as well tell them to let you get to the internal database without the hassle, because you haven't really added any additional security with this method.
So, I hope that this helps in at least providing you with some arguments to allow you to access the data directly. To me it seems like there are many misconceptions in your network department over how a secure database backed web application architecture should look like.

Here's what you could do... it's a bit of a stretch, but it should work...
Write a service that sits on the server in the DMZ. It will listen on three ports, A, B, and C (pick whatever port numbers make sense). I'll call this the DMZ tunnel app.
Write another service that lives anywhere on the internal network. It will connect to the DMZ tunnel app on port B. Once this connection is established, the DMZ tunnel app no longer needs to listen on port B. This is the "control connection".
When something connects to port A of the DMZ tunnel app, it will send a request over the control connection for a new DB/whatever connection. The internal tunnel app will respond by connecting to the internal resource. Once this connection is established, it will connect back to the DMZ tunnel app on port C.
After possibly verifying some tokens (this part is up to you) the DMZ tunnel app will then forward data back and forth between the connections it received on port A and C. You will effectively have a transparent TCP proxy created from two services running in the DMZ and on the internal network.
And, for the best part, once this is done you can explain what you did to your IT department and watch their faces as they realize that you did not violate the letter of their security policy, but you are still being productive. I tell you, they will hate that.

If all development solutions cannot be applied because of system engineering restriction in DMZ then give them the ball.
Put your website in intranet, and tell them 'Now I need inbound HTTP:80 or HTTPS:443 connections to that applications. Set up what you want : reverse proxies, ISA Server, protocols break, SSL... I will adapt my application if necessary.'
About ISA, I guess they got one if you have exchange with external connections.
Lot of companies are choosing this solution when a resource need to be shared between intranet and public.
Setting up a specific and intranet network with high security rules is the best way to make the administration, integration and deployment easier. What is easier is well known, what is known is masterized : less security breach.
More and more system enginers (like mines) prefer to maintain an intranet network with small 'security breach' like HTTP than to open other protocols and ports.
By the way, if they knew WCF services, they would have accepted this solution. This is the most secure solution if well designed.
Personnaly, I use this two methods : TCP(HTTP or not) Services and ISA Server.

How safe is code hosted elsewhere

I was at a meeting recently for our startup. For half an hour, I was listening to one of the key people on the team talk about timelines, the market, confidentiality, being there first and so on. But I couldn't help ask myself the question: all that talk about confidentiality is nice, but there isn't much talk about physical security. This thing we're working on is web-hosted. What if after uploading it to the webhost, someone walks into the server room (don't even know where that is) and grabs a copy of the code and the database. The database is encrypted, but with access to the machine, you'd have the key.
What do the big boys do to guard the code from being stolen off? Is it common for startups to host it themselves in some private data center or what? Does anyone have facts about what known startups have done, like digg, etc.? Anyone has firsthand experience on this issue?

Very few people are interested in seeing your source code. The sysadmins working at your host are most likely in this group. It's probably not the case that they can copy your code, paste it on another host and be up and running, stealing your customers in 42 minutes.
People might be interested in seeing the contents of your DB if you're storing things like user contact information (or even more extreme, financial information). How do you protect against this? Do the easy, host independent things (like storing passwords as hashes, offloading financial data to financial service providers, HTTPS/SSL, etc.) and make sure you use a host with a good reputation. Places like Amazon (with AWS) and RackSpace would fail quickly if it got out that they regularly let employees walk off with customer (your) data.
How do the big boys do it? They have their own infrastructure (places like Google, Yahoo, etc.) or they use one of the major players (Amazon AWS, Rackspace, etc.).
How do other startups do it? I remember hearing that Stack Overflow hosts their own infrastructure (details, anyone?). This old piece on Digg indicates that they run themselves too. These two instances do not mean that all (or even most) startups have an internal infrastructure.

Most big players in the hosting biz have a solid security policy on their servers. Some very advanced technology goes into securing most high end data centers.
Check out the security at the host that I use
http://www.liquidweb.com/datacenter/

What if after uploading it to the webhost, someone walks into the server room (don't even know where that is) and grabs a copy of the code and the database. The database is encrypted, but with access to the machine, you'd have the key.
Then you're screwed :-) Even colo or rented servers should be under an authorized-access only policy, that is physically enforced at the site. Of course that doesn't prevent anyone from obtaining the "super secret" code otherwise. For that, hire expensive lawyers and get insurance.

By sharing user accounts on the same system you have more to worry about. It can be done without ever having a problem, but you are less secure than if you controlled the entire system.
Make sure you code is chmod 500, or even chmod 700, as long as the last 2 are zeros then your better off. If you do a chmod 777, then everyone on the system will be able to access your files.
However there are still problems. A vulnerability in the Linux kernel would give the attacker access to all accounts. A vulnerability in MySQL would give the attacker access to all databases. By having your own system, then you don't have to worry about these attacks.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string