Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I read many article concerning Configuration Management, but I dont really understand on what this configuration is applied.
Is it on software himself ? Like, changing hosts in conf file etc... ?
Or on the app "host" ? In that case, what is the aim of using this kind of software, knowing that we generally use docker containers "ready to use" ?
You spent hours setting up that server, configuring every variable, installing every package, updating config files. You love that server so much that you named it 'Lucy'.
Tomorrow you get run over by a bus. Will your coworkers know every single tiny change you made to that server? Unlikely. They will have to spend hours digging into that server trying to figure out what you've done and why you've done it.
Now let's multiply this by 100s or even 1000s servers. Doing this manually is unfeasible.
That's where config management systems come in.
It allows you to have documentation of your system's configurations by the nature of config management systems itself. Playbooks/manifests/recipes/'whatever term they use' will be the authoritative description of your servers. Unlike readme.txt which might not always match the real world, these systems ensure that what you see there is what you have on your servers.
It will be relatively simple to duplicate this server configuration over and over to potentially limitless scale(Google, Facebook, Microsoft and every other large company work that way).
You might think of a "Golden image" approach where you configure everything, then take a snapshot and keep replicating it over and over. The problem is it's difficult to compare the difference between 2 such images. You just have binary blobs. Where as with most config management systems you can use traditional VCS and easily diff various versions.
The same principle applies to containers.
Don't treat your servers as pets, treat them as cattle.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I like how facebook releases features incrementally and not all at once to their entire user base. I get that this can be replicated with a bunch if statements smattered all throughout your code, but there needs to be a better way to do this. Perhaps that really is all they are doing, but that seems rather inelegant. Does anyone know if there is an industry standard for an architecture than can incrementally release features to portions of a user base?
On that same note, I have a feeling that all of their employees see an entirely different completely beta view of the site. So it seems that they are able to deem certain portions of their website as beta and others as production and have some sort of access control list to guide what people see? That seems like it would be slow.
Thanks!
Facebook has a lot of servers so they can apply new features only on some of them. Also they have some servers where they test new features before commiting to the production.
A more elegant solution is, if statements and feature flags using systems like gargoyle (in python).
Using a system like this you could do something like:
if feature_flag.is_active(MY_FEATURE_NAME, request, user, other_key_objects):
# do some stuff
In a web interface you would be able to isolate describe users, requests, or any other key object your system has and deliver your feature to them. In fact, via requests you could do things like direct X% of traffic to the new feature, and thus do things like A/B test and gather analytics.
An approach to this is to have a tiered architecture where the authentication tier hands-off to the product tier.
A user enters the product URL and that is resolved to direct them to a cluster of authentication servers. These servers handle authentication and then hand off the session to a cluster of product servers.
Using this approach you can:
Separate out your product servers in to 'zones' that run different versions of your application
Run logic on your authentication servers that decides which zone to route the session to
As an example, you could have Zone A running the latest production code and Zone B running beta code. At the point of login the authentication server sends every user with a user name starting with a-m to Zone A and n-z to Zone B. That way roughly half the users are running on the beta product.
Depending on the information you have available at the point of login you could even do something more sophisticated than this. For example you could target a particular user demographic (e.g. age ranges, gender, location, etc).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Background: I am working on a proposal for a PHP/web-based P2P replication layer for PDO databases. My vision is that someone with a need to crowd-source data sets up this software on a web server, hooks it up to their preferred db platform, and then writes a web app around it to add/edit/delete data locally. Other parties, if they wish, may set up a similar thing - with their own web apps written around it - and set up data-sharing agreements with one or more peers. In the general case, changes made to one database are written to another on a versioned basis, such that they eventually flow around the whole network.
Someone has asked me why I'm not using CouchDB, since it has bi-directional replication and record versioning offered as standard. I wasn't aware of these capabilities, so this turns out to be an excellent question! It occurs to me, if this facility is already available, are there any existing examples of server-to-server replication between separate groups? I've done a great deal of hunting and not found anything.
(I suppose what I am looking for is examples of "group-sourcing": give groups a means to access a shared dataset locally, plus the benefits of critical mass they would be unable to build individually, whilst avoiding the political ownership/control problems associated with the traditional centralised model.)
You might want to check out http://refuge.io/
It is built around couchdb, but more specifically to form peer groups.
Also, here is a couchbase sponsored case study of replication between various groups
http://site.couchio.couchone.com/case-study-assay-depot
This can be achived on standard couchdb installs.
Hope that gives you a start.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
My goal is to create a chatting website. Not so much for the sake of the website, but for the experience so I know how; just something to work towards gradually. I tried long polling, but that always ends up pissing off the webhosts whose servers I'm using. I was told to use nodejs instead. I have some idea of what it is, but no idea how to use it.
I'm guessing that the reason I can't find the answer to this question anywhere is because of how obvious it is... to everyone else.
I've been looking around and all I see are tutorials on installing it on your server when you own the server. I know you can install forums on webhost's servers, so can you also install nodejs?
Yes. You can check the full listing at https://github.com/joyent/node/wiki/Node-Hosting to check each site but it does not categorize it by free hosting..
Some I know of, I personally use Heroku.
Heroku
Nodester
Most standard LAMP hosting companies don't let you run node.js.
I currently recommend you use the Cloud9 IDE to get up and running with not only your tests and development, but also potential deployment. Cloud9 allows you to run your app from their IDE and will provide you with URL to see your app running and get familiar with node.js development.
A more manual way is to find a node.js PAAS (Platform as a Service) such as Joyent or Nodester.
Another one is Open Shift. I use them a lot and they allow you to use your own domain on the free plan. I use Heroku as well and have tried AppFog and Modulus.
But what it comes down to is whether I can use my own domain and how much they throttle my traffic. AppFog and Modulus don't allow custom domains on their free plans and seriously throttle traffic. They will cut your website off if you have one visitor an hour.
Another issue I was concerned about was with the upload of files. In particular, with my website content is added via markdown files. Most node webhosts use a variation on git deploys to update websites, with content supplied by databases. However, if you are trying to run a website without a database, using flat files, then each update must be done by a git deploy. This takes the whole website down and recreates a new website altogether (it just happens to look like the previous one). This will normally take a few minutes. Probably not a problem for a low volume website. But imagine if you are making a blog entry and you deploy it and then notice you've made a spelling mistake. You need to do a deploy all over again.
So, one of the things that attracted me to Open Shift was that they have a reserved area for flat files within your project. You can upload your files there and when your project is re-started these files will be preserved.
Appfog provides a free plan where you can host NodeJS and many other technos.
However, free plans don't allow custom domain name anymore.
There is also the Node.js Smart Machine service from Joyent.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I have been working on designing a file server that could take the load off from the primary website, and serve images/files over the web to the client.
Primary goals of the file server:
- Take off load from primary server hosting the site
- Reuse the existing web server code base and avoid duplication of code/logic for better maintainability
- Being scalable for increasing downloads
- Hide real download url path from user
By keeping above in mind, i could come up with two approaches. Sequence diagram representation of the two approaches for ease of understanding [apologies for the skewed use of sequence diagram]. Neither of the approaches would satisfy all my goals.
Which of these approaches would you recommend considering my goals?
Is there a better third approach?
Some of the differences, i could think of:
- Approach #1 would result in duplicating BL code causing maintainability issues
- Approach #2 would reuse code and centralize BL reducing maintainability issues
- Approach #1 would reduce network calls while #2 increases them
The concept of file servers, scalability of downloads, bandwidth distribution have all been there for a while now. Please share your thoughts!
UPDATED:
Approach #1 looks very attractive as it takes the load off the primary server completely. The only issue to address in #1 is the code duplication and maintainability issues. This could be overcome by having just one project for BL/DAC comprising the functionality required by both web service and file server. And, reference the assembly/library in both web service and file server projects. Now, there is only one BL/DAC code to maintain and also avoids the network calls in approach #2.
By serving images/files to the client, I assume you mean static files css, js etc.
Most of the time, a simple solution is the best solution. Just host them on a different server under a different subdomin, i.e. http://content.mydomain.com/img/xyz.jpg. You could host them at a data centre on dedicated server giving your perforamace (close to the backbone), you could load balance the url and by have 2+ servers at 2+ different data centers, giving you resilliance and scalability.
You maintence task is then having todo find a replace when promoting your site to live to replace dev/uat content paths with the live content path (tho you'd only need todo this in css files as you could store the paths for content used within aspx files for as config data).
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a lot of spare intel linux servers laying around (hundreds) and want to use them for a distributed file system in a web hosting and file sharing environment. This isn't for a HPC application, so high performance isn't critical. The main requirement is high availability, if one server goes offline, the data stored on it's hard drives is still available from other nodes. It must run over TCP/IP and provide standard POSIX file permissions.
I've looked at the following:
Lustre (http://wiki.lustre.org/index.php?title=Main_Page): Comes really close, but it doesn't provide redundancy for data on a node. You must make the data HA using RAID or DRBD. Supported by Sun and Open Source, so it should be around for a while
gfarm (http://datafarm.apgrid.org/): Looks like it provides the redundancy but at the cost of complexity and maintainability. Not as well supported as Lustre.
Does anyone have any experience with these or any other systems that might work?
check also GlusterFS
Edit (Aug-2012): Ceph is finally getting ready. Recently the authors formed Inktank, an independent company to sell commercial support for it. According to some presentaions, the mountable POSIX-compliant filesystem is the uppermost layer and not really tested yet, but the lower layers are being used in production for some time now.
The interesting part is the RADOS layer, which presents an object-based storage with both a 'native' access via the librados library (available for several languages) and an Amazon S3-compatible RESP API. Either one makes it more than adequate for adding massive storage to a web service.
This video is a good description of the philosophy, architecture, capabilities and current status.
In my opinion, the best file system for Linux is MooseFS , it's quite new, but I had an opportunity to compare it with Ceph and Lustre and I say for sure that MooseFS is the best one.
Gluster is getting quite a lot of press at the moment:
http://www.gluster.org/
Lustre has been working for us. It's not perfect but it's the only thing we have tried that has not broken down over load. We still get LBUGS from time to time and dealing with 100TB + file systems is never easy but the Lustre system has worked and increased both performance and availability.
If not someone forces you to use it, I would also highly recommend using anything else than Lustre. From what I hear from others and what also gave myself nightmares for quite some time is the fact that Lustre quite easily breaks down in all kinds of situations. And if only a single client in the system breaks down, it puts itself into an endless do_nothing_loop mode typically while holding some important global lock - so the next time another client tries to access the same information, it will also hang. Thus, you often end up rebooting the whole cluster, which I guess is something you would try to avoid normally ;)
Modern parallel file systems like FhGFS (http://www.fhgfs.com) are way more robust here and also allow you to do nice things like running server and client components on the same machines (though built-in HA features are still under development, as someone from their team told me, but their implementation is going to be pretty awesome from what I've heard).
Ceph looks to be a promising new-ish entry into the arena. The site claims it's not ready for production use yet though.
I read a lot about distributed filesystems and I think FhGFS is the best.
http://www.fhgfs.com/
It worth a try. See more about it at:
http://www.fhgfs.com/wiki/