(*nix) Cloud/Cluster solutions for bulding fast & scalable web-services [closed] - linux

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I'm going to build a high-performance web service. It should use a database (or any other storage system), some processing language (either scripting or not), and a web-server daemon. The system should be distributed to a large amount of servers so the service runs fast and reliable.
It should replicate data to achieve reliability and at the same time it must provide distributed computing features in order to process large amounts of data (primarily, queries on large databases that won't survive being executed on a single server with a suitable level of responsiveness). Caching techniques are out of the subject.
Which cluster/cloud solutions I should take for the consideration?
There are plenty of Single-System-Image (SSI), clustering file systems (can be a part of the design), projects like Hadoop, BigTable clones, and many others. Each has its pros and cons, and "about" page always says the solution is great :) If you've tried to deploy something that addresses the subject - share your experience!
UPD: It's not a file hosting and not a game, but something rather interactive. You can take StackOverflow as an example of a web-service: small pieces of data, semi-static content, intensive database operations.
Cross-Post on ServerFault

You really need a better definition of "big". Is "Big" an aspiration, or do you have hard numbers which your marketing department* reckon they'll have on board?
If you can do it using simple components, do so. The likes of Cassandra and Hadoop are neither easy to setup (especially the later) or develop for; developers who are going to be able to develop such an application effectively will be very expensive and difficult to hire.
So I'd say, start off using your favourite "Traditional" database, with an appropriate high-availability solution, then wait until you get close to the limit (You can always measure where the limit is on your real application, once it's built and you have a performance test system).
Remember that Stack Overflow uses pretty conventional components, simply well tuned with a small amount of commodity hardware. This is fine for its scale, but would never work for (e.g. Facebook), but the developers knew that the audience of SO was never going to reach Facebook levels.
EDIT:
When "traditional" techniques start failing, e.g. you reach the limit of what can be done on a single database instance, then you can consider sharding or doing functional partitioning into more instances (again with your choice of HA system).
The only time you're going to need one of these (e.g. Cassandra) "nosql" systems is if you have a homogeneous data store with very high write requirement and availability requirement; even then you could probably still solve it by sharding conventional systems - as others (even Facebook) have done at times.

It's hard to make specific recommendations since you've been a bit vague, but I would recommend Google Appengine for basically any web service. It's reliable, easy to use, and is built on the google architecture so is fast and reliable.

i'd like to recommend stratoscal symphony. it's a private cloud service that does it all. everything you just mentiond - this service provides perfectly. their symphony products deliver the public cloud experience in you enterprise data center. if that's what you're looking for, i suggest you give it a shot

Related

If you had to redo a site that has 150 tables and 250,000 visitors/day in any web platform, what would it be? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
If you had to redo a site that has around 150 tables and 250,000+ visitors/day in any web platform, what would your choice be and why?
Some points
The team has experienced developers
The old application is written in unrefactored PHP. It's unusable.
Much of the database is not normalized, and there are columns in the wrong spots. Many new features and the database can't support them now.
Desired goals:
Excellent and fast testing (grails is bad for this)
Good seperation of concerns (domain, controllers, views) with ability to not duplicate anything
Concise code & Elegant design - no code bloat
Flexible - we don't want to run into a leaky abstraction problem
Coding and testing are fast - it shouldn't take 1 hour to write a controller test, or we shouldn't have to spend more than 1 minute or so writing a reusable tag, for example.
Scala is on our minds, but we are having a hard time seeing how that can work as the tooling is not mature yet. We actually don't like Grails. A lot of us are used to Java/Spring/Hibernate, but are sick of the low-level nature of it and want something more expressive.
I would put together a detailed study group to analyze our choices, and see what we can use, and how it scales up to the load and tasks it must stand up to. After that, pick out like the top 5 choices for closer inspection, and see what floats with the team. Personally, I've come to like RoR over PHP.
Depending on the status of the old/current project, make sure everything is backed up and version controlled before it gets touched. Some people leave half their project un-vcs'd, or use none at all!
This is a pretty unanswerable question, because there a large number of factors that need to be taken into consideration which you haven't mentioned. For example, what are the skill sets of the developers who will be rewriting it? How is it currently implemented? Can existing code be reused? What are the performance requirements?
If it was my decision, I would choose Groovy/Grails because:
I like Groovy/Grails and know these technologies well
Offers good performance as it's built mostly on Java and mature Java libraries like Spring and Hibernate
Update
Excellent and fast testing (grails is bad for this)
I am not aware of any web framework that puts more effort into testability into Grails. It makes testing all types of artifacts (controllers, domains, services, tag libraries) very straightforward.
We actually don't like Grails
If you already know Java, Spring, and Hibernate, I find it very hard to understand why you don't like Grails.
good old php / mysql / apache on a linux environment is the most stable I've seen .I'm working since 3 year on a Asp.Net / SQL Server / IIS / Windows and only SQL server is stable, but it's really expensive so if you don't know really where you're going (about money), you better have to take care of this parameter.
And on an open environment you'll find more help, I think.

What to learn first, Front-end or Back-end development? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
For most of you web developing guru's my question will sound stupid, but as newbie I would like to ask if it is ok that I will have a frontend developed and only after Backend?
Also, if I will need database should I have the design of it first?
I also would like to know about the analysis part of the project. A friend in short informed me that to start the project requirements analysis (internal, technical and design) is a must. LEt's say if I want to build an social e-commerce site with ability for users to register. Can you determine a numbered list what would you do to prepare the analysis for such project (etc. 1. Database design a) prepare data models...)
I would be very happy if somebody could provide with a thorough answer.
Thank you.
Regards,
Donny
I usually decide what fields I need in the front end 1st.
Then start working on the database backend..then middle tiers with unit tests..then finally front end.
Of course, once I start work on the front end, I think of more fields or changes for the database....such is the nature of development.
I think this question is really a variation to the question whether the bottom-up or top-down design is better.
I find that it helps to do rough drafts of the front end to simulate typical usage of the site. This helps one to see required backend options one would have missed otherwise (thinking needed data).
Especially when new people are working with project, I'd suggest an incremental approach.
Pick some functionality you know you're going to need. Start with the database (SQL), then the backend code (PHP, maybe), then the web frontend (HTML). Make it as simple as possible to accomplish that one block of functionality. The order of things doesn't matter as much as just taking a small chunk at a time to work on.
Once that small part works, save a copy. Version control, even. That way, you can always return to something that worked if you screw something up tomorrow.
Then pick the next small function and add it in. I always find this very motivating; you get to see consistent improvement.
I'd probably plan ahead on the database level, because while any change to the HTML only really affects the HTML... database changes often require backend code changes which often require HTML changes, and having to redo everything is painful.
You should architect the tiers that you expect to exist in the whole system. Each tier can be parallel architected/implemented by different people, however integration points will require collaboration to decide on the contract.
There are two general interface/contract patterns:
1) Consumer/Application Needs -> interface/contract is dictated by the application and the next tier is written to conform/adapt to those needs. All of the tiers are now essentially driven by their downstream consumers. The pro is that you will likely have the most efficient and limited set of methods required, the downside is there is more work to adapt the system to other consumers.
OR
2) Service Provider -> interface/contract is dictated by a service which is designed to support a core set of common functionality that may service many apps, even some yet to be known. The application that consumes the provider must then adapt the contract's capabilities with its internal needs. The pro is that the service is more re-usable without modification, however those generalized methods will likely be a less efficient fit for any particular app's needs.
Neither of these is the perfect answer, it depends on the situation. The decision of 1 or 2 above may also differ depending on which tier connection you are working on. You could have a service with a service contract #2, an app with its own needs contract #1, then an adapter tier that maps the apps needs to the service's functionality.
Regardless of which pattern you use, the architecture of your tiers, their contracts, and how they interact with each other is more important than when you start working on any particular tier.
In general once the tier design is in place, you work on the tiers where the contracts are defined and followup on the tiers where contracts are consumed.
The question is highly subjective. In the practical reality in which we live one is limited by the customer's ability to communicate their requirements in a such a way that they can be translated into code (and of course ever expanding requirements). Medium-Larger companies have Business Analysts to perform most of these duties. As far as which tier to start the design, a DB guy will say DB, webguy will say frontend, etc...to each in accordance to their abilities.
There's no silver bullet. I recommend you readup on a few major paradigms like Agile and waterfall.

when to start performance tuning a website [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
i have a asp.net mvc website and the volume of traffic is increasing. I have the site pointing to a backend sql server 2008 database.
at what point, do i need to figure out what the bottleneck of the system and look to review if i need to load balance machines, or change the way i am doing database connection management.
are there specific tools and thresholds that are indicators that the current model isn't scalable or is hitting a breaking point (besides just observations of a slow site.
When you start noticing performance issues.
There are some very easy things you can do to increase performance with so little work, it's easier to do them that see if you need to yet ;)
First and foremost is putting all static images and other media on a separate server. That eliminates a whole lot of queries on the boxen running the dynamic parts of the web server.
Next in line is make sure you are using as many hard drive spindles as possible. Of course you want your database on a separate machine, let alone a separate hard drive, but you also want your web server logs written to a separate hard drive. That prevents a lot of jumping around of the hard drive heads.
As far as "how do you know when you need to performance tune", I will give a different answer than George Stocker: When there is a cost associated with your performance that outstrips the cost of looking into it. I say it this way because your customers may be a little unhappy if your website is a little sluggish, but if it doesn't prevent anyone from using it, or recommending it to others, then it may not be worth looking into. People put up with sub-optimal performance all the time.
There are a plethora of tools available to address the plethora of possible bottlenecks. A decent performance tuning strategy starts with measurement and consistent instrumentation of the given system.
But performance tuning requires precious time and resources, and should only be pursued when it gives you the most bang for the buck, i.e. it provides the greatest improvement to achieving your website's objectives given the work required. If your website supports (or is) a business or organization, you must continuously evaluate the business landscape and plan the next allocation of resources. This is entirely dependent on the particular industry.
An engineer might focus on continual refinement of an existing system, but the project commissioners (be they an external client, or your company's management) must weigh the costs and benefits of all types of development, from improving an existing featureset, to adding new features, to addressing technical limitations affecting product usability (including performance issues). That's not to say engineers have no say in resource allocation, but their perspective is just one of many contributing to success.
When you have doubts that the website would survive a doubling in max usage. One common line of thought where I am from is that you should have the performance capacity to support at least 2x the number of users you expect.
Determining whether or not you can support 2x is something better left to load testing though, rather then speculation. One comment from your other comment though: chances are a website performance problem is going to affect everyone using the web site, including you on a local machine... unless it's a bandwidth problem and you're connected to a local network. Barring cable cuttings, it's not going to be 'just the people in Asia'.

Best distributed filesystem for commodity linux storage farm [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a lot of spare intel linux servers laying around (hundreds) and want to use them for a distributed file system in a web hosting and file sharing environment. This isn't for a HPC application, so high performance isn't critical. The main requirement is high availability, if one server goes offline, the data stored on it's hard drives is still available from other nodes. It must run over TCP/IP and provide standard POSIX file permissions.
I've looked at the following:
Lustre (http://wiki.lustre.org/index.php?title=Main_Page): Comes really close, but it doesn't provide redundancy for data on a node. You must make the data HA using RAID or DRBD. Supported by Sun and Open Source, so it should be around for a while
gfarm (http://datafarm.apgrid.org/): Looks like it provides the redundancy but at the cost of complexity and maintainability. Not as well supported as Lustre.
Does anyone have any experience with these or any other systems that might work?
check also GlusterFS
Edit (Aug-2012): Ceph is finally getting ready. Recently the authors formed Inktank, an independent company to sell commercial support for it. According to some presentaions, the mountable POSIX-compliant filesystem is the uppermost layer and not really tested yet, but the lower layers are being used in production for some time now.
The interesting part is the RADOS layer, which presents an object-based storage with both a 'native' access via the librados library (available for several languages) and an Amazon S3-compatible RESP API. Either one makes it more than adequate for adding massive storage to a web service.
This video is a good description of the philosophy, architecture, capabilities and current status.
In my opinion, the best file system for Linux is MooseFS , it's quite new, but I had an opportunity to compare it with Ceph and Lustre and I say for sure that MooseFS is the best one.
Gluster is getting quite a lot of press at the moment:
http://www.gluster.org/
Lustre has been working for us. It's not perfect but it's the only thing we have tried that has not broken down over load. We still get LBUGS from time to time and dealing with 100TB + file systems is never easy but the Lustre system has worked and increased both performance and availability.
If not someone forces you to use it, I would also highly recommend using anything else than Lustre. From what I hear from others and what also gave myself nightmares for quite some time is the fact that Lustre quite easily breaks down in all kinds of situations. And if only a single client in the system breaks down, it puts itself into an endless do_nothing_loop mode typically while holding some important global lock - so the next time another client tries to access the same information, it will also hang. Thus, you often end up rebooting the whole cluster, which I guess is something you would try to avoid normally ;)
Modern parallel file systems like FhGFS (http://www.fhgfs.com) are way more robust here and also allow you to do nice things like running server and client components on the same machines (though built-in HA features are still under development, as someone from their team told me, but their implementation is going to be pretty awesome from what I've heard).
Ceph looks to be a promising new-ish entry into the arena. The site claims it's not ready for production use yet though.
I read a lot about distributed filesystems and I think FhGFS is the best.
http://www.fhgfs.com/
It worth a try. See more about it at:
http://www.fhgfs.com/wiki/

Is there a good service for checking website/server vulnerability [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I have been asked to provide information on available techniques for assessing our current, and any future websites for security problems. the request is in the form of
Do you know of any good free one that examines for security holes?
I think our data security is probably worth a small amount of upfront spend so any non-free methods would be appreciated too.
Our systems are a mish mash of mySQL, Oracle, SQLServer, PHP, ASP.NET etc etc systems though I guess that that does not matter too much. All the systems are secured in as much as they are patched and the firewalls are set sensibly so outside people cannot get directly to the database boxes etc.
It is XSS and similar attacks that we wish to prevent.
What do YOU use to give you confidence in your systems? ');DROP TABLE answer;
owasp would be a good place to start. There's too much to cover to include here.
If the security of your site is worth nothing to your company then that's what you should pay. For my company the security of our data and the brand image has quite a high value.
We pay a whole bunch of money for regular scans, we've trained the developers in basic hacking/security of applications, our code reviews include a security review and now we're looking at AppScan from IBM (which is expensive but in the long run probably cheaper than all the pen' testing we pay for).
You get what you pay for. Making sure you understand the owasp issues would be a good start though.
Personally, I choose not to be confident in the security of our systems. I am convinced there is always something that I am missing and thus I keep looking for it.
What you seem to be looking for is something to make others feel confident (even if that confidence is an illusion). Penetration testing is probably the right choice for that. Depending upon the tool, it shows potential vunerabilities in a nice report and then you can report how you mitigated them.
We use IBM AppScan and it is a good tool for this. As with any tester of this type you will find yourself following a lot of bad leads. Most of them are not false postives per se, more just things that might be an issue or appear to be and you will have to investigate and determine if they actually are.
I would not put a lot of faith in this kind of testing. If you app scans clean it really does not mean your app is clean. Does not mean it is worthless, but don't make it out to be more than it is.
The next thing I would look into is static analysis tools in your various languages. A lot of these are free. Hand in hand with that is developer education. That is usually a pretty cheap solution to the issue, just making sure they understand what the risks are.
There is no silver bullet, no simple answer, you need to define security as an EVERYONE problem and make sure it is given both priority and commitment.
Check out dotDefender - they've got versions for IIS/Apache/ISA. I use this app to protect against SQL Injection/XSS/DDOS/probing/encoding attacks. No piece of software will ever be perfect but in my case I run systems with sites being developed in .NET, PHP, and classic ASP with some of our sites being new and others being 5+ years old.
http://www.applicure.com/?page=dotDefender
I do also have a company do penetration testing / social engineering every year or so as well but with dotDefender I'm at least happy that I've got a baseline security blanket to protect my sites.
Of particular interest to me was that their app is fully x64 compatible - necessary since I'm using x64 web servers.

Resources