pictures on a sub-domain or separate domain - dns

I was given a suggestion to put our vast picture gallery into separate subdomain, or even separate domain. I dont see how it would help, but I do know that meny other sites does it.
Anyone could point me to the right direction concerning this question ?

The two biggest reasons for this are
to use a CDN (Content Delivery Network) to maximize your user's experience through delivering static files faster
To better utilize specific hardware. For example, your web server may be super beefy, running on SSDs (relatively small capacity), 16GB RAM etc. that you don't want to have to bog down with serving millions of extra files. Your image server would only need a very weak computer but could very inexpensively be backed by 10 TB of storage.

Related

What server should I use if I am expecting over 20k visits/day?

I recently launched a fantasy football online game for the English premier league called Myfpl11.com and I want to know what server should I choose if I am expecting 20k visits a day. My visits are going up and I want the site to keep performing smoothly. Please help.
This is probably the wrong area of StackExchange to ask this sort of question. However ...
The first thing you should do is get prepared to scale horizontally instead of vertically. If you keep growing you will soon grow out of any single server that you purchase.
Instead, what you need to do is start looking at ways to modify your website to be able to work over multiple systems. If you're experiencing load issues on the server you currently have, spin up another one of the exact same instance and move the database to that server, so you will then have two -- one dedicated to the database (which will really help it do its job) and one dedicated to serving traffic.
From there look at how you can scale in to multiple web processes, databases and add caching layers.
You can add cloudflare.com as your DNS service which will help you out by better caching your assets, but most importantly they will deliver a technical issues page to your users if your site does fall over at any stage. This is really helpful for saving face, because they will get an actual page with a message instead of a continually loading white-page.
Look at using services like digitalocean.com or linode.com (both very affordable and great staff) where you can easily add/remove resources as you need them.

I need to speed up my site and reduce the number of files calls

My webhost is aking me to speed up my site and reduce the number of files calls.
Ok let me explain a little, my website is use in 95% as a bridge between my database (in the same hosting) and my Android applications (I have around 30 that need information from my db), the information only goes one way (as now) the app calls a json string like this the one in the site:
http://www.guiasitio.com/mantenimiento/applinks/prlinks.php
and this webpage to show in a web view as welcome message:
http://www.guiasitio.com/movilapp/test.php
this page has some images and jquery so I think this are the ones having a lot of memory usage, they have told me to use some code to create a cache of those files in the person browser to save memory (that is a little Chinese to me since I don't understand it) can some one give me an idea and send me to a tutorial on how to get this done?. Can the webview in a Android app keep caches of this files?
All your help his highly appreciated. Thanks
Using a CDN or content delivery network would be an easy solution if it worked well for you. Essentially you are off-loading the work or storing and serving static files (mainly images and CSS files) to another server. In addition to reducing the load on your your current server, it will speed up your site because files will be served from a location closest to each site visitor.
There are many good CDN choices. Amazon CloudFront is one popular option, though in my optinion the prize for the easiest service to setup is CloudFlare ... they offer a free plan, simply fill in the details, change the DNS settings on your domain to point to CloudFlare and you will be up and running.
With some fine-tuning, you can expect to reduce the requests on your server by up to 80%
I use both Amazon and CloudFlare, with good results. I have found that the main thing to be cautious of is to carefully check all the scripts on your site and make sure they are working as expected. CloudFlare has a simple setting where you can specify the cache settings as well, so there's another detail on your list covered.
Good luck!

Things to keep in mind while architecting high volume traffic centered web site

I have this question. If I am designing a web site which is expected to have high-traffic, then what are the things I should keep in mind?
Thanks
Be careful about your database management.
Build your database tables, and links between tables keeping in mind that you do not want to search / load useless things.
Once the site is working, I would optimise it as much as possible. Tools like YSlow or WebPageTest make it easy to analyse a page and pinpoint bottlenecks and places for improvements.
Also for a high volume site, I think that you definitely want to use a content delivery network. There are lots of options, including Amazon CloudFront and CloudFlare. Using a CDN will reduce the load on your server by 60-80%, it will make the site faster and it will cost you hardly anything.
Unless there is a specific reason why it's not a good fit for your site, you can't go wrong!
Good luck!

Good distributed general purpose filesystem in my case?

I've been researching the idea of using distributed file system along with my dedicated servers instead of going with Amazon S3 and the results are nothing but massive headaches!
My project have the following characteristics/requirements:
User files are stored in dedicated servers. Each file is stored in 2 separate machines, located in different data centers (150-200 miles away from each other)
I'm using Amazon RDS to host the associated mysql database (*). It's fairly compact (only hold IDs/files metadata)
Files/data is around 50TB. Naturally, data does change and will definitely grow with time
My question is: is there a good general-purpose, distributed parallel fault-tolerant file system that have the following characteristics:
Stable & reasonably fast (upload/download)
Fairly easy to setup & maintain
Handle data storage so that I only have to care about removing/adding new servers if the need arise (ie. add new servers to the filesystem's server pool by editing a simple config, or something like that)
I've read about OpenStack, GlusterFS, MogileFS, XtreemFS, etc...but the more I read, the more I get confused!
(*) Yes, I realize the contradiction. Cost-wise it does make sense to host the database on RDS. But storing (up to) 50TB of users files on amazon is way too expensive compared to using dedicated servers (provided it's good enough).
PS. my app isn't live yet, so I'm open to suggestion if someone have a good idea that fits well in my case.
EDIT I'm not trying to make a S3 clone, I just need to use an existing hosting infrastructure to build small-scale cloud solution, my question is about finding the right distributed file system to handle/automate this.
We recently switched from an expensive storage solution to the opensource Lizardfs for our Distributed storage solution. It is quite simple to set up and scale once your understand the basic concept.
Check out https://docs.lizardfs.com/introduction.html#architecture for a quick overview. But forget about shadow master en meta loggers for now. What you need to know is that there are
a master: that regulates the traffic (make sure that has enough cpu)
chunkservers: which actually store the data. Use any kind of off the shelf hardware with a bunch of harddisks attached.
Clients: which are just simple mount points. So you can get a giant 50TB mount if you want. The master will tell the client where to find/store the files. The actual data is being transfered straight from the client->chunkserver and back.
You can add as many chunkservers as you want, the master will automatically try to balance your storage usage across them. Adding storage is a matter of adding harddrives, or adding servers. They don't have to be actual bare metal machines, but that is probably the cheapest.
There are 2 amazing features in lizardfs that allow georeplication.
Goals (see https://docs.lizardfs.com/adminguide/replication.html#standard-goals): How important are files to you. You can define, on a file level/folder level how many times a file needs to be replicated. Do you want 2 copies 3? 10? You could define a goal of 2 copies for old files that are simply there for archiving purposes. And define a goal of 4 copies on SSD drives for all new files.
Those same goals can also be used to do georeplication. You define that your data has to be stored it least two different locations by labeling your chunkservers accordingly. (e.g. DC1 and DC2)
Rack awareness (see https://docs.lizardfs.com/adminguide/advanced_configuration.html#configuring-rack-awareness-network-topology): you basically define IP ranges to teach the system how your network looks like. This way, clients will try to serve files from the closest server.
The ease of setting it up is what sold lizardfs for me. I've heard very good things about Ceph, but setting it up is another matter...
What worried me at first was how proven the technology is/was. So I spent quite a lot of research on figuring out who uses it.
Orange Poland (A large telecom provider) is one of the users.
And Cloudweavers/opennebula actualy built a business around it selling complete solutions.
Won't it take more than one person a few months a year to manage these servers? That will cost some $, then you have the cost of hosting the data yourself, then you have the added huge cost that the business / system you are building is not obviously scalable? In addition any likely investor will be turned away by a complex home grown data hosting system. How will you ensure integrity/security on par with Amazon? Your max savings per year look like $30,000 or so.
You could save money by doing a de-duplicated storage system where you just store all the unique chunks of data - also see rsync. Don't know how redundant your data is though.
I recommend LizardFS and GfarmFS.
IMHO Ceph is a major disappointment and so is XtreemFS.

Why ww2 sub domains?

I have seen on the web some domain names having prefix of ww2 or ww3 or so (ww2.somedomain.example, ww3.yourdomain.example). And these happen mostly when traveling from a page to page. What would be the reason of having such subdomains? Is there anything special about them or are they just another sub domain? I mean, are they useful in any particular context?
People running large(-ish) sites used to do this when they needed to break up the load between more than one server. One machine would be called www then the next one would be called www2, etc.
Today, much better load balancing solutions are available that don't require you to expose your internal machine naming conventions to the browser clients.
Technically, the initials before the primary domain name (e.g. the "mail" in mail.yahoo.com) can be best though of as a machine name, identifying the web server/mail server, whatever. They can also identify a group of machines (a web farm).
So the person building up that machine can call it anything they want. The initials www are a (somewhat arbitrary) convention.
Oftentimes, ww{x} is used to indicate a particular server of a set of mirrored servers. If properly configured, I could have www.mydomain.example point to my web site on a load balancer, while I could use ww1, ww2, ww3, etc to access the site guaranteed from a specific LBed server.
I can see 3 possibilities
make the browser load resources more faster. the browser would open a fixed number of connection to same domain not to load the server
they are using more then one server so they can share the load between servers
separate some content to a separate virtual host or server. some kind of organization ...
As various answers have pointed out, modern day load-balancers can balance load without having to resort to using different sub-domains for each machine. However, there is still one benefit of dividing your site into various sub-domains: maximize browser connections.
All browsers limit the number of concurrent connections to a particular host (6 for most modern browsers). If a page contains lots of assets, page-load would be slow as the browser queue those requests because of connection limit. By loading different assets from different subdomain, you get around the connection limit, speeding up page-load.
Typically it's a partitioning strategy. When sites get sufficiently large that they can't run (or run well) on a single server you then have to look at solutions for scaling the application out horizontally (ie more servers) rather than vertically (ie bigger servers).
Some example partitioning strategies are:
Certain users always use certain servers. This can be arbitrary or based on some criteria (user type, geographic location, etc);
When a user gets a session that session is assigned to a particular server (sometimes called "sticky sessions" although this can also be used where such different machines are transparent); and
Certain activities are always on certain machines.
Another common case is organizational reasons. In an extremely large company, www might be for their main marketing website. And, ww2 might be, say, for product documentation pages.
In an ideal world, all departments would share perfectly. In practise, a big company might have their (www) marketing pages managed by an external agency. Their internal (ww2) pages done by their internal team. Often, the marketing agency just doesn't update pages quickly or refuses to run certain stacks, may be too limiting in terms of bureaucratic needs.
The marketing agency may insist on controlling the www and not sharing due to past situations where a company website went down due to internal reasons and yet the agency got blamed, or vice versa.
So, theoretically, there's no need to do this with modern load balancing and such. But, in practise, it can be a lot cheaper, straightforward and allow better business productivity.

Resources