How much time CDN takes with new DNS - dns

I'm using Amazom CloudFront as CDN and we may have to change between two systems according to situation.
Here's what I need to be prepared for -
How much time will CDN take to resolve the new address every time (new system, old system)? The same time it takes the domain to propagate?
What about the cache it collected before changing?
I'm reading this article http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/HowToUpdateDistribution.html
It would be much better if someone shared his/her experience.

When you setup a new container on any CDN you are using their domains which are already propagated across the world. If you plan to use your own domain instead of theirs for example cdn.yourdomain.com or static01.yourdomain.com etc then standard propagate time comes into play.
What your find with CDNs is once you upload your files to the server it takes some times to "Spread" replicate your files on all there networks so for example say their main server is in america those accessing the files from UK will download from America and not a local server until all files have been cloned across their network, it takes from a few mins to a few hours depending on how many files and the sizes, the good thing is everyone can use those files with no delay but for full speed advantage it takes a little time.
As far as Cache its pretty straight forward you set a TTL expire on the containers which means they get cached and so on. Personally I use a 72 hour TTL on mine and is favored by Google and other search engines.
Hope this helps.

Related

simulate user load on hardware router

I am trying to simulate user load on a hardware router. I am specifically trying to emulate the average load of a home router.
What i need to to do is load it up over a week long period at different times and perform the following:
Data Transfer
Torrent Downloads
HTTP/HTTPS Pages requests to different pages. Static content, dynamic content. etc.
I would need to this repeat at my specific intervals and be able to test multiple routers at once.
Anyone know of any software or scripts that will achieve this.
Cheers
Sure. You might be surprised to learn that the load on an average home router is probably pretty low most of the time. Do the math: even downloading at maximum DSL or cable router speed (even if it were small packet sizes, which in higher loads is not usually the case) is just not a significant load on a modern CPU these days.
Scripting loads is easy. I have a script that I bang against Comcast sometimes when I doubt their last mile link to my home. It simply uses wget (or try curl) to download a file of reasonable size repeatedly and records the download statistics (time and/or data rate) of the transfers. Just find a .pdf or other file of the size you need from around the net somewhere, or use a busy website with lots of content. Just avoid the little guys who might have to pay for that bandwidth you are consuming in your test. Better yet, Amazon S3 storage (and transfer bandwidth) is very cheap these days and easy to use. You could put some files of your own choosing up there, and download those repeatedly for your test environment instead of stealing bandwidth from someone else! ;)
Never played with any torrent clients, so I can't help you there, but I bet there are some you can script.
Also, you might check out netperf. I don't know the status of that project, but I've used it in the past to generate very high network loads. Google for it.
Have fun and good luck!
-Chris

I need to speed up my site and reduce the number of files calls

My webhost is aking me to speed up my site and reduce the number of files calls.
Ok let me explain a little, my website is use in 95% as a bridge between my database (in the same hosting) and my Android applications (I have around 30 that need information from my db), the information only goes one way (as now) the app calls a json string like this the one in the site:
http://www.guiasitio.com/mantenimiento/applinks/prlinks.php
and this webpage to show in a web view as welcome message:
http://www.guiasitio.com/movilapp/test.php
this page has some images and jquery so I think this are the ones having a lot of memory usage, they have told me to use some code to create a cache of those files in the person browser to save memory (that is a little Chinese to me since I don't understand it) can some one give me an idea and send me to a tutorial on how to get this done?. Can the webview in a Android app keep caches of this files?
All your help his highly appreciated. Thanks
Using a CDN or content delivery network would be an easy solution if it worked well for you. Essentially you are off-loading the work or storing and serving static files (mainly images and CSS files) to another server. In addition to reducing the load on your your current server, it will speed up your site because files will be served from a location closest to each site visitor.
There are many good CDN choices. Amazon CloudFront is one popular option, though in my optinion the prize for the easiest service to setup is CloudFlare ... they offer a free plan, simply fill in the details, change the DNS settings on your domain to point to CloudFlare and you will be up and running.
With some fine-tuning, you can expect to reduce the requests on your server by up to 80%
I use both Amazon and CloudFlare, with good results. I have found that the main thing to be cautious of is to carefully check all the scripts on your site and make sure they are working as expected. CloudFlare has a simple setting where you can specify the cache settings as well, so there's another detail on your list covered.
Good luck!

Good distributed general purpose filesystem in my case?

I've been researching the idea of using distributed file system along with my dedicated servers instead of going with Amazon S3 and the results are nothing but massive headaches!
My project have the following characteristics/requirements:
User files are stored in dedicated servers. Each file is stored in 2 separate machines, located in different data centers (150-200 miles away from each other)
I'm using Amazon RDS to host the associated mysql database (*). It's fairly compact (only hold IDs/files metadata)
Files/data is around 50TB. Naturally, data does change and will definitely grow with time
My question is: is there a good general-purpose, distributed parallel fault-tolerant file system that have the following characteristics:
Stable & reasonably fast (upload/download)
Fairly easy to setup & maintain
Handle data storage so that I only have to care about removing/adding new servers if the need arise (ie. add new servers to the filesystem's server pool by editing a simple config, or something like that)
I've read about OpenStack, GlusterFS, MogileFS, XtreemFS, etc...but the more I read, the more I get confused!
(*) Yes, I realize the contradiction. Cost-wise it does make sense to host the database on RDS. But storing (up to) 50TB of users files on amazon is way too expensive compared to using dedicated servers (provided it's good enough).
PS. my app isn't live yet, so I'm open to suggestion if someone have a good idea that fits well in my case.
EDIT I'm not trying to make a S3 clone, I just need to use an existing hosting infrastructure to build small-scale cloud solution, my question is about finding the right distributed file system to handle/automate this.
We recently switched from an expensive storage solution to the opensource Lizardfs for our Distributed storage solution. It is quite simple to set up and scale once your understand the basic concept.
Check out https://docs.lizardfs.com/introduction.html#architecture for a quick overview. But forget about shadow master en meta loggers for now. What you need to know is that there are
a master: that regulates the traffic (make sure that has enough cpu)
chunkservers: which actually store the data. Use any kind of off the shelf hardware with a bunch of harddisks attached.
Clients: which are just simple mount points. So you can get a giant 50TB mount if you want. The master will tell the client where to find/store the files. The actual data is being transfered straight from the client->chunkserver and back.
You can add as many chunkservers as you want, the master will automatically try to balance your storage usage across them. Adding storage is a matter of adding harddrives, or adding servers. They don't have to be actual bare metal machines, but that is probably the cheapest.
There are 2 amazing features in lizardfs that allow georeplication.
Goals (see https://docs.lizardfs.com/adminguide/replication.html#standard-goals): How important are files to you. You can define, on a file level/folder level how many times a file needs to be replicated. Do you want 2 copies 3? 10? You could define a goal of 2 copies for old files that are simply there for archiving purposes. And define a goal of 4 copies on SSD drives for all new files.
Those same goals can also be used to do georeplication. You define that your data has to be stored it least two different locations by labeling your chunkservers accordingly. (e.g. DC1 and DC2)
Rack awareness (see https://docs.lizardfs.com/adminguide/advanced_configuration.html#configuring-rack-awareness-network-topology): you basically define IP ranges to teach the system how your network looks like. This way, clients will try to serve files from the closest server.
The ease of setting it up is what sold lizardfs for me. I've heard very good things about Ceph, but setting it up is another matter...
What worried me at first was how proven the technology is/was. So I spent quite a lot of research on figuring out who uses it.
Orange Poland (A large telecom provider) is one of the users.
And Cloudweavers/opennebula actualy built a business around it selling complete solutions.
Won't it take more than one person a few months a year to manage these servers? That will cost some $, then you have the cost of hosting the data yourself, then you have the added huge cost that the business / system you are building is not obviously scalable? In addition any likely investor will be turned away by a complex home grown data hosting system. How will you ensure integrity/security on par with Amazon? Your max savings per year look like $30,000 or so.
You could save money by doing a de-duplicated storage system where you just store all the unique chunks of data - also see rsync. Don't know how redundant your data is though.
I recommend LizardFS and GfarmFS.
IMHO Ceph is a major disappointment and so is XtreemFS.

Future proofing client-server code?

We have a web based client-server product. The client is expected to be used in the upwards of 1M users (a famous company is going to use it).
Our server is set up in the cloud. One of the major questions while designing is how to make the whole program future proof. Say:
Cloud provider goes down, then move automatically to backup in another cloud
Move to a different server altogether etc
The options we thought till now are:
DNS: Running a DNS name server on the cloud ourselves.
Directory server - The directory server also lives on the cloud
Have our server returning future movements and future URLs etc to the client - wherein the client is specifically designed to handle those scenarios
Since this should be a usual problem, which is the best solution for the same? Since our company is a very small one, we are looking at the least technically and financially expensive solution (say option 3 etc)?
Could someone provide some pointers for the same?
K
I would go for the directory server option. Its the most flexable and gives you the most control over what happens in a given situtaion.
To avoid the directory itself becoming a single point of failure I would have three or four of them running a different locations with different providers. Have the client app randomly choose one of the directoy urls at startup and work its way through them all until it finds one that works.
To make it really future proof you would probably need a simple protocol to dynamicly update the list of directory servers -- but be careful if this is badly implemented you will leave your clients open to all sorts of malicious spoofing attacks.
Re. DNS: requests can be cached, and it might take a while for the changes to propagate themselves (hours to days).
I'd go for a list of prioritized IPs that can be updated on the client. If one IP fails, the client would retry with 2nd, 3rd and so on.
I'm not sure I 100% understood your question, but if I did it boils down to: if my server moves, how can my clients find it?
That's exactly what DNS did in nearly the last three decades.
Every possible system you could choose would need to be bootstrapped with initial working data: address for a directory server, address of a working server to get an updated list of addresses, etc. That's what the root dns servers are for and OS vendors will do the bootstrapping part for you.
Sure DNS queries could be cached, that's how it is supposed to work and how it scales to internet size. You control the caching (read about the TTL) and you can usually keep it on sane values (doesn't make sense to keep it shorter than the absolute minimum time needed to re-deploy the server somewhere else).

Is there such a thing as a reverse CDN? (content 'retrieval' network)

Our clients upload a serious amount of data from all over the world and we'd like to do our best to make that as painless as possible. Our clients upload 2GB worth of files over their sometimes very 'retail' broadband packages (with capped upload speeds) that draw out upload times to 24-48 hours. At any given time we have 10 or more concurrent uploads and peek periods we can have 100 concurrent uploads. So we decided to consider ways to reduce latency and keep our clients traffic local... so just as a CDN has download servers in various locations, we'd like upload servers.
Any experience or thoughts?
We're not a huge company but this is a problem worth solving so we'll consider all options.
What about putting some servers physically closer to your clients ?
Same ISP, or at the very least in the same countries. Then you just collect it on schedule. I don't imagine that they're getting top speeds when there's 100 of them uploading to you either, so the sooner you can get them completed the better.
Also, do they need to upload this stuff immediately ?? Can some of them post DVD for whatever isn't time sensitive ? I know it sux dealing with media in the post.... so it's hardly ideal.
A reverse CDN sort of situation would only really happen if you had multiple clients using torrents and seeding their uploads (somehow) to one of your servers.
You haven't really said if this is a problem for you, or your clients. So, some more info is going to get you a better answer here.
2GB per what time period? Hour? Day?
If your operation is huge, I wouldn't be too surprised if Akamai or one of the other usual CDN suspects can provide this service to you for the right price. You might get your bizdev folks (or purchasing) in touch with them.

Resources