Is there such a thing as a reverse CDN? (content 'retrieval' network) - content-management

Our clients upload a serious amount of data from all over the world and we'd like to do our best to make that as painless as possible. Our clients upload 2GB worth of files over their sometimes very 'retail' broadband packages (with capped upload speeds) that draw out upload times to 24-48 hours. At any given time we have 10 or more concurrent uploads and peek periods we can have 100 concurrent uploads. So we decided to consider ways to reduce latency and keep our clients traffic local... so just as a CDN has download servers in various locations, we'd like upload servers.
Any experience or thoughts?
We're not a huge company but this is a problem worth solving so we'll consider all options.

What about putting some servers physically closer to your clients ?
Same ISP, or at the very least in the same countries. Then you just collect it on schedule. I don't imagine that they're getting top speeds when there's 100 of them uploading to you either, so the sooner you can get them completed the better.
Also, do they need to upload this stuff immediately ?? Can some of them post DVD for whatever isn't time sensitive ? I know it sux dealing with media in the post.... so it's hardly ideal.
A reverse CDN sort of situation would only really happen if you had multiple clients using torrents and seeding their uploads (somehow) to one of your servers.
You haven't really said if this is a problem for you, or your clients. So, some more info is going to get you a better answer here.

2GB per what time period? Hour? Day?
If your operation is huge, I wouldn't be too surprised if Akamai or one of the other usual CDN suspects can provide this service to you for the right price. You might get your bizdev folks (or purchasing) in touch with them.

Related

Prevent bottleneck on bandwidth for mobile internet

I am sure that this question has already been answered, but unfortunately I do not know the keywords. Therefore my search remained unsuccessful until now.
Scenario: I want to transmit a lifestream via Mobile Internet using RaspberryPi, and depending on the bandwidth, downscale the streams and upscale them again when available.
My two questions for the network specialists among you:
i know i can actively check the bandwidth, but how would you do this without interfering with the existing processes transmitting? Should I commit a bandwidth to the processes and then slowly determine the remaining bandwidth using a test tool? Or are there already practical solutions?
Can I determine in the mobile Internet, or in the network interface, when a bottelneck is reached?
Passive methods would be my preference. where I wouldn't have to load the bandwidth. e.g. I could know how much bandwidth the stream uses, and how much arrives. But how do I make sure there is enough capacity before I go up with the bitrate?
Thanks for your wisdom ;)

Best way to manage 1000+ users registering for conference events at once

We are currently in the process of organising a student conference.
The issue is that we offer several different events at the same time over the course of a week. The conference runs the whole day.
It's currently been operating on a first come, first served basis, however this has led to dramatic problems in the past, namely the server crashing almost immediately, as 1000+ students all try to get the best events as quickly as they can.
Is anyone aware of a way to best handle this so that each user has the best chance of enrolling in the events they wish to attend, firstly without the server crashing and secondly with people registering for events which have a maximum capacity, all within a few minutes? Perhaps somehow staggering the registration process or something similar?
I'm aware this is a very broad question, however I'm not sure where to look to when trying to solve this problem...
Broad questions have equally broad answers. There are broadly two ways to handle it
Write more performant code so that a single server can handle the load.
Optimize backend code; cache data; minimize DB queries; optimize DB queries; optimize third party calls; consider storing intermediate things in memory; make judicious use of transactions trading off consistency with performance if possible; partition DB.
Horizontally scale - deploy multiple servers. Put a load balancer in front of your multiple front end servers. Horizontally scale DB by introducing multiple read slaves.
There are no quick fixes. It all starts with analysis first - which parts of your code are taking most time and most resources and then systematically attacking them.
Some quick fixes are possible; e.g. search results may be cached. The cache might be stale; so there would be situations where the search page shows that there are seats available, when in reality the event is full. You handle such cases on registration time. For caching web pages use a caching proxy.

simulate user load on hardware router

I am trying to simulate user load on a hardware router. I am specifically trying to emulate the average load of a home router.
What i need to to do is load it up over a week long period at different times and perform the following:
Data Transfer
Torrent Downloads
HTTP/HTTPS Pages requests to different pages. Static content, dynamic content. etc.
I would need to this repeat at my specific intervals and be able to test multiple routers at once.
Anyone know of any software or scripts that will achieve this.
Cheers
Sure. You might be surprised to learn that the load on an average home router is probably pretty low most of the time. Do the math: even downloading at maximum DSL or cable router speed (even if it were small packet sizes, which in higher loads is not usually the case) is just not a significant load on a modern CPU these days.
Scripting loads is easy. I have a script that I bang against Comcast sometimes when I doubt their last mile link to my home. It simply uses wget (or try curl) to download a file of reasonable size repeatedly and records the download statistics (time and/or data rate) of the transfers. Just find a .pdf or other file of the size you need from around the net somewhere, or use a busy website with lots of content. Just avoid the little guys who might have to pay for that bandwidth you are consuming in your test. Better yet, Amazon S3 storage (and transfer bandwidth) is very cheap these days and easy to use. You could put some files of your own choosing up there, and download those repeatedly for your test environment instead of stealing bandwidth from someone else! ;)
Never played with any torrent clients, so I can't help you there, but I bet there are some you can script.
Also, you might check out netperf. I don't know the status of that project, but I've used it in the past to generate very high network loads. Google for it.
Have fun and good luck!
-Chris

How much time CDN takes with new DNS

I'm using Amazom CloudFront as CDN and we may have to change between two systems according to situation.
Here's what I need to be prepared for -
How much time will CDN take to resolve the new address every time (new system, old system)? The same time it takes the domain to propagate?
What about the cache it collected before changing?
I'm reading this article http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/HowToUpdateDistribution.html
It would be much better if someone shared his/her experience.
When you setup a new container on any CDN you are using their domains which are already propagated across the world. If you plan to use your own domain instead of theirs for example cdn.yourdomain.com or static01.yourdomain.com etc then standard propagate time comes into play.
What your find with CDNs is once you upload your files to the server it takes some times to "Spread" replicate your files on all there networks so for example say their main server is in america those accessing the files from UK will download from America and not a local server until all files have been cloned across their network, it takes from a few mins to a few hours depending on how many files and the sizes, the good thing is everyone can use those files with no delay but for full speed advantage it takes a little time.
As far as Cache its pretty straight forward you set a TTL expire on the containers which means they get cached and so on. Personally I use a 72 hour TTL on mine and is favored by Google and other search engines.
Hope this helps.

Logging requests on high traffic websites

I wonder how high traffic websites handle traffic logging, for example a website like myspace.com receives a lot of hits, I can imagine it would take a lot of space to log all those requests, so, do they log every single request or how do they handle this?
If you view source on a MySpace page, you get the answer:
<script type="text/javascript">
var pageTracker = _gat._getTracker("UA-6293770-1");
pageTracker._setDomainName(".myspace.com");
pageTracker._setSampleRate("1"); //sets sampling rate to 1 percent
pageTracker._trackPageview();
</script>
That script means they're using Google Analytics.
They can't just gauge traffic using IIS logs because they may sell ads to third parties, and third parties won't take your word for how much traffic you get. They want independent numbers from a separate company, and that's where Google Analytics comes in.
Just for future reference - whenever you've got a question about how a web site is doing something, try viewing the source. You'd be amazed at what you can find there in plain view.
We had a similar issue with out Intranet which is used by hundreds of people. The disk activity was huge and performance was being hurt.
The short answer is Asynchronous non-blocking logging.
probably like google analytics.
Use Javascript to load a page on a difference server, etc.
Don't how they track it since I don't work there. I am pretty sure that they have enough storage to record every little thing about their user if they wanted.
If I were them, I would use AwStats if I just wanted to know basic stuff about my users.
It is more likely that they have developed their own scripts for tracking their users. Stuff they would log
-ip_address
-referrer
-time
-browser
-OS
and so on. Then a script to see different data about the user varying by day, weeks, or months. As brulak said, something along the line of Analytics, but since they have access to actual database, they can learn much more about their users.
ZXTM traffic shaping and logging, speaking from experience here
I'd be extremely surprised if they didn't log every single request, yes, and operations with particularly high traffic volumes usually roll their own log-management solutions against the raw server logs, in some form or other -- sometimes as simple batch-type processes, sometimes as complete subsystems.
One company I worked for, back in the dot-com heyday, got upwards of twenty million pageviews a day; for that site (actually a set of them, running across a few dozen machines in all, as I recall), our ops team wrote a quite sophisticated, clustered solution in C that parsed, translated (into relational storage), compressed and distributed the logs daily. Log files, especially verbose ones, pile up fast, and the commercial solutions available at the time just couldn't cut it.
If by logging you mean for collecting server related information (request and response times, db and cpu usage per request etc) I think they sample only the 10% or 1% of the traffic. That gives the same results (provide developers with auditing information) without filling in the disks or slowing the site down.

Resources