How to handle lots of file downloads from my linux server - linux

I have a file 50MB file hosted in my deticated linux server, each day there is almost 50K users that download this file (2.5GB a day).
There are lots of crashes and users reports that sometimes even the file can't be downloaded since the server is overload.
I wonder if someone can help me how do I calculate which server/bandwidth/anything I need to handle that?
Is there any solution where I can host the file and pay per download?
Is there any setting or anything that I can improve or do on my server that will help to fix this issue?
My current server specification is:
2 x Intel Xeon E5 2620V2
2 x (6 x 2.10 GHz)
128 GB REG ECC
256GB SSD HD
1 IP Address
1 Gbit/s port Shared Bandwidth
I'll appreciate any help from you guys.
Thank you very much.

Your hardware configuration should probably be fine. At least if the downloads are more or less evenly distributed over the day.
One of the most effective http servers for serving static content is nginx. Take a look at this guide: Serving static content.
If that doesn't help, you should consider Amazon S3, which is probably the most popular file hosting solution with a reasonable price tag.

This is how not to make the file available for download:
data = read_file(filename)
echo data
You want to be using sendfile(2) to have the kernel stream the file directly into the socket instead of doing it in userspace.
Each server has their own mechanism for invoking sendfile(2); with httpd this is mod_xsendfile and its associated response header (X-SENDFILE). You'll find that moving to this will allow you to not only handle your current userbase but also to add many more without worry.

Related

Delays when uploading archives to Amazon Glacier using boto3 from a NAS box

I'm trying to backup local files to Amazon Glacier using the boto3 Python library on my NAS box (Zyxel NAS326 running Python 3.7 on entware-ng). While this generally works, I found that the upload speed is extremely slow. To rule out general problems with the NAS box or my internet connection, I did the following:
Running my backup program on my desktop computer: maximum upload rate used
Uploading a file to the internet from my NAS box using FTP: maximum upload rate used
On my router I could see that there are only short peaks of outgoing data followed by long delays.
To narrow down the problem I have logged the file access during the upload. This showed that there is no delay reading from disk, but during the sending of the data via the HTTPS connection. It turned out that a chunk of data is read from the file (usually about 9 MB) then there is a short activity on the internet connection, then a delay of minimum 10 seconds before more data is read from the file. So it seems that the connection is somehow blocking the upload, but I have no idea why, or what I could do about it.
Has anyone seen this behaviour before or has ideas what else I could try?

Processing speed over mounted path

I have two scenarios.
Scenario 1: Machine A contains 1000 documents as folders. This folder of machine A is mounted in machine B. I process documents within these folders in machine B and store the output result in mounted path in machine B.
Scenario 2: The documents in machine A is directly copied into machine B and processed
Scenario 2 is much faster than Scenario 1. I could guess its because there is no data transfer happening over the network between 2 machines. Is there a way I can use mounting and still achieve better performance?
Did you try enabling a cache? - for NFS: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/fscachenfs.html - CIFS should have caching enabled by default (unless you disabled it)
The other option would be to use something like Windows’ offline files, which copies files and folders between client and server in the background, so you don’t need to deal with it. The only thing I’ve found for linux is OFS.
But the performance depends on the size of the files and if you read them randomly or sequentially. For instance when I am encoding videos, I access the file right away via the network from my NFS, because it takes as much time as it would take to read and write the file. This way no additional time is “wasted” on the encoding, as the application can encode the stream which is coming from the network.
So for large files you might want to change the algorithms to a sequential read, on the other hand small files which are copied within seconds, could be also synced between server and client using rsync, bittorrent sync, dropbox or one of the other hundreds of tools. And this is actually quite commonly done.

File Server vs NAS for hosting media files

I have a web portal where my users can login and submit artworks (image, documents, etc.). This web portal is hosted in 2 load-balanced web servers.
Because of this load balancing, I'm thinking of using NAS to serve as a centralized media file storage for my web portal. I'm considering NAS because it's cheaper than a file server and it's easier to maintain.
Now the questions are:
File hosting - Is there any NAS device that can act as a file hosting server? Or, do I need to create a virtual path in my web server to the NAS? This can be achieved easily if I use a file server, I can just bind a separate domain to the file server, something like media.mydomain.com, so all media files will be served through this domain. I don't mind serving the media files through a virtual path from my web servers, smthg like mydomain.com/media. I would like to know if NAS can do any of the approaches above, and whether it's secure, easy to setup, etc.
Performance - This is more important because read and writes are quite intensive. I never use NAS before. I'm thinking of getting 2 hard drives (2TB, 15000RPM) configured for RAID-1. Would this be able to match the performance of a common file server? I know the answer to this question is relative but I just want to see how NAS can be used as a file hosting, not just as a file sharing device.
My web servers are running Windows Server 2008R2 with IIS 7.5. I would appreciate if anyone can also share best practices for integrating NAS with Win Server/IIS.
Thanks.
A NAS provides a shared location for information on a private network (at least you shouldn't expose NAS technologies as NFS and CIFS over the internet) and is not really designed as a web file host. That is not to say you can't configure a NAS as a web file host utilizing IIS/apache/nginx, but then you don't need your web servers. NAS setup is well documented for both windows server and most unix/linux distros, both are relatively easy. A NAS is as secure as it is designed to be, you can utilize a variety of access control methods to secure a NAS depending on your implementation.
This really depends on your concurrent users and what kind of load you are expecting them to put on the system, for the most part performance over a 1Gb LAN connection and a 15,000 RPM hard drive for a NAS should provide ample performance for a decent amount of concurrent users, but I can't say for certain because if a user sits there downloading hundreds of files at a time you can have issues. As with any web technology wrap limits around user usage to prevent one user bringing down your entire system. I'm not sure what you are defining a file server (a NAS is a file server), if you think of a file server as a website that hosts files, a NAS will provide the same if not better performance based on where the device is in relation to your web servers (again, depending on utilization). If you are worried about performance you can always build a larger RAID array using RAID 5, RAID 6, RAID 10 or use SSDs to increase storage performance. For the most part in any NAS the hardware constraints usually are: storage speed, network speed, ram, cpu. Again this really depends on utilization, so test well, benchmark, and monitor performance
Microsoft provides a tuning document for server 2008 r2 that is useful: http://msdn.microsoft.com/en-us/library/windows/hardware/gg463392.aspx
In my opinion your architecture would be your 2 web servers referencing the NAS as a shared location using a virtual directory pointed at the NAS for your files or handle the NAS location in code (using code provides a whole plethora of options around security and usage).

using torrents to back up vhd's

Hi it's a question and it may be redundant but I have a hunch there is a tool for this - or there should be and if there isn't I might just make it - or maybe I am barking up the wrong tree in which case correct my thinking:
But my problem is this: I am looking for some way to migrate large virtual disk drives off a server once a week via an internet connection of only moderate speed, in a solution that must be able to be throttled for bandwidth because the internet connection is always in use.
I thought about it and the problem is familar: large files that can moved that also be throttled that can easily survive disconnection/reconnection/large etc etc - the only solution I am familiar with that just does it perfectly is torrents.
Is there a way to automatically strategically make torrents and automatically "send" them to a client download list remotely? I am working in Windows Hyper-V Host but I use only Linux for the guests and I could easily cook up a guest to do the copying so consider it a windows or linux problem.
PS: the vhds are "offline" copies of guest servers by the time I am moving them - consider them merely 20-30gig dum files.
PPS: I'd rather avoid spending money
Bittorrent is an excellent choice, as it handles both incremental updates and automatic resume after connection loss very well.
To create a .torrent file automatically, use the btmakemetainfo script found in the original bittorrent package, or one from the numerous rewrites (bittornado, ...) -- all that matters is that it's scriptable. You should take care to set the "disable DHT" flag in the .torrent file.
You will need to find a tracker that allows you to track files with arbitrary hashes (because you do not know these in advance); you can either use an existing open tracker, or set up your own, but you should take care to limit the client IP ranges appropriately.
This reduces the problem to transferring the .torrent files -- I usually use rsync via ssh from a cronjob for that.
For point to point transfers, torrent is an expensive use of bandwidth. For 1:n transfers it is great as the distribution of load allows the client's upload bandwidth to be shared by other clients, so the bandwidth cost is amortised and everyone gains...
It sounds like you have only one client in which case I would look at a different solution...
wget allows for throttling and can resume transfers where it left off if the FTP/http server supports resuming transfers... That is what I would use
You can use rsync for that (http://linux.die.net/man/1/rsync). Search for the --partial option in man and that should do the trick. When a transfer is interrupted the unfinished result (file or directory) is kept. I am not 100% sure if it works with telnet/ssh transport when you send from local to a remote location (never checked that) but it should work with rsync daemon on the remote side.
You can also use that for sync in two local storage locations.
rsync --partial [-r for directories] source destination
edit: Just confirmed the crossed out statement with ssh

Uploading large files in JSF

I want to upload a file that is >16GB. How can I do this in JSF?
When using HTTP, you'll face two limitations. The one on the client side (webbrowser) and the one on the server side (webserver). The average webbrowser (IE/FF/Chrome/etc) has a limit of 2~4GB, depending on the make/version/platform. You cannot control this from the server side on. The enduser has to change the browser settings itself (sometimes this isn't possible at all). The average webserver (Tomcat/JBoss/Glassfish/etc) in turn has a limit of 2GB. You can configure this, but this still won't and can't remove the limitation on the webbrowser.
Your best bet is FTP. If you want to do this by a webpage, consider an applet which utilizes Apache Commons Net FTPClient. There are several ready-to-use opensource/commercial ones by the way.
You however still need to take into account that the disk file system on the FTP server side supports that large files. FAT32 for example has a limit of 4GB per file. NTFS and several *Nix file systems, however, can go up to 16EB.

Resources