why script downloads prevent parallelism in browsers - browser

According to this link:
"If you serve your images from multiple hostnames, you can get more than two downloads to occur in parallel. (I've gotten Internet Explorer to download over 100 images in parallel.) While a script is downloading, however, the browser won’t start any other downloads, even on different hostnames."
What I wonder is why downloading script prevents parallelism? I guess browser executes the script right after the download finishes. So why not to download them in parallel, but defer only execution? Is it an implementation choice? Is there any other reason for that?

The browser executes the script as it is downloading it, so it will not get any further on the page to see what else there is that it needs to download. You can use something like LABjs to help with this though: http://labjs.com/

Related

Are downloads from spark distribution archive often slow?

I was trying to download spark-hadoop distribution from the website - https://archive.apache.org/dist/spark/spark-3.1.2/. Often I find that the downloads from this site are generally slow. Is it due to some generic issue with the site itself?
That the download is slow I have verified in two ways -
In Colab I have run the command !wget -q https://archive.apache.org/dist/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz which keeps running often for more than say 10 minutes. While at other times it executes within 1 minute.
From the website I tried downloading it and even then the download speed is extremely slow occasionally.
It maybe because
You download multiple times
You download from non-browser, for example curl/wget
Your location is physically far from file server or network is unstable.
or something else. for example file server is slow
I think most of public server has kind a "safe guard" to prevent DDoS, So their "Safe guard" control download traffic per sec.
I faced similar issue, when I download from browser, it took 1min, but It took 10min when I use curl

Creating a customized sandbox in node.js (Can only read in a certain directory, and cannot write anywhere)

I am trying to make an application that runs submitted scripts, and would like to try to sandbox the submitted scripts. The scripts need to be able to be able to read in a certain directory (and in all of its subdirectories), but shouldn't be able to write at all, and, other than being able to read, should not be able to do anything that could not be done in a browser (ie download files using http). How would I go about doing this?
I don't think Node has this capability built in, but you should be able to run an "unsandboxed" Node on a *nix operating system as a severely restricted user (might be possible in other OSes too, I'm not sure). You might also want to look at Node's VM module.
Eventually, I decided on using the vm node module. I basically just made a namespace that the script running in the sandbox could use that would filter out malicious requests / requests that ought to be out of the bounds of the sandbox. The namebox included fs methods that would be necessary, but failed to execute any of the ones that would modify any directory other than the certain one that I wished the script to be able to modify.

Measure speed between two exact sites

I would really like to measure connection speed between two exact sites. Naturally one of the sites is our site. Somehow I need to prove that not our internet connection is flaky, but that a site at the other end, is overcrowded.
At our end I have windows and linux machines available for this.
I imagine I would run a script at certain times of day which - for example - tries to download an image from that site and try to measure download time. Then put the download time into a database then create a graph from the records in the database. (I know that this is really simple and not sophisticated enough, but hence my question)
I need help on the time measurement.
The felt speed differences are big, sometimes the application works flawlessly, but sometimes we get timed out errors.
Now I use speedtest to check if our internet connection is OK, but this does not show that the site that is not working is slow, and now I can't provide hard numbers to assist my situation.
Maybe it is worth mentioning that the application we try to use at the other end is java based.
Here's how I would do it in Linux:
Use wget to download whatever URL you think represents your sites best. Parse the output into a file (sed, awk), use crontab to trigger the download multiple times.
wget www.google.com
...
2014-02-24 22:03:09 (1.26 MB/s) - 'index.html' saved [11251]

Sending files to remote server in the quickest manner

I have a separate server that processes the media uploaded to my main, web facing server. For now I upload files to it using FTP but the problem with this is that to ensure the files are done uploading I have a timeout running, which adds a delay in the overall processing time. I can't seem to get it to wait less than 5 seconds and still guarantee to pick up the media and this delay is no longer acceptable. So:
Is there a better way to implement this cleanly? I've considered sticking with FTP and sending another file after the initial upload that will indicate it's done but then there are two uploads for every upload = expensive. Another option I've considered is implementing a custom server that will just get a content-length header, do some authentication, and then receive the file and kickoff the processing as soon as its ready. Socket programming doesn't seem too intimidating but I have some worries about sending binary files and different formats, is this a valid concern? Also are there any other protocols out there I could implement to do this, rather than reinvent the wheel? Something like FTP but with a little verification.
I'd be glad for any pointers or tips you can share, thanks!
I suggest you use rsync. This runs over ssh, will move entire directories / hierarchies of files, do incremental copies, in short everything you could possibly want.

Process text files ftp'ed into a set of directories in a hosted server

The situation is as follows:
A series of remote workstations collect field data and ftp the collected field data to a server through ftp. The data is sent as a CSV file which is stored in a unique directory for each workstation in the FTP server.
Each workstation sends a new update every 10 minutes, causing the previous data to be overwritten. We would like to somehow concatenate or store this data automatically. The workstation's processing is limited and cannot be extended as it's an embedded system.
One suggestion offered was to run a cronjob in the FTP server, however there is a Terms of service restriction to only allow cronjobs in 30 minute intervals as it's shared-hosting. Given the number of workstations uploading and the 10 minute interval between uploads it looks like the cronjob's 30 minute limit between calls might be a problem.
Is there any other approach that might be suggested? The available server-side scripting languages are perl, php and python.
Upgrading to a dedicated server might be necessary, but I'd still like to get input on how to solve this problem in the most elegant manner.
Most modern Linux's will support inotify to let your process know when the contents of a diretory has changed, so you don't even need to poll.
Edit: With regard to the comment below from Mark Baker :
"Be careful though, as you'll be notified as soon as the file is created, not when it's closed. So you'll need some way to make sure you don't pick up partial files."
That will happen with the inotify watch you set on the directory level - the way to make sure you then don't pick up the partial file is to set a further inotify watch on the new file and look for the IN_CLOSE event so that you know the file has been written to completely.
Once your process has seen this, you can delete the inotify watch on this new file, and process it at your leisure.
You might consider a persistent daemon that keeps polling the target directories:
grab_lockfile() or exit();
while (1) {
if (new_files()) {
process_new_files();
}
sleep(60);
}
Then your cron job can just try to start the daemon every 30 minutes. If the daemon can't grab the lockfile, it just dies, so there's no worry about multiple daemons running.
Another approach to consider would be to submit the files via HTTP POST and then process them via a CGI. This way, you guarantee that they've been dealt with properly at the time of submission.
The 30 minute limitation is pretty silly really. Starting processes in linux is not an expensive operation, so if all you're doing is checking for new files there's no good reason not to do it more often than that. We have cron jobs that run every minute and they don't have any noticeable effect on performance. However, I realise it's not your rule and if you're going to stick with that hosting provider you don't have a choice.
You'll need a long running daemon of some kind. The easy way is to just poll regularly, and probably that's what I'd do. Inotify, so you get notified as soon as a file is created, is a better option.
You can use inotify from perl with Linux::Inotify, or from python with pyinotify.
Be careful though, as you'll be notified as soon as the file is created, not when it's closed. So you'll need some way to make sure you don't pick up partial files.
With polling it's less likely you'll see partial files, but it will happen eventually and will be a nasty hard-to-reproduce bug when it does happen, so better to deal with the problem now.
If you're looking to stay with your existing FTP server setup then I'd advise using something like inotify or daemonized process to watch the upload directories. If you're OK with moving to a different FTP server, you might take a look at pyftpdlib which is a Python FTP server lib.
I've been a part of the dev team for pyftpdlib a while and one of more common requests was for a way to "process" files once they've finished uploading. Because of that we created an on_file_received() callback method that's triggered on completion of an upload (See issue #79 on our issue tracker for details).
If you're comfortable in Python then it might work out well for you to run pyftpdlib as your FTP server and run your processing code from the callback method. Note that pyftpdlib is asynchronous and not multi-threaded, so your callback method can't be blocking. If you need to run long-running tasks I would recommend a separate Python process or thread be used for the actual processing work.

Resources