Slow execution of Exe in Azure

Slow execution of Exe in Azure - azure

I am facing a problem of slow execution of exe in Azure platform
Following are the Steps:
Read data from SQL Azure Server& CSV files & display in on HTML5 pages.
Write data on CSV files.
Executing a external Fortron exe, which reads data from csv files generated in step 23.
Fortron exe after calculations write data on .txt file.
Read text file data generated in step 5 & display it on HTML5 pages.
Issue:
In point # 3, when we are invoking fortron exe using process start method, then –
On local machines in usually take 17~18 secs
On cloud server this is taking 34~35 secs.
Rest all other activities are taking same time on local as well as cloud server.

Regarding step 3: What size local machine are you using (e.g. number of cores), since you're running an exe that may be doing some number-crunching. Now compare that the the machine size allocated in Windows Azure? Are you using an Extra Small (shared core) or Small (single core)? Plus what size cpu does your local machine have? If you're not comparing like-kind configurations, you'll certainly have performance differences. Same goes for RAM (an Extra Small offers 768MB, with Small through XL offering 1.75GB per core) and bandwidth (XS has 5Mbps, Small through XL have 100Mbps per core).

The Azure systems have slower IO process than a local server this will be the reason you see the performance impact you are also on a shared system so your IO also may vary depending on your neighbours also and server load. If you are task is IO intensive the best bet is to run a VM and you need to persist the data is to attach multiple disks to the VM and then use stripping on the disk.
http://www.windowsazure.com/en-us/manage/windows/how-to-guides/attach-a-disk/
Striped IO Disks performance stats.
http://blinditandnetworkadmin.blogspot.co.uk/2012/08/vm-io-performance-on-windows-azure.html
You will need to have a warm set of disks to get true performance set.
Also I found the Temp storage on VM Normally the D drive to have very good IO so maybe worth if you are going to use a VM to try there first.

Related

Azcopy Copy Millions of Small Blob from storage account to your VM

I am trying to copy millions of small csv files from my storage account to a physical machine using the azcopy command, and I noticed that the speed has been very slow.
The format of azcopy command is
Azcopy Copy <Storage_account_source> --recursive --overwrite=True
And the Command is ran from the Physical Machine.
Is there a way where you can make azcopy download multiple blobs concurrently? instead of checking the blob one by one? I believe that's why the speed is dropping to such a low value of 1 mb/second as it's doing checks on these really small blobs one by one. Or if there is another way to increase the speed of this case of blob transfer?

azcopy is highly optimized for throughput using parallel processing etc. I haven't come across any tools that provide you faster download speed overall. The main limiting factors in my experience are usually (obviously) network bandwidth but also CPU power. It uses a lot of compute resources. So can you maybe increase those two on your VM at least for the duration of the download?

How to determine infra needs for a spark cluster

I am looking for some suggestions or resources on how to size-up servers for a spark cluster. We have enterprise requirements that force us to use on-prem servers only so I can't try the task on a public cloud (and even if I used fake data for PoC I would still need to buy the physical hardware later). The org also doesn't have a shared distributed compute env that I could use/I wasn't able to get good internal guidance on what to buy. I'd like to have some idea on what we need before I talk to a vendor who would try to up-sell me.
Our workload
We currently have a data preparation task that is very parallel. We implement it in python/pandas/sklearn + multiprocessing package on a set of servers with 40 skylake cores/80 threads and ~500GB RAM. We're able to complete the task in about 5 days by manually running this task over 3 servers (each one working on a separate part of the dataset). The tasks are CPU bounded (100% utilization on all threads) and usually the memory usage is low-ish (in the 100-200 GB range). Everything is scalable to a few thousand parallel processes, and some subtasks are even more paralellizable. A single chunk of data is in 10-60GB range (different keys can have very different sizes, a single chunk of data has multiple things that can be done to it in parallel). All of this parallelism is currently very manual and clearly should be done using a real distributed approach. Ideally we would like to complete this task in under 12 hours.
Potential of using existing servers
The servers we use for this processing workload are often used on individual basis. They each have dual V100 and do (single node, multigpu) GPU accelerated training for a big portion of their workload. They are operated bare metal/no vm. We don't want to lose this ability to use the servers on individual basis.
Looking for typical spark requirements they also have the issue of (1) only 1GB ethernet connection/switches between them (2) their SSDs are configured into a giant 11TB RAID 10 and we probably don't want to change how the file system looks like when the servers are used on individual basis.
Is there a software solution that could transform our servers into a cluster and back on demand or do we need to reformat everything into some underlying hadoop cluster (or something else)?
Potential of buying new servers
With the target of completing the workload in 12 hours, how do we go about selecting the correct number of nodes/node size?
For compute nodes
How do we choose number of nodes
CPU/RAM/storage?
Networking between nodes (our DC provides 1GB switches but we can buy custom)?
Other considerations?
For storage nodes
Are they the same as compute nodes?
If not how do we choose what is appropriate (our raw dataset is actually small, <1TB)
We extensively utilize a NAS as a shared storage between the servers, are there special consideration on how this needs to work with a cluster?
I'd also like to understand how I can scale up/down these numbers while still being able to viably complete the parallel processing workload. This way I can get a range of quotes => generate a budget proposal for 2021 => buy servers ~Q1.

Running Azcopy in parallel - bottleneck?

I want to download/upload files to Azure in parallel.
AzCopy, by default does not allow multiple runs on the same copy because of the locks on journal files. I am running multiple Azcopy instances on the same machine by pointing each of these instances to different journal files (using /Z )
But what is the bottle-neck in doing this? Bandwidth is obvious, but what is the bottleneck from Azure's side.

There is no real bottleneck from Azure's sides. Keep in mind though that the transfers are done with spare bandwidth and that there is no SLA as to whether it'll be fast or slow. That's all.
The other bottleneck you may need to check is your local CPU, when running over 4 AzCopy instances in parallel with 4 parallel uploads each my i7 starts to sweat a bit.

When uploading multiple small files it takes up to 60 seconds for it to complete

I'm trying to setup uploading to google cloud storage, and typically I will have about 200 concurrent uploads of files that are in size of 5 to 10kb. When I'm using the same code with local ceph s3 compatible storage, upload time is barely more than 2-3ms (which is obvious), and when uploading to google s3-like storage, if I have 3 to 5 threads upload time is usually within 200ms for a file. However, as soon as I reach decent concurrency - I get linear increments on the upload times.
First 10 files are uploaded within 200ms, next 10 within 5s, next 10 within 10s and so on till it get to a 60s.
If I use multiple processes - the result is the same. I'm using nodejs to perform the uploads with https://github.com/Automattic/knox module, pool is turned off, so its not an issue of sockets being queued up. I've tested enabling pool with maxSockets set to 500 or so, doesnt help much. When checking with sockstat, concurrently I only have up to 40 connections opened to google servers, even though I would initiate more than 500 to 1000 uploads at the same time using 16 processes. This is extremely weird.
Can anybody help me to diagnose the problem? Is there a limit of connections that google would allow to be opened from a single ip address?
I'm sure it's not a problem with my code, because beforehand I was using it with a local s3 storage (by local I mean I have a cluster of 20 machines with disks, and even though it's in the same data center if there would've been a problem with blocking operations or lack of sockets or anything similar I would've seen an increase in the upload time just as well, but there is not such a thing when using ceph). Reason I'm trying to migrate to google is that managing dying hard drives is pretty annoying and that happens often

According to Google's quotas page,
Sockets
Daily Data and Per-Minute (Burst) Data Limits
Applications using sockets are rate limited on a per minute and a per day basis. Per minute limits are set to handle burst behavior from applications.
The page also shows the limits. You might be running into them, or it could be a limitation of the hardware Google has your app running on.

Maybe its disk throughput problem, check your server iostat first. May be your disk it is too slow too handle file traffic or you got stuck because of open socket limits or file descriptor limits.
If this is the case some problems can be fixed via OS fine tuning. If its your disk latency problem then you can switch from hdd to ssd or increase your cluster size.

Process Many Files Concurrently — Copy Files Over or Read Through NFS?

I need to concurrently process a large amount of files (thousands of different files, with avg. size of 2MB per file).
All the information is stored on one (1.5TB) network hard drive, and will be processed by about 30 different machines. For efficiency, each machine will be reading (and processing) different files (there are thousands of files that need to be processed).
Every machine -- following its reading of a file from the 'incoming' folder on the 1.5TB hard drive -- will be processing the information and be ready to output the processed information back to the 'processed' folder on the 1.5TB drive. the processed information for every file is of roughly the same average size as the input files (about ~2MB per file).
What is the better thing to do:
(1) For every processing machine M, Copy all files that will be processed by M into its local hard drive, and then read & process the files locally on machine M.
(2) Instead of copying the files to every machine, every machine will access the 'incoming' folder directly (using NFS), and will read the files from there, and then process them locally.
Which idea is better? Are there any 'do' and 'donts' when one is doing such a thing?
I am mostly curious if it is a problem to have 30 machines or so read (or write) information to the same network drive, at the same time?
(note: existing files will only be read, not appended/written; new files will be created from scratch, so there are no issues of multiple access to the same file...). Are there any bottlenecks that I should expect?
(I am use Linux, Ubuntu 10.04 LTS on all machines if it all matters)

I would definitely do #2 - and I would do it as follows:
Run Apache on your main server with all the files. (Or some other HTTP server, if you really want). There are several reason's I'd do it this way:
HTTP is basically pure TCP (with some headers on it). Once the request is sent - it's a very "one-way" protocol. Low overhead, not chatty. High performance and efficiency - low overhead.
If you (for whatever reason) decided you needed to move or scale it out (using a could service, for example) HTTP would be a much better way to move the data around over the open Internet, than NFS. You could use SSL (if needed). You could get through firewalls (if needed). etc..etc..etc...
Depending on the access pattern of your file, and assuming the whole file is required to be read - it's easier/faster just to do one network operation - and pull the whole file in in one whack - rather than to constantly request I/Os over the network every time you're reading a smaller piece of the file.
It could be easy to distribute and run an application that does all this - and doesn't rely on the existance of network mounts - specific file paths, etc. If you have the URL to the files - the client can do it's job. It doesn't need to have established mounts, hard directory - or to become root to set-up such mounts.
If you have NFS connectivity problems - the whole system can get whacky when you try to access the mounts and they hang. With HTTP running in a user-space context - you just get a timeout error - and your application can take whatever action it chooses (like page you - log errors, etc).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string