Multi-threaded Execution in Excel Power Query

Multi-threaded Execution in Excel Power Query - multithreading

I'm trying to get a handle on power query's parallel processing characteristics, but haven't found any documentation.
It seems queries normally run in the "Microsoft.Mashup.Container.NetFX40.exe" that appears in task manager's process list. I've (unwittingly) built some that appear to run Multi-threaded as multiple processes with that same name run in parallel for a while.
However, I don't understand what has caused a large and complex query I created to run almost entirely within the EXCEL.EXE process instead, using a single CPU core, when most of the query is about accessing a list of server sources, processing each independently, and only combining them at the end.
Is any reading available?

Related

Which is more efficient: curl_easy_perform() in a multi-threaded program or curl_multi_perform() in a single threaded program?

I am working on a program where I am required to download a large amount of JSON files from different URLs.
Currently, my program creates multiple threads, and in each thread, it calls the LibCurl easy_perform() function but I am running into issues where the program fails occasionally with an error of "double free". It seems to be some sort of Heisenbug but I have been able to catch it in GDB which confirms the error originates in LibCurl (backtraced).
While I would love suggestions on the issue I am having, my actual question is this: Would it be better if I were to change the structure of my code to use the LibCurl Multi Interface on one thread instead of calling the single interface across multiple threads? What are the trade offs of using one over the other?
Note: By "better", I mean is it faster and less taxing on my CPU? Is it more reliable as the multi interface was designed for this?
EDIT:
The three options I have as I understand it are these:
1) Reuse the same easy_handle in a single thread. The connections wont need to be reestablished making it faster.
2) Call curl_easy_perform() in each individual thread. They all run in parallel, again, making it faster.
3) Call curl_multi_perform() in a single thread. This is non-blocking so I imagine all of the files are downloaded in parallel making it faster?
Which of these options is the most time efficient?

curl_easy_perform is blocking operation. That means if you run in in one thread you have to download files sequentially. In multithreaded application you can run many operations in parallel - this usually means faster download time (if speed is not limited by network or destination server).
But there is non-blocking variant that may work better for you if you want to go single threaded way - curl_multi_perform
From curl man
You can do any amount of calls to curl_easy_perform while using the
same easy_handle. If you intend to transfer more than one file, you
are even encouraged to do so. libcurl will then attempt to re-use the
same connection for the following transfers, thus making the
operations faster, less CPU intense and using less network resources.
Just note that you will have to use curl_easy_setopt between the
invokes to set options for the following curl_easy_perform.
In short - it will give few benefits you want vs curl_easy_perform.

Nodejs Parallel programming using threads

I am using nodejs for a CPU intensive task ,which basicly generates large amount of data and stores it in a file. I am streaming the data to output files as it is generated for a single type of data.
Aim : I want to make the task of generating this data for multiple types of data in parallel (utilizing my multi-core cpu to its best).Without each of process having its own heap memory .Thus providing with larger process memory and increased speed of execution.
I was planning to use node fibers which is also used by meteor js for its own callback handling.But I am not sure if this will achieve what I want,as in one of the video on meteor fibers by Chris Mather mentions at the end that eventually everything is single threaded and node fibers somehow manges the same single threaded event loop to provide its functionality.
So,
Does this mean that if I use node fibers I wont be running my task in
parallel ,thus not utilizing my cpu cores ?
Does node webworker-threads will help me in achieving the
functionality I desire.As is mentioned on modules home page which
says that ,webworker threads will run on seperate/parallel cpu
process ,thus providing multi-threading in real sense ??
As ending question ,Does this mean that node.js is not advisable for
such CPU intensive tasks ?
note : I dont want to use asynchronous code structuring libs which are presented as threads,but infact just add syntatical sugar over same async code, as the tasks are largely CPU intensive .I have already used async capabilities to max .
// Update 1 (based on answer for clusters )
Sorry I forgot to mention this ,but problem with clusters I faced is :
Complex to load balance the amount of work I have in a way which makes sure a particular set of parallel tasks execute before certain other tasks.
Not sure if clusters really do what I want ,referring to these lines on webworker-threads npm homepage
The "can't block the event loop" problem is inherent to Node's evented model. No matter how many Node processes you have running as a Node-cluster, it won't solve its issues with CPU-bound tasks.
..... any light on how ..would be helpfull.

Rather than trying to implement multiple threads, you should find it much easier to use multiple processes with Node.js
See, for example, the cluster module. This allows you to easily run the same js code in multiple processes, e.g. one per core, and collect their results / be notified once they're completed.
If cluster does more than you need, then you can also just call fork directly.
If you must have thread-parallelism rather than process-, then you may want to look at writing an async native module. Then you have access to the libuv thread pool (though starving it may reduce I/O performance) or can fork your own threads as you wish (but then you're on your own for synchronising with the rest of Node).
After update 1
For load balancing, if what cluster does isn't working for you, then you can just do it yourself with fork, as I mentioned. The source for cluster is available.
For the other point, it means if the task is truly CPU-bound then there's no advantage Node will give you over other technologies, other than being simpler if everything else is using Node. The only option you have is to make sure you're using all the available CPU resources, which a worker pool will give you. If you're already using Node then the easiest options are using the ones it's already got (cluster or libuv). If they're not sufficient then yeah, you'll have to find something else.
Regardless of technology, it remains true that multi-process parallelism is a lot easier than multi-thread parallelism.
Note: despite what you say, you definitely do want to use async code precisely because it is CPI-intensive, otherwise your tasks will block all I/O. You do not want this to happen.

Timeout and kill parallel matlab execution

I have a matlab processing script located in the middle of a long processing pipeline running on linux.
The matlab script applies the same operation to a number N of datasets D_i (i=1,2,...,N) in parallel on (8 cores) via parfor.
Usually, processing the whole dataset takes about 2hours (on 8 cores).
Unfortunately, from time to time, looks like one of the matlab subprocesses crashes randomly. This makes the job impossible to complete (and the pipeline can't finish).
I am sure this does not depend on the data as if I reprocess specifically the D_i on which the process crashes, it is executed without problems. Moreover, up to now I've processed already thousands of the mentioned dataset.
How I deal with the problem now (...manually...):
After I start the matlab job, I periodically check the process list on the machine (via a simple top); whenever I have one matlab process alive after two hours of work, then I know for sure that it has crashed. Then I simply kill it and process the part of the dataset which has not been analyzed.
Question:
I am looking for suggestion on how to timeout ALL the matlab processes running and kill them whenever they are alive for more than e.g. 2hrs CPU.

You should be able to do this by restructuring your code to use PARFEVAL instead of PARFOR. There's a simple example in this entry on Loren's blog: http://blogs.mathworks.com/loren/2013/12/09/getting-data-from-a-web-api-in-parallel/ which shows how you can stop waiting for work after a given amount of time.

Managing the TPL Queue

I've got a service that runs scans of various servers. The networks in question can be huge (hundreds of thousands of network nodes).
The current version of the software is using a queueing/threading architecture designed by us which works but isn't as efficient as it could be (not least of which because jobs can spawn children which isn't handled well)
V2 is coming up and I'm considering using the TPL. It seems like it should be ideally suited.
I've seen this question, the answer to which implies there's no limit to the tasks TPL can handle. In my simple tests (Spin up 100,000 tasks and give them to TPL), TPL barfed fairly early on with an Out-Of-Memory exception (fair enough - especially on my dev box).
The Scans take a variable length of time but 5 mins/task is a good average.
As you can imagine, scans for huge networks can take a considerable length of time, even on beefy servers.
I've already got a framework in place which allows the scan jobs (stored in a Db) to be split between multiple scan servers, but the question is how exactly I should pass work to the TPL on a specific server.
Can I monitor the size of TPL's queue and (say) top it up if it falls below a couple of hundred entries? Is there a downside to doing this?
I also need to handle the situation where a scan needs to be paused. This is seems easier to do by not giving the work to TPL than by cancelling/resetting tasks which may already be partially processed.
All of the initial tasks can be run in any order. Children must be run after the parent has started executing but since the parent spawns them, this shouldn't ever be a problem. Children can be run in any order. Because of this, I'm currently envisioning that child tasks be written back to the Db not spawned directly into TPL. This would allow other servers to "work steal" if required.
Has anyone had any experience with using the TPL in this way? Are there any considerations I need to be aware of?

TPL is about starting small units of work and running them in parallel. It is not about monitoring, pausing, or throttling this work.
You should see TPL as a low-level tool to start "work" and to synchronize threads.
Key point: TPL tasks != logical tasks. Logical tasks are in your case scan-tasks ("scan an ip-range from x to y"). Such a task should not correspond to a physical task "System.Threading.Task" because the two are different concepts.
You need to schedule, orchestrate, monitor and pause the logical tasks yourself because TPL does not understand them and cannot be made to.
Now the more practical concerns:
TPL can certainly start 100k tasks without OOM. The OOM happened because your tasks' code exhausted memory.
Scanning networks sounds like a great case for asynchronous code because while you are scanning you are likely to wait on results while having a great degree of parallelism. You probably don't want to have 500 threads in your process all waiting for a network packet to arrive. Asynchronous tasks fit well with the TPL because every task you run becomes purely CPU-bound and small. That is the sweet spot for TPL.

Multithreading vs virtual process

There's three types of control flow model,
single threaded, virtual process and multithreaded process.
here's what has written in the power point which I study form
Virtual processes. This is based on a
single threaded model but gives the
appearance of concurrent execution. a
controller component schedules the
execution of the other components and
gives them control. The scheduling can
be performed periodically or based on
events. This model is based on a
logical decomposition of activities in
simple steps whose execution requires
only short intervals of time.
I couldn't understand it and couldn't understand the difference between multithreading process and vp.
can some one help?
EDIT here the chapter of the book which I mention the section above form
http://www.mediafire.com/?ru82i0nvp12qw6t

This term "virtual process" is unusual but based on your description I can give 2 real-world examples of using each. For multithreading, imagine you have a lot of data in memory and want to perform some calculations on it... you can split that data up and have seperate threads (1 per CPU core, ideally) simultaneously working on different chunks of the data. This way, the calculations will be done faster based on how many threads you create. For 'virtual process', imagine you need to retrieve 20 files from remote servers... most of the CPU 'work' involved in this is just sitting around waiting for bytes to arrive from the remote network. Creating separate threads to download each of these files would not make the files arrive any faster. If anything, having extra threads that the OS needs to constantly switch between (and it will switch a LOT because most of the time each thread will just say 'im still waiting' and then cede control). So, in this case it's better to have a single thread doing all of the downloading, cycling internally between each of the download tasks to read incomming data off of their buffers.

Your virtual process looks to me like event driven programming. Google for eg. 'threads vs events', the first link you get is quite fine comparison.
EDIT: Here's another comparison I've found in bookmarks.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string