Thread pool with Apps Script on Spreadsheet - multithreading

I have a Google Spreadsheet with internal AppsScript code which process each row of the sheet and perform an urlfetch with the row data. The url will provide a value which will be added to the values returned by each row processing..
For now the code is processing 1 row at a time with a simple for:
var spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
var sheet = spreadsheet.getActiveSheet();
var range = sheet.getDataRange();
for(var i=1 ; i<range.getValues().length ; i++) {
var payload = {
// retrieve data from the row and make payload object
};
var options = {
"method":"POST",
"payload" : payload
};
var result = UrlFetchApp.fetch("http://.......", options);
var text = result.getContentText();
// Save result for final processing
// (with multi-thread function this value will be the return of the function)
}
Please note that this is only a simple example, in the real case the working function will be more complex (like 5-6 http calls, where the output of some of them are used as input to the next one, ...).
For the example let's say that there is a generic "function" which executes some sort of processing and provides a result as output.
In order to speed up the process, I'd like to try to implement some sort of "multi-thread" processing, so I can process multiple rows in the same time.
I already know that javascript does not offer a multi-thread handling, but I read about WebWorker which seems to create an async processing of a function.
My goal is to obtain some sort of ThreadPool (like 5 threads at a time) and send every row that need to be processed to the pool, obtaining as output the result of each function.
When all the rows finished the processing, a final action will be performed gathering all the results of each function.
So the capabilities I'm looking for are:
managed "ThreadPool" where I can submit an N amount of tasks to be performed
possibility to obtain a resulting value from each task processed by the pool
possibility to determine that all the tasks has been processed, so a final "event" can be executed
I already see that there are some ready-to-use libraries like:
https://www.hamsters.io/wiki#thread-pool
http://threadsjs.readthedocs.io/en/latest/
https://github.com/andywer/threadpool-js
but they work with NodeJS. Due to AppsScript nature, I need a more simplier approach, which is provided by native JS. Also, it seems that minified JS are not accepted by AppsScript editor, so I also need the "expanded" version.
Do you know a simple ThreadPool in JS where I can submit a function to be execute and I get back a Promise for the result?

Related

How to limit .once('value) in firebase-admin node.js

How do I limit .once('value') in firebase-admin?
Code:
async function GetStuff(limit, page){
const data = await ref.limitToFirst(parseInt(limit)).once('value')
return data.val();
}
I wanted to create a page system, where a it sends request for a limited amount of data, and the user can change the page to get different data, but for some reason, I can't get it to work.
The code above only gets the first 20(when limit is 20), but how can I make it start at 20, so I can make this page feature.
I thought:
Code:
async function GetStuff(limit, page){
const data = await ref.startAt(limit*page).limitToFirst(parseInt(limit)).once('value')
return data.val();
}
You might want to review the relevant documentation. It looks like you're trying to pass the offset of a child to startAt, but that's not how startAt works. It accepts the actual value of the child to start at. Pagination by offset index is not supported.
The way you use startAt is typically passing the last sorted value retrieved by the prior query (or, if you don't want to retrieve that value again, 1 + that value, or a string that is lexically greater than the last string received. As such, some data sets might actually be difficult to paginate if they have the same sorted value repeated many times.

How to save data using multiple threads in grails-2.4.4 application using thread pool

I have a multithreaded program running some logic to come up with rows of data that I need to save in my grails (2.4.4) application. I am using a fixedthreadpool with 30 threads. The skeleton of my program is below. My expectation is that each thread calculates all the attributes and saves on a row in the table. However, the end result I am seeing is that there are some random rows that are not saved. Upon repeating this exercise, it is seen that a different set of rows are not saved in the table. So, overall, each time this is attempted a certain set of rows are NOT saved in table at all. GORMInstance.errors did not reveal any errors. So, I have no clue what is incorrect in this program.
ExecutorService exeSvc = Executors.newFixedThreadPool(30)
for (obj in list){
exeSvc.execute({-> finRunnable obj} as Callable)
}
Also, here's the runnable program that the above snippet invokes.
def finRunnable = {obj ->
for (item in LIST-1){
for (it in LIST-2){
for (i in LIST-3){
rowdata = calculateValues(item, it, i);
GORMInstance instance = new GORMInstance();
instance.withTransaction{
instance.attribute1=rowdata[0];
instance.attribute2=rowdata[1];
......so on..
instance.save(flush:true)/*without flush:true, I am
running into HeuristicCompletion exception. So I need it
here. */
}//endTransaction
}//forloop 3
}//forloop 2
}//forloop 1
}//runnable closure

What is the most efficient way to track browser memory consumed by the execution of a Protractor test?

The idea is to:
Measure usedJSHeapSize before starting the test.
Measure usedJSHeapSize after completing the test.
Comparing values from 1 and 2 and if the size increases above a defined threshold, then fail the scenario.
So far I have tried:
SG Protractor Tools (https://github.com/SunGard-Labs/sg-protractor-tools) which allow to repeat the same scenario several times and find the memory growth. I have discarded it since it does not allow checking memory usage for a single scenario (https://github.com/SunGard-Labs/sg-protractor-tools/issues/3).
Extracting the memory values from the browser object, which does not seem to work (or I could not get to work) to integrate with the specs -> Assign a value returned from a promise to a global variable
Any other ideas?
This can be done by invoking browser.executeScript()
Use window.performance.memory for Chrome to fetch the performance parameters
The below code worked all good for me.
https://docs.webplatform.org/wiki/apis/timing/properties/memory
it('Dummy Test', function(){
//Fetch the browser memory parameters before execution
browser.executeScript('return window.performance.memory').then(function(memoryInfo){
console.log(memoryInfo)
var beforejsHeapSizeLimit = memoryInfo.jsHeapSizeLimit;
var beforeusedJSHeapSize = memoryInfo.usedJSHeapSize;
var beforetotalJSHeapSize = memoryInfo.totalJSHeapSize;
// Have all your code to open browser .. navigate pages etc
browser.driver.get("https://wordpress.com/");
browser.driver.get("http://www.adobe.com/software/flash/about/");
// Once you are done compare before and after values
//Fetch the browser memory parameters after execution and compare
browser.executeScript('return window.performance.memory').then(function(aftermemoryInfo) {
console.log(aftermemoryInfo)
var afterjsHeapSizeLimit = aftermemoryInfo.jsHeapSizeLimit;
var afterusedJSHeapSize = aftermemoryInfo.usedJSHeapSize;
var aftertotalJSHeapSize = aftermemoryInfo.totalJSHeapSize;
expect((parseInt(afterusedJSHeapSize)-parseInt(beforeusedJSHeapSize))<10000000).toBe.true;
});
});
});

CouchDB - Filtered Replication - Can the speed be improved?

I have a single database (300MB & 42,924 documents) consisting of about 20 different kinds of documents from about 200 users. The documents range in size from a few bytes to many KiloBytes (150KB or so).
When the server is unloaded, the following replication filter function takes about 2.5 minutes to complete.
When the server is loaded, it takes >10 minutes.
Can anyone comment on whether these times are expected, and if not, suggest how I might optimize things in order to
get better performance?
function(doc, req) {
acceptedDate = true;
if(doc.date) {
var docDate = new Date();
var dateKey = doc.date;
docDate.setFullYear(dateKey[0], dateKey[1], dateKey[2]);
var reqYear = req.query.year;
var reqMonth = req.query.month;
var reqDay = req.query.day;
var reqDate = new Date();
reqDate.setFullYear(reqYear, reqMonth, reqDay);
acceptedDate = docDate.getTime() >= reqDate.getTime();
}
return doc.user_id && doc.user_id == req.query.userid && doc._id.indexOf("_design") != 0 && acceptedDate;
}
Filtered replications works slow because for each fetched document runs complex logic to decide whether to replicate it or not:
CouchDB fetches next document;
Because filter function has to be applied the document gets converted to JSON;
JSONifyed document passes through stdio to query server;
Query server handles document and decodes it from JSON;
Now, query server lookups and runs your filter function which returns true or false value to CouchDB;
If result is true document goes to be replicated;
Go to p.1 and loop for all documents;
For non-filtered replications take this list, throw away p.2-5 and let p.6 has always true result. This overhead slows down whole replication process.
To significantly improve filtered replication speed, you may use Erlang filters via Erlang native server. They runs inside CouchDB, doesn't pass through any stdio interface and there is no JSON decode/encode overhead applied.
NOTE, that Erlang query server runs not inside sandbox like JavaScript one, so you need to really trust code that you run with it.
Another option is to optimize your filter function e.g. reduce object creation, method calls, but actually you wouldn't win much with this.

Parallel.ForEach Ordered Execution

I am trying to execute parallel functions on a list of objects using the new C# 4.0 Parallel.ForEach function. This is a very long maintenance process. I would like to make it execute in the order of the list so that I can stop and continue execution at the previous point. How do I do this?
Here is an example. I have a list of objects: a1 to a100. This is the current order:
a1, a51, a2, a52, a3, a53...
I want this order:
a1, a2, a3, a4...
I am OK with some objects being run out of order, but as long as I can find a point in the list where I can say that all objects before this point were run. I read the parallel programming csharp whitepaper and didn't see anything about it. There isn't a setting for this in the ParallelOptions class.
Do something like this:
int current = 0;
object lockCurrent = new object();
Parallel.For(0, list.Count,
new ParallelOptions { MaxDegreeOfParallelism = MaxThreads },
(ii, loopState) => {
// So the way Parallel.For works is that it chunks the task list up with each thread getting a chunk to work on...
// e.g. [1-1,000], [1,001- 2,000], [2,001-3,000] etc...
// We have prioritized our job queue such that more important tasks come first. So we don't want the task list to be
// broken up, we want the task list to be run in roughly the same order we started with. So we ignore tha past in
// loop variable and just increment our own counter.
int thisCurrent = 0;
lock (lockCurrent) {
thisCurrent = current;
current++;
}
dothework(list[thisCurrent]);
});
You can see how when you break out of the parallel for loop you will know the last list item to be executed, assuming you let all threads finish prior to breaking. I'm not a big fan of PLINQ or LINQ. I honestly don't see how writing LINQ/PLINQ leads to maintainable source code or readability.... Parallel.For is a much better solution.
If you use Parallel.Break to terminate the loop then you are guarenteed that all indices below the returned value will have been executed. This is about as close as you can get. The example here uses For but ForEach has similar overloads.
int n = ...
var result = new double[n];
var loopResult = Parallel.For(0, n, (i, loopState) =>
{
if (/* break condition is true */)
{
loopState.Break();
return;
}
result[i] = DoWork(i);
});
if (!loopResult.IsCompleted &&
loopResult.LowestBreakIteration.HasValue)
{
Console.WriteLine("Loop encountered a break at {0}",
loopResult.LowestBreakIteration.Value);
}
In a ForEach loop, an iteration index is generated internally for each element in each partition. Execution takes place out of order but after break you know that all the iterations lower than LowestBreakIteration will have been completed.
Taken from "Parallel Programming with Microsoft .NET" http://parallelpatterns.codeplex.com/
Available on MSDN. See http://msdn.microsoft.com/en-us/library/ff963552.aspx. The section "Breaking out of loops early" covers this scenario.
See also: http://msdn.microsoft.com/en-us/library/dd460721.aspx
For anyone else who comes across this question - if you're looping over an array or list (rather than an IEnumberable ), you can use the overload of Parallel.Foreach that gives the element index to maintain original order too.
string[] MyArray; // array of stuff to do parallel tasks on
string[] ProcessedArray = new string[MyArray.Length];
Parallel.ForEach(MyArray, (ArrayItem,loopstate,ArrayElementIndex) =>
{
string ProcessedArrayItem = TaskToDo(ArrayItem);
ProcessedArray[ArrayElementIndex] = ProcessedArrayItem;
});
As an alternate suggestion, you could record which object have been run and then filter the list when you resume exection to exclude the objects which have already run.
If this needs to be persistent across application restarts, you can store the ID's of the already executed objects (I assume here the objects have some unique identifier).
For anybody looking for a simple solution, I have posted 2 extension methods (one using PLINQ and one using Parallel.ForEach) as part of an answer to the following question:
Ordered PLINQ ForAll
Not sure if question was altered as my comment seems wrong.
Here improved, basically remind that parallel jobs run in out of your control order.
ea printing 10 numbers might result in 1,4,6,7,2,3,9,0.
If you like to stop your program and continue later.
Problems alike this usually endup in batching workloads.
And have some logging of what was done.
Say if you had to check 10.000 numbers for prime or so.
You could loop in batches of size 100, and have a prime log1, log2, log3
log1= 0..99
log2=100..199
Be sure to set some marker to know if a batch job was finished.
Its a general aprouch since the question isnt that exact either.

Resources