Counting the number of values emitted before the Observable completes? - node.js

Attempting to verify that an observable emits a certain number of events before it completes. This is pseudo code:
o.pipe(count).subscribe(count=>
expect(count).toEqual(4));
Thoughts?

The count operator works as follows:
Counts the number of emissions on the source and emits that number when the source completes (source)
So you can use it like so:
obs.pipe(count()).subscribe(totalEmissions => expect(totalEmissions).toEqual(4))
Note that you can't really measure how many events occured before the original observable completed, because if it didn't complete then you didn't finish counting!
You can, however, take note of the "index" of each emission using tap:
let count = 0
obs.pipe(tap(() => console.log("emitted! Index: " + count++))).subscribe(obsValue => {/*...*/})
I'm not sure which is your use case, but that's how you can do it.

Related

Thread pool with Apps Script on Spreadsheet

I have a Google Spreadsheet with internal AppsScript code which process each row of the sheet and perform an urlfetch with the row data. The url will provide a value which will be added to the values returned by each row processing..
For now the code is processing 1 row at a time with a simple for:
var spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
var sheet = spreadsheet.getActiveSheet();
var range = sheet.getDataRange();
for(var i=1 ; i<range.getValues().length ; i++) {
var payload = {
// retrieve data from the row and make payload object
};
var options = {
"method":"POST",
"payload" : payload
};
var result = UrlFetchApp.fetch("http://.......", options);
var text = result.getContentText();
// Save result for final processing
// (with multi-thread function this value will be the return of the function)
}
Please note that this is only a simple example, in the real case the working function will be more complex (like 5-6 http calls, where the output of some of them are used as input to the next one, ...).
For the example let's say that there is a generic "function" which executes some sort of processing and provides a result as output.
In order to speed up the process, I'd like to try to implement some sort of "multi-thread" processing, so I can process multiple rows in the same time.
I already know that javascript does not offer a multi-thread handling, but I read about WebWorker which seems to create an async processing of a function.
My goal is to obtain some sort of ThreadPool (like 5 threads at a time) and send every row that need to be processed to the pool, obtaining as output the result of each function.
When all the rows finished the processing, a final action will be performed gathering all the results of each function.
So the capabilities I'm looking for are:
managed "ThreadPool" where I can submit an N amount of tasks to be performed
possibility to obtain a resulting value from each task processed by the pool
possibility to determine that all the tasks has been processed, so a final "event" can be executed
I already see that there are some ready-to-use libraries like:
https://www.hamsters.io/wiki#thread-pool
http://threadsjs.readthedocs.io/en/latest/
https://github.com/andywer/threadpool-js
but they work with NodeJS. Due to AppsScript nature, I need a more simplier approach, which is provided by native JS. Also, it seems that minified JS are not accepted by AppsScript editor, so I also need the "expanded" version.
Do you know a simple ThreadPool in JS where I can submit a function to be execute and I get back a Promise for the result?

sqs.recieveMessage not receiving even when messages in queue

So i have 3 lambdas, one with an API event that triggers a lambda that pulls down around 50,000 objects and pushes them all to a queue.
The second lambda reads from the queue, 10 at a time, in a loop 30 times - meaning it reads, does stuff, invokes the third lambda, returns promise, then reads again - 30 times for a total of 300 reads in the time the lambda executes
The 3rd lambda takes the information from the queue and hits another endpoint with it.
The issue is in that second lambda...First i call a function that returns the number of messages in the queue and if it's more than zero i read them. However, even if there's 20,000 messages in the queue it often comes back with nothing. I'm not sure why.
I have WaitTimeSeconds set to 20 for long polling. Any help would be greatly appreciated, the docs claim i can read up to 3,000/second with a FIFO queue and i'm having trouble getting anywhere near that performance.
Here's the code:
exports.handler = (event, context, callback) => {
const sqs = new AWS.SQS({ region: process.env.AWS_REGION });
getMessageCount(sqs)
.then((messageCount) => {
if (messageCount > 0) {
mapSeries(range(0, 30), getMessages(sqs))
.then((messageRes) => {
callback(null, messageRes);
})
.catch(e => Promise.reject(e));
}
callback(null, 'No more messages');
})
.catch((e) => {
callback(e);
});
};
getMessageCount makes a call to sqs.getQueueAttributes and returns a promise that receives the number of messages.
mapSeries allows the loop to wait for the previous promise to be resolved/rejected before iterating and on each iteration it calls getMessages which calls sqs.receiveMessage and invokes the 3rd lambda with the data.
Any perspective on this is appreciated, thank you!
As i understand your questions, the problem lies with getting the number of messages in the queue. If you had also given the getMessageCount(sqs) as well, we could have determined the types of attributes you are trying to retrieve from SQS.
There are three types of attributes relevant, to get the message count in SQS. These attributes are given below.
ApproximateNumberOfMessages - Returns the approximate number of visible
messages in a queue
ApproximateNumberOfMessagesNotVisible - Returns the approximate number of messages that have not timed-out and aren't deleted.
If you want to include the messages that are waiting to be added, you can consider the following property as well.
ApproximateNumberOfMessagesDelayed - Returns the approximate number of
messages that are waiting to be added to the queue.
By considering these attributes, you can get a much more accurate count from SQS.
Also if I may suggest, I implemented a similar system, but without looking for the count.I retrieve 10 messages at a time via polling, process them and delete them from the queue. As per your example, you can repeat this for 30 times. But if the getMessages(sqs) function returns an empty set, we could assume that the list is empty. (This depends on whether you are using short polling or long polling). Nevertheless, checking for the number of messages at every step seems to be redundant. This is according to this example, but it might defer according to the use case.
Read through the API documentation: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/SQS.html#receiveMessage-property
Parameters:
MaxNumberOfMessages — (Integer)
The maximum number of messages to return. Amazon SQS never returns
more messages than this value (however, fewer messages might be
returned). Valid values are 1 to 10. Default is 1.
Wrap your code in a while loop and anticipate a frequent case of 0 messages since 0 is fewer than 1 to 10.
Something like...
var messages = [];
while(messages.length < NUMBER_OF_MSGS_YOU_REALLY_WANT) {
var new_messages = await getSQSMessages(NUMBER_OF_MSGS_YOU_REALLY_WANT - messages.length);
if(new_messages.Data.Messages.length > 0) {
messages.push(new_messages.Data.Messages);
}
}

Use First() and Repeat() without restarting whole stream RxJS

I am building a trading bot using RxJS. For that i have to convert ticker data from a socket connection to candles that is getting emitted every x seconds.
I created the socketObservable like this
const subscribeObservable = Observable.fromEventPattern(h => bittrex.websockets.subscribe(['USDT-BTC'], h))
const clientCallBackObservable = Observable.fromEventPattern(h => bittrex.websockets.client(h))
const socketObservable = clientCallBackObservable
.flatMap(() => subscribeObservable)
.filter(subscribtionData => subscribtionData && subscribtionData.M === 'updateExchangeState')
.flatMap(exchangeState => Observable.from(exchangeState.A))
.filter(marketData => marketData.Fills.length > 0)
.map(marketData => marketData && marketData.Fills)
Which works fine - when i connect to the client i flatMap to the subscription connection.
Then i have the candleObservable that is causing problems
export const candleObservable = (promise, timeFrame = TIME_FRAME) =>
promise
.scan((acc, curr) => [...acc, ...curr])
.skipWhile(exchangeData => dateDifferenceInSeconds(exchangeData) < timeFrame)
// take first after skipping
.first()
// first will complete the stream, so we repeat it
.repeat()
// we create candle data from the timeFrame array
.map(fillsData => createCandle(fillsData))
// accumulate candles
.scan((acc, curr) => [...[acc], curr])
What i am trying to achieve is to accumulate data until i have for a full candle that can be x seconds. Then i would like to take that emit and reset the scan function so i start for a new candle. Then i create the candle and accumulate it in another scan.
My problem is that when i call repeat() my socketObservable also gets called again. I do not know if this causes any overhead with the node-bittrex-api but i would like to avoid it.
I have tried putting the accumulating candle part in a flatMap or similar but couldn't get anyt of that to work.
Do you know how i can avoid to repeat() the whole stream or another way of make candles where i can accumulate and then reset the accumulator after first emit?
From what you've described it sounds like you have an observable you want to cut up into buckets of some kind based on some condition. In general, the reduction of a stream to another stream with fewer elements (without filtering) is referred to as "backpressure". In your specific case, it sounds like the backpressure operator you'd be interested in is buffer. The buffer operator can accept an observable as an argument that functions as a "closing selector", i.e. emissions in this observable can be used to regulate when you tie off one buffer and start a new one.
I'd suggest replacing your scan, skipWhile, first, and repeat with a buffer call, passing in a closing selector that will yield a value when your "TIME_FRAME" expires. This should be easy to express as an observable either using timer (in the case of a fixed amount) or a debounced version of the driving stream (if you want to stop when there's a pause in the data). If your buffer is strictly time-based, there's even a specialization of buffer called bufferTime that handles this. Because you'll wind up with an observable of arrays (rather than raw values), you'll likely want to replace your final scan with a regular array reduce.
It's hard to give concrete code without a simpler example to work with. I'd urge you to consult the sample code for the various backpressure operators to see if you can find something similar to what you're attempting to achieve.

Map three different functions to Observable in Node.js

I am new to Rxjs. I want to follow best practices if possible.
I am trying to perform three distinct functions on the same data that is returned in an observable. Following the 'streams of data' concept, I keep on thinking I need to split this Observable into three streams and carry on.
Here is my code, so I can stop talking abstractly:
// NotEmptyResponse splits the stream in 2 to account based on whether I get an empty observable back.
let base_subscription = RxNode.fromStream(siteStream).partition(NotEmptyResponse);
// Success Stream to perform further actions upon.
let successStream = base_subscription[0];
// The Empty stream for error reporting
let failureStream = base_subscription[1];
//Code works up until this point. I don't know how to split to 3 different streams.
successStream.filter(isSite)
.map(grabData)// Async action that returns data
/*** Perform 3 separate actions upon data that .map(grabData) returned **/
.subscribe();
How can I split this data stream into three, and map each instance of the data to a different function?
In fact partition() operator internally just calls filter() operator twice. First to create an Observable from values matching the predicate and then for values not matching the predicate.
So you can do the exact same thing with filter() operator:
let obs1 = base_subscription.filter(val => predicate1);
let obs2 = base_subscription.filter(val => predicate2);
let obs3 = base_subscription.filter(val => predicate3);
Now you have three Observables, each of them emitting only some specific values. Then you can carry on with your existing code:
obs2.filter(isSite)
.map(grabData)
.subscribe();
Just be aware that calling subscribe() triggers the generating values from the source Observable. This doesn't have to be always like this depending on what Observable you use. See “Hot” and “Cold” Observables in the documentation. Operator connect() might be useful for you depending on your usecase.

GroupBy then ObserveOn loses items

Try this in LinqPad:
Observable
.Range(0, 10)
.GroupBy(x => x % 3)
.ObserveOn(Scheduler.NewThread)
.SelectMany(g => g.Select(x => g.Key + " " + x))
.Dump()
The results are clearly non-deterministic, but in every case I fail to receive all 10 items. My current theory is that the items are going through the grouped observable unobserved as the pipeline marshals to the new thread.
Linqpad doesn't know that you're running all of these threads - it gets to the end of the code immediately (remember, Rx statements don't always act synchronously, that's the idea!), waits a few milliseconds, then ends by blowing away the AppDomain and all of its threads (that haven't caught up yet). Try adding a Thread.Sleep to the end to give the new threads time to catch up.
As an aside, Scheduler.NewThread is a very inefficient scheduler, EventLoopScheduler (create exactly one thread), or Scheduler.TaskPool (use the TPL pool, as if you created a Task for each item) are much more efficient (of course in this case since you only have 10 items, Scheduler.Immediate is the best!)
It appears here that the problem is in timing between starting the subscription to the new group in the GroupBy operation and the delay of implementing the new subscription. If you increase the number of iterations from 10 to 100, you should start seeing some results after a period of time.
Also, if you change the GroupBy to .Where(x => x % 3 == 0), you will likely notice that no values are lost because the dynamic subscription to the IObservable groups doesn't need to initialize new observers.

Resources