How to code two-way duplex streams in NodeJS - node.js

In the latest few versions of NodeJS (v0.10.X as of writing), the Streams API has undergone a welcome redesign and I would like to start using it now.
I want to wrap both the input and output of a socket with an object which implements a protocol.
The so-called Duplex interface, seems to just be any stream which is readable and writable (like a socket).
It is not clear whether Duplexes should be like A or B, or whether it doesn't matter.
+---+ +---+
-->| A |--> | |-->
+---+ | B |
| |<--
+---+
What is the correct code-structure/interface for an object which has two writeables and two readables?
+--------+ +----------+ +----
| r|-->|w r|-->|w
| socket | | protocol | | rest of app
| w|<--|r w|<--|r
+--------+ +----------+ +----
The problem with the diagram above is that the protocol object needs two separate read methods and two write methods.
Off the top of my head, I could make the protocol produce 'left' and 'right' duplex objects, or 'in' and 'out' duplex objects (to slice it a different way).
Are either of these the preferred way, or is there a better solution?

| app |
+---------------+
^ |
| V
+-----+ +-----+
| | | |
+----------| |-| |-+
| protocol | .up | |.down| |
+----------| |-| |-+
| | | |
+-----+ +-----+
^ |
| V
+---------------+
| socket |
My solution was to make a Protocol class, which created an Up Transform and a Down Transform.
The Protocol constructor passes a reference (to itself) when constructing the Up and Down transforms. The _transform method in each of the up and down transforms can then call push on itself, on the other Transform, or both as required. Common state can be kept in the Protocol object.

A duplex stream is like your diagram B, at least for the user. A more complete view of a stream would be to include producer(source) with the consumer(user). See my previous answer. Try not to think both read/write from a consumer point of view.
What you are doing is building a thin layer over the socket for protocol, so your design is correct :
-------+ +----------+ +------
r|---->| r|---->|
socket | | protocol | | rest of app
w|<----| w|<----|
-------+ +----------+ +------
You can use duplex or transform for the protocol part.
+---------+--------+---------+ +------------------+
| _write->| | |r | Transform -> |r
|-----------Duplex-----------| +------------------+
| | | <-_read |w | <- Transform |w
+---------+--------+---------+ +------------------+
process being your protocol related processing on incoming/outgoing data using internal _read, _write. Or you can transform streams. You would pipe protocol to socket and socket to protocol .

Related

How to perform a series of steps in a single thread, with an async flow in spring-integration?

I currently have a spring-integration (v4.3.24) flow that looks like the following:
|
| list of
| filepaths
+----v---+
|splitter|
+----+---+
| filepath
|
+----------v----------+
|sftp-outbound-gateway|
| "get" |
+----------+----------+
| file
+---------------------+
| +----v----+ |
| |decryptor| |
| +----+----+ |
| | |
| +-----v------+ | set of transformers
| |decompressor| | (with routers before them
| +-----+------+ | because some steps are optional)
| | | that process the file;
| +--v--+ | call this "FileProcessor"
| | ... | |
| +--+--+ |
+---------------------+
|
+----v----+
|save file|
| to disk |
+----+----+
|
All of the channels above are DirectChannels - Yup, I know this is a poor structure. This was working fine for files in small numbers. But now, I have to deal with thousands of files which need to go through the same flow - benchmarks reveal that this takes ~ 1 day to finish processing. So, I'm planning to introduce some parallel processing to this flow. I want to modify my flow to achieve something like this:
|
|
+----------v----------+
|sftp-outbound-gateway|
| "mget" |
+----------+----------+
| list of files
|
+----v---+
|splitter|
+----+---+
one thread one | thread ...
+------------------------+---------------+--+--+--+--+
| file | file | | | | |
+---------------------+ +---------------------+
| +----v----+ | | +----v----+ |
| |decryptor| | | |decryptor| |
| +----+----+ | | +----+----+ |
| | | | | |
| +-----v------+ | | +-----v------+ | ...
| |decompressor| | | |decompressor| |
| +-----+------+ | | +-----+------+ |
| | | | | |
| +--v--+ | | +--v--+ |
| | ... | | | | ... | |
| +--+--+ | | +--+--+ |
+---------------------+ +---------------------+
| |
+----v----+ +----v----+
|save file| |save file|
| to disk | | to disk |
+----+----+ +----+----+
| |
| |
For parallel processing, I output the files from the splitter on to a ExecutorChannel with a ThreadPoolTaskExecutor.
Some of the questions that I have:
I want all of the "FileProcessor" steps for one file to happen on the same thread, while multiple files are processed in parallel. How can I achieve this?
I saw from this answer, that a ExecutorChannel to MessageHandlerChain flow would offer such functionality. But, some of the steps inside "FileProcessor" are optional (using selector-expression with routers to skip some of the steps) - ruling out using a MessageHandlerChain. I can rig up a couple of MessageHandlerChains with Filters inside, but this more or less becomes the approach mentioned in #2.
If #1 cannot be achieved, will changing all of the channel types starting from the splitter, from DirectChannel to ExecutorChannel help in introducing some parallelism? If yes, should I create a new TaskExecutor for each channel or can I reuse one TaskExecutor bean for all channels (I cannot set scope="prototype" on a TaskExecutor bean)?
In your opinion, which approach (#1 or #2) is better? Why?
If I perform global error handling, like the approach mentioned here, will the other files continue to process even if one file errors out?
It will work as you need by using an ExecutorChannel as an input to the decrypter and leave all the rest as direct channels; the remaining flow does not have to be a chain, each component will run on one of the executor's threads.
You will need to be sure all your downstream components are thread-safe.
Error handling should remain as is; each sub flow is independent.

Perform NGram on Spark DataFrame

I'm using Spark 2.3.1, I have Spark DataFrame like this
+----------+
| values|
+----------+
|embodiment|
| present|
| invention|
| include|
| pairing|
| two|
| wireless|
| device|
| placing|
| least|
| one|
| two|
+----------+
I want to perform a Spark ml n-Gram feature like this.
bigram = NGram(n=2, inputCol="values", outputCol="bigrams")
bigramDataFrame = bigram.transform(tokenized_df)
Following Error occurred on this line bigramDataFrame = bigram.transform(tokenized_df)
pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Input type must be ArrayType(StringType) but got StringType.'
So I changed my code
df_new = tokenized_df.withColumn("testing", array(tokenized_df["values"]))
bigram = NGram(n=2, inputCol="values", outputCol="bigrams")
bigramDataFrame = bigram.transform(df_new)
bigramDataFrame.show()
So I got my final Data Frame as Follow
+----------+------------+-------+
| values| testing|bigrams|
+----------+------------+-------+
|embodiment|[embodiment]| []|
| present| [present]| []|
| invention| [invention]| []|
| include| [include]| []|
| pairing| [pairing]| []|
| two| [two]| []|
| wireless| [wireless]| []|
| device| [device]| []|
| placing| [placing]| []|
| least| [least]| []|
| one| [one]| []|
| two| [two]| []|
+----------+------------+-------+
Why my bigram column value is empty.
I want my output for bigram column as follow
+----------+
| bigrams|
+--------------------+
|embodiment present |
|present invention |
|invention include |
|include pairing |
|pairing two |
|two wireless |
|wireless device |
|device placing |
|placing least |
|least one |
|one two |
+--------------------+
Your bi-gram column value is empty because there are no bi-grams in each row of your 'values' column.
If your values in input data frame look like:
+--------------------------------------------+
|values |
+--------------------------------------------+
|embodiment present invention include pairing|
|two wireless device placing |
|least one two |
+--------------------------------------------+
Then you can get the output in bi-grams as below:
+--------------------------------------------+--------------------------------------------------+---------------------------------------------------------------------------+
|values |testing |ngrams |
+--------------------------------------------+--------------------------------------------------+---------------------------------------------------------------------------+
|embodiment present invention include pairing|[embodiment, present, invention, include, pairing]|[embodiment present, present invention, invention include, include pairing]|
|two wireless device placing |[two, wireless, device, placing] |[two wireless, wireless device, device placing] |
|least one two |[least, one, two] |[least one, one two] |
+--------------------------------------------+--------------------------------------------------+---------------------------------------------------------------------------+
The scala spark code to do this is:
val df_new = df.withColumn("testing", split(df("values")," "))
val ngram = new NGram().setN(2).setInputCol("testing").setOutputCol("ngrams")
val ngramDataFrame = ngram.transform(df_new)
A bi-gram is a sequence of two adjacent elements from a string of
tokens, which are typically letters, syllables, or words.
But in your input data frame, you have only one token in each row, hence you are not getting any bi-grams out of it.
So, for your question, you can do something like this:
Input: df1
+----------+
|values |
+----------+
|embodiment|
|present |
|invention |
|include |
|pairing |
|two |
|wireless |
|devic |
|placing |
|least |
|one |
|two |
+----------+
Output: ngramDataFrameInRows
+------------------+
|ngrams |
+------------------+
|embodiment present|
|present invention |
|invention include |
|include pairing |
|pairing two |
|two wireless |
|wireless devic |
|devic placing |
|placing least |
|least one |
|one two |
+------------------+
Spark scala code:
val df_new=df1.agg(collect_list("values").alias("testing"))
val ngram = new NGram().setN(2).setInputCol("testing").setOutputCol("ngrams")
val ngramDataFrame = ngram.transform(df_new)
val ngramDataFrameInRows=ngramDataFrame.select(explode(col("ngrams")).alias("ngrams"))

How to conditionally query tables based on oData request in an Azure Logic App

I am a total newbie when it comes to both oData and Logic Apps.
My scenario is as follows:
I have an Azure SQL database with two tables (daily_stats, weekly_stats) for users
I have a Logic App I managed to test successfully but that targets one table, triggered by an HTTP request and initialises a variable using the following expression to get the query
if(equals(coalesce(trigger()['outputs']?['queries']?['filter'],''),''),'1 eq 1',trigger()['outputs']?['queries']?['filter'])
The problem comes with how to query a different table based on what the user passes as an ODATA GET request
I imagine I need a condition and the pseudo code of this would be something like:
For daily stats the ODATA query URL would be
https://myproject.logic.azure.com/workflows/some-guid-here/triggers/manual/paths/invoke/daily_stats/api-version=2016-10-01/&sp=%2Ftriggers%2Fmanual%2Frun&sv=1.0&sig=my-key-here&filter=userid eq 'richard'
For weekly stats the ODATA query URL would be
https://myproject.logic.azure.com/workflows/some-guid-here/triggers/manual/paths/invoke/weekly_stats/api-version=2016-10-01/&sp=%2Ftriggers%2Fmanual%2Frun&sv=1.0&sig=my-sig-here&filter=userid eq 'richard'
If it is daily_stats, it queries the daily_stats stored procedure/table for the user = richard
If it is weekly_stats, it queries the weekly_stats stored procedure/table for the user = richard
Edit: Added an ASCII flow diagram
+----------------------+
| HTTP ODATA GET |
| Reguest |
| |
+----------+-----------+
|
|
|
|
v
+-------+---------+
| |
| |
| |
| filter has |
| daily_stats |
| |
+-------+---------+
|
|
|
|
+-------------+ | +--------------+
| | | | |
| | YES | NO | |
| query +<--------------+-----------------+ query |
| daily | | monthly |
| stats | | stats |
| table | | table |
| | | |
+-------------+ +--------------+
There is a switch action, further more information you could refer to here:Create switch statements that run workflow actions based on specific values in Azure Logic Apps.
Below is my sample, switch statements support only equality operators. If you need other relational operators, such as "greater than", use a conditional statement.

boost MSM how to define transition between two sub-states?

I'm using the Boost 1.64.0 MSM library to produce a hierarchical state machine. For test the transition mechanism, I implement a state machine like this
+------------------------------------------------+
| S |
| +-------------+ +-------------+ |
| | S1 | | S2 | |
| | +-------+ | | +-------+ | |
| | | S11 | | | | S21 | | |
| | +-------+ | | +-------+ | |
| +-------------+ +-------------+ |
| |
+------------------------------------------------+
So how to define transition from S11 to S21, according to the same situation described in wiki the transition execution sequence should be 'exit S11' -> 'exit S1' -> 'enter S2' -> 'enter S21'.
According to the document https://www.boost.org/doc/libs/1_66_0/libs/msm/doc/HTML/ch03s02.html#d0e875,
it is only possible to explicitly enter a sub- state of the target
but not a sub-sub state.
it is not possible to explicitly exit. Exit points must be used.
So you cannot do explicit exit from S11.
You can use exit point pseudo state instead of the explicit exit. And I recommend that use entry point pseudo state instead of the explicit entry.
Here is an example code of the entry point pseudo state
http://redboltz.wikidot.com/entry-point-pseudo-state
and the exit point pseudo state.
http://redboltz.wikidot.com/exit-point-pseudo-state

Manage multiple signal speed in a Gnu-Radio flow graph

I am currently working on Z-Wave protocol.
With my HackRF One and scapy-radio I try to sniff the communications between two devices.
However devices can transmit at different speeds :
9,6 kbps
40 kbps
100 kbps
As I can only decode communications at 40 kbps, I imagine my graph is unable to manage other speeds.
Some informations about Z-Wave communications :
Frequency (EU) : 868.4 MHz
Modulation : GFSK
And my GRC graph :
So my question is : How to modify the graph to decode and sniff 9,6 and 100 kbps signal too ?
As an easy workaround, I would suggest to take the input stream from the HackRF and connect it into 3 different decoders, each one with the desired parameters. Then each Packet sink block will publish messages at the same Socket PDU block.
I am not familiar with the Z-Wave, but if the 3 different data rates share the same spectrum bandwidth, then there is no more job for you and you are done.
But if they do, which I believe that is true for your case, you need some extra steps.
First of all you have to sample the time domain signal with the maximum sampling rate required by the Z-Wave. For example, if for the 3 different data rates the spectrum bandwidth is 4, 2 and 1 MHz you have to sample with 4e6 samples/s. Then you perform SRC (Source Rate Conversion), also known as re-sampling, for each of the different streams. So for the second rate you may want to re-sample your input stream of 4e6 samples/s to 2e6 samples/s.
Then you connect re-sampled streams at the corresponding decoding procedures
+---------------+
|Rest blocks 0 |
+---------------------------------> |
| | |
| +---------------+
|
+------------+ +--------------+ +---------------+
| | | | |Rest blocks 1 |
| Source +----------> Resampler 1+-------------> |
| | | | | |
+------------+ +--------------+ +---------------+
|
| +--------------+ +---------------+
| | | |Rest blocks 2 |
+-----> Resampler 2+--------------> |
| | | |
+--------------+ +---------------+
GNU Radio already ships with some resamplers, you can start using the Rational Resampler block.

Resources