How does one send blobs or arraybuffers through pusher? - pusher

The Pusher documentation states that it does not support binary websocket frames. Why is this the case and how would one send non-text data, e.g. blobs, through Pusher?

You need to find a way to serialize your data in order to send it over. For binary data transmitted over text formats like JSON/XML, the most common way is to use base64.

Related

Bulk Data Transfer through REST API

I have been informed that "REST API is not made / good for Bulk Data Transfer. Its a proven fact". I tried to search over google about this, but unable to find any fruitful answer. Can anyone let me know whether this statement is actually True or not? If its TRUE, then why?
Note: I am not exposing Bulk Data (50 million rows from database) over Web. I am saving it to Server as JSON format (Approx. 3GB file size) and transferring it to other system. I am using Node JS for this purpose. Network is not an issue to transfer file.
Nothing wrong with exposing a end point which returns huge data
It might be concern on how you are sending that data, as memory could be a issue
Why don't you consider streaming the data, that way memory needed is only one packet of the data which has to be streamed at a time
NodeJS has many way to pipe the data into response object, you can also consider JSONStream module from npmjs.org

Using ConvertRecord on compressed input

In Apache NiFi I can have an input with compressed data that's unpacked using the UnpackContent processor and then connect the output to further record processing or otherwise.
Is it possible to operate directly on the compressed input? In a normal programming environment, one might easily wrap the record processor in a container that more or less transparently unpacks the data in a stream-processing fashion.
If this is not supported out of the box, would it be reasonable to implement a processor that extends for example ConvertRecord to accept a compressed input?
The motivation for this is to work efficiently with large CSV data files, converting it into a binary record format without having to spill the uncompressed CSV data to disk.
Compressed input for record processing is not supported currently, but is a great idea for improvement.
Instead of implementing it at a particular processor (e.g. ConvertRecord), I'd suggest following two approaches:
Create CompressedRecordReaderFactory implementing RecordReaderFactory
Like Java compressed stream such as GZIPInputStreawm, CompressedRecordReaderFactory will wrap another RecordReaderFactory, user specifies compression type (or the reader factory may be able to implement auto-detect capability by looking at FlowFile attributes ... etc)
Benefit of this approach is once we add this, we can support reading compressed input stream at any existing RecordReader and Processors using Record api, not only CSV, but also XML, JSON ... etc
Wrap InputStream at each RecordReaderFactory (e.g. CSVReader)
We could implement the same thing at each RecordReaderFactory and supporting compressed input gradually
This may provide a better UX because no additional ControllerService has to be configured
How do you think? For further discussion, I suggest creating a NiFi JIRA ticket. If you're willing to contribute, that would be even better.

How is the data chunked when using UploadFromStreamAsync and DownloadToStreamAsync when uploading to block blob

I just started learning about Azure blob storage. I have come across various ways to upload and download the data. One thing that puzzles me to when to use what.
I am mainly interested in PutBlockAsync in conjunction with PutBlockListAsync and UploadFromStreamAsync.
As far as I understand when using PutBlockAsync it is up to the user to break the data into chunks and making sure each chunk is within the Azure block blob size limits. There is an id associated with each chunk that is uploaded. At the end, all the ids are committed.
When using UploadFromStreamAsync, how does this work? Who handles chunking the data and uploading it.
Why not convert the data into Stream and use UploadFromStreamAsync all the time and avoid two commits?
You can use fiddler, and observe what happens when use UploadFromStreamAsync.
If the file is larger(more than 256MB), such as 500MB, the Put Block and Put Block List api are called in the background(they are also called when use PutBlockAsync and PutBlockListAsync method)
If the file is small than 256MB, then it(UploadFromStreamAsync) will call the Put Blob api in the background.
I use UploadFromStreamAsync and uploading a file whose size is 600MB, then open the fidder.
Here are some findings from fidder:
1.The large file is broken into small size(4MB) one by one, and calls Put Block api in the background:
2.At the end, the Put Block List api will be called:

Allowing users to upload csv data into parse app

I need to let my users upload CSV data into my app. Data such as contacts, or products. Their are a number of web based client libraries that can handle the client side logic. What I am looking for is a fast reliable solution to get the data into a parse class.
I have not written any code. Right now I am trying to discover the best process to do this. I have played with parse batch save and know that is not reliable for 1000's of inserts. My thought is to upload the CSV store it in a parse class "uploads" and then have background job lift out say 100, or 1000 records at a time and insert them. Then send a notification when it is done.
Is this the best option, or has anybody found a simpler faster solution?

How can I buffer the twitter stream with redis before inserting into rethinkdb?

Where I'm At
I have a simple node.js twitter stream consumer that tracks various hashtags. Oftentimes, these are trending hashtags, which means a high-volume of twitter json is streaming into my consumer. I don't do any processing of the twitter json in the consumer.
What I Want
I want to store the tweet json objects in rethinkdb.
Assumptions
Due to the volume (and unpredictability of said volume) of tweets, I should avoid inserting the tweet json objects into rethinkdb as they are consumed (since the rate at which the tweets enter the consumer might be faster than the rate at which rethinkdb can write those tweets).
Since Redis is definitely fast enough to handle the writes of the tweet json objects as they are consumed, I can push the tweet json objects directly to redis and have another process pull those tweets out and insert them into rethinkdb.
What I Hope To Learn
Are my assumptions correct?
Does this architecture make sense? If not, can you suggest a better alternative?
If my assumptions are correct and this architecture makes sense,
a. What is the best way of using redis as a buffer for the tweets?
b. What is the best way of reading from (and updating/clearing) the redis buffer in order to perform the inserts into rethinkdb?
We do use this kind of architecture in our production. If the amount of data that you are going to handle doesn't exceeds the max memory limit of redis you can proceed this way. And also you need to take care of downtime.
What is the best way of using redis as a buffer for the tweets?
You can use a redis queue. Where you producer keeps pushing into the head.
And your consumer consumes from the tail and populates to your db.
http://redis.io/commands#list
You can use this solution Redis Pop list item By numbers of items as you have a similar requirement (producer is heavy and consumer needs to consume little quicker than popping one by one)

Resources