Streaming audio from S3 via Express server takes too long

Streaming audio from S3 via Express server takes too long - node.js

I am trying to stream audio files from S3 to my React client by making a request to my Node/Express server. I managed to implement something that works, but I am not sure if I am actually streaming the file here, or simply downloading it. I suspect I might be downloading the file, because my requests to the server take a long time to come back:
Established database connection.
Server listening on port 9000!
::1 - - [18/Apr/2017:21:13:43 +0000] "GET / HTTP/1.1" 200 424 6.933 ms
::1 - - [18/Apr/2017:21:13:43 +0000] "GET /static/js/main.700ba5c4.js HTTP/1.1" 200 217574 1.730 ms
::1 - - [18/Apr/2017:21:13:43 +0000] "GET /index.css HTTP/1.1" 200 - 8.722 ms
Server received a request: GET /tracks
::1 - - [18/Apr/2017:21:13:43 +0000] "GET /tracks HTTP/1.1" 304 - 41.468 ms
Server received a request: GET /tracks/1/stream
::1 - - [18/Apr/2017:21:14:13 +0000] "GET /tracks/1/stream HTTP/1.1" 200 - 636.249 ms
Server received a request: GET /tracks/2/stream
Database query threw an error: ETIMEDOUT
Note the 636.249 ms!
Can you guys tell if I am doing anything wrong here? I pasted the code from my current approach below; It does the following:
Client makes a fetch call to /tracks/id/stream
Server queries database to get the track
Server uses downloadStream (from the s3 package) to get the file
Server pipes the data to the client
Client receives the data as an ArrayBuffer
Client decodes the buffer and passes it to an AudioBufferSourceNode
AudioBufferSourceNode plays the audio
The server-side:
app.get('/tracks/:id/stream', (req, res) => {
const id = req.params.id
// Query the database for all tracks
database.query(`SELECT * FROM Tracks WHERE id = ${id}`, (error, results, fields) => {
// Upon failure...
if (error) {
res.sendStatus(500)
}
// Upon success...
const params = {
Bucket: bucketName, // use bucketName defined elsewhere
Key: results[0].key // use the key from the track object
};
// Download stream and pipe to client
const stream = client.downloadStream(params)
stream.pipe(res)
})
});
The client-side fetch call:
const URL = `/tracks/${id}/stream`
const options = { method: 'GET' }
fetch(URL, options)
.then(response => response.arrayBuffer())
.then(data => {
// do some stuff
AudioPlayer.play(data)
})
The client-side AudioPlayer, responsible for handling the actual AudioBufferSourceNode:
const AudioPlayer = {
play: function(data) {
// Decode the audio data
context.decodeAudioData(data, buffer => {
// Create a new buffer source
source = context.createBufferSource()
source.buffer = buffer
source.connect(context.destination)
// Start playing immediately
source.start(context.currentTime)
})
},
...

There's a lot wrong here, so let's just go through it piece by piece, whether it was related to your original question or not.
database.query(`SELECT * FROM Tracks WHERE id = ${id}`, (error, results, fields) => {
In this line, you open yourself up to SQL injection attacks. Never concatenate arbitrary data into the context of a query (or any other context for that matter) without proper escaping. Whatever database library you're using will have a parameterized method that you should be utilizing.
I suspect I might be downloading the file, because my requests to the server take a long time to come back
Who knows... you didn't show us a single location where you're doing the logging so it's hard to say whether or not the logged line is before or after the request is complete. One thing that is clear however is that the response has to at least begin, otherwise the response status code wouldn't be known. A 600 ms response time for the first resource byte from S3 isn't unheard of anyway.
Server uses downloadStream (from the s3 package) to get the file
Server pipes the data to the client
You're wasting a lot of bandwidth with this. Rather than fetching the file and relaying it to the client, what you should do is sign a temporary URL with a 15-minute expiration or so, and redirect the client to it. The client will follow the redirect and now S3 is responsible for handling your clients. It will cost you half as much bandwidth, less CPU resource, and will be delivered from a location likely closer to your users. You can create this signed URL with the AWS JS SDK.
Client receives the data as an ArrayBuffer
There's no streaming happening here. Your client is downloading the entire resource before it's playing anything.
What you should do is be creating a normal Audio instance. It will automagically follow your redirect from your Node.js app to your signed S3 URL and handle all the buffering and streaming for you.
let a = new Audio('/tracks/' + encodeURIComponent(id) + '/stream');
a.play();

Related

busboy wait for field data before accepting file upload

Is something like this even possible, or are there better ways to do this? Is what Im doing even a good idea, or is this a bad approach?
What I want to do is upload a file to my nodejs server. Along with the file I want to send some meta data. The meta data will determine if the file can be saved and the upload accepted, or if it should be rejected and sending a 403 response.
I am using busboy and I am sending FormData from my client side.
The example below is very much simplified:
Here is a snippet of the client side code.
I am appending the file as well as the meta data to the form
const formData = new FormData();
formData.append('name', JSON.stringify({name: "John Doe"}));
formData.append('file', this.selectedFile, this.selectedFile.name);
Here is the nodejs side:
exports.Upload = async (req, res) => {
try {
var acceptUpload = false;
const bb = busboy({ headers: req.headers });
bb.on('field', (fieldname, val) => {
//Verify data here before accepting file upload
var data = JSON.parse(val);
if (val.name === 'John Doe') {
acceptUpload = true;
} else {
acceptUpload = false;
}
});
bb.on('file', (fieldname, file, filename, encoding, mimetype) => {
if (acceptUpload) {
const saveTo = '/upload/file.txt'
file.pipe(fs.createWriteStream(saveTo));
}else{
response = {
message: 'Not Authorized'
}
res.status(403).json(response);
}
});
bb.on('finish', () => {
response = {
message: 'Upload Successful'
}
res.status(200).json(response);
});
req.pipe(bb);
} catch (error) {
console.log(error)
response = {
message: error.message
}
res.status(500).json(response);
}
}
So basically, is it even possible for the 'field' event-handler to wait for the 'file' event handler? How could one verify some meta data before accepting a file upload?
How can I do validation of all data in the form data object, before accepting the file upload? Is this even possible, or are there other ways of uploading files with this kind of behaviour? I am considering even adding data to the request header, but this does not seem like the ideal solution.
Update
As I suspected, nothing is waiting. Which ever way I try, the upload first has to be completed, only then after is it rejected with a 403
Another Update
Ive tried the same thing with multer and have similar results. Even when I can do the validation, the file is completely uploaded from the client side. Once the upload is complete, only then the request is rejected. The file, however, never gets stored, even though it is uploaded in its entirety.

With busboy, nothing is written to the server if you do not execute the statement file.pipe(fs.createWriteStream(saveTo));
You can prevent more data from even being uploaded to the server by executing the statement req.destroy() in the .on("field", ...) or the .on("file", ...) event handler, even after you have already evaluated some of the fields. Note however, that req.destroy() destroys not only the current HTTP request but the entire TCP connection, which might otherwise have been reused for subsequent HTTP requests. (This applies to HTTP/1.1, in HTTP/2 the relationship between connections and requests is different.)
At any rate, it has no effect on the current HTTP request if everything has already been uploaded. Therefore, whether this saves any network traffic depends on the size of the file. And if the decision whether to req.destroy() involves an asynchronous operation, such as a database lookup, then it may also come too late.
Compare
> curl -v -F name=XXX -F file=#<small file> http://your.server
* We are completely uploaded and fine
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server
with
> curl -v -F name=XXX -F file=#<large file> http://your.server
> Expect: 100-continue
< HTTP/1.1 100 Continue
* Send failure: Connection was reset
* Closing connection 0
curl: (55) Send failure: Connection was reset
Note that the client sets the Expect header before uploading a large file. You can use that fact in connection with a special request header name in order to block the upload completely:
http.createServer(app)
.on("checkContinue", function(req, res) {
if (req.headers["name"] === "John Doe") {
res.writeContinue(); // sends HTTP/1.1 100 Continue
app(req, res);
} else {
res.statusCode = 403;
res.end("Not authorized");
}
})
.listen(...);
But for small files, which are uploaded without the Expect request header, you still need to check the name header in the app itself.

With a node HTTP2 push stream, what is the purpose of the request header vs. response header?

When a node HTTP2 server creates a new push stream, what is the purpose of the request header vs. response header?
Server code:
http2Server.on('stream', (stream) => {
stream.pushStream({ ':path': '/' }, (err, pushStream, headers) => { // send request headers
pushStream.respond({ ':status': 200 }); // send response headers
pushStream.end('some pushed data');
});
stream.end('some data');
});
Client code:
clientHttp2Session.on('stream', (pushedStream, requestHeaders) => { // receive request headers
pushedStream.on('push', (responseHeaders) => { // receive response headers
/* Process response headers */
});
pushedStream.on('data', (chunk) => { /* handle pushed data */ });
});
Both of these must be sent before any data is sent, so it seems one of them is redundant?
MDN states:
Request headers contain more information about the resource to be fetched, or about the client requesting the resource.
Response headers hold additional information about the response, like its location or about the server providing it.
However, that seems to be slanted towards a more client request, server response model - which doesn't apply to push.

The "request header" as you call it, maps to the PUSH_PROMISE frame in HTTP/2 (you can see this in the NodeJS source code).
A PUSH_PROMISE frame is defined in the HTTP/2 spec, and is used to tell the client "hey I'm going to pretend you sent this request, and send you a response to that 'fake request' next."
It is used to inform the client this response is on it's way, and if it needs it, then not to bother making another request for it.
It also allows the client to cancel this push request with a RST_STREAM frame to say "No thanks, I don't want that." This may be because the server is pushing a resource that the client already has in it's cache or for some other reason.

Subscribe to long-processing request feedback on NodeJS server from client

I have created a Node JS server which does the following:
Uploads media files (videos and images) to the server using multer
If the media is an image, then resize it using sharp
It the media is a video , then resize and compress it using fluent-ffmpeg
Upload files to Firebase storage for backup
All this is working know fluently. The problem is that, when the size of an uploaded file is big, the request processing takes long time. So I want to show some progress on the client side as below:
State 1. The media is uploading -> n%
State 2. The media is compessing
State 3. The media is uploading to cloud -> n%
State 4. Result -> JSON = {status: "ok", uri: .., cloudURI: .., ..}
Firebase storage API has a functionality like this when we creating an upload task as shown below:
let uploadTask = imageRef.put(blob, { contentType: mime });
uploadTask.on('state_changed', (snapshot) => {
if (typeof snapshot.bytesTransferred == "number") {
let progress = (snapshot.bytesTransferred / snapshot.totalBytes) * 100;
console.log('Upload is ' + progress + '% done');
}
});
I have found that, it is possible to realize this using websockets, I am interested if there is other methods to do that.
The problem is described also here: http://www.tugberkugurlu.com/archive/long-running-asynchronous-operations-displaying-their-events-and-progress-on-clients
And there is one of the methods Accessing partial response using AJAX or WebSockets? but I am looking for a more flexible and professional solution.

I have solved this problem using GraphQL Subscriptions. The same approach can be realized using WebSockets. The steps to solve this problem are as below:
Post files to upload server
Generate operation unique ID and send it as response to the client
Ex: response = {op: "A78HNDGS89NSNBDV7826HDJ"}
Create a subscription by opID
Ex: subscription { uploadStatus(op: "A78HNDGS89NSNBDV7826HDJ") { status }}
Every time on status change send request to the GraphQL endpoint, which which publishes the data to the pubsub. To send GraphQL request from nodejs server you can use https://github.com/prisma-labs/graphql-request
Ex:
const { request } = require('graphql-request');
const GQL_URL = "YOUR_GQL_ENDPOINT";
const query = `query {
notify ("Status text goes here")
}`
request(GQL_URL, query).then(data =>
console.log(data)
)
notify resolver function publishes the data to the pubsub
context.pubsub.publish('uploadStatus', {
status: "Status text"
});
If you have more complicated architecture, you can use message brokers like RabbitMQ, Kafka etc.
If someone knows other solutions, please let us know )

How to cancel previous ES request before making a new one

I am using NodeJS with https://www.npmjs.com/package/elasticsearch package
Use Case is like this: When a link is clicked on the page, I will make a request to NodeJS Server which will in turn use the ES node package to fetch the data from ES Server and sends the data back to the client.
The issue is, when two requests are made in quick session(two links clicked in a short span), the Response of first request and then the Response of second request is reaching the client. The UI depends on this response, and i would like to directly show only the second request's response.
So, the question is, Is there any way to cancel out the previous request made to ES Server before starting a new one ?
Code:
ES Client:
var elasticsearch = require('elasticsearch');
var client = new elasticsearch.Client({
host: 'HostName',
log: 'trace'
});
Route:
app.get('/data/:reportName', dataController.getReportData);
DataController:
function getReportData(req, res) {
query = getQueryForReport(report)
client.search(query)
.then(function(response) {
res.json(parseResponse(response)
})
}
So, the same API /data/reportName is called twice in succession with different reportNames. I would like to send only the second report Data back and cancel our the first request.

If you're only concerned about the UX, rather than stressing your ES, than aborting the ajax request is what you want.
Since you didn't post your client side code, I'll give you a generic example:
var xhr = $.ajax({
type: "GET",
url: "searching_route",
data: "name=John&location=Boston",
success: function(msg){
alert( "Data Saved: " + msg );
}
});
//kill the request
xhr.abort()
Remember that aborting the request may not prevent the elasticsearch query from being processed, but will prevent the client from receiving the data.

Node.js - Stream Binary Data Straight from Request to Remote server

I've been trying to stream binary data (PDF, images, other resources) directly from a request to a remote server but have had no luck so far. To be clear, I don't want to write the document to any filesystem. The client (browser) will make a request to my node process which will subsequently make a GET request to a remote server and directly stream that data back to the client.
var request = require('request');
app.get('/message/:id', function(req, res) {
// db call for specific id, etc.
var options = {
url: 'https://example.com/document.pdf',
encoding: null
};
// First try - unsuccessful
request(options).pipe(res);
// Second try - unsuccessful
request(options, function (err, response, body) {
var binaryData = body.toString('binary');
res.header('content-type', 'application/pdf');
res.send(binaryData);
});
});
Putting both data and binaryData in a console.log show that the proper data is there but the subsequent PDF that is downloaded is corrupt. I can't figure out why.

Wow, never mind. Found out Postman (Chrome App) was hijacking the request and response somehow. The // First Try example in my code excerpt works properly in browser.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string