NodeJs: pipe tar file from http response to tar.Extract? - node.js

Following task:
We have a nodejs client (daemon) downloading audio data which is contained in a tar file (NOT tar.gz - it really is uncompressed!) from Amazon S3.
At the moment, we glue the chunks together in the 'data' handler of the response, save the whole buffer to disc as a file and then call tar.Extract(inPath, outPath) on the newly created file.
I'd like to skip the process of writing the data to disc and instead pass the data from the response directly to tar.Extract().
This is my handler code:
var readResponseData = function (response) {
response.setEncoding('binary');
response.pipe(tar.Extract( { path: '/tmp/testyeah' }));
....
....
I always get "Error: Invalid tar file"
I also tried without success the suggestions from this page (https://groups.google.com/forum/#!topic/nodejs/A7jz6b9daZc) although that should apply to compressed tar files and not the uncompressed ones that we use.
Any suggestions?

Related

Download multiple file using wget by looping through a text file of IDs

I am trying to download multiple files using wget. I have a text file containing the ID of the files that I want to download (mannifest.tsv, one line for one ID).
Currently, I am using the below command:
while read id; do wget https://target-data.nci.nih.gov/Public/AML/miRNA-seq/L3/expression/BCCA/TARGET-FHCRC/$id.txt; done < manifest.tsv
However, I got the following error:
--2022-08-12 23:43:28-- https://target-data.nci.nih.gov/Public/AML/miRNA-seq/L3/expression/BCCA/TARGET-FHCRC/TARGET-00-BM3897-14A-01R.isoform.quantification%0D.txt
Resolving target-data.nci.nih.gov... 129.43.254.217, 2607:f220:41d:21c1::812b:fed9
Connecting to target-data.nci.nih.gov|129.43.254.217|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-08-12 23:43:30 ERROR 404: Not Found.
Probably because when I loop through manifest.tsv file, the new line character was also read, therefore, the file ID is not correct anymore.
Could someone help me? I really appreciate!

FFmpeg.wasm - retrieve video metadata with node js

I want to retrieve metadata from a converted video using FFmpeg web assembly package.
I tried to extract a metadata.txt file to my current directory but nothing was return from my command
in my index.js file:
const { createFFmpeg, fetchFile } = require('#ffmpeg/ffmpeg')
const ffmpeg = createFFmpeg({ log: true })
await ffmpeg.load()
await ffmpeg.run('-i', "myvideo.avi'", '-f', 'ffmetadata', 'metadata.txt')
I got error :
Output file is empty, nothing was encoded.
Is this kind of command available on this wasm package? Found nothing relevant on their documentation and I want to avoid the global FFmpeg installation
Wasm doesn't read from the file system. You need to copy files into wasm virtual file system for it to be able to reference files.

AWS Lambda function - convert PDF to Image

I am developing application where user can upload some drawings in pdf format. Uploaded files are stored on S3. After uploading, files has to be converted to images. For this purpose I have created lambda function which downloads file from S3 to /tmp folder in lambda execution environment and then I call ‘convert’ command from imagemagick.
convert sourceFile.pdf targetFile.png
Lambda runtime environment is nodejs 4.3. Memory is set to 128MB, timeout 30 sec.
Now the problem is that some files are converted successfully while others are failing with the following error:
{ [Error: Command failed: /bin/sh -c convert /tmp/sourceFile.pdf
/tmp/targetFile.png convert: %s' (%d) "gs" -q -dQUIET -dSAFER -dBATCH
-dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile=/tmp/magick-QRH6nVLV--0000001" "-f/tmp/magick-B610L5uo"
"-f/tmp/magick-tIe1MjeR" # error/utility.c/SystemCommand/1890.
convert: Postscript delegate failed/tmp/sourceFile.pdf': No such
file or directory # error/pdf.c/ReadPDFImage/678. convert: no images
defined `/tmp/targetFile.png' #
error/convert.c/ConvertImageCommand/3046. ] killed: false, code: 1,
signal: null, cmd: '/bin/sh -c convert /tmp/sourceFile.pdf
/tmp/targetFile.png' }
At first I did not understand why this happens, then I tried to convert problematic files on my local Ubuntu machine with the same command. This is the output from terminal:
**** Warning: considering '0000000000 XXXXX n' as a free entry.
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> Mac OS X 10.10.5 Quartz PDFContext <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
So the message was very clear, but the file gets converted to png anyway. If I try to do convert source.pdf target.pdf and after that convert target.pdf image.png, file is repaired and converted without any errors. This doesn’t work with lambda.
Since the same thing works on one environment but not on the other, my best guess is that the version of Ghostscript is the problem. Installed version on AMI is 8.70. On my local machine Ghostsript version is 9.18.
My questions are:
Is the version of ghostscript problem? Is this a bug with older
version of ghostscript? If not, how can I tell ghostscript (with or
without using imagemagick) to repair or ignore errors like it does on
my local environment?
If the old version is a problem, is it possible to build ghostscript
from source, create nodejs module and then use that version of
ghostscript instead the one that is installed?
Is there an easier way to convert pdf to image without using
imagemagick and ghostscript?
UPDATE
Relevant part of lambda code:
var exec = require('child_process').exec;
var AWS = require('aws-sdk');
var fs = require('fs');
...
var localSourceFile = '/tmp/sourceFile.pdf';
var localTargetFile = '/tmp/targetFile.png';
var writeStream = fs.createWriteStream(localSourceFile);
writeStream.write(body);
writeStream.end();
writeStream.on('error', function (err) {
console.log("Error writing data from s3 to tmp folder.");
context.fail(err);
});
writeStream.on('finish', function () {
var cmd = 'convert ' + localSourceFile + ' ' + localTargetFile;
exec(cmd, function (err, stdout, stderr ) {
if (err) {
console.log("Error executing convert command.");
context.fail(err);
}
if (stderr) {
console.log("Command executed successfully but returned error.");
context.fail(stderr);
}else{
//file converted successfully - do something...
}
});
});
You can find a compiled version of Ghostscript for Lambda in the following repository.
You should add the files to the zip file that you are uploading as the source code to AWS Lambda.
https://github.com/sina-masnadi/lambda-ghostscript
This is an npm package to call Ghostscript functions:
https://github.com/sina-masnadi/node-gs
After copying the compiled Ghostscript files to your project and adding the npm package, you can use the executablePath('path to ghostscript') function to point the package to the compiled Ghostscript files that you added earlier.
Its almost certainly a bug, or perhaps limitation, with the older version of Ghostscript.
Many PDF producers create PDF files which do not conform to the specification, and yet will open without complain in Adobe Acrobat. Ghostscript endeavours to do the same, but obviously we can't know what Acrobat is going to allow, so we are continually chasing this nebulous target. (FWIW that warning is a legitimate out-of-spec PDF file).
There's nothing you can do with the old version other than replace it.
Yes you can build Ghostscript from source, I have no idea about a nodejs module, not sure why that's relevant.
There are numerous other applications which will render a PDF file, MuPDF is another one I know of. And, of course, you can use Ghostscript directly without using ImageMagick. Of course, if you can load another application, then you should simply be able to replace your Ghostscript installation too.
The version of GS on aws is an old version with known bugs. We can get around this by uploading an x64 GS file, compiled specifically for Linux. Then upload that using new AWS lambda layers. I have written a node function that does just this here:
https://github.com/rcastoro/PDFImagine
Make sure you have that GS layer for your lambda, however!

Adm zip write zip buffer to ExpressJS response

Hi I'm trying to send a zip buffer made by Adm Zip npm module to my response for client download.
I manage to download the zip file but unable to expand it. OSX says "error 2 No such file or directory"...
The downlaoded zip file has got the right size I believe and is sent over this way:
var zip = new AdmZip();
// added files with zip.addFile(...)
var zipFile = zip.toBuffer();
res.contentType('zip');
res.write(zipFile);
res.end();
Any idea what could be wrong?
Thanks
Apparently it comes from the Adm-zip code base and hasn't been merged yet:
https://github.com/cthackers/adm-zip/compare/master...mygoare:unzipErr

Create tar file from files in a particular directory

I need to use nodejs to create a tar file that isn't encompassed in a parent directory.
For example, here is the file system:
/tmp/mydir
/tmp/mydir/Dockerfile
/tmp/mydir/anotherfile
What I'm looking to do is the equivalent to this:
cd /tmp/mydir
tar -cvf archive.tar *
So, when I extract archive.tar, Dockerfile will end up in the same directory I execute the command.
I've tried tar.gz and a few others, but all the examples are compressing an entire directory, and not just files.
I'm doing this so I can utilize the Docker REST API to send builds.
With a modern module node-tar you can create a .tar file like this:
tar.create(
{ file: 'archive.tar' },
['/tmp/mydir']
).then(_ => { .. tarball has been created .. })
The tar.gz module referenced in other answers is deprecated.
Use tar.gz module. Here is a sample code
var targz = require('tar.gz');
var compress = new targz().compress('/path/to/compress', '/path/to/store.tar.gz',
function(err){
if(err)
console.log(err);
console.log('The compression has ended!');
});
For more options, visit the documentation page.
This package is now deprecated. Check the answer provided by #Kelin.
Second argument to the constructor is passed on as properties to the tar module.
var TarGz = require('tar.gz');
var compressor = new TarGz({}, {fromBase: true});
This will use create the archive without top level directory.
Edit: This was undocumented in node-tar.gz but pull request has now been merged: https://github.com/alanhoff/node-tar.gz#tar-options

Resources