Streaming pdf file from node server randomly just shows binary data on browser

Streaming pdf file from node server randomly just shows binary data on browser - node.js

I have a node app (specifically sails app) that is serving pdf file. My code for serving file looks like this.
request.get(pdfUrl).pipe(res)
And when I view the url for pdf, it renders the pdf fine. But sometimes, it just renders the binary data of pdf on browser like given below.
%PDF-1.4 1 0 obj << /Title (��) /Creator (��wkhtmltopdf
I am not able to figure out why is it failing to serve the pdf correctly just randomly. Is it chrome thing? or Am I missing something?

Leaving this here in the hope that it helps somebody - I have had similar issues multiple times and its either of two things:
You're using an HTTP connection to an HTTPS delivery (this is typical with websockets, where you must specify :443 in addition to the wss.
request's encoding parameter is serving plaintext instead of objects. This is done by setting encoding to null as follows: request({url: myUrl, encoding: null}).
Content types in headers - steering clear of this since it's obvious/others have covered this substantially enough already :)
I am pretty sure you're facing this due to (2). Have a look at https://github.com/request/request
encoding - Encoding to be used on setEncoding of response data. If
null, the body is returned as a Buffer. Anything else (including the
default value of undefined) will be passed as the encoding parameter
to toString() (meaning this is effectively utf8 by default). (Note: if
you expect binary data, you should set encoding: null.)
Since, the aforementioned suggestions didn't work for you, would like to see forensics from the following:
Are files that fail over a particular size? Is this a buffer issue at some level?
Does the presence of a certain character in the file cause this because it breaks some of your script?
Are the meta-data sections and file-endings the same across a failed and a successful file? How any media file is signed up-top, and how it's truncated down-bottom, can greatly impact how it is interpreted

You may need to include the content type header application/pdf in the node response to tell the recipient that what they're receiving is a PDF. Some browsers are smart enough to determine the content type from the data stream, but you can't assume that's always the case.

When Chrome downloads the PDF as text I would check the very end of the file. The PDF file contains the obligatory xref table at the end. So every valid PDF file should end with the following sequence: %EOF. If not then the request was interrupted or something gone wrong.

You also need HTTP header:
Content-Disposition:inline; filename=sample.pdf;
And
Content-Length: 200
Did you try to save what ever binary stuff you get on disk and open it manually by PDF reader? It could be corrupt.

I would suggest trying both of these:
Content-Type: application/pdf
Content-Disposition: attachment; filename="somefilename.pdf"
(or controlling Mime Type in other ways: https://www.npmjs.com/package/mime-types)

Related

Is it possible to download a file nested in a zip file, without downloading the entire zip file?

Is it possible to download a file nested in a zip file, without downloading the entire zip archive?
For example from a url that could look like:
https://www.any.com/zipfile.zip?dir1\dir2\ZippedFileName.txt

Depending on if you are asking whether there is a simple way of implementing this on the server-side or a way of using standard protocols so you can do it from the client-side, there are different answers:
Doing it with the server's intentional support
Optimally, you implement a handler on the server that accepts a query string to any file download similar to your suggestion (I would however include a variable name, example: ?download_partial=dir1/dir2/file
). Then the server can just extract the file from the ZIP archive and serve just that (maybe via a compressed stream if the file is large).
If this is the path you are going and you update the question with the technology used on the server, someone may be able to answer with suggested code.
But on with the slightly more fun way...
Doing it opportunistically if the server cooperates a little
There are two things that conspire to make this a bit feasible, but only worth it if the ZIP file is massive in comparison to the file you want from it.
ZIP files have a directory that says where in the archive each file is. This directory is present at the end of the archive.
HTTP servers optionally allow download of only a range of a response.
So, if we issue a HEAD request for the URL of the ZIP file: HEAD /path/file.zip we may get back a header Accept-Ranges: bytes and a header Content-Length that tells us the length of the ZIP file. If we have those then we can issue a GET request with the header (for example) Range: bytes=1000000-1024000 which would give us part of the file.
The directory of files is towards the end of the archive, so if we request a reasonable block from the end of the file then we will likely get the central directory included. We then look up the file we want, and know where it is located in the large ZIP file.
We can then request just that range from the server, and decompress the result...

How do I decompress the diagram data in a .drawio file with node.js and zlib?

Diagrams.net, previously and still more widely known as draw.io, is a popular tool for drawing diagrams of various kinds. It stores diagrams in an XML-based format that uses the file ending .drawio. The file content has the structure:
<mxfile {...}>
<diagram {...}>
{the-actual-diagram-content}
</diagram>
</mxfile>`
According to the documentation page Extracting the XML from mxfiles, the string {the-actual-diagram-content} contains the actual diagram data in compressed format, "compressed with the standard deflate process". I'd like to decompress this data in my node.js app to parse and modify it.
I have found an older, similar question on StackOverflow, which wants the same, but uses the libraries "atob", and later "pako". I'd like to achieve the same with the more standard "zlib" node.js module, which - if this is really "the standard deflate process" - should be possible.
However, all my attempts to "inflate" the compressed string fail. I have mostly tried variations of the following code, with different encodings ('base64', 'utf8') and methods ('inflateSync', 'unzipSync', 'gunzipSync'):
zlib.inflateSync(Buffer.from(string, 'base64')).toString();
All attempts fail with the error "Error: incorrect header check". I read this as "dude, seriously, you're using the wrong unzip algorithm for this". However, I cannot figure out what the right algorithm or settings are.
The sample string I'd like to decode is the following. Using the jgraph inflate/deflate tool, this uncompresses perfectly fine. However, the settings done there, "URL Encode", "Deflate", "Base64" sound to me exactly like what I am trying.
zVdbk6I6EP41Vp3z4BYXL/Ao3nV0VEYZfQsQITOBIEQu/voNAgrqrHtOzVbti5X+0t0kX/eXxJrYdeKhDzx7RkyIawJnxjWxVxOEZktmvymQZIAoNzLA8pGZQfwVUNEJ5iCXo0dkwqDiSAnBFHlV0CCuCw1awYDvk6jqtie4+lUPWPAOUA2A71ENmdTOUKnJXfERRJZdfJnn8hkHFM45ENjAJFEJEvs1sesTQrORE3chTrkreMniBl/MXhbmQ5f+TkCXT0gX48NHW1CSsVHXjta0nmcJAT7mG+7kq6VJQYFPjq4J0yxcTVQiG1GoesBIZyNWc4bZ1MHM4tkwoD75vFDFNqnkX4A+hfGXS+cvhLBGgsSB1E+YSxHQyjlMbuzoWhK+4NkulaPwA3kXWJfUV6LYIOfqP/Am3PGm/IW8ia3mX8abeMebSdIYesceNJkOcxNinUT9K6CcATaRsoOYWM8EAp92UsUz3CXu2c01b5AS42xygNLVn8sDMLJcNjYYtdBnAAY6xAowPq1zHbsEE/+ap1pbFuMn72Vjmxo/moXZi8uTvaSwYkTfi+WwcSmKWdeg1ChiMqJSdn7dFIxMcvQN+Fz8jDgL0mfN/mWT1bkfXFNqVxqtXjSVDzGgKKyu9VFX5ekXBLFdXHIL0o3wpZvGzPaYR5UPv5tEolhNJIg3iTISfpGocCT7fQArPmchXHj5/9po3GnjXhQYs4sPPj9OQOBlt+EexWmbPjhffEIBBfo5ddpY+Q1aQthVDsv2b51IX8v+voNKx9CjU+ibmqjOV2t/sf98SZvPS8qeBV46DCh0DYT/CTfzlR7MrF5PmC8SIz5MdfnN4UVrzlmdlql4q46i66/m3uq7mtdTvZ3jrFQ0Zp9S4EjXoqUgK+uX5cTtbS3TmDV36a4E5bhgxA0mW3u1w5PpSRuph+6SIW9HNIC0cewe9e0cxsJyA4VOe8v2qyznChq33wP8Ee3DE97iYWvIqXY74k/4oIKDGQta/LnmY9WBRjsaAaN90jSwWrNYHIZr5vGxZt8eeMTHLyCQArwGfZ2OY3Uh0tZouLKlYLya0FVfjjZyM3ZM5VVDn6d4ISvcECB5rYSOOEpCJA90I54dtp0X7Mubo77bACKoZiNgz0vFOxmKLr3tk7QLlNZorbYnQ6HhBtPe20Hy2QJUcR8vwlktvaUHvPc+VDYn1yFm0kY4lJfSXBxN9d3rsKmpO3VuyWa4kLr/PpfZQ9kAjEnUKd6e3I3U+MJGJL1v6vL5hDdROQOMPeAWl8udlP+kDMUHMhS/S4bVx0i98Q0yZOb1BZ25X/+GiP2f
What am I doing wrong?

Use zlib.inflateRawSync(). What you have there is a raw deflate stream, not a zlib stream.

ZF2 - Download xls show two empty lines in the downloaded sheet

I am using streams to download the excel file. Please find the code below
$response = new \Zend\Http\Response\Stream();
$response->setStream(fopen($myfile, 'r'));
$response->setStatusCode(200);
$headers = new \Zend\Http\Headers();
$headers->addHeaderLine("Content-Type: application/vnd.ms-excel; charset=UTF-8")
->addHeaderLine('Content-Disposition', 'attachment; filename=my.xls')
->addHeaderLine( "Content-Transfer-Encoding: binary" )
->addHeaderLine('Content-Length', filesize($myfile));
$response->setHeaders($headers);
The file generated is proper but when the same file I am enforcing the user to download there are two empty lines coming in the downloaded excel file. I did some researching and thought may it's http version and header lines but I think it's not because if I try otherwise I get those two empty line in start of excel report.
Please note there are no empty space in the start of the content.
Any idea ?? why it's so?

Mind the streamed response does in fact not really provide a stream context to the client. It just buffers the stream internally and sends out the response in one go.
That being said, I have created a controller plugin to send attachments from a file path or directly with the binary data addressed to a variable. It's in my common Soflomo\Common library. I haven't had the issues you described and I use some more headers than you do.
Tell me if that piece of code works for you. One of the differences is you use the size of the original file as the size of the response. I am not sure, as this might cause an indifference with the cached streamed response. Try to just grab the contents and do a strlen() on this content.

Wrong text encoding when parsing json data

I am curling a website and writing it to .json file; this file is input to my java code which parses it using json library and the necessary data is written back in a CSV file which i later use to store it in a database.
As you know data coming from a website can be in different formats so i make sure that i read and write in UTF-8 format, still i get wrong output.
For example, Østerriksk becomes ï¿½sterriksk.
I am doing all this in Linux. I think there is some encoding problem because this same code runs fine in Windows but not in Unix/Linux.
I am quite sure my java code is proper but i am not able to find out what I'm doing wrong.

You're reading the data as ISO 8859-1 but the file is actually UTF-8. I think there's an argument (or setting) to the file reader that should solve that.
Also: curl isn't going to care about the encodings. It's really something in your Java code that's wrong.

What kind of IDE are you using, for example this can happen if you are using Eclipse IDE, and not set your default encoding to utf-8 in properties.

Twitter Widget behind proxy

I'm trying to use Twitter widget in a site which server is inside my corporation, hence, behind its proxy.
I can't use directly the code they provide, since I can't reach the source address.
<script charset="utf-8" src="http://widgets.twimg.com/j/2/widget.js"></script>
I was wondering if I could make a local copy of the js so I could avoid this problem, but, when I did so I get:
ActionView::WrongEncodingError in Home#index
Your template was not saved as valid UTF-8. Please either specify UTF-8 as the encoding for your template in your text editor, or mark the template with its encoding by inserting the following as the first line of the template:
# encoding: <name of correct encoding>.
But the encoding its already set.
I'm really really newby with this stuff.
Please help.

The error you get is because ruby needs an explicit encoding to parse correctly a non latin-1 file.
In each ruby file that has utf-8 characters you need a first line like the example:
# encoding: UTF-8
As for the main problem in your question, you can try, but probably communication with twitter is blocked.
You should talk to your system administrator to try to get access to twitter for your app.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Streaming pdf file from node server randomly just shows binary data on browser - node.js

You may need to include the content type header application/pdf in the node response to tell the recipient that what they're receiving is a PDF. Some browsers are smart enough to determine the content type from the data stream, but you can't assume that's always the case.

When Chrome downloads the PDF as text I would check the very end of the file. The PDF file contains the obligatory xref table at the end. So every valid PDF file should end with the following sequence: %EOF. If not then the request was interrupted or something gone wrong.

You also need HTTP header: Content-Disposition:inline; filename=sample.pdf; And Content-Length: 200 Did you try to save what ever binary stuff you get on disk and open it manually by PDF reader? It could be corrupt.

I would suggest trying both of these: Content-Type: application/pdf Content-Disposition: attachment; filename="somefilename.pdf" (or controlling Mime Type in other ways: https://www.npmjs.com/package/mime-types)

Related

Is it possible to download a file nested in a zip file, without downloading the entire zip file?

How do I decompress the diagram data in a .drawio file with node.js and zlib?

ZF2 - Download xls show two empty lines in the downloaded sheet

Wrong text encoding when parsing json data

Twitter Widget behind proxy

Categories

Resources