nodejs re-encoding uploaded files for security reason.... but how? - node.js

I'm using nodejs and developing file upload system on my personal project.
From one blog, (it's not english site so I won't link it here) I saw that I have to re-encoding fil. If not, high-level attacker can insert malicious shell code in normal file (he told jpeg as an example) even I limit the extension of uploaded files and change the original filename and hide the filepath.
My question is, so how to re-encoding files in nodejs??
there are several types of file and I don't know how to do it.
Thanks in advance.

While doing a file upload, there are several hardening to increase security.
Rename the file name with extension
Use whitelist to check the file's extension.
Use whitelist to check the file's mime types.
Use whitelist to check the file's magic bytes.
Set content-type and no-sniff headers to response if you buffer (accessible) your files from HTTP
Re-encoding means in here actually is the fourth step in above.
Determine file type with reading magic bytes;
const FileType = require('file-type');
const readChunk = require('read-chunk');
(async () => {
const buffer = readChunk.sync('Unicorn.png', 0, 4100);
console.log(await FileType.fromBuffer(buffer));
//=> {ext: 'png', mime: 'image/png'}
})();
There are several packages that you can decide to use:
https://github.com/sindresorhus/file-type

Related

Is it possible to download a file nested in a zip file, without downloading the entire zip file?

Is it possible to download a file nested in a zip file, without downloading the entire zip archive?
For example from a url that could look like:
https://www.any.com/zipfile.zip?dir1\dir2\ZippedFileName.txt
Depending on if you are asking whether there is a simple way of implementing this on the server-side or a way of using standard protocols so you can do it from the client-side, there are different answers:
Doing it with the server's intentional support
Optimally, you implement a handler on the server that accepts a query string to any file download similar to your suggestion (I would however include a variable name, example: ?download_partial=dir1/dir2/file
). Then the server can just extract the file from the ZIP archive and serve just that (maybe via a compressed stream if the file is large).
If this is the path you are going and you update the question with the technology used on the server, someone may be able to answer with suggested code.
But on with the slightly more fun way...
Doing it opportunistically if the server cooperates a little
There are two things that conspire to make this a bit feasible, but only worth it if the ZIP file is massive in comparison to the file you want from it.
ZIP files have a directory that says where in the archive each file is. This directory is present at the end of the archive.
HTTP servers optionally allow download of only a range of a response.
So, if we issue a HEAD request for the URL of the ZIP file: HEAD /path/file.zip we may get back a header Accept-Ranges: bytes and a header Content-Length that tells us the length of the ZIP file. If we have those then we can issue a GET request with the header (for example) Range: bytes=1000000-1024000 which would give us part of the file.
The directory of files is towards the end of the archive, so if we request a reasonable block from the end of the file then we will likely get the central directory included. We then look up the file we want, and know where it is located in the large ZIP file.
We can then request just that range from the server, and decompress the result...

The priority of require () 's parameter to determine the file-extension in nodejs

I am learning node.js
for example, I try
var server = require("./myserver.js");
and
var server = require("./myserver");
Both of these two casework.
what if I have another file with the same name?
e.g myserver.json, myserver.node
, will it always search .js at first?
From one of the answerers in my previous question,
he mentions
only load the .json file if you explicitly add the .json extension to the require-call. So if you leave the extension, it always loads the .js file.
will this rule also suit to .node file?
If the exact filename is not found, then Node.js will attempt to load the required filename with the added extensions: .js, .json, and finally .node. You can check node_js docs for detailed explanation. https://nodejs.org/api/modules.html#modules_file_modules
Yes, If you will not provide the extension of the file, Then, it will first look at the JS file, since JS is by default

how to check if a file path in cloud storage is valid or not using cloud function?

so I am trying to download an image from firebase storage like this using cloud function
const bucket = storage.bucket("myApp.appspot.com")
const filePath = `eventPoster/${creatorID}/${eventID}.png`
await bucket.file(filePath).download({
destination: tmpFilePath
})
but the problem is, I can't ensure the image is always in the png format, it can also be jpeg or jpg
I will get error like this if I hard code it like that
Error: No such object: myApp.appspot.com/eventPoster/user1/ev123.png
so how to check if a path is valid or not or how to make a code that can work dynamically with other image extension ?
Posting as Community Wiki, because I based part of it in a comment and so other members of the Community can feel free to edit it.
As mentioned in the comments, you can use the method exists(), to check if a specific file exists in your bucket. This way, you will not an issue, in case the file doesn't exist when you are trying to return it. This would be the best way for you to check only for files existing, with the name based in the ids as you are basing your name structure.
Besides that, as clarified in this other post from the Community here, you can restrict the file types in your bucket in case you are using the POST method to upload files, otherwise, you won't be able - which can also, be an alternative. In case you are not using POST, as mentioned as well in this other post, you can try to construct a Cloud Function that will check for the file type before it uploads the file to the bucket, which would make your application only upload specific files.
In case you would like to check on how to upload files to Cloud Storage using Cloud Functions, you can check this tutorial here: How to upload using firebase cloud functions
Unfortunately, these seem to be the only available options right now, but I think they are a good start for you.
Let me know if the information helped you!
You can use the exists method on a file to check if your PNG is present or not. You can also change your code and use the getFiles() method. As you can see, you can put options, and one named "prefix". In your case you can have this
bucket.getFiles({
prefix: "eventPoster/${creatorID}/${eventID}"
}, <callback>);
Here you will list all the file with the prefix: PNG, JPEG and others. make your code smarter to leverage this dynamic answer

Using github for excel files

As a kind of follow up of this question, I would like to ask regarding binary files (such us excel files) and versioning.
Let's say I want to use github to store a programming project. No problem there since the majority of files are text (no matter the language).
But I have also documentation. What if I put it into a folder of the github project? (I have seen projects that do this)
I read git is no good for this, so how can I work versioning for say excel files?
You could save your excel as .fods, which is regular .ods file saved as flat XML. This format is probably not supported by MS Office, so you may need to install Libre Office for this (it is free).
Since .fods is regular XML, it can be versioned as regular text file with diffs and (with some luck) even support of merges between branches.
You could also save other Open Document formats as flat XMLs:
.fodt for word processing (text) documents
.fods for spreadsheets
.fodp for presentations
.fodg for graphics
So if migration to Libre Office is not a problem, this is probably the best solution.
If this is not an option, you may consider using Git LFS for storing binaries. But if files are small and you don't change them often, you can just ignore the whole problem - few small binary files will not hurt your repository. You should just estimate - if you will start versioning 1 MB binary file and save 100 versions of it, this will increase size of your repository about 100 MB (it could be smaller if file can be compressed). You need a really large codebase to reach 100 MB in repository with text source files only, so in this case your repository will be filled mainly by binary files.
BTW: GitHub released a tool for measuring size of git repository: git-sizer. It may give you some hints about potential problems with your repository.
//FIRST RUN THIS COMMAND
//npm install xlsx jsonfile
//CHANGE INPUT FILE NAME TO sample.xlsx and OUTPUT file is data.json
var XLSX = require('xlsx'),
request = require('request');
var fs = require('fs');
var jsonfile = require('jsonfile')
var file = 'data.json'
var buf = fs.readFileSync("sample.xlsx");
var wb = XLSX.read(buf, {type:'buffer'});
console.log(wb.Sheets);
jsonfile.writeFile(file, wb.Sheets, function (err) {
console.error(err)
})
Interesting question.Simple answer to it is, 'write some code to convert your excel file(.xls or .xlsx) to a json file and upload the content to git.
This idea is valid only for a simple excel sheet and not for complex ones involving a lot of math and charts.

Stream text file from client to server?

So I have a use case where the client uploads a small TSV file, the file is opened and parsed on the server, and results are written to a new file on the server.
Since the TSV file will be tiny (under 1 MB), I am wondering if it is even necessary to upload the file to the server (writing it to disk) before parsing it. Instead, could the file contents be captured when the user clicks "upload file"? I could then store the file contents in an array, each item representing a line in the file.
Thoughts?
You don't need to stream the file to disk, but be aware that you should set clear and concise limits so that a person could not, say, upload a 5GB file and make your service crash from memory exhaustion. You just need to be aware that you're limited to your available amount of memory(likely less) when you process something completely in memory. It's also possible to stream parse it, so that you don't need to save it to disk before parsing it. In your case it sounds easiest to just upload it into memory, and make certain that you put a limit(maybe like 5mb limit) on the upload file size.
Are you asking whether this option is feasible or whether it's a good idea?
Regarding feasibility, it is entirely possible using the FileReader API to parse the content and then a simple Meteor.call onto whatever method is appending to the file on disk. The code would like like follows:
function onSubmit(event, template) {
var file = template.$('.your-file-input-elemt').files[0];
var filereader = new FileReader();
filereader.onload = function(fileevent) {
Meteor.call('processTSV', filereader.readAsText(file));
};
}
If you're talking about whether it's a good idea, then it comes down to browser support. Are you okay with users without the FileReader API not getting support from your application? If so it's considerably easier to deal with than handling uploads with something like CollectionFS.

Resources