Can we limit upload file size without uploading it fully?

Can we limit upload file size without uploading it fully? - node.js

I'm using multer v1.3.0 with express v4.15.4.
I have used fileSize limits like below
multer({
storage: storage,
limits: {fileSize: 1*1000*50, files:1},
fileFilter: function (req, file, cb) {
if (_.indexOf(['image/jpeg', 'image/png'], file.mimetype) == -1)
cb(null, false);
else
cb(null, true)
}
});
In this case I dont think limits are working. Cause not getting any
LIMIT_FILE_SIZE error.
If I remove fileFilter then I'm getting LIMIT_FILE_SIZE error.
But in both cases first the whole file getting uploaded & then fileSize is checked.
Its not good at all to upload a 1GB of file and then check its of 1MB or not.
So I want to know if there has any way to stop the uploading in the middle when file size limit exceeds. I don't want to rely on Content-Length.

From looking through the multer and busboy source it looks like it calculates the data size as a running total and stops reading the stream as soon as the fileSize is reached. See the code here:
https://github.com/mscdex/busboy/blob/8f6752507491c0c9b01198fca626a9fe6f578092/lib/types/multipart.js#L216
So while 1GB might be uploaded it shouldn't be saved anywhere.
If you want to stop the upload from continuing I think you'd have to close the socket though I haven't experimented with this myself. I'm not aware of any way to respond cleanly while the request is still trying to upload data.
Related:
YES or NO: Can a server send an HTTP response, while still uploading the file from the correlative HTTP request?

You can use a javascript script to prevent users uploading all of the 1GB and getting a file size exceeded error. However, all client-side checks can be bypassed, so you should still enforce the file limit on the backend.
Your code is correct, it should work as intended. I am guessing you are worried about the file getting uploaded. There are no workarounds for that since Multer checks the size after upload.
Here is the javascript you can put in to prevent someone from uploading for your client-side code.
function ValidateSize(file) {
const FileSize = file.files[0].size / 1024 / 1024;
if (FileSize > 20) {
alert('File size exceeds 20 MB');
document.getElementById('formFile').value = null;
}
}
<label for="formFile" class="form-label">Select file to upload (Max File Size: 20MB):</label>
<input class="form-control" type="file" onchange="ValidateSize(this)" id="formFile" name="file">

Related

Can't upload large files to Python + Flask in GCP App Engine

UPDATE: (5/18/2020) Solution at the end of this post!
I'm attempting to upload big CSV files (30MB - 2GB) from a browser to GCP App Engine running Python 3.7 + Flask, and then push those files to GCP Storage. This works fine on local testing with large files, but errors out immediately on GCP with a "413 - Your client issued a request that was too large" if the file is larger than roughly 20MB. This error happens instantly on upload before it even reaches my custom Python logic (I suspect App Engine is checking the Content-Length header). I tried many solutions after lots of SO/blog research to no avail. Note that I am using the basic/free App Engine setup with the F1 instance running the Gunicorn server.
First, I tried setting app.config['MAX_CONTENT_LENGTH'] = 2147483648 but that didn't change anything (SO post). My app still threw an error before it even reached my Python code:
# main.py
app.config['MAX_CONTENT_LENGTH'] = 2147483648 # 2GB limit
#app.route('/', methods=['POST', 'GET'])
def upload():
# COULDN'T GET THIS FAR WITH A LARGE UPLOAD!!!
if flask.request.method == 'POST':
uploaded_file = flask.request.files.get('file')
storage_client = storage.Client()
storage_bucket = storage_client.get_bucket('my_uploads')
blob = storage_bucket.blob(uploaded_file.filename)
blob.upload_from_string(uploaded_file.read())
<!-- index.html -->
<form method="POST" action='/upload' enctype="multipart/form-data">
<input type="file" name="file">
</form>
After further research, I switched to chunked uploads with Flask-Dropzone, hoping I could upload the data in batches then append/build-up the CSV files as a Storage Blob:
# main.py
app = flask.Flask(__name__)
app.config['MAX_CONTENT_LENGTH'] = 2147483648 # 2GB limit
dropzone = Dropzone(app)
#app.route('/', methods=['POST', 'GET'])
def upload():
if flask.request.method == 'POST':
uploaded_file = flask.request.files.get('file')
storage_client = storage.Client()
storage_bucket = storage_client.get_bucket('my_uploads')
CHUNK_SIZE = 10485760 # 10MB
blob = storage_bucket.blob(uploaded_file.filename, chunk_size=self.CHUNK_SIZE)
# hoping for a create-if-not-exists then append thereafter
blob.upload_from_string(uploaded_file.read())
And the JS/HTML is straight from a few samples I found online:
<script>
Dropzone.options.myDropzone = {
timeout: 300000,
chunking: true,
chunkSize: 10485760 };
</script>
....
<form method="POST" action='/upload' class="dropzone dz-clickable"
id="dropper" enctype="multipart/form-data">
</form>
The above does upload in chunks (I can see repeated calls to POST /upload), but, the call to blob.upload_from_string(uploaded_file.read()) just keeps replacing the blob contents with the last chunk uploaded instead of appending. This also doesn't work even if I strip out the chunk_size=self.CHUNK_SIZE parameter.
Next I looked at writing to /tmp then to Storage but the docs say writing to /tmp takes up the little memory I have, and the filesystem elsewhere is read-only, so neither of those will work.
Is there an append API or approved methodology to upload big files to GCP App Engine and push/stream to Storage? Given the code works on my local server (and happily uploads to GCP Storage), I'm assuming this is a built-in limitation in App Engine that needs to be worked around.
SOLUTION (5/18/2020)
I was able to use Flask-Dropzone to have JavaScript split the upload into many 10MB chunks and send those chunks one at a time to the Python server. On the Python side of things we'd keep appending to a file in /tmp to "build up" the contents until all chunks came in. Finally, on the last chunk we'd upload to GCP Storage then delete the /tmp file.
#app.route('/upload', methods=['POST'])
def upload():
uploaded_file = flask.request.files.get('file')
tmp_file_path = '/tmp/' + uploaded_file.filename
with open(tmp_file_path, 'a') as f:
f.write(uploaded_file.read().decode("UTF8"))
chunk_index = int(flask.request.form.get('dzchunkindex')) if (flask.request.form.get('dzchunkindex') is not None) else 0
chunk_count = int(flask.request.form.get('dztotalchunkcount')) if (flask.request.form.get('dztotalchunkcount') is not None) else 1
if (chunk_index == (chunk_count - 1)):
print('Saving file to storage')
storage_bucket = storage_client.get_bucket('prairi_uploads')
blob = storage_bucket.blob(uploaded_file.filename) #CHUNK??
blob.upload_from_filename(tmp_file_path, client=storage_client)
print('Saved to Storage')
print('Deleting temp file')
os.remove(tmp_file_path)
<!-- index.html -->
<script>
Dropzone.options.myDropzone = {
... // configs
timeout: 300000,
chunking: true,
chunkSize: 1000000
};
</script>
Note that /tmp shares resources with RAM, so you need at least as much RAM as the as the uploaded file size, plus more for Python itself (I had to use an F4 instance). I would imagine there's a better solution to write to block storage instead of /tmp, but I haven't gotten that far yet.

The answer is that you cannot upload or download files larger than 32 MB in a single HTTP request. Source
You either need to redesign your service to transfer data in multiple HTTP requests, transfer data directly to Cloud Storage using Presigned URLs, or select a different service that does NOT use the Global Front End (GFE) such as Compute Engine. This excludes services such as Cloud Functions, Cloud Run, App Engine Flexible.
If you use multiple HTTP requests, you will need to manage memory as all temporary files are stored in memory. This means you will have issues as you approach the maximum instance size of 2 GB.

Running node js export in google cloud function

We need to export a zip file, containing lots of data (a couple of gb). The zip archive needs to contain about 50-100 indesign files (each about 100mb) and some other smaller files. We try to use google cloud functions to achieve it (less costs etc.) The function is triggered via a config file, which is uploaded into a bucket. The config file contains all information which files needs to be put into the zip. Unfortunately the memory limit of 2gb is always reached, so the function never succeeds.
We tried different things:
First solution was to loop over the files, create promises to download them and after the loop is done we tried to resolve all promises at once. (files are downloaded via streaming directly into a file).
Second try was to await every download inside the for loop, but again, memory limit reached.
So my question is:
Why does node js not clear the streams? It seems like node keeps every streamed file in memory and finally crashes. I already tried to set the readStream and writeStream to null as suggested here:
How to prevent memory leaks in node.js?
But no change.
Note: We never reached the point, there all files are downloaded to create the zip file. It always failed after the first files.
See below the code snippets:
// first try via promises all:
const promises = []
for (const file of files) {
promises.push(downloadIndesignToExternal(file, 'xxx', dir));
}
await Promise.all(promises)
// second try via await every step (not performant in terms of execution time, but we wanted to know if memory limit is also reached:
for (const file of files) {
await downloadIndesignToExternal(file, 'xxx', dir);
}
// code to download indesign file
function downloadIndesignToExternal(activeId, externalId, dir) {
return new Promise((resolve, reject) => {
let readStream = storage.bucket(INDESIGN_BUCKET).file(`${activeId}.indd`).createReadStream()
let writeStream = fs.createWriteStream(`${dir}/${externalId}.indd`);
readStream.pipe(writeStream);
writeStream.on('finish', () => {
resolve();
});
writeStream.on('error', (err) => {
reject('Could not write file');
})
})
}

It's important to know that /tmp (os.tmpdir()) is a memory-based filesystem in Cloud Functions. When you download a file to /tmp, it is taking up memory just as if you had saved it to memory in a buffer.
If your function needs more memory than can be configured for a function, then Cloud Functions might not be the best solution to this problem.
If you still want to use Cloud Functions, you will have to find a way to stream the input files directly to the output file, but without saving any intermediate state in the function. I'm sure this is possible, but you will probably need to write a fair amount of extra code for this.

For anyone interested:
We got it working by streaming the files into the zip and streaming it directly into google cloud storage. Memory usage is now by around 150-300mb, so this works perfectly for us.

Whats the best way to save files on backend with mongoDB?

I'm in the middle of writing a mean stack application and I'm a beginner programmer.
The user must be able to submit some strings from form fields together with a file which would be an image or a pdf.
I´ve looked at Multer then I could store the path to the file together with the form data in MongoDB.
I also found something on GridFS.
What would be the best and easiest way to make this work?

As usual: it depends....
In one of the projects I am working on, we have deliberately chosen to use the database to store the files. One of the reasons is to prevent yet another security and configuration artefact to the solution by introducing a file-management requirements.
Now, mind you that storing a file on disk is just faster than storing it in your database. But if performance is not a really big issue, storing it in MongoDB is a perfectly valid option as well (after all, it is a NoSQL solution that uses binary serialization as one of its core components). You can choose to store files either in one of the normal collections, or in GridFS.
As a rule of thumb: if the files in your solution do not exceed the 16MB, then you can save them as (part of) a document in a normal collection. If they are larger than 16 MB, it is best to save them in GridFS. When saving in GridFS though, you have to play around with chunck-sizes to find an optimal performance.
A blog post on GridFS can be found here
In the past I have posted something in another answer on binary- vs. string-based serialization and the use of GridFS here Note: that post is in C#, but that should not really pose a problem.
Hope this helps, and good luck honing your programmer skills. The best way to learn is to ask and experiment (a lot).

You would never want to save the file as binary data on the DB for the obvious reasons so If you want to store files on the local storage where your app runs then multer is the package that you need.
var path = require('path')
var multer = require('multer')
var storage = multer.diskStorage({
destination: function(req, file, callback) {
callback(null, './uploads')
},
filename: function(req, file, callback) {
console.log(file)
callback(null, file.fieldname + '-' + Date.now() + path.extname(file.originalname))
}
})
app.post('/api/file', function(req, res) {
var upload = multer({
storage: storage
}).single('userFile')
upload(req, res, function(err) {
res.end('File is uploaded')
})
})
You can change a code bit as per your requirements and naming conventions you want. I would suggest that you keep the predictable filename so you won't even need to store the path in DB i.e. for userid 5 file will be stored at /img/user/profile/5.jpg or something similar.
And you can also try storing files on the cloud storage like aws s3 or google storage if you are having too much of files.

NodeJS + AWS SDK + S3 - how do I successfully upload a zip file?

I've been trying to upload gzipped log files to S3 using the AWS NodeJS sdk, and occasionally find that the uploaded file in S3 is corrupted/truncated. When I download the file and decompress using gunzip in a bash terminal, I get:
01-log_2014-09-22.tsv.gz: unexpected end of file.
When I compare file sizes, the downloaded file comes up just a tiny bit short of the original file size (which unzips fine).
This doesn't happen consistently...one out of every three files or so is truncated. Reuploading can fix the problem. Uploading through the S3 Web UI also works fine.
Here's the code I'm using...
var stream = fs.createReadStream(localFilePath);
this.s3 = new AWS.S3();
this.s3.putObject({
Bucket: bucketName,
Key: folderName + filename,
ACL: "bucket-owner-full-control",
Body: stream,
},function(err) {
// stream.close();
callback(err);
});
I shouldn't have to close the stream since it defaults to autoclose, but the problem seems to occur either way.
The fact that its intermittent suggests it's some sort of a timing or buffering issue, but I can't find any controls to fiddle with that might affect that. Any suggestions?
Thanks.

Restricting file upload size in JSF

I am using the Tomahawk inputFileUpload component to allow users to upload files to a server. I have implemented a "soft" file size limit by checking the size of the file after it has been uploaded and displaying an error if it is too large. However I would also like a larger "hard" limit, where uploading immediately stops once it has passed the limit. For example if the hard limit is 500MB and the user attempts to upload a 2GB file, uploading will immediately stop once 500MB has been uploaded and an error is displayed.
I had hoped that using the MyFaces ExtensionsFilter and setting uploadMaxFileSize would fix the problem, but the file is completely uploaded before the SizeLimitExceededException is thrown.
Is it possible to do this? Ideally I'd still be able to use Tomahawk but any other solution would be good.

The web server can't abort a HTTP request halfway and then return a HTTP response. The entire HTTP request has to be consumed fully until the last bit before a HTTP response can ever be returned. That's the nature of HTTP and TCP/IP. There's nothing you can do against it with a server side programming language.
Note that the Tomahawk file upload size limit already takes care that the server's memory/disk space won't be polluted with the entire uploaded file whenever the size limit has been hit.
Your best bet is to validate the file length in JavaScript before the upload takes place. This is supported in browsers supporting HTML5 File API. The current versions of Firefox, Chrome, Safari, Opera and Android support it. IE9 doesn't support it yet, it'll be in the future IE10.
<t:inputFileUpload ... onchange="checkFileSize(this)" />
with something like this
function checkFileSize(inputFile) {
var max = 500 * 1024 * 1024; // 500MB
if (inputFile.files && inputFile.files[0].size > max) {
alert("File too large."); // Do your thing to handle the error.
inputFile.value = null; // Clears the field.
}
}

Try this:
<div>
<p:fileUpload id="fileUpload" name="fileUpload" value="#{controller.file}" mode="simple" rendered="true"/>
<input type="button" value="Try it" onclick="checkFileSize('fileUpload')" />
</div>
When user click in the button "Try it", checkFileSize() function is called and the fileUpload primefaces component is validated. If file size is greater than 500MB the file not is uploaded.
<script>
// <![CDATA[
function checkFileSize(name) {
var max = 500 * 1024 * 1024; // 500MB
var inputFile = document.getElementsByName(name)[0];
var inputFiles = inputFile.files;
if (inputFiles.lenght > 0 && inputFiles[0].size > max) {
alert("File too large."); // Do your thing to handle the error.
inputFile.value = null; // Clears the field.
}
}
// ]]>
</script>
The checkFileSize() logic is based on the BalusC answered above.
Versions tested:
primefaces 3.5
jsf 2.1

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Can we limit upload file size without uploading it fully? - node.js

Related

Can't upload large files to Python + Flask in GCP App Engine

Running node js export in google cloud function

Whats the best way to save files on backend with mongoDB?

NodeJS + AWS SDK + S3 - how do I successfully upload a zip file?

Restricting file upload size in JSF

Categories

Resources