Nodejs memory gets filled too quick when uploading images ~10MB - node.js

Summary
Uploading an image from a Nodejs backend to AWS S3 via JIMP fills up heaps of memory.
Workflow
Frontend (react) sends image via formsubmission to the API
The server parses the form data
JIMP is rotating the image
JIMP is resizing the image if > 1980px wide
JIMP creates Buffer
Buffer is being uploaded to S3
Promise resolved -> Image meta data (URL, Bucket name, index, etc.) saved in Database (MongoDB)
Background
The Server is hosted on Heroku with only 512MB RAM. Uploading smaller images and all other requests are working fine. However, the app crashes when uploading a single image larger than ~8MB, with only a single user online.
Investigation so far
I've tried to replicate this on my local environment. Since I don't have a memory restriction, the app won't crash but the memory usage is ~870MB when uploading a 10MB image. A 6MB image stays around 60MB RAM usage. I've updated all the npm packages, and have tried to disabled any processing of the image.
I've tried to look for memory leaks as seen in the following screenshots, however, following the same workflow as above for the same image (6MB) and taking 3 heap snapshots are giving around 60MB RAM usage.
First, I thought the problem is that the image processing (resizing) takes too much memory, but this would not explain the big gap between 60MB (for a 6MB image) and around 800MB for a 10MB image.
Then I thought it's related to the item "system / JSArrayBufferData" (seen in ref2) which is taking around 30% of the memory. However, this item is always there, even I do not upload an image. It only appears just before I stop the recording snapshot in the "Memory tab" under the "Chrome dev tools". However, I'm still not 100% sure what exactly it is.
Now, I believe this is related to the "TimeList" (seen in ref3). I think it's coming from timeouts waiting for the file to be uploaded to S3. However, here as well, I'm absolutely not sure why this is happening.
The following are the screenshots of in my opinion important parts of the snapshots of the Chrome Inspector running on the server on nodejs with the --inspect flag.
Ref1: Shows full items of 3rd Snapshot - All of the 3 snapshots have uploaded the same image of 6MB. Garbage seems properly collected as memory size did not increase
Ref2: Shows the end 3rd Snapshot, just before I stopped recording. Unsure what "system / JSArrayBufferData" is.
Ref3: Shows the end of the 5th Snapshot, this is the one with a 10MB image. Those little, continuous spikes are the items "TimeList" which seems to be related to a timeout. It seems they appear when the server is waiting for a response from AWS. It also seems this is what's filling up the memory as this item is not there when uploading something less than 10MB.
Ref4: Shows the immediate end of the 5th Snapshot, just before stopping the recording. "system / JSArrayBufferData" appears again, however, only at the end.
Question
Unfortunately, I'm not sure how to articulate my question as I don't know what the problem is or for what I really need to look out. I would be very appreciative for any tips or experiences.

the high memory consumption was caused by the package "Jimp" which has been used to read the file, rotate it, resize it, and create a buffer to upload to the file storage system.
The part of reading the file, i.e. Jimp.read('filename') has caused the memory problem. It's a known bug as seen here: https://github.com/oliver-moran/jimp/issues/153
By now, I've switched to the 'sharp' image processing package, and am now able to easily upload images and videos larger than 10MB.
I hope it helps for people running into this as well.
Cheers

Related

How do you reliably serve images from a server?

This has got to be a simple question, I know. But this problem is driving me insane.
I have an app that allows users to upload both images & files that can be up to 10 MB in size.
I'm using nodejs on the backend to handle them and am currently saving the base64 into a blob in the database(mysql). I figured this would be the issue, but I checked the database and it only had 3% average usage. So this wasn't the bottleneck. The EC2 doesn't go over 10% CPU so this seems fine aswell. NetworkIn averages around 200 MB per & NetworkOut averages around 100 MB.
I moved things over to an S3Bucket however fetching the images still goes through the server which is causing super slow loading times (like 10 seconds)
Should I just throw in the towel and move things over to imgur or does anybody know of something I can do or check?
Edit: Here's the monitor

How to reduce used storage in azure function for container

A little bit of background first. Maybe, hopefully, this can save some else some trouble and frustration. Move down to the TL;DR to move on to the actual question.
We currently have a couple genetics workflows related to gene sequencing running in azure batch. Some of which are quite light and I'd like to move them to an Azure Function running a Docker container. For this purpose I have created a docker image, based on the azure-functions image, containing anaconda with the necessary packages to run our most common, lighter workflows. My initial attempt produced a huge image of ~8GB. Moving to Miniconda and a couple other adjustments has reduced the image size to just shy of 3.5GB. Still quite large, but should be manageable.
For this function I created an Azure Function running on an App Service Plan on the P1V2 tier on the belief that I would have 250GB storage to work with as stated in the tier description:
I encountered some issues with loading my first image (the large one) after a couple fixes where the log indicated that there was no more space left on the device. This puzzled me since the quota stated that I'd used some 1.5MB of the 250 total. At this point I reduced the image size and could at least successfully deploy the image again. Enabling SSH support I logged in to the container via SSH and ran df -h.
Okay so the function does not have the advertised 250GB of storage available runtime. It only has about 34. I spent some time searching in the documentation but could not find anything indicating that this should be the case. I did find this related SO question which clarified things a bit. I also found this still open issue on the azure functions github repo. Seems that more people are having the same issue and is not aware of the local storage limitation of the SKU. I might have overlooked something so if this is in fact documented I'd be happy if someone could direct me there.
Now the reason I need some storage is that I need to get the raw data file which can be anything from a handful of MBs to several GBs. And the workflow then subsequently produces multiple files varying between a few bytes and several GBs. The intention was, however, not to store this on the function instances but to complete the workflow and then store the resulting files in a blob storage.
TL;DR
You do not get the advertised storage capacity for functions running on an App Service Plan on the local instance. You get around 20/60/80GB depending on the SKU.
I need 10-30GB of local storage temporarily until the workflow has finished and the resulting files can be stored elsewhere.
How can I reduce the spent storage on the local instance?
Finally, the actual question. You might have noticed on the screenshot from the df -h command that of the available 34GB a whopping 25GB is already used. Which leaves 7.6GB to work with. I already mentioned that my image is ~3.5GB of size. So how come there is a total of 25GBs used and is there any change at all to reduce this aside from shrinking my image? That being said, if I'd removed my image completely (freeing 3.5GB of storage) it would still not be quite enough. Maybe the function simply needs stuff worth over 20GB of storage to run?
Note: It is not a result of cached docker layers or the like since I have tried scaling the app service plan which clears the cached layers/images and re-downloads the image.
Moving up a tier gives me 60GB of total available storage on the instance. Which is enough. But it feels very overkill when I don't need the rest that this tier offers.
Attempted solution 1
One thing I have tried, which might be of help to others, is mounting a file share on the function instance. This can be done with very little effort as shown by the MS docs. Great, now I could directly write to a file share saving me some headache and finally move on. Or so I thought. While this mostly worked great it still threw an exception indicating that it ran out space on the device at some point leading me to believe that it may be using local storage as temporary storage, buffer, or whatever. I will continue looking into it and see if I can figure that part out.
Any suggestions or alternative solutions will be greatly appreciated. I might just decide to move away from Azure Functions for this specific workflow. But I'd still like to clear things up for future reference.
Thanks in advance
niknoe

Why does w3wp memory keeps increasing?

I am on a medium instance which has 3GB of RAM. When I start my webapp the w3wp process starts with say 80MB. I notice that the more time passes this goes up and up.... Now I took a memory dump of the process when it was 570MB and the site was running for 5 days, to see whether there were any .NET objects which were consuming a lot but found out that the largest object was 18MB which were a set of string objects.
I am not using any cache objects since I'm using redis for my session storage, and in actual fact the dump showed that there was nothing in the cache.
Now my question is the following... I am thinking that since I have 3GB of memory IIS will retain some pages in memory (cached) so the website is faster whenever there are requests and that is the reason why the memory keeps increasing. What I'm concerned is that I am having some memory leak in some way, even if I am disposing all EntityFramework objects when being used, or any other appropriate streams which need to be disposed. When some specific threshold is reached I am assuming that old cached data which was in memory gets removed and new pages are included. Am I right in saying this?
I want to point out that in the past I had been on a small instance and the % never went more than 70% and now I am on medium instance and the memory is already 60%.... very very strange with the same code.
I can send memory dump if anyone would like to help me out.
There is an issue that is affecting a small number of Web Apps, and that we're working on patching.
There is a workaround if you are hitting this particular issue:
Go to Kudu Console for your app (e.g. https://{yourapp}.scm.azurewebsites.net/DebugConsole)
Go into the LogFiles folder. If you are running into this issue, you will have a very large eventlog.xml file
Make that file readonly, by running attrib +r eventlog.xml
Optionally, restart your Web App so you have a clean w3wp
Monitor whether the usage still goes up
The one downside is that you'll no longer get those events generated, but in most cases they are not needed (and this is temporary).
The problem has been identified, but we don't have an ETA for the deployment yet.

What is consuming memory in my Node JS application?

Background
I have a relatively simple node js application (essentially just expressjs + mongoose). It is currently running in production on an Ubuntu Server and serves about 20,000 page views per day.
Initially the application was running on a machine with 512 MB memory. Upon noticing that the server would essentially crash every so often I suspected that the application might be running out of memory, which was the case.
I have since moved the application to a server with 1 GB of memory. I have been monitoring the application and within a few minutes the application tends to reach about 200-250 MB of memory usage. Over longer periods of time (say 10+ hours) it seems that the amount keeps growing very slowly (I'm still investigating that).
I have been since been trying to figure out what is consuming the memory. I have been going through my code and have not found any obvious memory leaks (for example unclosed db connections and such).
Tests
I have implemented a handy heapdump function using node-heapdump and I have now enabled --expore-gc to be able to manually trigger garbage collection. From time to time I try triggering a manual GC to see what happens with the memory usage, but it seems to have no effect whatsoever.
I have also tried analysing heapdumps from time to time - but I'm not sure if what I'm seeing is normal or not. I do find it slightly suspicious that there is one entry with 93% of the retained size - but it just points to "builtins" (not really sure what the signifies).
Upon inspecting the 2nd highest retained size (Buffer) I can see that it links back to the same "builtins" via a setTimeout function in some Native Code. I suspect it is cache or https related (_cache, slabBuffer, tls).
Questions
Does this look normal for a Node JS application?
Is anyone able to draw any sort of conclusion from this?
What exactly is "builtins" (does it refer to builtin js types)?

Nodejs: Streaming images to a separate server without saving

I've been looking for a decent example to answer my question but, not sure if its possible at this point.
I'm curious if its possible to upload a image or any file and stream it so separate server? In my case I would like to stream it to imgur.
I ask this because I don't want to reduce the bandwidth hit for all the files to come to the actual nodejs server and upload it from there. Again, I'm not sure if this is possible or if I'm reaching but, some insight or an example would help a lot.
Took a look at Binary.js which may do what I'm looking for but, its IE10+ so no dice with that...
EDIT: based on comment on being vague
When the file is uploaded to the node server, it takes a bandwidth hit. When node takes the file and upload it to the remote server, it takes another hit. (if I'm not mistaken) I want to know it its possible to pipe it to the remote service(imgur in this case) and just used the node server as a liaison. Again, I'm not sure if this is possible which is why I'm attempting to articulate the question. I'm attempting to reduce the amount of bandwidth and storage space used.

Resources