Image storage performance on file system with Nodejs and Mongo - node.js

My Node.js application currently stores the uploaded images to the file system with the paths saved into a MongoDB database. Each document, maybe max 2000 in future, has between 4 and 10 images each. I don't believe I need to store the images in the database directly for my usage (I do not need to track versions etc), I am only concerned with performance.
Currently, I store all images in one folder and associated paths stored in the database. However as the number of documents, hence number of images, increase will this slow performance having so many files in a single folder?
Alternatively I could have a folder for each document. Does this extra level of folder complexity affect performance? Also using MongoDB the obvious folder naming schema would be to use the ObjectID but does folder names of the length (24) affect performance? Should I be using a custom ObjectID?
Are there more efficient ways? Thanks in advance.

For simply accessing files, the number of items in a directory does not really affect performance. However, it is common to split out directories for this as getting the directory index can certainly be slow when you have thousands of files. In addition, file systems have limits to the number of files per directory. (What that limit is depends on your file system.)
If I were you, I'd just have a separate directory for each document, and load the images in there. If you are going to have more than 10,000 documents, you might split those a bit. Suppose your hash is 7813258ef8c6b632dde8cc80f6bda62f. It's pretty common to have a directory structure like /7/8/1/3/2/5/7813258ef8c6b632dde8cc80f6bda62f.

Related

Disk read performance - Does splitting 100k+ of files into subdirectories help while read them faster?

I have 100Ks+ of small JSON data files in one directory (not by choice). When accessing each of them, does a flat vs. pyramid directory structure make any difference? Does it help Node.js/Nginx/filesystem retrieve them faster, if the files would be grouped by e.g. first letter, in corresponding directories?
In other words, is it faster to get baaaa.json from /json/b/ (only b*.json here), then to get it from /json/ (all files), when it is same to assume that the subdirectories contain 33 times less files each? Does it make finding each file 33x faster? Or is there any disk read difference at all?
jfriend00's comment EDIT: I am not sure what the underlying filesystem will be yet. But let's assume an S3 bucket.

How to overcome the sizelimits of mongodb and in general storing, sending and retrieving large documents?

Currently i work on an application that can send and retrieve arbitary large files. In the beginning we decided to use json for this because it is quite easy to handle and store. This works until images, videos or larger stuff in general comes in.
The current way we do this.
So we got a few problems at least with the current approach:
1 MB File size limit of express. Solution
10 MB File size limit of axios. Solution
16 MB File size limit of MongoDB. No solution currently
So currently we are trying to overcome the limits of MongoDB, but in general this seems like we are on the wrong path. As we go higher there will be more and more limits that are harder to overcome and maybe MongoDB's limit is not solveable. So would there be a way to do this in a more efficent way then what we currently do?
There is one thing left to say. In general we need to load the whole object on serverside back together to verify that the structure is the one we would expect and to hash the whole object. So we did not think of splitting it right now, but maybe that is the only option left. But even then how would you send videos or similar big chunks ?
If you need to store files bigger than 16 MB in MongoDb, you can use GridFS.
GridFS works by splitting your file into smaller chunks of data and store them separately. When that file is needed it gets reassembled and becomes available.

What's the best method for fetch the huge files from the webserver using c#

Hi i have a spec for fetch the files from server and predict the un-used files from the directory in this situation i am going to fetch the files from server it will return huge files, the problem is the cpu usage will increase while i am fetching large files, so i like to eliminate this scenario. can any one knows how to avoid this situation please share with me though it might help full for me.
Thanks
You can split your large file on server into several smaller pieces and fetch some metadata about amount of pieces, size etc. and than fetch them one by one from your client c# code and join pieces in binary mode to your larger file.

Windows Azure Cloud Storage - Impact of huge number of files in root

Sorry if I get any of the terminology wrong here, but hopefully you will get what I mean.
I am using Windows Azure Cloud Storage to store a vast quantity of small files (images, 20Kb each).
At the minute, these files are all stored in the root directory. I understand it's not a normal file system, so maybe root isn't the correct term.
I've tried to find information on the long-term effects of this plan but with no luck so if any one can give me some information I'd be grateful.
Basically, am I going to run into problems if the numbers of files stored in this root end up in the hundreds of thousands/millions?
Thanks,
Steven
I've been in a similar situation where we were storing ~10M small files in one blob container. Accessing individual files through code was fine and there weren't any performance problems.
Where we did have problems was with managing that many files outside of code. If you're using a storage explorer (either the one that comes with VS2010 or anyone of the others), the ones I've encountered don't support the return files by prefix API, you can only list the first 5K, then the next 5K and so on. You can see how this might be a problem when you want to look at the 125,000th file in the container.
The other problem is that there is no easy way of finding out how many files are in your container (which can be important for knowing exactly how much all of that blob storage is costing you) without writing something that simply iterates over all the blobs and counts them.
This was an easy problem to solve for us as our blobs had sequential numeric names, so we've simply partitioned them into folders of 1k items each. Depending on how many items you've got you can group 1K of these folders into sub folders.
http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/d569a5bb-c4d4-4495-9e77-00bd100beaef
Short Answer: No
Medium Answer: Kindof?
Long Answer: No, but if you query for a file list it will only return 5000. You'll need to requery every 5k to get a full listing according to that MSDN page.
Edit: Root works fine for describing it. 99.99% of people will grok what you're trying to say.

How do we get around the Lotus Notes 60 Gb database barrier

Are there ways to get around the upper database size limit on Notes databases? We are compacting a database that is still approaching 60 gigs in size. Thank you very much if you can offer a suggestion.
Even if you could find a way to get over the 64GB limit it would not be the recommended solution. Splitting up the application into multiple databases is far better if you wish to improve performance and retain the stability of your Domino server. If you think you have to have everything in the same database in order to be able to search, please look up domain search and multi-database search in the Domino Administrator help.
Maybe some parts of the data is "old" and could be put into one or more archive databases instead?
Maybe you have a lot of large attachments and can store them in a series of attachment databases?
Maybe you have a lot of complicated views that can be streamlined or eliminated and thereby save a lot of space and keep everything in the same database for the time being? (Remove sorting on columns where not needed, using "click on column header to sort" is a sure way to increase the size of the view index.)
I'm assuming your database is large because of file attachments as well. In that case look into DAOS - it will store all file attachments on filesystem (server functionality - transparent to clients and existing applications).
As a bonus it finds duplicates and stores them only once.
More here: http://www.ibm.com/developerworks/lotus/library/domino-green/
Just a stab in the dark:
Use the DB2 storage method instead of to a Domino server?
I'm guessing that 80-90% of that space is taken up by file attachments. My suggestion is to move all the attachments to a file share, provided everyone can access that share, or to an FTP server that everyone can connect to.
It's not ideal because security becomes an issue - now you need to manage credentials to the Notes database AND to the external file share - however it'll be worth the effort from a Notes administrator's perspective.
In the Notes documents, just provide a link to the file. If users are adding these files via a Notes form, perhaps you can add some background code to extract the file from the document after it has been saved, and replace it with a link to that file.
The 64GB is not actually an absolute limit, you can go above that, I've seen 80GB and even close to 100Gb although once your past 64Gb you can get problems at any time. The limit is not actually Notes, its the underlying file system, I've seen this on AS400 but the great thing about Notes is that if you do get a huge crash you can still access all the documents and pull everything out to new copies using scheduled agents even if you can no longer get views to open in the client.
Your best best is regular archiving, if it is file attachments then anything over two years old doesn't need to be in main system, just brief synopsis and link, you could even have 5 year archive, 2 year archive 1 year archive etc, data will continue to accumulate and has to be managed, irrespective of what platform you use to store it.
If the issue really is large file attachments, I would certainly recommend looking into implementing DAOS on your server / database. It is only available with Domino Server 8.5 and later. On the other hand, if your database contains over 100,000+ documents, you may want to look seriously at dividing the data into multiple NSF's - at that number of documents, you need to be very careful about your view design, your lookup code, etc.
Some documented successes with DAOS:
http://www.edbrill.com/ebrill/edbrill.nsf/dx/yet-another-daos-success-story-from-darren-duke?opendocument&comments
If you're database is getting to 60gb.. don't use a Domino solution you need to switch to a relational database. You need to archive or move documents across several databases. Although you can get to 60gb, you shouldn't do it. The performance hit for active databases is significant. Not so much a problem for static databases.
I would also look at removing any unnecessary views & their indexes. View indexes can occupy 80-90% of your disk space. If you can't remove them, simplify their sorting arrangements/formulas and remove any unnecessary column sorting options. I halved a 50gb down to 25gb with a few simple changes like this and virtually no users noticed.
One path could be, for once, to start with the user. Do all the users need to access all that data all the time ? If no, it's time to split or archive. If yes, there is probably a flaw in the design of the application.
Technically, I would add to the previous comments a suggestion to check the many options for compaction. Quick and dirty : disard all view indices, but be sure to rebuild at least the one for the default view if you don't want your users to riot. See updall
One more thing to check: make sure you have checked
[x] Use LZ1 compression for attachments
in db properties.

Resources