GridFS + express

GridFS + express - node.js

I'm building an application that requires that a user be able upload audio files, and then at a later time request those same files. Being that I'm very new to all this, creating a file directory seems really confusing to me, GridFS ( storing the audio files in the database) seems to be easier to understand at this point.
What I am confused about is, if I go the direction of GridFS, does every user need to have a GridFS collection. Or would I somehow set up one main GridFS collection, and all user's audio files will be in that collection. Then in my mongoose user model I would save the names of the audio file's that belong to a given user. And then when the user requests their audio files, I will get the list of files that belong to that user, search the one main GridFS collection for those files?
I know that I might be better off setting up a file system(performance reasons), I looked into nginx, but I found myself just getting more and more confused.

Treat gridFS as a collection with audio files (so we have fileId and content),
then you need to store fileId somewhere with owner record/document.
What can be an issue that storing pointer to user files in one document can go beyond max document size (16MB) - so if this could be a case - then we need a simple userId-fileId collection.
Have a fun!

Related

When i modify the mongo db, how can i upload the image?? (node.js, multer)

when mongodb is changed(update), if the image is changed, change the db(image path) and upload image and remove the image(before), if is not, just upload the image..
router.put('/test/update/:first_idx/:second_idx', isAuthenticated, upload.array('images', 5), TestController.updateSomthing)
upload.array('images') at this can i save or remove images depending on the req.body?

Something that is very important to consider before saving images to MongoDB is the document size limit. A document can only be at most 16MB. If you are allowing an array of images, especially, you could potentially run into size limit issues when the array gets too large.
You can find more about this limit here.
To get around this size limit, store the relative path to the file on the server, and store the file in a folder.
Here is a tutorial on how to use Multer/MongoDB to create a file store like you want. If you want to continue with storing the files in the DB and comparing them, this tutorial will lay that ground for you. You can store the images in Base64 and then check on upload whether another image with that base64 string exists. If it does, don't save it. This is assuming by comparing images, you mean checking to see if the EXACT same image exists.
I do not think it is going to be efficient to compare images to make sure that they are equal, because this sounds very computationally expensive. I would just do a full write-over with every PUT (unless you have to worry about concurrency).

Image storage performance on file system with Nodejs and Mongo

My Node.js application currently stores the uploaded images to the file system with the paths saved into a MongoDB database. Each document, maybe max 2000 in future, has between 4 and 10 images each. I don't believe I need to store the images in the database directly for my usage (I do not need to track versions etc), I am only concerned with performance.
Currently, I store all images in one folder and associated paths stored in the database. However as the number of documents, hence number of images, increase will this slow performance having so many files in a single folder?
Alternatively I could have a folder for each document. Does this extra level of folder complexity affect performance? Also using MongoDB the obvious folder naming schema would be to use the ObjectID but does folder names of the length (24) affect performance? Should I be using a custom ObjectID?
Are there more efficient ways? Thanks in advance.

For simply accessing files, the number of items in a directory does not really affect performance. However, it is common to split out directories for this as getting the directory index can certainly be slow when you have thousands of files. In addition, file systems have limits to the number of files per directory. (What that limit is depends on your file system.)
If I were you, I'd just have a separate directory for each document, and load the images in there. If you are going to have more than 10,000 documents, you might split those a bit. Suppose your hash is 7813258ef8c6b632dde8cc80f6bda62f. It's pretty common to have a directory structure like /7/8/1/3/2/5/7813258ef8c6b632dde8cc80f6bda62f.

MongoDB: How can I store files (Word, Excel, etc.)?

I've yet to look into storing files such as Word, Excel, etc. into MongoDB seriously and I want to know - am I able to store whole docx or excel files in MongoDB and then RETRIEVE them via querying?

Using gridfs yes.
Gridfs is a storage specification. It is not built into the DB but instead into the drivers.
You can find out more here: http://www.mongodb.org/display/DOCS/GridFS.
It's normal implementation is to break down your big documents into smaller ones and store those aprts in a chunks collection mastered by a fs.files collection which you query for your files.

MongoDB is a document database that stores JSON-like documents (called BSON). Maximum size of a BSON object is 16 megabytes, which may be too little for some use cases.
If you want to store binary data of arbitrary size, you can use GridFS (http://www.mongodb.org/display/DOCS/GridFS+Specification). GridFS automatically splits your documents (or any binary data) into several BSON objects (usually 256k in size), so you only need to worry about storing and retriving complete documents (whatever their sizes are).
As far as I know, Mongoose doesn't support GridFS. However, you can use GridFS via its native driver's GridStore. Just run npm install mongodb and start hacking!

Enhance my Core Data design. Experts only!

In AcaniUsers, I'm downloading the closest 20 users to me and displaying their profile pictures as thumbnails in a table view. User & Photo are both Resources because they each have an id (MongoDB BSON ObjectId) on the server. Each user has a unique_id. Each Photo has four different sizes (images) on the server: square: 75x75, square#2x: 150x150, large: 320x480, large#2x: 640x960. But, each device will only have two of these sizes, depending on whether it's an iPhone 3 or 4 (retina display). Each of these sizes has their own MongoDB collection. And, all four images for each Photo have the same BSON ObjectId's across these four collections.
In the future, I may give User a relationship called photos to allow a user to have more than one photo. Also, although I don't foresee this, I may add more Image sizes (types).
The fresh attribute on Image tells me whether I've downloaded the latest Image. I set this to NO whenever the Photo's ID has changed, and then back to yes after I've finished downloading the Image.
Should I store the four different images in Core Data or on the file system and just store their URLs in Core Data? I read somewhere that over 1 or 2MB, you should store in file system, not Core Data. So, I was thinking of storing the square images in Core Data and the large images in the file system, but I'd rather store them all the same way to make things easier. So, maybe I'll just store them all in the file system? What do you think?
Do you think I should discard the 75x75 & 320x480 sizes since pretty soon iPhone 3's will be gone?
How can I improve my design of the entities, and their attributes and relationships. For example, is the Resource entity even beneficial at all?
I'm displaying the Users with an NSFetchedResultsController. However, it doesn't know when the User's image gets updated, so the images don't show up until I scroll aggressively the first time. How do I let the NSFetchedResultsController know that a user's thumbnail has finished downloading? Do I have to use KVO?

To answer your questions:
1 I'd store them all in the file system and record the URL in the database. I've never been a big fan of storing image data in the DB. Plus it'll simplify things a little to have all of the image storage uniform. That way in your image loading code you don't have to worry about if it's a type that's stored in the DB or on the file system.
2 No, I wouldn't do that yet. The iPhone 3 is going to be around for a bit longer. ATT is still selling them as the cheap entry level iPhone. I just saw a commercial the other night advertising them for $49.
3 Remove the Resources entry and add the id attribute to each of the classes. How you did it is actually bad. Abstract entities should only be used when you have a couple of entities that are almost identical and only have a few differences between them. Under the hood, Core Data will make only one table for an abstract entity and all of its children. So right now you're going to end up with only one table that will contain both your user and photo entries which can be bad when you're trying to query just type of entity.
You should also delete the Image entity and move its attributes into the Photo entity. The Photo will always have those values associated with it and the same values won't be shared between photos. Having them as a separate entity will cause a slow down. You'll either need to load them with the photos which will require a join (slow) or they'll be loaded one at a time when you access either the data or fresh attributes which is also slow. When each of the faults is fired in the latter scenario a separate query and round trip to the disk will happen for each object. So when you loop through your pictures for display in the table, you'll be firing n queries instead of one which can be a big difference in performance.
4 You can use KVO to do it. Have your table cell observer the User or Picture (depends on if you have the Picture already added to the user and are changing the data or if you're adding a new picture to the user on load completion). When the observer gets triggered, update the image being displayed.

Audio metadata storage

I checked through the questions asked on SO on audio metadata, but could not find one which answers my doubt. Where exactly is the metadata of audio files stored, and in what form? Is it in the form of files or in a database? And where is this database of files stored?

Thank you Michelle. My basic confusion was whether the metadata is stored as a part of the file or in a separate file which is stored somewhere else in the file system - like inode in case of Unix like systems. ID3 shows that it is stored with the file as a block of bytes after the actual content of the file.
Is this the way of metadata storage for most of the other file types?

As far as I know, audio file formats :
May support metadata standards (e.g. ID3v1, ID3v2, APEtag, iXML)
May also have their own native metadata format (e.g. MP4 boxes / Quicktime atoms, OGG/FLAC/OPUS/Speex/Theora VorbisComment, WMA native metadata, AIFF / AIFC native metadata...)
=> In these two cases, metadata is stored directly into the audio file itself.
HydrogenAudio maintains a field mapping table between the most common formats : http://wiki.hydrogenaud.io/index.php?title=Tag_Mapping
That being said, many audio players (e.g. iTunes, foobar2000) allow their users to edit any metadata field in any file, regardless of whether said fields are supported or not by the underlying tagging standards (e.g. adding an "Album Artist" field in an S3M file).
In order to do that, these audio players store metadata in their internal database, thus giving the illusion that the audio file has been "enriched" while its actual content remain unchanged.
Another classic use of audio player databases is to store the following fields :
Rating
Number of times played
Last time played
=> In that case, you'll find metadata in the audio player's internal database

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string