I've yet to look into storing files such as Word, Excel, etc. into MongoDB seriously and I want to know - am I able to store whole docx or excel files in MongoDB and then RETRIEVE them via querying?
Using gridfs yes.
Gridfs is a storage specification. It is not built into the DB but instead into the drivers.
You can find out more here: http://www.mongodb.org/display/DOCS/GridFS.
It's normal implementation is to break down your big documents into smaller ones and store those aprts in a chunks collection mastered by a fs.files collection which you query for your files.
MongoDB is a document database that stores JSON-like documents (called BSON). Maximum size of a BSON object is 16 megabytes, which may be too little for some use cases.
If you want to store binary data of arbitrary size, you can use GridFS (http://www.mongodb.org/display/DOCS/GridFS+Specification). GridFS automatically splits your documents (or any binary data) into several BSON objects (usually 256k in size), so you only need to worry about storing and retriving complete documents (whatever their sizes are).
As far as I know, Mongoose doesn't support GridFS. However, you can use GridFS via its native driver's GridStore. Just run npm install mongodb and start hacking!
Related
I am trying to save the data from an API coinbase pro, The loop will run until all data is fetched and is being saved to mongodb collection. But the main issue is when we reach 16MB , the script fails.
I need a viable solution to save unlimited data to mongodb collection and utilize it.
MongoDB documents have a maximum size of 16MB according to the docs
"The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API..."
(https://docs.mongodb.com/manual/reference/limits/)
It might be worth checking out that GridFS API (but I haven't yet).
Are you trying to insert ONE document that is 16MB+? or are you trying to insert MULTIPLE documents that are adding up to 16MB+?
I am using sequelize ORM with node/express.js and would like to save images or documents into sqlite database
Have you checked out BLOB column datatype ? It allows you to save binary data.This way, you could have files of any type stored as blobs in the db.
By default you could store ~1 GB max. size image/document in a SQLite BLOB field. However, it can be increased by setting SQLITE_MAX_LENGTH
Does NeDB use the disk to do find queries?
Or are find queries 100% using RAM based data structures.
I need to do intensive work using find queries and I don't want to put work on my hard disk.
(As a bonus track, do SQLite find queries also work 100% in memory?)
Depends, if you create indexes on fields, those fields will be held in memory for faster lookup and access. Unindexed field lookups will be hitting disk. This is true for SQLite, as well as most other persistent databases (e.g PostgreSQL, MySQL, MongoDB, and many more)
NeDB is an in-memory database, which means that all data will be held in memory (similar to Redis). That being said, you still have to index fields for faster lookup.
If you want to create indexes on fields in NeDB, they have documentation for that here.
For NeDB, the _id field is automatically indexed, so you don't have to create an index for that field and querying for _id will be substantially faster than querying for another unindexed field.
I need to optimize disk usage and amount of data transferred during replication with my CouchDB instance. Does storing numerical data as int/floats instead of as string make a difference to file storage and or during http requests? I've read that JSON treats everything as strings, but newer JSON specs make use of different datatypes (float/int/boolean). What about for PouchDB?
CouchDB stores JSON data in native JSON types, so ints and floats are actual number types when serialised to disk. But I doubt you save much disk space over when that wouldn’t be the case. The replication protocol uses JSON and the internal encoding has no effect on this.
PouchDB in WebSQL and Sqlite store your document as string (I don't know what IndexedDb).
So to optimize disk usage, just keep less data. :)
I have a CouchDB database which stores mostly document attachments.
The files are sored in db with URL following structure:
/db-name/numeric-file-id/official-human-readable-file-name.ext
There is always only one attachment to one document.
Today I have computed the md5 sums of all of the files and it seems that many of them are duplicates.
I am wondering if couchdb is aware of duplicate attachments and internally stores only some kind of a pointer to a file, and keeps track of reference count, or just simply stores each attachments as is.
I mean, if I put 5 identical 100MB files as attachments, will the database use 100MB or 500MB?
I also couldn't find a direct answer to this question in the CouchDB docs, so I devised a simple empirical test (using CouchDB 1.4):
Experiment:
I incrementally added 3 documents, each with several large (multi MB) attachments that were identical between documents. I then examined the size on-disk of the resulting db.couch file after each document insert.
Results:
The db.couch file increased from 8MB to 16MB and then 24MB for the 1st, 2nd and 3rd document inserts, respectively. So, CouchDB does not appear to be deduplicating identical attachments on different documents. Manually compacting the database after the three documents were added made no difference in the file size, so it's also unlikely that some background maintenance process would get around to noticing/fixing this.
This lack of attachment deduplication is a curious omission given the following three observations:
The authors were concerned enough about efficiently handling large attachments that they added automatic gzip compression of stored attachments (for those with MIME types that indicate some kind of text content.)
Adding an attachment causes an MD5 digest to be calculated and stored with the metadata for the attachment.
CouchDB does seem to deduplicate identical attachments shared among multiple revs of the same document that are still being held in the DB (probably one use of the MD5 digest).
Given these factors, it is surprising that CouchDB isn't more intelligent in this regard, as it would be a valuable and (likely) straightforward optimization.