I have a very large document store - about 50 million JSON docs, with 50m more added per year. Each is about 10K. I would like to store them in a cloud storage and retrieve them via a couple structured metadata indices that I would update as I add documents to the store.
It looks like AWS S3, Google Cloud Storage and Azure allow custom metadata to be returned with an object, but not used as part of a GET request to filter a collection of objects.
Is there a good solution "out-of-the-box" for this? I can't find any, but it seems like my use case shouldn't be really unusual. I don't need to query by document attributes or to return partial documents, I just need to GET a collection of documents by filtering on a handful of metadata fields.
The AWS SimpleDB page mentions "Indexing Amazon S3 Object Metadata" as a use case, and links to a library that hasn't been updated since 2009.
They are simply saying that you can store and query the metadata in amazon simple DB which is a NoSQL database provided by amazon for you. Depending on the kind of metadata you have, you could also store it in an RDBMS. Few 100 million rows isn’t too much if you create the proper indices and you can store URLs or file names, to access the files stored on S3, Azure, … afterwards.
Related
Looking for help to achieve the following points in GCS.
I want to be able to upload one object that is composed of two or more images to GCS.
All images, grouped in one object are uploaded at the same time. In other words as a bulk.
The object, composed of several files/images, has its own Id or a property that can be used to query the object.
Reading the GCS API docs, I found the method to upload one single image at a time. Also, reading similar questions in Stack Overflow, I have found how to upload several images at the same time but individually no as a bulk. However, I can not find a method provided by the API to group several images into an object and upload the object to GCS.
You can create a composite object in order to merge multiple images together in a single object - here is the documentation for this in Node.js
Nonetheless, composing objects requires that the source files are already stored in the same storage bucket and must have the same storage class;see document for More details.
Because of this, the composite object can only be created once all the files that you are willing for it to be composed of are stored in the same bucket.
For this reason, in case you would like to have this done prior to uploading to GCS, you should implement logic on node.js side like merging objects before uploading to gcs.
You can have a look at this document Node.js — How to Merge Objects.
I'm new to mongoDB,
I want to fill a form for a post model and one of the fields is file-type (image or video), can I store the attached media in the same mongo document ? If not, how should I go about doing this and is there a helpful guide I can follow ?
this is how my backend looks like
this is my Post model that I'm going to fill it in the form
NB : I'm using angular in my front-end
If the media is relatively small like an 8K thumbnail then you can store it as a binary type field as a peer to the rest of your ints, strings, and dates.
However, an individual document cannot be larger than 16MB so such an approach in general is not feasible especially for video content.
You can use the gridFs utils which are bundled with the client side drivers. A comprehensive example is posted here: How to save images from url into mongodb using python?
If you do not require the media to be managed by the database storage subsystem, a practical alternative is to store the media in an AWS S3 bucket (or Azure or GCP equivalent) and store just the path to the content in mongodb.
I have a firebase storage bucket set up for the primary purpose of storing user's profile pictures. Fetching the profile picture of the currentUser is simple, as I know the .uid. However, fetching the profile pictures for other users is not so straightforward as that first requires a query to my actual database (in this case a graph database) before I can even begin fetching their images. This process is aggravated by my backend having a three tier architecture.
So my current process is this:
get request to Node.js backend
Node.js queries graph database
Node.js sends data to frontend
frontend iteratively fetches profile pictures from other user's uid
What seems slow is the fact that my frontend has to wait for the other uids before it can even begin fetching the images. Is this unavoidable? Ideally, the images would be fetched concurrently with the info about the users.
The title here is Firebase fetching other user's Images efficiently but you're using a non-firebase database which makes it a little difficult.
The way I believe you could handle this in Firebase/Firestore would be to have duplicate data (pretty common with NoSQL databases).
Example:
Say you have a timeline feed, you probably wouldn't query the list of posts and then query user info from each of the posts. Instead, I would have a list of timeline posts for a given UID (the customer accessing the system right now), that list would include all the details needed to display the feed without another query. This could be users names, post description, and a link to their pictures based of a known bucket path to a bucket and directory structure and the UIDs. Something like gs://<my-bucket>/user-images/<a-uid>.jpg. Again, I don't have much exposure to graph databases so not sure how applicable the technique is there but I believe it could work the same.
anybody know, maybe Minio not result and best product for it, but i need upload many objects and metadata and then i need can find this objects over metadata and get uri? I had install Monio server on Ubuntu 18.04 and i can't unserstand how i can do this func?
I need many files upload with metadata like position num, size, weight and other metadata of object;
I need good performance like 1000 objects per hour;
Upload with REST in URI data or the same.
Maybe some system already have this functions, like Hadoop or Minio or other?
Anybody know solved? Thank you so much.
#AlexPebody You can use the minio client mc or any minio sdk to upload an object with metadata.
You can run the below command which uploads object with metadata
Copy a list of objects from local file system to MinIO cloud storage with specified metadata, separated by ";"
$ mc cp --attr "key1=value1;key2=value2" Music/*.mp4 play/mybucket/
As far as searching against metadata is concerned S3 does not have an API for searching metadata or tags of objects in a bucket. To achieve this, you could turn on event notifications to a configured web hook endpoint, which can then be used to search for objects with certain metadata/tags
The head object returns all the metadata associated with the object
You can also use the object tagging feature and get the tags of objects using Get Object Tags.
I am using listObjectsV2 to list all the objects from AWS s3 bucket.But
that list does not contain tags and metadata.I have gone through the documentation and came to know that metadata details we can get separately
by fetching one by one object.But is there any way can we get tags and metadata of files from s3 bucket in one method?
Note: I am using AWS-SDK(Node.js) version 2.x
The underlying S3 service API has no method for fetching listings along with object metadata and/or tags, so none of the SDKs implement such functionality, either.
Amazon S3 Inventory provides comma-separated values (CSV) or Apache optimized row columnar (ORC) output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).
It can be configured to run on a daily basis.