Minio + Upload Metadata + Object? - object

anybody know, maybe Minio not result and best product for it, but i need upload many objects and metadata and then i need can find this objects over metadata and get uri? I had install Monio server on Ubuntu 18.04 and i can't unserstand how i can do this func?
I need many files upload with metadata like position num, size, weight and other metadata of object;
I need good performance like 1000 objects per hour;
Upload with REST in URI data or the same.
Maybe some system already have this functions, like Hadoop or Minio or other?
Anybody know solved? Thank you so much.

#AlexPebody You can use the minio client mc or any minio sdk to upload an object with metadata.
You can run the below command which uploads object with metadata
Copy a list of objects from local file system to MinIO cloud storage with specified metadata, separated by ";"
$ mc cp --attr "key1=value1;key2=value2" Music/*.mp4 play/mybucket/
As far as searching against metadata is concerned S3 does not have an API for searching metadata or tags of objects in a bucket. To achieve this, you could turn on event notifications to a configured web hook endpoint, which can then be used to search for objects with certain metadata/tags
The head object returns all the metadata associated with the object
You can also use the object tagging feature and get the tags of objects using Get Object Tags.

Related

How to upload several images to Google Cloud Storage using Node.js

Looking for help to achieve the following points in GCS.
I want to be able to upload one object that is composed of two or more images to GCS.
All images, grouped in one object are uploaded at the same time. In other words as a bulk.
The object, composed of several files/images, has its own Id or a property that can be used to query the object.
Reading the GCS API docs, I found the method to upload one single image at a time. Also, reading similar questions in Stack Overflow, I have found how to upload several images at the same time but individually no as a bulk. However, I can not find a method provided by the API to group several images into an object and upload the object to GCS.
You can create a composite object in order to merge multiple images together in a single object - here is the documentation for this in Node.js
Nonetheless, composing objects requires that the source files are already stored in the same storage bucket and must have the same storage class;see document for More details.
Because of this, the composite object can only be created once all the files that you are willing for it to be composed of are stored in the same bucket.
For this reason, in case you would like to have this done prior to uploading to GCS, you should implement logic on node.js side like merging objects before uploading to gcs.
You can have a look at this document Node.js — How to Merge Objects.

How to organize s3 uploads client/server with AWS SDK

I have a bucket that has multiple users, and would like to pre-sign urls for the client to upload to s3 (some files can be large, so I'd rather they not pass through the Node server. My question is this: Until the mongo database is hit, there is no mongo Object Id to tag as a prefix for the file. (I'm separating the files in this structure: (UserID/PostID/resource) so you can check all of a user's pictures by looking under /UserID, and you can target a specific post by also adding the PostID. Conversely, there is no Object URL until the client uploads the file, so I'm at a bit of an impasse.
Is it bad practice to rename files after they touch the bucket? I just can't pre-know the ObjectID (the post has to be created in Mongo first) - but the user has to select what files they want to upload before the object is created. I was thinking the best flow could be one of two situations:
Client sets files -> Mongo created Document -> Responds to client with ObjectID and pre-signed urls for each file with the key set to /UserID/PostID/name. After successful upload, it re-triggers an update function on the server to edit the urls of the post. after update, send success to client.
Client uploads files to root of bucket -> Mongo doc created where urls of uploaded s3 files are being stored -> iterate over list and prepend the UserID and newly-created PostID, updating mongo document -> success response to client
Is there another approach that I don't know about?
Answering your question:
Is it bad practice to rename files after they touch the server?
If you are planing to use S3 to save your files, there is no server, so there is no problems to change these files after you upload them.
The only thing that you need to understand is renaming a object you need to two requests:
copy the object with a new name
delete the old object with the old name
And this means that maybe can be a problem in costs/latency if you have a huge number of changes (but I can say for most of cases this will not be a problem)
I can say that the first option will be a good option for you, and the only thing that I would change is adding a Serverless processing for your object/files, using the AWS Lambda service will be a good option .
In this case instead of updating the files on the server, you will update using a Lambda function, you only need to add a trigger for your bucket in the PutObject event on S3, this way will can change the name of your files in the best processing time for your client and with low costs.

AWS Lambda Function - Image Upload - Process Review

I'm trying to better understand how the overall flow should work with AWS Lambda and my Web App.
I would like to have the client upload a file to a public bucket (completely bypassing my API resources), with the client UI putting it into a folder for their account based on a GUID. From there, I've got lambda to run when it detects a change to the public bucket, then resizing the file and placing it into the processed bucket.
However, I need to update a row in my RDS Database.
Issue
I'm struggling to understand the best practice to use for identifying the row to update. Should I be uploading another file with the necessary details (where every image upload consists really of two files - an image and a json config)? Should the image be processed, and then the client receives some data and it makes an API request to update the row in the database? What is the right flow for this step?
Thanks.
You should use a pre-signed URL for the upload. This allows your application to put restrictions on the upload, such as file type, directory and size. It means that, when the file is uploaded, you already know who did the upload. It also prevents people from uploading randomly to the bucket, since it does not need to be public.
The upload can then use an Amazon S3 Event to trigger the Lambda function. The filename/location can be used to identify the user, so the database can be updated at the time that the file is processed.
See: Uploading Objects Using Presigned URLs - Amazon Simple Storage Service
I'd avoid uploading a file directly to S3 bypassing the API. Uploading file from your API allows you to control type of file, size etc as well as you will know who exactly is uploading the file (API authid or user id in API body). This is also a security risk to open a bucket to public for writes.
Your API clients can then upload the file via API, which then can store file on S3 (trigger another lambda for processing) and then update your RDS with appropriate meta-data for that user.

How to get all list of objects in buckets including tags in single request from S3 bucket

I am using listObjectsV2 to list all the objects from AWS s3 bucket.But
that list does not contain tags and metadata.I have gone through the documentation and came to know that metadata details we can get separately
by fetching one by one object.But is there any way can we get tags and metadata of files from s3 bucket in one method?
Note: I am using AWS-SDK(Node.js) version 2.x
The underlying S3 service API has no method for fetching listings along with object metadata and/or tags, so none of the SDKs implement such functionality, either.
Amazon S3 Inventory provides comma-separated values (CSV) or Apache optimized row columnar (ORC) output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).
It can be configured to run on a daily basis.

Does any cloud object stores support object metadata indices?

I have a very large document store - about 50 million JSON docs, with 50m more added per year. Each is about 10K. I would like to store them in a cloud storage and retrieve them via a couple structured metadata indices that I would update as I add documents to the store.
It looks like AWS S3, Google Cloud Storage and Azure allow custom metadata to be returned with an object, but not used as part of a GET request to filter a collection of objects.
Is there a good solution "out-of-the-box" for this? I can't find any, but it seems like my use case shouldn't be really unusual. I don't need to query by document attributes or to return partial documents, I just need to GET a collection of documents by filtering on a handful of metadata fields.
The AWS SimpleDB page mentions "Indexing Amazon S3 Object Metadata" as a use case, and links to a library that hasn't been updated since 2009.
They are simply saying that you can store and query the metadata in amazon simple DB which is a NoSQL database provided by amazon for you. Depending on the kind of metadata you have, you could also store it in an RDBMS. Few 100 million rows isn’t too much if you create the proper indices and you can store URLs or file names, to access the files stored on S3, Azure, … afterwards.

Resources