Cloud Functions for Firebase download from google storage - node.js

I have a collection in firebase which looks something like this:
people:
-KuM2GgA5JdH0Inem6lGaddclose
appliedTo: "-KuM1IB5TisBtc34y2Bb"
document: "docs/837c2500-9cbe-11e7-8ac1-17a6c37e2057"
name: "Test Testerson"
the document node contains a path to a file in a storage bucket. Is it possible to download this file to the client using an http firebase function. According to Stream files in node/express to client I should be able to stream to response in express. Will the google-storage readstream work for this?
Thanks,
Ben

The Firebase Admin SDK has a Storage object you can use. It gives you an entry point into the Google Cloud Storage SDK which can interact with storage buckets.
const bucket = admin.storage().bucket
Use this Bucket object to upload and download files. You should be able to use a stream to send contents to the client.

Related

Utility for copying bucket data from one project to another in NodeJs

I need to copy data from one bucket in Project A in gCloud to another bucket in Project B. Any utility if present in NodeJs to do this?
You might be tempted to get a list of blobs inside the bucket, download them, and upload them to another storage bucket (this is what gsuilt cp -m gs://origin_bucket/** gs://destination_bucket/ does).
The problem with this approach is you will consume CPU (on your side), and will take some time.
If you want to move all data from one bucket to another one, the best way to do this is using the Storage Transfer Service.
With the Storage Transfer Service you just tell the origin and destination buckets, and optionally a schedule... and google will do the operation much faster than you can do it yourself.
Also remember that the source and destination buckets can be GCS buckets, S3, or Azure Storage buckets, too.
Take a look at google-provided Node Sample Code for Storage Transfer Service.
If you just want to transfer some files, the Storage Transfer Service has a (in beta as Feb 2022) feature that allows you to specify a manifest file (a CSV file stored in a gcs bucket). See: Transfer specific files or objects using a Manifest
You can use the Cloud Storage Client Libraries. To copy an object in one of your Cloud Storage buckets, see this sample code:
const srcBucketName = 'your-source-bucket';
const srcFilename = 'your-file-name';
const destBucketName = 'target-file-bucket';
const destFileName = 'target-file-name';
// Imports the Google Cloud client library
const {Storage} = require('#google-cloud/storage');
// Creates a client
const storage = new Storage();
async function copyFile() {
// Copies the file to the other bucket
await storage
.bucket(srcBucketName)
.file(srcFilename)
.copy(storage.bucket(destBucketName).file(destFileName));
console.log(
`gs://${srcBucketName}/${srcFilename} copied to gs://${destBucketName}/${destFileName}`
);
}
copyFile().catch(console.error);
Additionally, you need to ensure that you have been assigned a role with the necessary permissions from the other project.

Cannot read .json from a google cloud bucket

I have a folder structure within a bucket of google cloud storage
bucket_name = 'logs'
json_location = '/logs/files/2018/file.json'
I try to read this json file in jupyter notebook using this code
from google.cloud import storage
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "logs/files/2018/file.json"
def download_blob(source_blob_name, bucket_name, destination_file_name):
"""Downloads a blob from the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print('Blob {} downloaded to {}.'.format(
source_blob_name,
destination_file_name))
Then calling the function
download_blob('file.json', 'logs', 'file.json')
And I get this error
DefaultCredentialsError: File /logs/files/2018/file.json was not found.
I have looked at all the similar question asked on stackoverflow and cannot find a solution.
The json file is present and can be open or downloaded in the json_location on google cloud storage.
There are two different perspectives regarding the json file you refer:
1) The json file used for authenticating to GCP.
2) The json you want to download from a bucket to your local machine.
For the first one, if you are accessing remotely to you Jupyter server, most probably the json doesn't exist in such remote machine, but in your local machine. If this is your scenario try to upload the json to the Jupyter server. Executing ls -l /logs/files/2018/file.json in the remote machine could help to verify its correctness. Then, os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "JSON_PATH_ON_JUPYTER_SERVER" should work.
On the other hand, I executed your code and got:
>>> download_blob('static/upload_files_CS.png', 'bucketrsantiago', 'file2.json')
Blob static/upload_files_CS.png downloaded to file2.json.
The file gs://bucketrsantiago/static/upload_files_CS.png was downloaded to my local machine with the name file2.json. This helps to clarify that the only problem is regarding the authentication json file.
GOOGLE_APPLICATION_CREDENTIALS is supposed to point to a file on the local disk where you are running jupyter. You need the credentials in order to call GCS, so you can't fetch them from GCS.
In fact, you are best off not messing around with credentials at all in your program, and leaving the client library to it. Don't touch GOOGLE_APPLICATION_CREDENTIALS in our application. Instead:
If you are running on GCE, just make sure your GCE instances [has a service account with the right scopes and permissions]. Applications running in that instance will automatically have the permissions of that service account.
If you are running locally, install google cloud SDK and run gcloud auth application-default login. Your program will then automatically use whichever account you log in as.
Complete instructions here

Cloud Functions: Delete file on Cloud Storage when Firestore document is deleted

I have a cloud function which listens for onDelete events. When a document is deleted, I also want an associated file on the storage to be deleted.
Currently I have only the download-url (https link) stored as a field in the document.
How can I select the file within the function? Is this possible or should I store the storage location (path) of the file inside the document and use that to do:
storage.bucket(<my-bucket>).file(<path>).delete()
The Cloud Storage SDK doesn't have a way to convert an HTTPS download URL into a file path in your storage bucket. If you need to know the path to a file in Cloud Storage, you should store that path as another field in your database. This will make it easy to reach back into your storage bucket to delete the file when needed.

GCP App Engine Access to GCloud Storage without 'sharing publicly'

I would like to know how to grant a Google Cloud Platform App Engine project permissions to serve content from Google Cloud Storage without setting the Google Cloud Storage bucket permissions to ‘share publicly'.
My App engine project is running Node JS. Uses Passport-SAML authentication to authenticate users before allowing them to view content, hence I do not want to set access on an individual user level via IAM. Images and videos are currently served from within a private folder of my app, which is only accessible once users are authenticated. I wish to move these assets to Google Cloud Storage and allow the app to read the files, whist not providing global access. How should I go about doing this? I failed to find any documentation on it.
I think this might work for you https://cloud.google.com/storage/docs/access-control/create-signed-urls-program
I can't seem to find the API doc for nodejs (google is really messing around with their doc urls). Here's some sample code:
bucket.upload(filename, options, function(err, file, apiResponse) {
var mil = Date.now()+60000;
var config = {
action: 'read',
expires: mil
};
file.getSignedUrl(config, function(err, url) {
if (err) {
return;
}
console.log(url);
});
As stated in the official documentation:
By default, when you create a bucket for your project, your app has
all the permissions required to read and write to it.
Whenever you create an App Engine application, there is a default bucket that comes with the following perks:
5GB of free storage.
Free quota for Cloud Storage I/O operations
By default it is created automatically with your application, but in any case you can follow the same link I shared previously in other to create the bucket. Should you need more than those 5GB of free storage, you can make it a paid bucket and you will only be charged for the storage that surpasses the first 5 GB.
Then, you can make use of the Cloud Storage Client Libraries for Node.js and have a look at some nice samples (general samples here or even specific operations over files here) for working with the files inside your bucket.
UPDATE:
Here there is a small working example on how to use the Cloud Storage client libraries to retrieve images from your private bucket without making them public, by means of authenticating requests. It works in a Cloud Function, so you should have no issues in reproducing the same behavior in App Engine. It does not perform exactly what you need, as it displays the image in the bucket alone, without any integration inside an HTML file, but you should be able to build something from that (I am not too used to work with Node.js, unfortunately).
I hope this can be of some help too.
'use strict';
const gcs = require('#google-cloud/storage')();
exports.imageServer = function imageSender(req, res) {
let file = gcs.bucket('<YOUR_BUCKET>').file('<YOUR_IMAGE>');
let readStream = file.createReadStream();
res.setHeader("content-type", "image/jpeg");
readStream.pipe(res);
};

How to do Azure Blob storage and Azure SQL Db atomic transaction

We have a Blob storage container in Azure for uploading application specific documents and we have Azure Sql Db where meta data for particular files are saved during the file upload process. This upload process needs to be consistent so that we should not have files in the storage for which there is no record of meta data in Sql Db and vice versa.
We are uploading list of files which we get from front-end as multi-part HttpContent. From Web Api controller we call the upload service passing the httpContent, file names and a folder path where the files will be uploaded. The Web Api controller, service method, repository, all are asyn.
var files = await this.uploadService.UploadFiles(httpContent, fileNames, pathName);
Here is the service method:
public async Task<List<FileUploadModel>> UploadFiles(HttpContent httpContent, List<string> fileNames, string folderPath)
{
var blobUploadProvider = this.Container.Resolve<UploadProvider>(
new DependencyOverride<UploadProviderModel>(new UploadProviderModel(fileNames, folderPath)));
var list = await httpContent.ReadAsMultipartAsync(blobUploadProvider).ContinueWith(
task =>
{
if (task.IsFaulted || task.IsCanceled)
{
throw task.Exception;
}
var provider = task.Result;
return provider.Uploads.ToList();
});
return list;
}
The service method uses a customized upload provider which is derived from System.Net.Http.MultipartFileStreamProvider and we resolve this using a dependency resolver.
After this, we create the meta deta models for each of those files and then save in the Db using Entity framework. The full process works fine in ideal situation.
The problem is if the upload process is successful but somehow the Db operation fails, then we have files uploaded in Blob storage but there is no corresponding entry in Sql Db, and thus there is data inconsistency.
Following are the different technologies used in the system:
Azure Api App
Azure Blob Storage
Web Api
.Net 4.6.1
Entity framework 6.1.3
Azure MSSql Database (we are not using any VM)
I have tried using TransactionScope for consistency which seems not working for Blob and Db, (works for Db only)
How do we solve this issue?
Is there any built in or supported feature for this?
What are the best practices in this case?
Is there any built in or supported feature for this?
As of today no. Essentially Blob Service and SQL Database are two separate services hence it is not possible to implement "atomic transaction" functionality like you're expecting.
How do we solve this issue?
I could think of two ways to solve this issue (I am sure there would be other as well):
Implement your own transaction functionality: Basically check for the database transaction failure and if that happens delete the blob manually.
Use some background process: Here you would continue to save the data in blob storage and then periodically find out orphaned blobs through some background process and delete those blobs.

Resources