How to get the file path in AWS Lambda? - node.js

I would like to send a file to Google Cloud Platform using their client library such on this this example (Node.js code sample): https://cloud.google.com/storage/docs/uploading-objects
My current code looks like this:
const s3Bucket = 'bucket_name';
const s3Key = 'folder/filename.extension';
const filePath = s3Bucket + "/" + s3Key;
await storage.bucket(s3Bucket).upload(filePath, {
gzip: true,
metadata: {
cacheControl: 'public, max-age=31536000',
},
});
But when I do this there is an error:
"ENOENT: no such file or directory, stat
'ch.ebu.mcma.google.eu-west-1.ibc.websiteExtract/AudioJobResults/audioGoogle.flac'"
I also tried to send the path I got in AWS Console (Copy path button) "s3://s3-eu-west-1.amazonaws.com/ch.ebu.mcma.google.eu-west-1.ibc.website/ExtractAudioJobResults/audioGoogle.flac", but did not work.

You seem to be trying to copy data from S3 to Google Cloud Storage directly. This is not what your example/tutorial shows. The sample code assumes that you upload a local copy of the data to Google Cloud Storage. S3 is not local storage.
How you could do it:
Download the data to /tmp in your Lambda function
Use the sample code above to upload the data from /tmp
(Optionally) Remove the uploaded data from /tmp
A word of caution: The available storage under /tmp is currently limited to 500MB. If you want to upload/copy files larger than that this won't work. Also beware that the lambda execution environment might be re-used so cleaning up after yourself (i.e. step 3) is probably a good idea if you plan to copy lots of files.

Related

How can I check if a file has finished uploading before moving it with the Google Drive API v3?

I'm writing a small archiving script (in node.js) to move files on my Google Drive to a predetermined folder if they contain .archive.7z in the filename. The script is run periodically as a cron job, and the file movement has not caused any issues, but files still in the process of being uploaded by my desktop client are moved before they're finished. This terminates the upload and results in corrupted files in the destination folder.
Files still being uploaded from my desktop to Google Drive are returned by the following function anyway:
async function getArchivedFiles (drive) {
const res = await drive.files.list({
q: "name contains '.archive.7z'",
fields: 'files(id, name, parents)',
})
return res.data.files
}
Once the files are moved and renamed with the following code, the upload terminates from my client (Insync) and the destination files are ruined.
drive.files.update({
fileId: file.id,
addParents: folderId,
removeParents: previousParents,
fields: 'id, parents',
requestBody: {
name: renameFile(file.name)
}
})
Is there any way to check if a file is still being uploaded before moving it?
It turns out that a tiny placeholder-type file is being created on uploads. I'm not sure if this is a Google Drive API behaviour or something unique to the Insync desktop client. This file seems to upload separately and thus can be freely renamed once it's complete.
I worked around this problem by including the file's md5 hash in the filename, and updating my script to only move files when the hash in their filename matches the md5Checksum retrieved from the Google Drive API.

Writing a new file to a Google Cloud Storage bucket from a Google Cloud Function (Python)

I am trying to write a new file (not upload an existing file) to a Google Cloud Storage bucket from inside a Python Google Cloud Function.
I tried using google-cloud-storage but it does not have the
"open" attribute for the bucket.
I tried to use the App Engine library GoogleAppEngineCloudStorageClient but the function cannot deploy with this dependencies.
I tried to use gcs-client but I cannot pass the credentials inside the function as it requires a JSON file.
Any ideas would be much appreciated.
Thanks.
from google.cloud import storage
import io
# bucket name
bucket = "my_bucket_name"
# Get the bucket that the file will be uploaded to.
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket)
# Create a new blob and upload the file's content.
my_file = bucket.blob('media/teste_file01.txt')
# create in memory file
output = io.StringIO("This is a test \n")
# upload from string
my_file.upload_from_string(output.read(), content_type="text/plain")
output.close()
# list created files
blobs = storage_client.list_blobs(bucket)
for blob in blobs:
print(blob.name)
# Make the blob publicly viewable.
my_file.make_public()
You can now write files directly to Google Cloud Storage. It is no longer necessary to create a file locally and then upload it.
You can use the blob.open() as follows:
from google.cloud import storage
def write_file():
client = storage.Client()
bucket = client.get_bucket('bucket-name')
blob = bucket.blob('path/to/new-blob.txt')
with blob.open(mode='w') as f:
for line in object:
f.write(line)
You can find more examples and snippets here:
https://github.com/googleapis/python-storage/tree/main/samples/snippets
You have to create your file locally and then to push it to GCS. You can't create a file dynamically in GCS by using open.
For this, you can write in the /tmp directory which is an in memory file system. By the way, you will never be able to create a file bigger than the amount of the memory allowed to your function minus the memory footprint of your code. With a function with 2Gb, you can expect a max file size of about 1.5Gb.
Note: GCS is not a file system, and you don't have to use it like this
EDIT 1
Things have changed since my answer:
It's now possible to write in any directory in the container (not only the /tmp)
You can stream write a file in GCS, as well as you receive it in streaming mode on CLoud Run. Here a sample to stream write to GCS.
Note: stream write deactivate the checksum validation. Therefore, you won't have integrity checks at the end of the file stream write.

Downloading folders from Google Cloud Storage Bucket with NodeJS

I need to download folders with NodeJS from my Bucket from my Google Cloud Storage. I read all the documentation and I only found a way to download files and not folders. I need to get/download the folder to provide user's download files.
Could someone help me?
As Doug said, Google Cloud Storage would show you the structure of different directories, but there are actually no folders within the buckets.
However, you can find perform some workarounds within your code to create that very same folder structure yourself. For the workaround I came up with, you need to use libraries such as shelljs, which will allow you to create folders in your system.
Following this GCP tutorial on Cloud Storage, you will find examples on, for instance, how to list or download files from your bucket.
Now, putting all this together, you can get the full path of the file you are going to download, parse it to separate the folders from the actual file, then create the folder structure using the method mkdir from shelljs.
For me, modifying the method for downloading files in the tutorial, was something like this:
var shell = require('shelljs');
[...]
async function downloadFile(bucketName, srcFilename, destFilename) {
// [START storage_download_file]
// Imports the Google Cloud client library
const {Storage} = require('#google-cloud/storage');
// Creates a client
const storage = new Storage();
//Find last separator index
var index = srcFilename.lastIndexOf('/');
//Get the folder route as string using previous separator
var str = srcFilename.slice(0, index);
//Create recursively the folder structure in the current directory
shell.mkdir('-p', './'+str);
//Path of the downloaded file
var destPath = str+'/'+destFilename;
const options = {
destination: destPath,
};
// Downloads the file
await storage
.bucket(bucketName)
.file(srcFilename)
.download(options);
console.log(
`gs://${bucketName}/${srcFilename} downloaded to ${destPath}.`
);
// [END storage_download_file]
}
You will want to use the getFiles method of Bucket to query for the files you want to download, then download each one of them individually. Read more about how to use the underlying list API. There are no folder operations in Cloud Storage (as there are not actually any folders, there are just file paths the look like they're organized as folders).

How to store files in firebase using node.js

I have a small assignment where I will have a URL to a document or a file like google drive link or dropbox link.
I have to use this link to store that file or doc in firebase using nodejs. How should i start?
Little head's up might help. What should i use? Please help I'm stuck here.
The documentation for using the admin SDK is mostly covered in GCP documentation.
Here's a snippet of code that shows how you could upload a image directly to Cloud Storage if you have a URL for it. Any public link works, whether it's shared from Dropbox or somewhere else on the internet.
Edit 2020-06-01 The option to upload directly from URL was dropped in v2.0 of the SDK (4 September 2018): https://github.com/googleapis/nodejs-storage/releases/tag/v2.0.0
const fileUrl = 'https://www.dropbox.com/some/file/download/link.jpg';
const opts = {
destination: 'path/to/file.jpg',
metadata: {
contentType: 'image/jpeg'
}
};
firebase.storage().bucket().upload(fileUrl, opts);
This example is using the default bucket in your application and the opts object provides file upload options for the API call.
destination is the path that your file will be uploaded to in Google Cloud Storage
metadata should describe the file that you're uploading (see more examples here)
contentType is the file MIME type that you are uploading

Node.js: multi-part file upload via REST API

I would like to upload invoking a REST endpoint in multi-part.
In particular, I am looking at this API: Google Cloud Storage: Objects: insert
I did read about using multer, however I did not find any complete example showing me how to perform this operation.
Could someone help me with that?
https://cloud.google.com/nodejs/getting-started/using-cloud-storage#uploading_to_cloud_storage
^^ this is a a good example of how to use multer to upload a single image to Google Cloud Storage. Use multer to create filestream for each file ( storage: multer.memoryStorage() ), and handle the file stream by sending it to your GCS bucket in your callback.
However link only shows an example for one image. If you want to do an array of images, create a for-loop, where you create a stream for each file in your request, but only put the next() function after the for loop ends. If you keep the next(); in each loop cycle you will get the error: Error: Can't set headers after they are sent.
There is an example for uploading files with the nodejs client library and multer. You can modify this example and set the multipart option:
Download the sample code and cd into the folder:
git clone https://github.com/GoogleCloudPlatform/nodejs-docs-samples/
cd nodejs-docs-samples/appengine/storage
Edit the app.yaml file and include your bucket name:
GCLOUD_STORAGE_BUCKET: YOUR_BUCKET_NAME
Then in the source code, you can modify the publicUrl variable according to Objects: insert example:
const publicUrl = format(`https://www.googleapis.com/upload/storage/v1/b/${bucket.name}/o?uploadType=multipart`);
Download a key file for your service account and set the environment variable:
Go to the Create service account key page in the GCP Console.
From the Service account drop-down list, select New service account.
Input a name into the Service account name field.
From the Role drop-down list, select Project > Owner.
Click Create. A JSON file that contains your key downloads to your computer. And finally export the environment variable:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/key/file
After that, yo're ready to run npm start and go to the app's frontend and upload your file:

Resources