AWS S3 get object using URL - node.js

I have a collection of URLs that may or may not belong to a particular bucket. These are not public.
I'm using the nodejs aws-sdk to get them.
However, the getObject function needs params Bucket and Key separately, which are already in my URL.
Is there any way I can use the URL?
I tried extracting the key by splitting URL with / and getting bucket by splitting with .. But the problem is the bucket name can also have . and I'm not sure if key name can have / as well.

The amazon-s3-uri library can parse out the Amazon S3 URI:
const AmazonS3URI = require('amazon-s3-uri')
try {
const uri = 'https://bucket.s3-aws-region.amazonaws.com/key'
const { region, bucket, key } = AmazonS3URI(uri)
} catch((err) => {
console.warn(`${uri} is not a valid S3 uri`) // should not happen because `uri` is valid in that example
})

use this module parse-s3-url to set the parameter for the getObject.
bucket.getObject( parseS3Url('https://s3.amazonaws.com/mybucket/mykey'), (err:any, data:any) =>{
if (err) {
// alert("Failed to retrieve an object: " + error);
} else {
console.log("Loaded " + data.ContentLength + " bytes");
// do something with data.Body
}
});

To avoid installing a package.
const objectUrl = 'https://s3.us-east-2.amazonaws.com/my-s3-bucket/some-prefix/file.json'
const { host, pathname } = new URL(objectUrl);
const [, region] = /s3.(.*).amazon/.exec(host)
const [, bucket, key] = pathname.split('/')

Related

Authorization Error from beginCopyFromURL API from Javascript library (#azure/storage-blob) when executed from minikube

I have application registered in Azure and it has Storage Account Contributor role. I am trying to copy content from one account to another in same subscription by using SAS token. Below is code snippet for testing purpose. This code works perfectly fine from standalone node js but it fails when deployed in minikube pod with Authorization Error code 403. Any suggestions/thoughts will be appreciated.
I have verified start and end date for signature.
Permissions are broader but they seem to correct.
For testing keeping expiry for 24 hrs.
If I copy sas url generated from failed code,I can download file from my host machine using azcopy command line. Looks like code fails only when executed from minikube pod.
const { ClientSecretCredential } = require("#azure/identity");
const { BlobServiceClient, UserDelegationKey, ContainerSASPermissions, generateBlobSASQueryParameters } = require("#azure/storage-blob");
module.exports = function () {
/*
This function will receive an input that conforms to the schema specified in
activity.json. The output is a callback function that follows node's error first
convention. The first parameter is either null or an Error object. The second parameter
of the output callback should be a JSON object that conforms to the schema specified
in activity.json
*/
this.execute = async function (input, output) {
try {
if (input.connection) {
const containerName = input.sourcecontainer.trim()
const credential = new ClientSecretCredential(input.connection.tenantId, input.connection.clientid, input.connection.clientsecret);
const { BlobServiceClient } = require("#azure/storage-blob");
// Enter your storage account name
const account = input.sourceaccount.trim();
const accounturl = 'https://'.concat(account).concat('.blob.core.windows.net')
const blobServiceClient = new BlobServiceClient(
accounturl,
credential);
const keyStart = new Date()
const keyExpiry = new Date(new Date().valueOf() + 86400 * 1000)
const userDelegationKey = await blobServiceClient.getUserDelegationKey(keyStart, keyExpiry);
console.log(userDelegationKey)
const containerSAS = generateBlobSASQueryParameters({
containerName,
permissions: ContainerSASPermissions.parse("racwdl"),
startsOn: new Date(),
expiresOn: new Date(new Date().valueOf() + 86400 * 1000),
},
userDelegationKey, account).toString();
const target = '/' + containerName + '/' + input.sourcefolder.trim() + '/' + input.sourcefilename.trim()
const sastoken = accounturl + target + '?' + containerSAS
console.log(sastoken)
let outputData = {
"sourcesas": sastoken
}
//Testing second action execution from same action for testing purpose.
const containerName2 = 'targettestcontainer'
const credential2 = new ClientSecretCredential(input.connection.tenantId, input.connection.clientid, input.connection.clientsecret);
// Enter your storage account name
const blobServiceClient2 = new BlobServiceClient(
accounturl,
credential2);
const destContainer = blobServiceClient2.getContainerClient(containerName2);
const destBlob = destContainer.getBlobClient('testfolder01' + '/' + 'test-code.pdf');
const copyPoller = await destBlob.beginCopyFromURL(outputData.sourcesas);
const result = await copyPoller.pollUntilDone();
return output(null, outputData)
}
} catch (e) {
console.log(e)
return output(e, null)
}
}
}
Thank you EmmaZhu-MSFT for providing the solution. Simmilar issue also raise in github Posting this as an answer to help other community member.
From service side log, seems there's time skew between Azure Storage
Service and the client, the start time used in source SAS token was
later than server time.
We'd suggest not using start time in SAS token to avoid this kind of
failure caused by time skew.
Reference : https://github.com/Azure/azure-sdk-for-js/issues/21977

Google Cloud KMS: The checksum in field ciphertext_crc32c did not match the data in field ciphertext

I am having issues setting up a system to encrypt and decrypt data in my Node.js backend. I am following this guide in the process.
I wrote a helper class KMSEncryption to abstract the logic from the example. Here's the code where I call it:
const kms = new KMSEncryption();
const textToEncrypt = 'hello world!';
const base64string = await kms.encrypt(textToEncrypt);
const decrypted = await kms.decrypt(base64string);
The issue I am having is that the decryption fails with the following error:
UnhandledPromiseRejectionWarning: Error: 3 INVALID_ARGUMENT: The checksum in field ciphertext_crc32c did not match the data in field ciphertext.
I tried to compare side by side with guide from Google docs but I cannot see where I went wrong.
Some of the things I have tried include:
Converting the base64string into a Buffer
Trying to calculate checksum on a Buffer of base64string and not the string itself
Any help is appreciated. Thank you
I believe you are base64 encoding the ciphertext when you do:
if (typeof ciphertext !== 'string') {
return this.toBase64(ciphertext);
}
but you are not reversing the encoding before calculating the crc32c.
I pulled this example together from sample code, it works correctly for me from Cloud Shell. (Sorry it's messy):
// On Cloud Shell, install ts first with:
// npm install -g typescript
// npm i #types/node
// npm i #google-cloud/kms
// npm i fast-crc32c
// Then to compile and run:
// tsc testcrc.ts && node testcrc.js
// Code adapted from https://cloud.google.com/kms/docs/encrypt-decrypt#kms-decrypt-symmetric-nodejs
const projectId = 'kms-test-1367';
const locationId = 'global';
const keyRingId = 'so-67778448';
const keyId = 'example';
const plaintextBuffer = Buffer.from('squeamish ossifrage');
// Imports the Cloud KMS library
const {KeyManagementServiceClient} = require('#google-cloud/kms');
const crc32c = require('fast-crc32c');
// Instantiates a client
const client = new KeyManagementServiceClient();
// Build the key name
const keyName = client.cryptoKeyPath(projectId, locationId, keyRingId, keyId);
// Optional, but recommended: compute plaintext's CRC32C.
async function encryptSymmetric() {
const plaintextCrc32c = crc32c.calculate(plaintextBuffer);
console.log(`Plaintext crc32c: ${plaintextCrc32c}`);
const [encryptResponse] = await client.encrypt({
name: keyName,
plaintext: plaintextBuffer,
plaintextCrc32c: {
value: plaintextCrc32c,
},
});
const ciphertext = encryptResponse.ciphertext;
// Optional, but recommended: perform integrity verification on encryptResponse.
// For more details on ensuring E2E in-transit integrity to and from Cloud KMS visit:
// https://cloud.google.com/kms/docs/data-integrity-guidelines
if (!encryptResponse.verifiedPlaintextCrc32c) {
throw new Error('Encrypt: request corrupted in-transit');
}
if (
crc32c.calculate(ciphertext) !==
Number(encryptResponse.ciphertextCrc32c.value)
) {
throw new Error('Encrypt: response corrupted in-transit');
}
console.log(`Ciphertext: ${ciphertext.toString('base64')}`);
console.log(`Ciphertext crc32c: ${encryptResponse.ciphertextCrc32c.value}`)
return ciphertext;
}
async function decryptSymmetric(ciphertext) {
const cipherTextBuf = Buffer.from(await ciphertext);
const ciphertextCrc32c = crc32c.calculate(cipherTextBuf);
console.log(`Ciphertext crc32c: ${ciphertextCrc32c}`);
const [decryptResponse] = await client.decrypt({
name: keyName,
ciphertext: cipherTextBuf,
ciphertextCrc32c: {
value: ciphertextCrc32c,
},
});
// Optional, but recommended: perform integrity verification on decryptResponse.
// For more details on ensuring E2E in-transit integrity to and from Cloud KMS visit:
// https://cloud.google.com/kms/docs/data-integrity-guidelines
if (
crc32c.calculate(decryptResponse.plaintext) !==
Number(decryptResponse.plaintextCrc32c.value)
) {
throw new Error('Decrypt: response corrupted in-transit');
}
const plaintext = decryptResponse.plaintext.toString();
console.log(`Plaintext: ${plaintext}`);
console.log(`Plaintext crc32c: ${decryptResponse.plaintextCrc32c.value}`)
return plaintext;
}
decryptSymmetric(encryptSymmetric());
You can see that it logs the crc32c several times. The correct crc32c for the example string, "squeamish ossifrage", is 870328919. The crc32c for the ciphertext will vary on every run.
To run this code yourself, you just need to point it at your project, region, key ring, and key (which should be a symmetric encryption key); hopefully comparing this code with your code's results will help you find the issue.
Thanks for using Google Cloud and Cloud KMS!

Google Cloud Storage - getting Not Found when trying to delete object that exists

This is likely a duh mistake but I can't figure this out.
I'm successfully uploading images to a bucket with a signed URL. When trying to delete the object from my Express backend, using the below code from Google's example, I get Not Found, yet the object is there with the correct name. Thoughts?
async function deleteFile(filename) {
console.log(filename); // correct file name as exists in bucket
try {
await storage
.bucket(bucketName) // correct bucket name and subfolder 'my-image-bucket/posts'
.file(filename)
.delete();
} catch (e) {
console.log('Error message = ', e.message); // Not Found
}
}
The only red flag I'm seeing is, "correct bucket name and subfolder 'my-image-bucket/posts'" next to .bucket(). You should only be passing the bucket name to .bucket() and then the full path to .file().
const bucketName = 'my-image-bucket';
const filename = 'posts/image.jpg';
await storage
.bucket(bucketName)
.file(filename)
.delete();

AWS Lambda: Get Latest Modified n number of Files from a s3 bucket folder

We need to get the latest 4 files which are uploaded into s3 bucket folder.
But it was not returning the exact result.
Below my code
let params = { Bucket: bucket };
let keys = [];
const response = await s3.listObjectsV2(params).promise();
response.Contents.forEach(item => {
if (item.Key !== prefix) {
keys.push(item.Key);
}
});
Then sort the keys and get 0 to 3 files.
any good way to get this?

Google Cloud Functions bucket.upload()

I'm trying to archive pdf files from remote websites to Google Cloud Storage using a google function triggered by a firebase write.
The code below works. However, this function copies the remote file to the bucket root.
I'd like to copy the pdf to the pth of the bucket: library-xxxx.appspot.com/Orgs/${params.ukey}.
How to do this?
exports.copyFiles = functions.database.ref('Orgs/{orgkey}/resources/{restypekey}/{ukey}/linkDesc/en').onWrite(event => {
const snapshot = event.data;
const params = event.params;
const filetocopy = snapshot.val();
if (validFileType(filetocopy)) {
const pth = 'Orgs/' + params.orgkey;
const bucket = gcs.bucket('library-xxxx.appspot.com')
return bucket.upload(filetocopy)
.then(res => {
console.log('res',res);
}).catch(err => {
console.log('err', err);
});
}
});
Let me begin with a brief explanation of how GCS file system works: as explained in the documentation of Google Cloud Storage, GCS is a flat name space where the concept of directories does not exist. If you have an object like gs://my-bucket/folder/file.txt, this means that there is an object called folder/file.txt stored in the root directory of gs://my-bucket, i.e. the object name includes / characters. It is true that the GCS UI in the Console and the gsutil CLI tool make the illusion of having a hierarchical file structure, but this is only to provide more clarity for the user, even though those directories do not exist, and everything is stored in a "flat" name space.
That being said, as described in the reference for the storage.bucket.upload() method, you can specify an options parameter containing the destination field, where you can specify a string with the complete filename to use.
Just as an example (note the options paramter difference between both functions):
var bucket = storage.bucket('my-sample-bucket');
var options = {
destination: 'somewhere/here.txt'
};
bucket.upload('sample.txt', function(err, file) {
console.log("Created object gs://my-sample-bucket/sample.txt");
});
bucket.upload('sample.txt', options, function(err, file) {
console.log("Created object gs://my-sample-bucket/somewhere/here.txt");
});
So in your case you can build a string containing the complete name that you want to use (containing also the "directory" structure you have in mind).
filepath --> local machine file storage path
await bucket.upload(filepath, {
public: true,
gzip: true,
metadata: {
cacheControl: "public, max-age=31536000",
},
});

Resources