Gitlab s3 object storage 5gb limitation - object

A little background, Gitlab allow its storage to be on S3 (not the runner).
You can see link the documentation here https://docs.gitlab.com/ee/administration/object_storage.html#consolidated-object-storage-configuration.
I configured it properly like this
gitlab_rails['object_store']['enabled'] = true
gitlab_rails['object_store']['proxy_download'] = true
gitlab_rails['object_store']['connection'] = {
'provider' => 'AWS',
'region' => 'eu-west-1',
'regionendpoint' => 'https://s3.eu-west-1.amazonaws.com',
'aws_access_key_id' => 'REDACTED',
'aws_secret_access_key' => 'REDACTED'
}
gitlab_rails['object_store']['objects']['packages']['bucket'] = '<name>-gitlab-packages'
gitlab_rails['object_store']['objects']['artifacts']['bucket'] = '<name>-gitlab-artifacts'
gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = '<name>-gitlab-external-diffs'
gitlab_rails['object_store']['objects']['lfs']['bucket'] = '<name>-gitlab-lfs'
gitlab_rails['object_store']['objects']['uploads']['bucket'] = '<name>-gitlab-uploads'
gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = '<name>-gitlab-dependency-proxy'
gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = '<name>-gitlab-terraform-state'
gitlab_rails['object_store']['objects']['pages']['bucket'] = '<name>-gitlab-pages2'
So far so good.
Now in my packages I have one that is 5.1Gb,
and apparently there's a 5Gb limitation.
I could split the tar.ball of course but i'd like a better solution.
I saw that there's something called multipart upload but couldn't find how to activate it on Gitlab.
If possible I do not want to change my upload code, only increase Gitlab 5Gb limitation.
Thank you

Related

Is it safe to store public keys/policies in a node.js constant in Lambda

I am writing a AWS lambda Authorizer in node.js. We are required to call Azure AD API to fetch the public keys/security policies to validate the incoming the Access Token.
However, to optimize the performance, I decided to store the public keys/security policies in node.js as a constant (this will be active until the Lambda is running or TTL of the keys expire).
Question : Is it safe from a security perspective ? I want to avoid "caching" it in DynamoDB as calls to DynamoDB would also incur additional milliseconds. Ours is a very high traffic application and we would like to save any millisecond possible for optimal performance. Also, any best practice is also higly appreciated
Typically, you should not hard-code things like that in your code. Even though it is not a security problem, it is making maintenance harder.
For example: when the key is "rotated" or the policy changed and you had it hard-coded in your Lambda, you would need to update your code and do another deployment. This is often causing issues, because the developer forgot about this etc. causing issues because your authorizer does not work anymore. If the Lambda loads the information from an external service like S3, SSM or directly Azure AD, you don't need another deployment. In theory, it should sort itself out depending on which service you use and how you manage your keys etc.
I think the best way is to load the key from an external service during the initialisation phase of the Lambda. That means when it is "booted" for the first time and then cache that value for the duration of the Lambdas lifetime (a few minutes to a few hours).
You could for example load the public keys and policies either directly from Azure, from S3 or SSM Parameter Store.
The following code uses the AWS SDK NodeJS v3, which is not bundled with the Lambda Runtime. You can use v2 of the SDK as well.
const { SSMClient, GetParameterCommand } = require("#aws-sdk/client-ssm");
// This only happens once, when the Lambda is started for the first time:
const init = async () => {
const config = {}
try {
// use whatever 'paramName' you defined, when you created the SSM parameter
const paramName = "/azure/publickey"
const command = new GetParameterCommand({Name: paramName});
const ssm = new SSMClient();
const data = await ssm.send(command);
config["publickey"] = data.Parameter.Value;
} catch (error) {
return Promise.reject(new Error("unable to read SSM parameter '"+ paramName + "'."));
}
return new Promise((resolve, reject) => {
resolve(config);
reject(new Error("unable to create configuration. Unknown error."));
});
};
const initPromise = init();
exports.handler = async (event) => {
const config = await initPromise;
console.log("My public key '%s'", config.key);
return "Hello World";
};
The most important point of this code is the init "function", which is only run on once, creating a "config" which should contain your AWS SDK clients and all the configuration you need in your code. This way, you don't have to get the policy for every request that the Lambda is processing etc.

Is it safe to export Firestore data multiple times to the same storage folder? Could the overwrite break the exported data?

Here is how I'm exporting my Firestore data:
import { today } from "#utils/dates/today";
import * as admin from "firebase-admin";
admin.initializeApp({
credential: admin.credential.cert(
MY_SERVICE_ACCOUNT as admin.ServiceAccount
)});
const client = new admin.firestore.v1.FirestoreAdminClient();
const BUCKET = "gs://MY_PROJECT_ID.appspot.com/firestore-backup";
const PROJECT_ID = "MY_PROJECT_ID";
const DB_NAME = client.databasePath(PROJECT_ID, "(default)");
export const backupData = async () : Promise<void> => {
const todayDate = today(); // THIS IS YYYY-MM-DD STRING
// const hashId = generateId().slice(0,5);
const responses = await client.exportDocuments({
name: DB_NAME,
outputUriPrefix: `${BUCKET}/${todayDate}`,
collectionIds: []
});
const response = responses[0];
console.log(`Operation Name: ${response['name']}`);
return;
};
You see I'm exporting to the following path:
/firestore-backup/YYYY-MM-DD/
If I'm going to backup multiple times over the same day, can I use same date folder? Is it safe to do it? Or should I add a hash to the folder name to avoid overwriting the previous export?
PS: The overwrite on a single day is not a problem. I just don't want to break the exported data.
If you go to the bucket and check the exports you'll see that the files exported seem to follow the same pattern every time. If we were to rely only on the write/update semantics of Cloud Storage, whenever there's a write to a location where a file already exists it is overwritten. Therefore, at first it doesn't seem it would cause data corruption.
However, the assumption above relies on the internal behavior of the export operations, which may be subject to future change (let aside that I can't even guarantee them as of now). Therefore, the best practice would be appending a hash to the folder name to prevent any unexpected behavior.
As an additional sidenote, it's worth mentioning that exports could incur in huge costs depending on the size of your Firestore data.

Is there a way to get location (public url) of S3 object using AWS CLI?

< premise>
I'm new cloud computing in general, AWS specifically, and REST API, and am trying to cobble together a "big-picture" understanding.
I am working with LocalStack - which, by my understanding, simulates the real AWS by responding identically to (a subset of) the AWS API if you specify the endpoint address/port that LocalStack listens at.
Lastly, I've been working from this tutorial: https://dev.to/goodidea/how-to-fake-aws-locally-with-localstack-27me
< /premise>
Using the noted tutorial, and per its guidance, I successfully creating a S3 bucket using the AWS CLI.
To demonstrate uploading a local file to the S3 bucket, though, the tutorial switches to node.js, which I think demonstrates the AWS node.js SDK:
# aws.js
# This code segment comes from https://dev.to/goodidea/how-to-fake-aws-locally-with-localstack-27me
#
const AWS = require('aws-sdk')
require('dotenv').config()
const credentials = {
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_KEY,
}
const useLocal = process.env.NODE_ENV !== 'production'
const bucketName = process.env.AWS_BUCKET_NAME
const s3client = new AWS.S3({
credentials,
/**
* When working locally, we'll use the Localstack endpoints. This is the one for S3.
* A full list of endpoints for each service can be found in the Localstack docs.
*/
endpoint: useLocal ? 'http://localhost:4572' : undefined,
/**
* Including this option gets localstack to more closely match the defaults for
* live S3. If you omit this, you will need to add the bucketName to the `Key`
* property in the upload function below.
*
* see: https://github.com/localstack/localstack/issues/1180
*/
s3ForcePathStyle: true,
})
const uploadFile = async (data, fileName) =>
new Promise((resolve) => {
s3client.upload(
{
Bucket: bucketName,
Key: fileName,
Body: data,
},
(err, response) => {
if (err) throw err
resolve(response)
},
)
})
module.exports = uploadFile
.
# test-upload.js
# This code segment comes from https://dev.to/goodidea/how-to-fake-aws-locally-with-localstack-27me
#
const fs = require('fs')
const path = require('path')
const uploadFile = require('./aws')
const testUpload = () => {
const filePath = path.resolve(__dirname, 'test-image.jpg')
const fileStream = fs.createReadStream(filePath)
const now = new Date()
const fileName = `test-image-${now.toISOString()}.jpg`
uploadFile(fileStream, fileName).then((response) => {
console.log(":)")
console.log(response)
}).catch((err) => {
console.log(":|")
console.log(err)
})
}
testUpload()
Invocation :
$ node test-upload.js
:)
{ ETag: '"c6b9e5b1863cd01d3962c9385a9281d"',
Location: 'http://demo-bucket.localhost:4572/demo-bucket/test-image-2019-03-11T21%3A22%3A43.511Z.jpg',
key: 'demo-bucket/test-image-2019-03-11T21:22:43.511Z.jpg',
Key: 'demo-bucket/test-image-2019-03-11T21:22:43.511Z.jpg',
Bucket: 'demo-bucket' }
I do not have prior experience with node.js, but my understanding of the above code is that it uses the AWS.S3.upload() AWS node.js SDK method to copy a local file to a S3 bucket, and prints the HTTP response (is that correct?).
Question: I observe that the HTTP response includes a "Location" key whose value looks like a URL I can copy/paste into a browser to view the image directly from the S3 bucket; is there a way to get this location using the AWS CLI?
Am I correct to assume that AWS CLI commands are analogues of the AWS SDK?
I tried uploading a file to my S3 bucket using the aws s3 cp CLI command, which I thought would be analogous to the AWS.S3.upload() method above, but it didn't generate any output, and I'm not sure what I should have done - or should do - to get a Location the way the HTTP response to the AWS.S3.upload() AWS node SDK method did.
$ aws --endpoint-url=http://localhost:4572 s3 cp ./myFile.json s3://myBucket/myFile.json
upload: ./myFile.json to s3://myBucket/myFile.json
Update: continued study makes me now wonder whether it is implicit that a file uploaded to a S3 bucket by any means - whether by CLI command aws s3 cp or node.js SDK method AWS.S3.upload(), etc. - can be accessed at http://<bucket_name>.<endpoint_without_http_prefix>/<bucket_name>/<key> ? E.g. http://myBucket.localhost:4572/myBucket/myFile.json?
If this is implicit, I suppose you could argue it's unnecessary to ever be given the "Location" as in that example node.js HTTP response.
Grateful for guidance - I hope it's obvious how painfully under-educated I am on all the involved technologies.
Update 2: It looks like the correct url is <endpoint>/<bucket_name>/<key>, e.g. http://localhost:4572/myBucket/myFile.json.
AWS CLI and the different SDKs offer similar functionality but some add extra features and some format the data differently. It's safe to assume that you can do what the CLI does with the SDK and vice-versa. You might just have to work for it a little bit sometimes.
As you said in your update, not every file that is uploaded to S3 is publicly available. Buckets have policies and files have permissions. Files are only publicly available if the policies and permissions allow it.
If the file is public then you can just construct the URL as you described. If you have the bucket setup for website hosting, you can also use the domain you setup.
But if the file is not public or you just want a temporary URL, you can use aws presign s3://myBucket/myFile.json. This will give you a URL that can be used by anyone to download the file with the permissions of whoever executed the command. The URL will be valid for one hour unless you choose a different time with --expires-in. The SDK has similar functionality as well but you have to work a tiny bit harder to use it.
Note: Starting with version 0.11.0, all APIs are exposed via a single edge service, which is accessible on http://localhost:4566 by default.
Considering that you've added some files to your bucket
aws --endpoint-url http://localhost:4566 s3api list-objects-v2 --bucket mybucket
{
"Contents": [
{
"Key": "blog-logo.png",
"LastModified": "2020-12-28T12:47:04.000Z",
"ETag": "\"136f0e6acf81d2d836043930827d1cc0\"",
"Size": 37774,
"StorageClass": "STANDARD"
}
]
}
you should be able to access your file with
http://localhost:4566/mybucket/blog-logo.png

Is it possible to download only a certain part of an append blob?

I upload audio files to my azure blob, and I would like to know if it is possible to download only the parts I want of the audio ?
by the way, I'm using nodeJs
Thank you !
I upload audio files to my azure blob, and I would like to know if it
is possible to download only the parts I want of the audio?
Yes, it is certainly possible to download a portion of a blob. Azure Blobs support reading a range of bytes. For example, let's say you want to download only first 1KB of data from a file. This is how you would download that data:
import azure from 'azure-storage';
const ms = require('memory-streams');
const chunkStart = 0;
const chunkEnd = 1023;
const connectionString = 'your-azure-storage-connection-string';
const blobService = azure.createBlobService(connectionString);
const writableStream = new ms.WritableStream({
highWaterMark: (chunk.end - chunk.start) * 2,
writableHighWaterMark: (chunk.end - chunk.start) * 2,
});
const requestOptions = {
rangeStart: chunkStart,
rangeEnd: chunkEnd
};
blobService.getBlobToStream('container-name', 'blob-name', writableStream, requestOptions, (error, result, response) => {
if (error) {
console.log('Error occurred while downloading chunk!');
} else {
const dataBuffer = writableStream.toBuffer();
console.log('Blob chunk downloaded!');
}
});
Considering you mentioned that you're storing an audio file, please note that you can't instruct Azure Storage to download "x" duration of audio (e.g. download first 30 seconds of audio) as Azure Storage treats all blobs as a collection of bytes and has no idea if the file is an audio file or something else.

PNP-JS Create File from File Lib template

I need to create/add a new File from Files Lib, then i need to set the name of the file with an unique ID and update other fieds. I can update the fields without any problems, but i couldn't find a way to create the file.
How can i possibly achieve the New => Create from template like in the image below
I've tried many ways but nothing fullfill my request.
1
this.web.lists.getByTitle('myFilesLib').items.add() => i get an error about the need to use SPFileCollection.Add()
2
this.web.getFolderByServerRelativeUrl(url).files.addTemplateFile => there is nothing for custom template
3
this.web.getFolderByServerRelativeUrl(url).files.add => i need to provide a file.
For JSOM, you can try this:
newFile = parentList.get_rootFolder().get_files().add(fileCreateInfo);
For CSOM:
var creationInformation = new ListItemCreationInformation();
Microsoft.SharePoint.Client.ListItem listItem = list.AddItem(creationInformation);
listItem.FieldValues["Foo"] = "Bar";
listItem.Update();
clientContext.ExecuteQuery();
Some wiki for this way: https://social.technet.microsoft.com/wiki/contents/articles/37575.sharepoint-online-working-with-files-inside-document-library-using-jsom.aspx
this.web.getFolderByServerRelativeUrl(url).files.addTemplateFile is used to add page, so it is not suitable for your scenario.
http://www.ktskumar.com/2016/08/pnp-js-core-create-new-page-sharepoint-library/
The PnP JS method requires a File DOM or BLOB object for uploading a file to the SharePoint. Based on the those, we will try two options to upload a file to the SharePoint Library or folder in a library:BLOB/FileAPI
http://www.ktskumar.com/2016/09/pnp-js-core-upload-file-sharepoint/
More reference for you from Microsoft:
https://msdn.microsoft.com/en-us/library/office/dn450841.aspx
This is a sample code to use when you want to upload an existing ContentType to your content library on SharePoint.
const templateUrl = '/sites/App/Template/Forms/Template/test.docx'
const name = 'test.docx'
const depositUrl = '/sites/App/Template'
const web = new Web('http://localhost:8080/sites/App'); // Proxy URL for Dev
web.getFileByServerRelativeUrl(templateUrl)
.getBuffer()
.then((templateData: ArrayBuffer) => {
web.getFolderByServerRelativeUrl(depositUrl).files.add(name, templateData));
});
There is an issue if you use SP-Rest-Proxy at the moment (file is corrupted on SharePoint) but it should be fix soon.
If you Deploy your app on SharePoint, it work as expected.
Related links:
https://github.com/pnp/pnpjs/issues/196#issuecomment-410908170
https://github.com/koltyakov/sp-rest-proxy/issues/61

Resources