I couldn't remove a custom metadata key from a file in Firebase storage.
This is what I tried so far:
blob = bucket.get_blob("dir/file")
metadata = blob.metadata
metadata.pop('custom_key', None) # or del metadata['custom_key']
blob.metadata = metadata
blob.patch()
I also tried to set its value to None but it didn't help.
It seems that there are some reasons that could be affecting you to delete the custom metadata. I will address them individually, so it's easier for understanding.
First, it seems that when you read the metadata with blob.metadata, it only returns as a read-only - as clarified here. So, your updates will not work as you would like, using the way you are trying. The second reason, it seems that saving the metadata again back to blob, follows a different order than what you are trying - as shown here.
You can give it a try using the below code:
blob = bucket.get_blob("dir/file")
metadata = blob.metadata
metadata.pop{'custom_key': None}
blob.patch()
blob.metadata = metadata
While this code is untested, I believe it might help you changing the orders and avoid the blob.metadata read-only situation.
In case this doesn't help you, I would recommend you to raise an issue for in the official Github repository for the Python library on Cloud Storage, for further clarifications from the developers.
Related
I saw a few similar posts, but unfortunately none helped me.
I have an s3 bucket (on scaleway), and I'm trying to simply list all objects contained in that bucket, using boto3 s3 client as follow:
s3 = boto3.client('s3',
region_name=AWS_S3_REGION_NAME,
endpoint_url=AWS_S3_ENDPOINT_URL,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY
)
all_objects = s3.list_objects_v2(Bucket=AWS_STORAGE_BUCKET_NAME)
This simple piece of code responds with an error:
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the ListObjects operation: The specified key does not exist.
First, the error seems inapropriate to me since I'm not specifying any key to search. I also tried to pass a Prefix argument to this method to narrow down the search to a specific subdirectory, same error.
Second, I tried to achieve the same thing using boto3 Resource rather than Client, as follow:
session = boto3.Session(
region_name=AWS_S3_REGION_NAME,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY
)
resource = session.resource(
's3',
endpoint_url=AWS_S3_ENDPOINT_URL,
)
for bucket in resource.buckets.all():
print(bucket.name)
That code produces absolutely nothing. One weird thing that strikes me is that I don't pass the bucket_name anywhere here, which seems to be normal according to aws documentation
There's no chance that I misconfigured the client, since I'm able to use the put_object method perfectly with that same client. One strange though: when I want to put a file, I pass the whole path to put_object as Key (as I found it to be the way to go), but the object is inserted with the bucket name prepend to it. So let's say I call put_object(Key='/path/to/myfile.ext'), the object will end up to be /bucket-name/path/to/myfile.ext.
Is this strange behavior the key to my problem ? How can I investigate what's happening, or is there another way I could try to list bucket files ?
Thank you
EDIT: So, after logging the request that boto3 client is sending, I noticed that the bucket name is append to the url, so instead of requesting https://<bucket_name>.s3.<region>.<provider>/, it requests https://<bucket_name>.s3.<region>.<provider>/<bucket-name>/, which is leading to the NoSuchKey error.
I took a look into the botocore library, and I found this:
url = _urljoin(endpoint_url, r['url_path'], host_prefix)
in botocore.awsrequest line 252, where r['url_path'] contains /skichic-bucket?list-type=2. So from here, I should be able to easily patch the library core to make it work for me.
Plus, the Prefix argument is not working, whatever I pass into it I always receive the whole bucket content, but I guess I can easily patch this too.
Now it's not satisfying, since there's no issue related to this on github, I can't believe that the library contains such a bug that I'm the first one to encounter.
Does anyone can explain this whole mess ? >.<
For those who are facing the same issue, try changing your endpoint_url parameter in your boto3 client or resource instantiation from https://<bucket_name>.s3.<region>.<provider> to https://s3.<region>.<provider> ; i.e for Scaleway : https://s3.<region>.scw.cloud.
You can then set the Bucket parameter to select the bucket you want.
list_objects_v2(Bucket=<bucket_name>)
you can try this. you'll have to use your resource instead of my s3sr.
s3sr = resource('s3')
bucket = 'your-bucket'
prefix = 'your-prefix/' # if no prefix, pass ''
def get_keys_from_prefix(bucket, prefix):
'''gets list of keys for given bucket and prefix'''
keys_list = []
paginator = s3sr.meta.client.get_paginator('list_objects_v2')
# use Delimiter to limit search to that level of hierarchy
for page in paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter='/'):
keys = [content['Key'] for content in page.get('Contents')]
print('keys in page: ', len(keys))
keys_list.extend(keys)
return keys_list
keys_list = get_keys_from_prefix(bucket, prefix)
After looking more closely into things, I've found out that (a lot) of botocore services endpoints patterns starts with the bucket name. For example, here's the definition of the list_objects_v2 service:
"ListObjectsV2":{
"name":"ListObjectsV2",
"http":{
"method":"GET",
"requestUri":"/{Bucket}?list-type=2"
},
My guess is that in the standard implementation of AWS S3, there's a genericendpoint_url (which explains #jordanm comment) and the targeted bucket is reached through the endpoint.
Now, in the case of Scaleway, there's an endpoint_url for each bucket, with the bucket name contained in that url (e.g https://<bucket_name>.s3.<region>.<provider>), and any endpoint should directly starts with a bucket Key.
I made a fork of botocore where I rewrote every endpoint to remove the bucket name, if that can help someone in the future.
Thank's again to all contributors !
I'm writing an application based on GCP services and I need to access to an external project. I stored on my Firestore database the authentication file's informations of the other project I need to access to. I read this documentation and I tried to apply it but my code does not work. As the documentaion says, what I pass to the authentication method is a dictionary[str, str].
This is my code:
from googleapiclient import discovery
from google.oauth2 import service_account
from google.cloud import firestore
project_id = body['project_id']
user = body['user']
snap_id = body['snapshot_id']
debuggee_id = body['debuggee_id']
db = firestore.Client()
ref = db.collection(u'users').document(user).collection(u'projects').document(project_id)
if ref.get().exists:
service_account_info = ref.get().to_dict()
else:
return None, 411
credentials = service_account.Credentials.from_service_account_info(
service_account_info,
scopes=['https://www.googleapis.com/auth/cloud-platform'])
service = discovery.build('clouddebugger', 'v2', credentials=credentials)
body is just a dictionary containing all the informations of the other project. What I can't understand is why this doesn't work and instead using the method from_service_account_file it works.
The following code will give to that method the same informations of the previous code, but inside a json file instead of a dictionary. Maybe the order of the elements is different, but I think that it doesn't matter at all.
credentials = service_account.Credentials.from_service_account_file(
[PATH_TO_PROJECT_KEY_FILE],
scopes=['https://www.googleapis.com/auth/cloud-platform'])
Can you tell me what I'm doing wrong with the method from_service_account_info?
Problem solved. When I posted the question I manually inserted from the GCP Firestore Console all the info about the other project. Then I wrote the code to make it authomatically and it worked. Honestly I don't know why it didn't worked before, the informations put inside Firestore were the same and the format as well.
We have a lot of docs in Couchbase with expiration = 0, which means that documents stay in Couchbase forever. I am aware that INSERT/UPDATE/DELETE isn't supported by N1QL.
We have 500,000,000 such docs and I would like to do this in parallel using chunks/bulks. How can I update the expiration field using Python 3?
I am trying this:
bucket.touch_multi(('000c4894abc23031eed1e8dda9e3b120', '000f311ea801638b5aba8c8405faea47'), ttl=10)
However I am getting an error like:
_NotFoundError_0xD (generated, catch NotFoundError): <Key=u'000c4894abc23031eed1e8dda9e3b120'
I just tried this:
from couchbase.cluster import Cluster
from couchbase.cluster import PasswordAuthenticator
cluster = Cluster('couchbase://localhost')
authenticator = PasswordAuthenticator('Administrator', 'password')
cluster.authenticate(authenticator)
cb = cluster.open_bucket('default')
keys = []
for i in range(10):
keys.append("key_{}".format(i))
for key in keys:
cb.upsert(key, {"some":"thing"})
print(cb.touch_multi(keys, ttl=5))
and I get no errors, just a dictionary of keys and OperationResults. And they do in fact expire soon thereafter. I'd guess some of your keys are not there.
However maybe you'd really rather set a bucket expiry? That will make all the documents expire in that time, regardless of what the expiry on the individual documents are. In addition to the above answer that mentions that, check out this for more details.
You can use Couchbase Python (Any) SDK Bucket.touch() method Described here https://docs.couchbase.com/python-sdk/current/document-operations.html#modifying-expiraton
If you don't know the document keys you can use N1QL Covered index get the document keys asynchronously inside your python SDK and use the above bucket touch API set expiration from your python SDK.
CREATE INDEX ix1 ON bucket(META().id) WHERE META().expiration = 0;
SELECT RAW META().id
FROM bucket WHERE META().expiration = 0 AND META().id LIKE "a%";
You can issue different SELECT's for different ranges and do in parallel.
Update Operation, You need to write one. As you get each key do (instead of update) bucket.touch(), which only updates document expiration without modifying the actual document. That saves get/put of whole document (https://docs.couchbase.com/python-sdk/current/core-operations.html#setting-document-expiration).
According to this answer one can retrieve immediate "subdirectories" by querying by prefix and then obtaining CommonPrefix of the result of Client.list_objects() method.
Unfortunately, Client is a part of so-called "low level" API.
I am using different API:
session = Session(aws_access_key_id=access_key,
aws_secret_access_key=secret_key)
s3 = session.resource('s3')
my_bucket = s3.Bucket(bucket_name)
result = my_bucket.objects.filter(Prefix=prefix)
and this method does not return dictionary.
Is it possible to obtain common prefixes with higher level API in boto3?
As noted in this answer, it seems that the Resource doesn't handle Delimiter well. It is often annoying, when your entire stack relies on Resource, to be told that, ah, you should have instantiated a Client instead...
Fortunately, a Resource object, such as your Bucket above, contains a client as well.
So, instead of the last line in your code sample, do:
paginator = my_bucket.meta.client.get_paginator('list_objects')
for resp in paginator.paginate(Bucket=my_bucket.name, Prefix=prefix, Delimiter='/', ...):
for x in resp.get('CommonPrefixes', []):
print(x['Prefix'])
You can access client from session.
session.client('s3').list_objects(Bucket=bucket_name, Prefix= prefix)
i am able to upload file to azure blob using REST api provided by Azure.
i want to set metadata at the time i am doing request for put blob, when i am setting it into header as shown here i am unble to upload file and getting following exception org.apache.http.client.ClientProtocolException.
from the last line of the code below
HttpPut req = new HttpPut(uri);
req.setHeader("x-ms-blob-type", blobType);
req.setHeader("x-ms-date", date);
req.setHeader("x-ms-version", storageServiceVersion);
req.setHeader("x-ms-meta-Cat", user);
req.setHeader("Authorization", authorizationHeader);
HttpEntity entity = new InputStreamEntity(is,blobLength);
req.setEntity(entity);
HttpResponse response = httpClient.execute(req);
regarding the same, i have two questions.
can setting different metadata, avoid overwriting of file? See my question for the same here
if Yes for first question, how to set metadata in REST request to put blob into Azure?
please help
So a few things are going here.
Regarding the error you're getting, it is because you're not adding your metadata header when calculating authorization header. Please read Constructing the Canonicalized Headers String section here: http://msdn.microsoft.com/en-us/library/windowsazure/dd179428.aspx.
Based on this, you would need to change the following line of code (from your blog post)
String canonicalizedHeaders = "x-ms-blob-type:"+blobType+"\nx-ms-date:"+date+"\nx-ms-version:"+storageServiceVersion;
to
String canonicalizedHeaders = "x-ms-blob-type:"+blobType+"\nx-ms-date:"+date+"\nx-ms-meta-cat"+user+"\nx-ms-version:"+storageServiceVersion;
(Note: I have just made these changes in Notepad so they may not work. Please go to the link I mentioned above for correctly creating the canonicalized headers string.
can setting different metadata, avoid overwriting of file?
Not sure what you mean by this. You can update metadata of a blob by performing Set Blob Metadata operation on a blog.