Cloud Storage python client fails to retrieve bucket - python-3.x

I am trying to use the python client library to write blobs to cloud storage. The VM I'm using has Read/Write permissions for storage and I'm able to access the bucket via gsutil, however python is giving me the following error
>>> from google.cloud import storage
>>> storage_client = storage.Client()
>>> bucket = storage_client.get_bucket("gs://<bucket name>")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/client.py", line 225, in get_bucket
bucket.reload(client=self)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/_helpers.py", line 108, in reload
_target_object=self)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.NotFound: 404 GET https://www.googleapis.com/storage/v1/b/gs://<bucket name>?projection=noAcl: Not Found

Phix is right. You only need to specify the bucket name without the 'gs://'. As a matter of fact the API being called (e.g. Buckets: update https://www.googleapis.com/storage/v1/b/bucket) is found here. And here is more on Python's Google Cloud Storage API client library and an example of how to use it.

Related

How to connect to GCP BigTable using Python

I am connecting to my GCP BigTable instance using python (google-cloud-bigtable library), after setting up "GOOGLE_APPLICATION_CREDENTIALS" in my environment variables. I'm successful at doing this.
However, my requirement is that I want to pass the credentials during the run-time while creating the BigTable Client object as shown below:
client = bigtable.Client(credentials='82309281204023049', project='xjkejfkx')
I have followed the GCP BigTable Client Documentation to connect to GCP BigTable, but I am getting this error:
Traceback (most recent call last):
File "D:/testingonlyinformix/bigtable.py", line 14, in <module>
client = bigtable.Client(credentials="9876543467898765", project="xjksjkdn", admin=True)
File "D:\testingonlyinformix\new_venv\lib\site-packages\google\cloud\bigtable\client.py", line 196, in __init__
project=project, credentials=credentials, client_options=client_options,
File "D:\testingonlyinformix\new_venv\lib\site-packages\google\cloud\client\__init__.py", line 320, in __init__
self, credentials=credentials, client_options=client_options, _http=_http
File "D:\testingonlyinformix\new_venv\lib\site-packages\google\cloud\client\__init__.py", line 167, in __init__
raise ValueError(_GOOGLE_AUTH_CREDENTIALS_HELP)
ValueError: This library only supports credentials from google-auth-library-python. See https://google-auth.readthedocs.io/en/latest/ for help on authentication with this library.
Can someone please suggest what are all the fields/attributes a Client object is expecting during the run-time when making a connection to GCP BigTable?
Thanks
After 2 hrs of searching I finally landed on these page(s), please check them out in order:
BigTable Authentication
Using end-user authentication
OAuth Scopes for BigTable
from google_auth_oauthlib import flow
appflow = flow.InstalledAppFlow.from_client_secrets_file(
"client_secrets.json", scopes=["https://www.googleapis.com/auth/bigtable.admin"])
appflow.run_console()
credentials = appflow.credentials
The credentials in the previous step will need to be provided to the BigTable client object:
client = bigtable.Client(credentials=credentials, project='xjkejfkx')
This solution worked for me, if anyone has any other suggestions, please do pitch-in.

Accessing Azure webhook data with python

How do you access using the Python programming language the WebhookData object in the Azure automation webhooks. I read the documentation regarding this, but it is in PowerShell, and not helping in my instance. My Azure webhook URL endpoint is successfully receiving data from a custom external application. I would like to read the received data and run logic driven by the received data. As shown on the below screenshot, I am receiving the data in Azure.
This is the error message I am getting when I attempt to access the WEBHOOKDATA input parameter:
Traceback (most recent call last): File "C:\Temp\rh0xijl1.ayb\3b9ba51c-73e7-44ba-af36-3c910e659c71", line 7, in <module> received_data = WEBHOOKDATA NameError: name 'WEBHOOKDATA' is not defined
This is the code producing the error message:
#!/usr/bin/env python3
import json
# Here is where my question is. How do I get this in Python?
# Surely, I should be able to access this easily. But how.
# Powershell does have a concept of param in the documentation - but I want to do this in Python.
received_data = WEBHOOKDATA
#convert JSON to string
received_as_text = json.dumps(received_data)
print(received_as_text)
You access runbook input parameters with sys.argv. See Tutorial: Create a Python 3 runbook

How to provide serializer, deserializer, and config arguments when instantiating Databricks in Azure Python SDK? [duplicate]

[Previously in this post I asked how to provision a databricks services without any workspace. Now I'm asking how to provision a service with a workspace as the first scenario seems unfeasible.]
As a cloud admin I'm asked to write a script using the Azure Python SDK which will provision a Databricks service for one of our big data dev teams.
I can't find much online about Databricks within the Azure Python SDK other than https://azuresdkdocs.blob.core.windows.net/$web/python/azure-mgmt-databricks/0.1.0/azure.mgmt.databricks.operations.html
and
https://azuresdkdocs.blob.core.windows.net/$web/python/azure-mgmt-databricks/0.1.0/azure.mgmt.databricks.html
These appear to offer some help provisioning a workspace, but I am not quite there yet.
What am I missing?
EDITS:
Thanks to #Laurent Mazuel and #Jim Xu for their help.
Here's the code I'm running now, and the error I'm receiving:
client = DatabricksClient(credentials, subscription_id)
workspace_obj = client.workspaces.get("example_rg_name", "example_databricks_workspace_name")
WorkspacesOperations.create_or_update(
workspace_obj,
"example_rg_name",
"example_databricks_workspace_name",
custom_headers=None,
raw=False,
polling=True
)
error:
TypeError: create_or_update() missing 1 required positional argument: 'workspace_name'
I'm a bit puzzled by that error as I've provided the workspace name as the third parameter, and according to this documentation, that's just what this method requires.
I also tried the following code:
client = DatabricksClient(credentials, subscription_id)
workspace_obj = client.workspaces.get("example_rg_name", "example_databricks_workspace_name")
client.workspaces.create_or_update(
workspace_obj,
"example_rg_name",
"example_databricks_workspace_name"
)
Which results in:
Traceback (most recent call last):
File "./build_azure_visibility_core.py", line 112, in <module>
ca_databricks.create_or_update_databricks(SUB_PREFIX)
File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/expd_az_databricks.py", line 34, in create_or_update_databricks
self.databricks_workspace_name
File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/azure-visibility-core/lib64/python3.6/site-packages/azure/mgmt/databricks/operations/workspaces_operations.py", line 264, in create_or_update
**operation_config
File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/azure-visibility-core/lib64/python3.6/site-packages/azure/mgmt/databricks/operations/workspaces_operations.py", line 210, in _create_or_update_initial
body_content = self._serialize.body(parameters, 'Workspace')
File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/azure-visibility-core/lib64/python3.6/site-packages/msrest/serialization.py", line 589, in body
raise ValidationError("required", "body", True)
msrest.exceptions.ValidationError: Parameter 'body' can not be None.
ERROR: Job failed: exit status 1
So Line 589 in serialization.py has an error. I don't see where an error in my code is causing that. Thanks to all who have been generous to assist!
you need to create a databrick client, and workspaces will be attached to it:
client = DatabricksClient(credentials, subscription_id)
workspace = client.workspaces.get(resource_group_name, workspace_name)
I don't think creating a service without a workspace is even possible, trying to create databricks service on the portal, you will see workspace name is required as well
so using the SDK I would look at the doc for client.workspaces.create_or_update
(I work at MS in the SDK team)
with help from #Laurent Mazuel and support engineers at Microsoft, I have a solution:
managed_resource_group_ID = ("/subscriptions/"+sub_id+"/resourceGroups/"+managed_rg_name)
client = DatabricksClient(credentials, subscription_id)
workspace_obj = client.workspaces.get(rg_name, databricks_workspace_name)
client.workspaces.create_or_update(
{
"managedResourceGroupId": managed_resource_group_ID,
"sku": {"name":"premium"},
"location":location
},
rg_name,
databricks_workspace_name
).wait()

Error: 503 DNS resolution failed in Google Translate API but ONLY when executing the Python file via terminal

I am running into a strange issue when using Google Translate API with a JSON authorization key. I can run the file without any issue directly in my editor of choice (VSCode). No errors.
But when I run the same file via terminal, the file executes up until after loading the Google Translate credentials from disk. The call then throws below error message.
Since this only ever happens when running the file from terminal, I find this bug hard to tackle.
The goal is to then have this file collect data and translate some of the fields using Google services, then store the data in a data base.
Below is the error message:
Error: 503 DNS resolution failed
Traceback (most recent call last):
File "/home/MY-USER-NAME/anaconda3/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
File "/home/MY-USER-NAME/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 690, in __call__
File "/home/MY-USER-NAME/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 592, in _end_unary_response_blocking
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "DNS resolution failed"
debug_error_string = "{"created":"#1584629013.091398712","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3934,"referenced_errors":[{"created":"#1584629013.091395769","description":"Resolver transient failure","file":"src/core/ext/filters/client_channel/resolving_lb_policy.cc","file_line":262,"referenced_errors":[{"created":"#1584629013.091394954","description":"DNS resolution failed","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc","file_line":370,"grpc_status":14,"referenced_errors":[{"created":"#1584629013.091389655","description":"C-ares status is not ARES_SUCCESS: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":244,"referenced_errors":[{"created":"#1584629013.091380513","description":"C-ares status is not ARES_SUCCESS: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":244}]}]}]}]}"
>
This is the relevant part of my code:
Dedicated CRON file (I use this two-step setup in many other projects without any issue)
#! anaconda3/bin/python
import os
os.chdir(r'/home/MY-USER-NAME/path-to-project')
import code-file
Code file (simple .py script)
[...]
from google.oauth2 import service_account
from google.cloud import translate
key_path = 'path-to-file/credentials.json'
credentials = service_account.Credentials.from_service_account_file(
key_path, scopes=['https://www.googleapis.com/auth/cloud-platform'])
def translate_to_en(contents, credentials=None, verbose=False, use_pinyin=False):
[...]
client = translate.TranslationServiceClient(credentials=credentials)
parent = client.location_path(credentials.project_id, 'global')
[...]
response = client.translate_text(
parent=parent,
contents=[contents],
mime_type='text/plain', # mime types: text/plain, text/html
target_language_code='en-US')
[...]
[...]
for c in trans_cols:
df[f'{c}__trans'] = df[c].apply(lambda x: translate_to_en(contents=x, credentials=credentials, verbose=False))
[...]
If there is anyone with a good idea to solve this, your help is greatly appreciated.
It seems the issue is relevant to the grpc bug reported in github. For gRPC name resolution, you can follow this doc on github.

upload files to tmp directory lambda

I have a Lambda function that triggers when an S3 upload happens. It then downloads to the /tmp and then sends to GCP Storage. Issue is that the logfiles can be up to 900 MB so there is not enough space on the /tmp storage in the Lambda function. Is there away around this?
I tried sending to memory but I believe the memory is read only.
Also there is talk about mounting efs but not sure this will work.
retrieve bucket name and file_key from the S3 event
logger.info(event)
s3_bucket_name = event['Records'][0]['s3']['bucket']['name']
file_key = event['Records'][0]['s3']['object']['key']
logger.info('Reading {} from {}'.format(file_key, s3_bucket_name))
logger.info(s3_bucket_name)
logger.info(file_key)
# s3 download file
s3.download_file(s3_bucket_name, file_key, '/tmp/{}'.format(file_key))
# upload to google bucket
bucket = google_storage.get_bucket(google_bucket_name)
blob = bucket.blob(file_key)
blob.upload_from_filename('/tmp/{}'.format(file_key))
This the error from cloudwatch logs for the lambda function.
[ERROR] OSError: [Errno 28] No space left on device
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 30, in lambda_handler
s3.download_file(s3_bucket_name, file_key, '/tmp/
storage_client = storage.Client()
bucket = storage_client.get_bucket("YOUR_BUCKET_NAME")
blob = bucket.blob("file/path.csv") #file path on your gcs
blob.upload_from_filename("/tmp/path.csv") #tmp file
i hope that will help you.

Resources