Writing data to google cloud storage using python - python-3.x

I cannot find a way to to write a data set from my local machine into the google cloud storage using python. I have researched a a lot but didn't find any clue regarding this. Need help, thanks

Quick example, using the google-cloud Python library:
from google.cloud import storage
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print('File {} uploaded to {}.'.format(
source_file_name,
destination_blob_name))
More examples are in this GitHub repo: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/storage/cloud-client

When we want to write a string to a GCS bucket blob, the only change necessary is using blob.upload_from_string(your_string) rather than blob.upload_from_filename(source_file_name):
from google.cloud import storage
def write_to_cloud(your_string):
client = storage.Client()
bucket = client.get_bucket('bucket123456789')
blob = bucket.blob('PIM.txt')
blob.upload_from_string(your_string)

In the earlier answers, I still miss the easiest way, using the open() method.
You can use the blob.open() as follows:
from google.cloud import storage
def write_file():
client = storage.Client()
bucket = client.get_bucket('bucket-name')
blob = bucket.blob('path/to/new-blob-name.txt')
## Use bucket.get_blob('path/to/existing-blob-name.txt') to write to existing blobs
with blob.open(mode='w') as f:
for line in object:
f.write(line)
You can find more examples and snippets here:
https://github.com/googleapis/python-storage/tree/main/samples/snippets

from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
credentials = GoogleCredentials.get_application_default()
service = discovery.build('storage', 'v1', credentials=credentials)
filename = 'file.csv'
bucket = 'Your bucket name here'
body = {'name': 'file.csv'}
req = service.objects().insert(bucket=bucket, body=body, media_body=filename)
resp = req.execute()

from google.cloud import storage
def write_to_cloud(buffer):
client = storage.Client()
bucket = client.get_bucket('bucket123456789')
blob = bucket.blob('PIM.txt')
blob.upload_from_file(buffer)
While Brandon's answer indeed gets the file to Google cloud, it does this by uploading the file, as opposed to writing the file. This means that the file needs to exist on your disk before you upload it to the cloud.
My proposed solution uses an "in-memory" payload (the buffer parameter) which is then written to cloud. To write the content you need to use upload_from_file instead of upload_from_filename, everything else being the same.

Related

loading data from GCS bucket to Sharepoint folder

I am working on a POC where I have to load data from GCS Bucket to a sharePoint Location.
I am using the below code but not able to get desired result.
# Import the storage library
from google.cloud import storage
client = storage.Client()
bucket_name = 'my-bucket'
file_name = 'my-file.csv'
# Download the file from GCS
bucket = client.bucket(bucket_name)
blob = bucket.blob(file_name)
blob.download_to_filename(file_name)
# Import the office365-rest-python-client library
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
# Set the SharePoint site URL
site_url = 'https://7rhjkshshgvavvd.sharepoint.com/sites/MyDemo/testing/'
# Authenticate with SharePoint
context = AuthenticationContext(url=site_url)
if context.acquire_token_for_user(username="XXXXXX", password="XXXXXX"):
print("Authenticated with SharePoint")
else:
print("Failed to authenticate with SharePoint")
# Construct a ClientContext object
client_context = ClientContext(site_url, context)
# Set the path to the file you want to upload
# Upload the file to SharePoint
file_creation_info = File.from_local_file(client_context)
sp_file = file_creation_info.upload()
client_context.execute_query()
print(f'File uploaded to SharePoint: {sp_file.server_relative_url}')

cloud function read storage object data without using the storage client

I have created a simple cloud function with trigger: google.cloud.storage.object.v1.finalized
When a file (.xlsx) is being uploaded to my bucket I want to read it's content.
I am using following method for the same:
import functions_framework
#functions_framework.cloud_event
def process_data(cloud_event):
print(f"Data: {cloud_event.data}")
I am able to print the cloud_event.data but how to I get the actual file which was uploaded
One way that I can do is using the storage client in below manner:
from google.cloud import storage
import functions_framework
def get_file(object_name, bucket_name, download_path):
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(object_name)
blob.download_to_filename(download_path)
#functions_framework.cloud_event
def process_data(cloud_event):
print(f"Data: {cloud_event.data}")
object_name = cloud_event.data['name']
bucket_name = cloud_event.data['bucket']
download_path = "/tmp/"
get_file(object_name, bucket_name, download_path)
But is there a way through which I can get the actual contents of the file without using the cloud storage client ?

How can I scrape PDFs within a Lambda function and save them to an S3 bucket?

I'm trying to develop a simple lambda function that will scrape a pdf and save it to an s3 bucket given the url and the desired filename as input data. I keep receiving the error "Read-only file system,' and I'm not sure if I have to change the bucket permissions or if there is something else I am missing. I am new to S3 and Lambda and would appreciate any help.
This is my code:
import urllib.request
import json
import boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
url = event['url']
filename = event['filename'] + ".pdf"
response = urllib.request.urlopen(url)
file = open(filename, 'w')
file.write(response.read())
s3.upload_fileobj(response.read(), 'sasbreports', filename)
file.close()
This was my event file:
{
"url": "https://purpose-cms-preprod01.s3.amazonaws.com/wp-content/uploads/2022/03/09205150/FY21-NIKE-Impact-Report_SASB-Summary.pdf",
"filename": "nike"
}
When I tested the function, I received this error:
{
"errorMessage": "[Errno 30] Read-only file system: 'nike.pdf.pdf'",
"errorType": "OSError",
"requestId": "de0b23d3-1e62-482c-bdf8-e27e82251941",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 15, in lambda_handler\n file = open(filename + \".pdf\", 'w')\n"
]
}
AWS Lambda has limited space in /tmp, the sole writable location.
Writing into this space can be dangerous without a proper disk management since this storage is kept alive across multiple executions. It can lead to a saturation or unexpected file share with previous requests.
Instead of saving locally the PDF, write it directly to S3, without involving file system this way:
import urllib.request
import json
import boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
url = event['url']
filename = event['filename']
response = urllib.request.urlopen(url)
s3.upload_fileobj(response.read(), 'sasbreports', filename)
BTW: The .pdf appending should be removed according your use case.
AWS Lambda functions can only write to the /tmp/ directory. All other directories are Read-Only.
Also, there is a default limit of 512MB for storage in /tmp/, so make sure you delete the files after upload it to S3 for situations where the Lambda environment is re-used for future executions.

How to change storage class of object in s3 bucket using boto3?

I am trying to change the storage class of an object in S3 from standard to IA
This is similar to this thread. But I would like to do it using boto3 and lambda trigger.
thanks
You can use copy_object class:
You can use the CopyObject action to change the storage class of an object that is already stored in Amazon S3 using the StorageClass parameter.
For example:
import boto3
s3 = boto3.client('s3')
bucket_name = '<your bucket-name>'
object_key = '<your-object-key>'
r = s3.copy_object(
CopySource=f"{bucket_name}/{object_key}",
Bucket=bucket_name,
Key=object_key,
StorageClass='STANDARD_IA')
print(r)

Download file from AWS S3 using Python

I am trying to download a file from Amazon S3 bucket to my local using the below code but I get an error saying "Unable to locate credentials"
Given below is the code I have written:
from boto3.session import Session
import boto3
ACCESS_KEY = 'ABC'
SECRET_KEY = 'XYZ'
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
your_bucket = s3.Bucket('bucket_name')
for s3_file in your_bucket.objects.all():
print(s3_file.key) # prints the contents of bucket
s3 = boto3.client ('s3')
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')
Could anyone help me on this?
You are not using the session you created to download the file, you're using s3 client you created. If you want to use the client you need to specify credentials.
your_bucket.download_file('k.png', '/Users/username/Desktop/k.png')
or
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')
From an example in the official documentation, the correct format is:
import boto3
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME')
You can also use a file-like object opened in binary mode.
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
with open('FILE_NAME', 'wb') as f:
s3.download_fileobj('BUCKET_NAME', 'OBJECT_NAME', f)
f.seek(0)
The code in question uses s3 = boto3.client ('s3'), which does not provide any credentials.
The format for authenticating a client is shown here:
import boto3
client = boto3.client(
's3',
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
aws_session_token=SESSION_TOKEN,
)
# Or via the Session
session = boto3.Session(
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
aws_session_token=SESSION_TOKEN,
)
And lastly you can also re-use the authenticated session you created to get the bucket, and then download then file from the bucket.
from boto3.session import Session
import boto3
ACCESS_KEY = 'ABC'
SECRET_KEY = 'XYZ'
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
# session is authenticated and can access the resource in question
session.resource('s3')
.Bucket('bucket_name')
.download_file('k.png','/Users/username/Desktop/k.png')
For others trying to download files from AWS S3 looking for a more user-friendly solution with other industrial-strength features, check out https://github.com/d6t/d6tpipe. It abstracts the S3 functions into a simpler interface. It also supports directory sync, uploading files, permissions and many other things you need to sync files from S3 (and ftp).
import d6tpipe
api = d6tpipe.api.APILocal() # keep permissions locally for security
settings = \
{
'name': 'my-files',
'protocol': 's3',
'location': 'bucket-name',
'readCredentials' : {
'aws_access_key_id': 'AAA',
'aws_secret_access_key': 'BBB'
}
}
d6tpipe.api.create_pipe_with_remote(api, settings)
pipe = d6tpipe.Pipe(api, 'my-files')
pipe.scan_remote() # show all files
pipe.pull_preview() # preview
pipe.pull(['k.png']) # download single file
pipe.pull() # download all files
pipe.files() # show files
file=open(pipe.dirpath/'k.png') # access file
You can setup your AWS profile with awscli to avoid introduce your credentials in the file. First add your profile:
aws configure --profile account1
Then in your code add:
aws_session = boto3.Session(profile_name="account1")
s3_client = aws_session.client('s3')
FileName:
can be any name; with that name; file will be downloaded.
It can be added to any existing local directory.
Key:
Is the S3 file path along with the file name in the end.
It does not start with a backslash.
Session()
It automatically picks the credentials from ~/.aws/config OR ~/.aws/credentials
If not you need to explicitly pass that.
from boto3.session import Session
import boto3
# Let's use Amazon S3
s3 = boto3.resource("s3")
# Print out bucket names to check you have accessibility
# for bucket in s3.buckets.all():
# print(bucket.name)
session = Session()
OR
session = Session(aws_access_key_id="AKIAYJN2LNOU",
aws_secret_access_key="wMyT0SxEOsoeiHYVO3v9Gc",
region_name="eu-west-1")
session.resource('s3').Bucket('bucket-logs').download_file(Key="logs/20221122_0_5ee03da676ac566336e2279decfc77b3.gz", Filename="/tmp/Local_file_name.gz")

Resources