upload a file directly into S3 using python - python-3.x

I want to download a file received from a http url, directly into an amazon s3 bucket, instead of local system.
I run python on a 64 bit windows os.
I tried providing the Amazon S3's bucket url as the second argument of urlretrieve function of python during the file extract.
urllib.request.urlretrieve(url, amazon s3 bucket url)
I expected it to upload the file directly to s3, however it fails with filenotFound error , which , after some thought makes sense.

It appears that you want to run a command on a Windows computer (either local or running on Amazon EC2) that will copy the contents of a page identified by a URL directly onto Amazon S3.
This is not possible. There is no API call for Amazon S3 that retrieves content from a different location.
You will need to download the file from the Internet and then upload it to Amazon S3. The code would look something like:
import boto3
import urllib.request
urllib.request.urlretrieve('http://example.com/hello.txt', '/tmp/hello.txt')
s3 = boto3.client('s3')
s3.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')

Related

I am not able to read dat file from S3 bucket using lambda function

I have been trying to read dat file from one s3 bucket and convert it into CSV and then compress it and put it into another bucket
for open and reading i am using below code but it is throwing me an error No such file or directory
with open(f's3://{my_bucket}/{filenames}', 'rb') as dat_file:
print(dat_file)'''
The Python language does not natively know how to access Amazon S3.
Instead, you can use the boto3 AWS SDK for Python. See: S3 — Boto 3 documentation
You also have two choices about how to access the content of the file:
Download the file to your local disk using download_file(), then use open() to access the local file, or
Use get_object() to obtain a StreamingBody of the file contents
See also: Amazon S3 Examples — Boto 3 documentation

Writing a new file to a Google Cloud Storage bucket from a Google Cloud Function (Python)

I am trying to write a new file (not upload an existing file) to a Google Cloud Storage bucket from inside a Python Google Cloud Function.
I tried using google-cloud-storage but it does not have the
"open" attribute for the bucket.
I tried to use the App Engine library GoogleAppEngineCloudStorageClient but the function cannot deploy with this dependencies.
I tried to use gcs-client but I cannot pass the credentials inside the function as it requires a JSON file.
Any ideas would be much appreciated.
Thanks.
from google.cloud import storage
import io
# bucket name
bucket = "my_bucket_name"
# Get the bucket that the file will be uploaded to.
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket)
# Create a new blob and upload the file's content.
my_file = bucket.blob('media/teste_file01.txt')
# create in memory file
output = io.StringIO("This is a test \n")
# upload from string
my_file.upload_from_string(output.read(), content_type="text/plain")
output.close()
# list created files
blobs = storage_client.list_blobs(bucket)
for blob in blobs:
print(blob.name)
# Make the blob publicly viewable.
my_file.make_public()
You can now write files directly to Google Cloud Storage. It is no longer necessary to create a file locally and then upload it.
You can use the blob.open() as follows:
from google.cloud import storage
def write_file():
client = storage.Client()
bucket = client.get_bucket('bucket-name')
blob = bucket.blob('path/to/new-blob.txt')
with blob.open(mode='w') as f:
for line in object:
f.write(line)
You can find more examples and snippets here:
https://github.com/googleapis/python-storage/tree/main/samples/snippets
You have to create your file locally and then to push it to GCS. You can't create a file dynamically in GCS by using open.
For this, you can write in the /tmp directory which is an in memory file system. By the way, you will never be able to create a file bigger than the amount of the memory allowed to your function minus the memory footprint of your code. With a function with 2Gb, you can expect a max file size of about 1.5Gb.
Note: GCS is not a file system, and you don't have to use it like this
EDIT 1
Things have changed since my answer:
It's now possible to write in any directory in the container (not only the /tmp)
You can stream write a file in GCS, as well as you receive it in streaming mode on CLoud Run. Here a sample to stream write to GCS.
Note: stream write deactivate the checksum validation. Therefore, you won't have integrity checks at the end of the file stream write.

How to upload downloaded file to s3 bucket using Lambda function

I saw different questions/answers but I could not find the one that worked for me. Hence, I am really new to AWS, I need your help. I am trying to download gzip file and load it to the json file then upload it to the S3 bucket using Lambda function. I wrote the code to download the file and convert it to json but having problem while uploading it to the s3 bucket. Assume that file is ready as x.json. What should I do then?
I know it is really basic question but still help needed :)
This code will upload to Amazon S3:
import boto3
s3_client = boto3.client('s3', region_name='us-west-2') # Change as appropriate
s3._client.upload_file('/tmp/foo.json', 'my-bucket', 'folder/foo.json')
Some tips:
In Lambda functions you can only write to /tmp/
There is a limit of 512MB
At the end of your function, delete the files (zip, json, etc) because the container can be reused and you don't want to run out of disk space
If your lambda has proper permission to write a file into S3, then simply use boto3 package which is an AWS SDK for python.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html
Be aware that if the lambda locates inside of VPC then lambda cannot access to the public internet, and also boto3 API endpoint. Thus, you may require a NAT gateway to proxy lambda to the public.

How to pass video file from S3 bucket to opencv VideoCapture?

I'm working on an aws lambda function on python that reads videos uploaded to an s3 bucket and extracts a few frames from it, i already have the script for extracting the frames with opencv but i don't know what parameter i should pass to cv2.VideoCapture since the file is only accessible through the s3 bucket.
I've tried passing the video as an s3 object with s3.get_object() as well as with s3.download_fileobj, none of this seemed to work tho.
I've also tried passing just the key of the video file in s3 but it didn't work either (I didn't expect this to work, but i was hopeless).
Code i have now:
import boto3
import cv2
import io
def lambda_handler(event, context):
s3 = boto3.client("s3")
bucket_name = "my_bucket"
video_key = "videos/video.mp4"
vidcap = cv2.VideoCapture(s3.get_object(Bucket=bucket_name,Key=video_path))
success,image = vidcap.read()
I've also tried with:
vidcap = cv2.VideoCapture(s3.download_fileobj(Bucket=bucket_name, Key=video_key, Fileobj=io.BytesIO())
But with no luck either
I'm getting success = False and image=None. I expect the output of success to be True and the image to be a numpy array to be able to read it.
A presigned url for S3 object can be used.
url = s3_client.generate_presigned_url( ClientMethod='get_object', Params={ 'Bucket': bucket, 'Key': key } )
vidcap = cv2.VideoCapture(url)
OpenCV is expecting to access a file on the local disk.
You would need to download the file from Amazon S3, then reference that file.
Please note that AWS Lambda only provides 500MB of disk space, and only in the /tmp/ directory.
You can try to create a AWS CloudFront distribution for s3 bucket. Here is the tutorial link: Use CloudFront to serve HTTPS requests S3

How TensorFlow read file from s3 bytestream

I have done a deep learning model in TensorFlow for image recognition, and this one works reading an image file from local directory with tf.read_file() method, but I need now that the file be read by TensorFlow since a variable that is a Byte-Streaming that extract the image file since an S3 Bucket of Amazon without storage the streaming in local directory
You should be able to pass in the fully formed s3 path to tf.read_file(), like:
s3://bucket-name/path/to/file.jpeg where bucket-name is the name of your s3 bucket, and path/to/file.jpeg is where it's stored in your bucket. It seems possible you might be running into some access permissions issue, depending on if your bucket is private. You can follow https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/s3.md to set up your credentials
Is there an error you ran into when doing this?

Resources