Can't write XML to S3 from python lambda - python-3.x

I have a python lambda that takes a JSON from my bucket and converts it to an XML file, I'm trying to then write the xml file back to an S3 and I seem to be doing it incorrectly. I've tried converting the element tree and the root to a string an every approach I take I seem to get some error in cloud watch.

I would save XML file in a following way instead of tree.write():
with open('tmp/data.xml', 'w') as file:
file.write(ET.tostring(root).decode('utf-8'))

Related

Uploading a file from memory to S3 with Boto3

This question has been asked many times, but my case is ever so slightly different. I'm trying to create a lambda that makes an .html file and uploads it to S3. It works when the file was created on disk, then I can upload it like so:
boto3.client('s3').upload_file('index.html', bucket_name, 'folder/index.html')
So now I have to create the file in memory, for this I first tried StringIO(). However then .upload_file throws an error.
boto3.client('s3').upload_file(temp_file, bucket_name, 'folder/index.html')
ValueError: Filename must be a string`.
So I tried using .upload_fileobj() but then I get the error TypeError: a bytes-like object is required, not 'str'
So I tried using Bytesio() which wants me to convert the str to bytes first, so I did:
temp_file = BytesIO()
temp_file.write(index_top.encode('utf-8'))
print(temp_file.getvalue())
boto3.client('s3').upload_file(temp_file, bucket_name, 'folder/index.html')
But now it just uploads an empty file, despite the .getvalue() clearly showing that it does have content in there.
What am I doing wrong?
If you wish to create an object in Amazon S3 from memory, use put_object():
import boto3
s3_client = boto3.client('s3')
html = "<h2>Hello World</h2>"
s3_client.put_object(Body=html, Bucket='my-bucket', Key='foo.html', ContentType='text/html')
But now it just uploads an empty file, despite the .getvalue() clearly >showing that it does have content in there.
When you finish writing to a file buffer, the position stays at the end. When you upload a buffer, it starts from the position it is currently in. Since you're at the end, you get no data. To fix this, you just need to add a seek(0) to reset the buffer back to the beginning after you finish writing to it. Your code would look like this:
temp_file = BytesIO()
temp_file.write(index_top.encode('utf-8'))
temp_file.seek(0)
print(temp_file.getvalue())
boto3.client('s3').upload_file(temp_file, bucket_name, 'folder/index.html')

How to append files in GCS with the same schema?

Is there any way one can append two files in GCS, suppose file one is a full
load and second file is an incremental load. Then what's the way we can append
the two?
Secondly, using gsutil compose will append the two files including the attributes
names as well. So, in the final file I want the data of the two files.
You can append two separate files using compose in the Google Cloud Shell and rename the output file as the first file, like this:
gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/obj1
This command is meant for parallel uploads in which you divide a large object file in smaller objects. They get uploaded to Google Cloud Storage and then you can append them to get the original file. You can find more information on Composite Objects and Parallel Uploads.
I've come up with two possible solutions:
Google Cloud Function solution
The option I would go for is using a Cloud Function. Doing something like the following:
Create an empty bucket like append_bucket.
Upload the first file.
Create a Cloud Function to be triggered by new uploaded files on the
bucket.
Upload the second file.
Read the first and the second file (you will have to download them as string first).
Make the append operation.
Upload the result to the bucket.
Google Dataflow solution
You can also do it with Dataflow for BigQuery (keep in mind it’s still in beta).
Create a BigQuery dataset and table.
Create a Dataflow instance, from the template Cloud Storage Text to BigQuery.
Create a Javascript file with the logic to transform the text.
Upload your files in Json format to the bucket.
Dataflow will read the Json file, execute the Javascript code and append the new data to the BigQuery dataset.
At last, export the BigQuery query result to Cloud Storage.

Can AWS Lambda write CSV to response?

like the question says, I would like to know if it is possible to return the response request of a lambda function in CSV format. I already know that is possible to write JSON objects as such, but for my current project, CSV format is necessary. I have only seen discussion of writing CSV files to S3, but that is what we need for this project.
This is an example of what I would like to have displayed in a response:
year,month,day,hour
2017,10,11,00
2017,10,11,01
2017,10,11,02
2017,10,11,03
2017,10,11,04
2017,10,11,05
2017,10,11,06
2017,10,11,07
2017,10,11,08
2017,10,11,09
Thanks!

Appending to a text file in S3

I know how to write and read from a file in S3 using boto. I'm wondering if there is a way to append to a file without having to download the file and re-upload an edited version?
There is no way to append data to an existing object in S3. You would have to grab the data locally, add the extra data, and then write it back to S3.

Upload and Save an excel file with BottlePy

I am creating an application using Bottle framework. I need a feature to upload an Excel file.
I am using the following for file upload.
http://bottlepy.org/docs/dev/tutorial.html#post-form-data-and-file-uploads
On the server side I am getting the file data as binary content. I want to save it in a temporary folder as an Excel file.
I am new to Python and Bottle. Any help will be much appreciated.
Thanks
Chirdeep
Your request.files.data object contains the data about your excel file. So you only need to create a temporary folder and save it inside. This can be done using the tempfile module
f = tempfile.NamedTemporaryFile(delete=False, suffix=".xlsx")
f.write(request.files.data.file.read())
f.close()
I was not able to get simple file writing code like yours to work, So I used the tempfile module. Looking at your code, I would have assumed it would write to the directory where the python file is, if the code is working. Try using the code below, if you don't pass arguments to dir, it will create a file in the current directory.
def save_as_temp_file(data):
with tempfile.NamedTemporaryFile(dir=settings.TEMP_PATH,
delete=False,
suffix=".xlsx") as f:
f.write(data.file.read())
return f.name

Resources