Can AWS Lambda write CSV to response? - python-3.x

like the question says, I would like to know if it is possible to return the response request of a lambda function in CSV format. I already know that is possible to write JSON objects as such, but for my current project, CSV format is necessary. I have only seen discussion of writing CSV files to S3, but that is what we need for this project.
This is an example of what I would like to have displayed in a response:
year,month,day,hour
2017,10,11,00
2017,10,11,01
2017,10,11,02
2017,10,11,03
2017,10,11,04
2017,10,11,05
2017,10,11,06
2017,10,11,07
2017,10,11,08
2017,10,11,09
Thanks!

Related

Can't write XML to S3 from python lambda

I have a python lambda that takes a JSON from my bucket and converts it to an XML file, I'm trying to then write the xml file back to an S3 and I seem to be doing it incorrectly. I've tried converting the element tree and the root to a string an every approach I take I seem to get some error in cloud watch.
I would save XML file in a following way instead of tree.write():
with open('tmp/data.xml', 'w') as file:
file.write(ET.tostring(root).decode('utf-8'))

How can I convert a Pyspark dataframe to a CSV without sending it to a file?

I have a dataframe which I need to convert to a CSV file, and then I need to send this CSV to an API. As I'm sending it to an API, I do not want to save it to the local filesystem and need to keep it in memory. How can I do this?
Easy way: convert your dataframe to Pandas dataframe with toPandas(), then save to a string. To save to a string, not a file, you'll have to call to_csv with path_or_buf=None. Then send the string in an API call.
From to_csv() documentation:
Parameters
path_or_bufstr or file handle, default None
File path or object, if None is provided the result is returned as a string.
So your code would likely look like this:
csv_string = df.toPandas().to_csv(path_or_bufstr=None)
Alternatives: use tempfile.SpooledTemporaryFile with a large buffer to create an in-memory file. Or you can even use a regular file, just make your buffer large enough and don't flush or close the file. Take a look at Corey Goldberg's explanation of why this works.

How to get parquet file schema in Node JS AWS Lambda?

Is there any way to read a parquet file schema from Node.JS?
If yes, how?
I saw that there is a lib, parquetjs but as I saw it from the documentation it can only read and write the contents of the file.
After some investigation, I've found that the parquetjs-lite can do that. It does not read the whole file, just the footer and then it extracts the schema from it.
It works with a cursor and the way I saw it there is two s3.getobject calls, one for the size and one for the given data.

Why output from google video intelligence not in JSON format

I have been trying to use the google video intelligence API from https://cloud.google.com/video-intelligence/docs/libraries and I tried the exact same code. The response output was supposed to be in json format however the output was either a google.cloud.videointelligence_v1.types.AnnotateVideoResponse or something similar to that.
I have tried the code from many resources and recently from https://cloud.google.com/video-intelligence/docs/libraries but still no JSON output was given. What I got when I checked the type of output I got:
type(result)
google.cloud.videointelligence_v1.types.AnnotateVideoResponse
So, how do I get a JSON response from this?
If you specify an outputUri, the results will be stored in your GCS bucket in json format. https://cloud.google.com/video-intelligence/docs/reference/rest/v1/videos/annotate
It seems like you aren't storing the result in GCS. Instead you are getting the result via the GetOperation call, which has the result in AnnotateVideoResponse format.
I have found a solution for this. What I had to do was import this
from google.protobuf.json_format import MessageToJson
import json
and run
job = client.annotate_video(
input_uri='gs://xxxx.mp4',
features=['OBJECT_TRACKING'])
result = job.result()
serialized = MessageToJson(result)
a = json.loads(serialized)
type(a)
what I was doing was turn the results into a dictionary.
Or for more info, try going to this link: google forums thread

Nodejs best way to read xlsx as utf8 text

I need to read xlsx in nodejs. Xlsx contains text with accents and apostrophes and so on. Then i have to save the text in json file.
What are the best practices to perform that task?
Stage 1 - take a look at this module node-xlsx or more robust and possibly better for your needs xlsx.
Stage 2 - Writing the file to JSON - if the module can return a JSON format then great. If you use xlsx it has an option to JSON --> take a look here.
Since you may need to actually strip and/or protect special accents etc. you may need to validate the data which is returned before producing a JSON file.
As to actually writing a JSON file, there are a huge amount of NPM modules for the task.

Resources