Lambda which reads jpg/vector files from S3 and processes them using graphicsmagick - node.js

We have a lambda which reads jpg/vector files from S3 and processes them using graphicsmagick.
This lambda was working fine till today. But since today morning we are getting errors while processing vector images using grahicsmagick.
"Error: Command failed: identify: unable to load module /usr/lib64/ImageMagick-6.7.8/modules-Q16/coders/ps.la': file not found # error/module.c/OpenModule/1278.
identify: no decode delegate for this image format/tmp/magick-E-IdkwuE' # error/constitute.c/ReadImage/544."
The above error is occurring for certain .eps files (vector) while using the identify function of the gm module.
Could you please share your insights on this.
Please let us know whether any backend changes have gone through with the aws end for Imagemagick module recently which might have had an affect on this lambda.

Related

KeyError: 'PYSPARK_GATEWAY_SECRET' when creating spark context inside aws lambda code

I have deployed a lambda function which uses sparknlp, as a docker container. For working with sparknlp I need spark context. So, In my sparknlp code, I start with
sc = pyspark.SparkContext().getOrCreate()
I tested my lambda on local and it worked fine.
On aws I got this error :
java gateway process exited before sending its port number
even though JAVA_HOME was properly set.
I found out in the source code :
https://github.com/apache/spark/blob/master/python/pyspark/java_gateway.py
in the launch_gateway method that it tries to create a temporary file and if that file is not created it raises the above error. (line 105)
Lambda won't allow write access to file system, so the file cannot be created.
So I am trying to pass gateway_port and gateway_secret as environment variables.
I have kept the PYSPARK_GATEWAY_PORT=25333 which is the default value.
I am not able to figure out how to get the PYSPARK_GATEWAY_SECRET.
Which is why getting the error :
KeyError: 'PYSPARK_GATEWAY_SECRET'

How to abort uploading a stream to google storage in Node.js

Interaction with Cloud Storage is performed using the official Node.js Client library.
Output of an external executable (ffmpeg) through fluent-ffmpeg is piped to a writable stream of a Google Cloud Storage object using [createWriteStream].(https://googleapis.dev/nodejs/storage/latest/File.html#createWriteStream).
Executable (ffmpeg) can end with an error. In this case the file is created on Cloud Storage with 0 length.
I want to abort uploading on the command error to avoid finalizing an empty storage object.
What is the proper way of aborting the upload stream?
Current code (just an excerpt):
ffmpeg()
.input(sourceFile.createReadStream())
.output(destinationFile.createWriteStream())
.run();
Files are instances of https://cloud.google.com/nodejs/docs/reference/storage/latest/storage/file.

How to start an ec2 instance using sqs and trigger a python script inside the instance

I have a python script which takes video and converts it to a series of small panoramas. Now, theres an S3 bucket where a video will be uploaded (mp4). I need this file to be sent to the ec2 instance whenever it is uploaded.
This is the flow:
Upload video file to S3.
This should trigger EC2 instance to start.
Once it is running, I want the file to be copied to a particular directory inside the instance.
After this, I want the py file (panorama.py) to start running and read the video file from the directory and process it and then generate output images.
These output images need to be uploaded to a new bucket or the same bucket which was initially used.
Instance should terminate after this.
What I have done so far is, I have created a lambda function that is triggered whenever an object is added to that bucket. It stores the name of the file and the path. I had read that I now need to use an SQS queue and pass this name and path metadata to the queue and use the SQS to trigger the instance. And then, I need to run a script in the instance which pulls the metadata from the SQS queue and then use that to copy the file(mp4) from bucket to the instance.
How do i do this?
I am new to AWS and hence do not know much about SQS or how to transfer metadata and automatically trigger instance, etc.
Your wording is a bit confusing. It says that you want to "start" an instance (which suggests that the instance already exists), but then it says that it wants to "terminate" an instance (which would permanently remove it). I am going to assume that you actually intend to "stop" the instance so that it can be used again.
You can put a shell script in the /var/lib/cloud/scripts/per-boot/ directory. This script will then be executed every time the instance starts.
When the instance has finished processing, it can call sudo shutdown now -h to turn off the instance. (Alternatively, it can tell EC2 to stop the instance, but using shutdown is easier.)
For details, see: Auto-Stop EC2 instances when they finish a task - DEV Community
I tried to answer in the most minimalist way, there are many points below that can be further improved. I think below is still quite some as you mentioned you are new to AWS.
Using AWS Lambda with Amazon S3
Amazon S3 can send an event to a Lambda function when an object is created or deleted. You configure notification settings on a bucket, and grant Amazon S3 permission to invoke a function on the function's resource-based permissions policy.
When the object uploaded it will trigger the lambda function. Which creates the instance with ec2 user data Run commands on your Linux instance at launch.
For the ec2 instance make you provide the necessary permissions via Using instance profiles for download and uploading the objects.
user data has a script that does the rest of the work which you need for your workflow
Download the s3 object, you can pass the name and s3 bucket name in the same script
Once #1 finished, start the panorama.py which processes the videos.
In the next step you can start uploading the objects to the S3 bucket.
Eventually terminating the instance will be a bit tricky which you can achieve Change the instance initiated shutdown behavior
OR
you can use below method for terminating the instnace, but in that case your ec2 instance profile must have access to terminate the instance.
ec2-terminate-instances $(curl -s http://169.254.169.254/latest/meta-data/instance-id)
You can wrap the above steps into a shell script inside the userdata.
Lambda ec2 start instance:
def launch_instance(EC2, config, user_data):
ec2_response = EC2.run_instances(
ImageId=config['ami'], # ami-0123b531fc646552f
InstanceType=config['instance_type'],
KeyName=config['ssh_key_name'],
MinCount=1,
MaxCount=1,
SecurityGroupIds=config['security_group_ids'],
TagSpecifications=tag_specs,
# UserData=base64.b64encode(user_data).decode("ascii")
UserData=user_data
)
new_instance_resp = ec2_response['Instances'][0]
instance_id = new_instance_resp['InstanceId']
print(f"[DEBUG] Full ec2 instance response data for '{instance_id}': {new_instance_resp}")
return (instance_id, new_instance_resp)
Upload file to S3 -> Launch EC2 instance

AWS Lambda Python lots of "could not create '/var/task/__pycache__/FILENAMEpyc'" messages

In the configuration for my Pyhon 3.6 AWS Lambda function I set the environment variable "PYTHONVERBOSE" with a setting of 1
Then in the Cloudwatch logs for my function it shows lots of messages similar to:
could not create '/var/task/pycache/auth.cpython-36.pyc': OSError(30, 'Read-only file system')
Is this important? Do I need to fix it?
I don't think you can write in the /var/task/ folder. If you want to write something to disk inside of the lambda runtime try the /tmp folder.

Azure: importing not already existing packages in 'src'

I have an experiment in which a module R script uses functions defined in a zip source (Data Exploration). Here it's described how to do about the packages not already existing in the Azure environment.
The DataExploration module has been imported from a file Azure.zip containing all the packages and functions I need (as shown in the next picture).
When I run the experiment nothing goes wrong. At the contrary, watching the log it seems clear that Azure is able to manage the source.
The problem is that, when I deploy the web service (classic), if I run the experiment I get the following error:
FailedToEvaluateRScript: The following error occurred during
evaluation of R script: R_tryEval: return error: Error in
.zip.unpack(pkg, tmpDir) : zip file 'src/scales_0.4.0.zip' not found ,
Error code: LibraryExecutionError, Http status code: 400, Timestamp:
Thu, 21 Jul 2016 09:05:25 GMT
It's like he cannot see the scales_0.4.0.zip into the 'src' folder.
The strange fact is that all used to work until some days ago. Then I have copied the experiment on a second workspace and it gives me the above error.
I have also tried to upload again the DataExploration module on the new workspace, but it's the same.
I have "solved" thanks to the help of the AzureML support: it is a bug they are trying to solve right now.
The bug shows up when you have more R script modules, and the first has no a zip input module while the following have.
Workaround: connect the zip input module to the first R script module too.

Resources