Download file to AWS Lambda using SSH client - node.js

I am trying to download a file from an EC2 instance and store it temporarily in the tmp folder inside AWS Lambda. This is what I have tried:
let Client = require('ssh2-sftp-client');
let sftp = new Client();
sftp.connect({
host: host,
username: user,
privateKey : fs.readFileSync(pemfile)
}).then(() => {
return sftp.get('/config/test.txt' , fs.createWriteStream('/tmp/test.txt'))
}).then(() => {
sftp.end();
}).catch(err => {
console.error(err.message);
});
The function runs without generating an error but nothing is written to the destination file. What am I doing wrong here and how could I debug this? Also is there a better way of doing this altogether?

This is not the cloud way to do it IMO. Create a S3 bucket, and create proper Lambda execution role for the lambda function to be able to read from the bucket. Also, create a role for the EC2 instance to be able to write to the same S3 bucket. Using S3 API from both sides, the lambda function and the EC2, should be enough to share the file.
Think about this approach: you decouple your solution from a VPC and region perspective. Also, since the lambda only needs to access S3, you save a ENI (elastic network interface) resources, so you are not using your VPC private ips. These are just advantages that may not care in your case, but it is good to be aware of them.

Related

what the best way to upload larger files to s3 with nodejs aws-sdk? MultipartUpload vs ManagedUpload vs getSignedURL, etc

Im trying to look over the ways AWS has to offer in order to upload files to s3. When I looked into their docs it confused the hell of out me. Looking up to the various resources I came to know a bit more resources like s3.upload vs s3.putObject and others realised there are physical limitations in API gateway and using lambda function to upload a file.
Particularly in case of uploading large file like 1-100 GB AWS suggests multiple methods to upload file to s3. Amongst them are createMultipartUpload, ManagedUpload, getSignedURL and tons of other.
So my Question is:
What is the best and the easiest way to upload large files to s3 where I also can cancel the upload process. The multipart upload seems to tedious.
There's no Best Way to upload file to S3
It depends on what you want especially what are the sizes of the object that you want to upload.
putObject - Ideal for objects which are under 20MB
Presigned Url - Allows you to bypass API Gateway and Put object under 5GB to s3 bucket
Multipart Upload - Allows you to upload files in chunks which means you can continue your upload even the connection went off temporarily. The maximum file size you can upload via this method is 5TB.
Use Streams to upload to S3, this way the Node.JS server doesn't take too much of the resources.
const AWS = require('aws-sdk');
const S3 = new AWS.S3();
const stream = require('stream');
function upload(S3) {
let pass = new stream.PassThrough();
let params = {
Bucket: BUCKET,
Key: KEY,
Body: pass
};
S3.upload(params, function (error, data) {
console.error(error);
console.info(data);
});
return pass;
}
const readStream = fs.createReadStream('/path/to/your/file');
readStream.pipe(upload(S3));
This is via streaming local file, the stream can be from request as well.
If want to listen to the progress can use ManagedUpload
const manager = S3.upload(params);
manager.on('httpUploadProgress', (progress) => {
console.log('progress', progress)
// { loaded: 6472, total: 345486, part: 3, key: 'large-file.dat' }
});

Failed attempts to write to DynamoDB Local

I recently discovered DynamoDB Local and started building it into my project for local development. I decided to go the docker image route (as opposed to the downloadable .jar file.
That being said I've gotten image up and running and have created a table and can successfully interact with the docker container via the aws cli. aws dynamodb list-tables --endpoint-url http://localhost:8042 successfully returns the table I created previously.
However, when I run my lambda function and set my aws config like so.
const axios = require('axios')
const cheerio = require('cheerio')
const randstring = require('randomstring')
const aws = require('aws-sdk')
const dynamodb = new aws.DynamoDB.DocumentClient()
exports.lambdaHandler = async (event, context) => {
let isLocal = process.env.AWS_SAM_LOCAL
if (isLocal) {
aws.config.update({
endpoint: new aws.Endpoint("http://localhost:8042")
})
}
(which I have confirmed is getting set) it actually writes to the table (with the same name of the local dynamodb instance) in the live AWS Webservice as opposed to the local container and table.
It's also worth mentioning I'm unable to connect to the local instance of DynamoDB with the AWS NoSQL Workbench tool even though it's configured to point to http://localhost:8042 as well...
Am I missing something? Any help would be greatly appreciated. I can provide any more information if I haven't already done so as well :D
Thanks.
SDK configuration changes, such as region or endpoint, do not retroactively apply to existing clients (regular DynamoDB client or a document client).
So, change the configuration first and then create your client object. Or simply pass the configuration options into the client constructor.

Fetch requests goes towards the local base URL

I try to fetch data with my web-client (express-server) from my backend-service (also express-server). Locally it works fine, using environment variables to set the backend-service-url. But deployed on AWS, it won't let me fetch from my web client EC2 to my backend EC2.
I log my environment variable for the backend-service (comes from AWS SSM Paramter Store) and it logs the correct service-url for my backend EC2 instance.
But then it fails, because it calls 'GET host-url/service-url/endpoint' instead of 'GET service-url/endpoint'. Don't know, if this is an AWS or node.js/express problem.
That's how I call my backend:
async function callEndpoint(endpointUrl) {
console.log("Fetching to: " + endpointUrl)
const response = await fetch(endpointUrl, {
method: 'GET',
});
let data = await response.json();
return data;
The console.log prints out the correct value, but fetch however makes (I guess, but don't understand why) it a relative path, prefixing it with the host-url from my frontend EC2 instance IP/DNS.
Don't know, how relevant, but my servers running in Docker containers in an ECS cluster (each container on its own EC2 instance).
If you're not specifying the scheme in the URL, fetch assumes that the domain root should be applied to the URL.
fetch("external-service.domain.com/endpoint")
translates into
fetch("https://hostname/external-service.domain.com/endpoint")
Try adding https:// or the appropriate scheme to your URL.
Read more:
https://url.spec.whatwg.org/#url-writing

Preferred method of downloading large files from AWS S3 to EC2 server

I'm having some intermittent problems downloading a largeish (3.5GB) file from S3 to an EC2 instance. about 95% of the time, it works great, and fast - maybe 30 seconds. However, that 5% of the time it stalls out and can take > 2 hours to download. Restarting the job normally solves this problem - indicating that the problem is transient. This is making me think there is a problem with how I'm downloading files. Below is my implementation - I pipe the read stream into a write stream to disk and return a promise which resolves when it is done (or rejects on error).
Is this the preferred method of downloading large files from S3 with node.js? Are there any "gotchas" I should know about?
function getDownloadStream(Bucket, Key) {
return s3
.getObject({
Bucket,
Key
})
.on('error', (error) => {
console.error(error);
return Promise.reject(`S3 Download Error: ${error}`);
})
.createReadStream();
}
function downloadFile(inputBucket, key, destination) {
return new Promise(function(resolve, reject){
getDownloadStream(inputBucket, key)
.on('end', () => {
resolve(destination);
})
.on('error', reject)
.pipe(fs.createWriteStream(destination));
});
}
By default traffic to s3 goes through internet so download speed can become unpredictable. To increase the download speed and for security reasons you can configure aws endpoints, which is a virtual device and it can be used for routing the traffic between your instance to s3 through their internal network(much faster) than going through internet.
While creating endpoint service for s3, you need select route tables of the instances where the app is hosted. after creating you will see a entry in those route tables like destination (com.amazonaws.us-east-1.s3) -> target vpce-xxxxxx, so when ever traffic goes to s3 it is routed through the endpoint instead of going through internet.
Alternatively you can also try parallelising the download like downloading range of bytes in parallel and combine it but for 3.5GB above approach should be fine.

Lambda Timeout while communicating with S3

I'm trying to simply list all the files in an S3 bucket using Lambda
The code looks as follows:
var AWS = require('aws-sdk');
var s3 = new AWS.S3();
exports.handler = (event, context, callback) => {
s3.listObjectsV2({
Bucket: "bucketname",
}, function(err, data) {
console.log("DONE : " + err + " : " + data);
callback(null, 'Hello from Lambda');
});
};
Using the above, I never get the "DONE" printed at all. The log doesn't show any information except for the fact that it timed out.
Is there any troubleshooting I could do here? I would've thought that at least the error would've been shown in the "DONE" section.
Thanks to Michael above. The problem was that it was running inside a VPC. If I change it to No VPC, it works correctly. Your solution may be different if you require it to run in a VPC.
If you are running your code inside VPC make sure to create VPC Endpoint.
Here is the tutorial: https://aws.amazon.com/blogs/aws/new-vpc-endpoint-for-amazon-s3/
If you are running your code inside the VPC, Make sure VPC subnet and its routing table entry should be proper (routing : Dest= 0.0.0.0/0 and target = igw-xxxx). Also VPC endpoint routing must be added in order to communicate to s3 via endpoint.
In my case I have selected 2 different subnets, 1 is private and other is public. So it was working sometimes and sometimes not. I changed both subnets to private (having NAT gateway in route) and now it that worked without timeout error.

Resources