Preferred method of downloading large files from AWS S3 to EC2 server - node.js

I'm having some intermittent problems downloading a largeish (3.5GB) file from S3 to an EC2 instance. about 95% of the time, it works great, and fast - maybe 30 seconds. However, that 5% of the time it stalls out and can take > 2 hours to download. Restarting the job normally solves this problem - indicating that the problem is transient. This is making me think there is a problem with how I'm downloading files. Below is my implementation - I pipe the read stream into a write stream to disk and return a promise which resolves when it is done (or rejects on error).
Is this the preferred method of downloading large files from S3 with node.js? Are there any "gotchas" I should know about?
function getDownloadStream(Bucket, Key) {
return s3
.getObject({
Bucket,
Key
})
.on('error', (error) => {
console.error(error);
return Promise.reject(`S3 Download Error: ${error}`);
})
.createReadStream();
}
function downloadFile(inputBucket, key, destination) {
return new Promise(function(resolve, reject){
getDownloadStream(inputBucket, key)
.on('end', () => {
resolve(destination);
})
.on('error', reject)
.pipe(fs.createWriteStream(destination));
});
}

By default traffic to s3 goes through internet so download speed can become unpredictable. To increase the download speed and for security reasons you can configure aws endpoints, which is a virtual device and it can be used for routing the traffic between your instance to s3 through their internal network(much faster) than going through internet.
While creating endpoint service for s3, you need select route tables of the instances where the app is hosted. after creating you will see a entry in those route tables like destination (com.amazonaws.us-east-1.s3) -> target vpce-xxxxxx, so when ever traffic goes to s3 it is routed through the endpoint instead of going through internet.
Alternatively you can also try parallelising the download like downloading range of bytes in parallel and combine it but for 3.5GB above approach should be fine.

Related

Download file to AWS Lambda using SSH client

I am trying to download a file from an EC2 instance and store it temporarily in the tmp folder inside AWS Lambda. This is what I have tried:
let Client = require('ssh2-sftp-client');
let sftp = new Client();
sftp.connect({
host: host,
username: user,
privateKey : fs.readFileSync(pemfile)
}).then(() => {
return sftp.get('/config/test.txt' , fs.createWriteStream('/tmp/test.txt'))
}).then(() => {
sftp.end();
}).catch(err => {
console.error(err.message);
});
The function runs without generating an error but nothing is written to the destination file. What am I doing wrong here and how could I debug this? Also is there a better way of doing this altogether?
This is not the cloud way to do it IMO. Create a S3 bucket, and create proper Lambda execution role for the lambda function to be able to read from the bucket. Also, create a role for the EC2 instance to be able to write to the same S3 bucket. Using S3 API from both sides, the lambda function and the EC2, should be enough to share the file.
Think about this approach: you decouple your solution from a VPC and region perspective. Also, since the lambda only needs to access S3, you save a ENI (elastic network interface) resources, so you are not using your VPC private ips. These are just advantages that may not care in your case, but it is good to be aware of them.

Rackspace cloud taking too long to upload?

Im following rackspace example on file upload to cloud storage on docs. It is works but the upload are taking too loong. Like really longer! No matter what region do I use 17,kb file take more than 3sec is this actual behaviour of rackspace cloud, they really slow?
Im using rackspace with nodejs, whith the help of pacakage named pkgcloud.
// taken from pkgcloud docs
var readStream = fs.createReadStream('a-file.txt');
var writeStream = client.upload({
container: 'a-container',
remote: 'remote-file-name.txt'
});
writeStream.on('error', function(err) {
// handle your error case
});
writeStream.on('success', function(file) {
// success, file will be a File model
});
readStream.pipe(writeStream);
The purpose here is, I do image processing on the backend then I send back CDN URL to the user, but a user cannot wait too long 2MB took forever to upload -- timeout and held my server until crash since the stream aren't finished yet

Files don't by fs.writeFile() on Heroku

JSON-files don't write by fs.writeFile on Heroku.
Console is clear.
fs.writeFile('${__dirname}/config.json', JSON.stringify(config),(err) => {
if(err) console.log(err);
});
You can't persistently write files to Heroku's filesystem, which is ephemeral. Any changes you make will be lost the next time your dyno restart, which happens frequently (at least once per day).
Use a client-server database like PostgreSQL (or choose another service), or store files on a third-party object storage service like Amazon S3 instead.
Two suggestions:
Try using "writeFileSync" function instead of "writeFile"
Make a function and include your line in the body. Add "await" to the first line. Then put "async" at the front. Then call the function. For example:
const myWriteFunction = async (filename) => {
await fs.writeFile(filename, 'Hello World')
}

Heroku doesn't allow temporary downloads. How can I work around this limitation?

I am trying to download images from MongoDB every time my app starts so it will work fast and as the images are in the app, but Heroku crashes. How can I solve this?
Here is the code I'm trying to use:
dir = "./public/media/"
function getAllImages() {
Image.find({}, function (err, allImages) {
if (err) {
console.log(err);
} else {
allImages.forEach(file => {
fs.writeFile(dir + file.name, file.img.data, function (err) {
if (err) throw err;
console.log('Sucessfully saved!');
});
});
};
});
I currently have 24 images which add up to approximately 10 MB. I will use them as static images in my application. I would like to access them via example.com/media/foo.jpg, etc.
User uploads can't be stored on Heroku's ephemeral filesystem. Any changes made to it will be lost whenever your dyno restarts, which happens frequently (at least once per day). Heroku recommends storing uploaded files on a service like Amazon S3.
You can have your users upload files directly from their browsers to S3 or you could use the AWS SDK to save files from your back-end. A higher-level library like multer-s3 might be helpful too.
It's not usually a good idea to store files in your database, but you can store a pointer to files in your database. For example, you might store https://domain.tld/path/to/some/image.jpg in your database if that's where the file actually lives.
I just learn that the problem was the folder that i use (./public/media) was empty so heroku (even though it is there in the git system) did not create the folder. Because of this the code fs.writeFile(dir + file.name, file.img.data, function (err) didn't work. Thanks for the answers.

upload to S3 bucket

I am new to Elastic Beanstalk, just uploaded an NodeJs app.
I'm wondering if it's possible to "link" (like unix symlinks) a folder to a S3 bucket?
Make "/recordings" points to S3:
var filename = 'recordings/' + match[1] + '.wav';
var file = fs.createWriteStream(filename);
var request = https.get(url, function(response) {
response.pipe(file);
file.on('finish', function() {
file.close();
}).on('error', function(err) {
fs.unlink(file);
console.log('error downloading recording');
});
You can use tools like s3fs-fuse to mount S3 buckets to your filesystem. However, this is generally not recommended as S3 is not designed to be used as a block storage device.
As the s3fs readme documents:
Generally S3 cannot offer the same performance or semantics as a local
file system. More specifically:
random writes or appends to files require rewriting the entire file
metadata operations such as listing directories have poor performance
due to network latency
eventual consistency can temporarily yield
stale data no atomic renames of files or directories
no coordination
between multiple clients mounting the same bucket
no hard links
The best way to use S3 with your Node application is using the AWS SDK for JavaScript in Node.js.

Resources