ImageOptimization in Rails 5 ActiveStorage - rails-activestorage

I'm planning to upgrade Rails to 5.2 in one of my websites and introduce ActiveStorage, as of right now I use Paperclip with paperclip_optimizer. One of the negative sides is that I will lose the optimizer, when replacing paperclip with ActiveStorage. How can I implement automatically image optimization on user uploads in ActiveStorage?

It's possible by creating a custom Variation. There is a good example here:
https://prograils.com/posts/rails-5-2-active-storage-new-approach-to-file-uploads

If you are on AWS you can create Lambda function that can listen to an S3 bucket for uploads, and runs image optimization on the newly uploaded files.

Active Storage doesn’t have built-in support for optimizing images on upload.

Related

VMSS Custom image pros/cons

I need to install .net framework 4.7 on my VMSS. I tried using script extension but since I need to reboot the machine after the installation, it was a bit complex.
I decided to go with custom image. I created a VM, installed the .net framework and then captured it to image. Was painless process.
My question is, it seems that if my VMSS is using custom image, I cannot update it to use a marketplace image. Are there any other things I lose by using custom images?
George! Long time no see :).
This is a great question, but I don't think it's documented anywhere. I threw together a quick blog post describing the pros and cons: https://negatblog.wordpress.com/2018/06/28/custom-vs-platform-images-for-scale-sets/. Here's the summary:
Platform images tend to allow for greater scale
Some features only support platform images
When using custom images, you control the image lifecycle, so you don't need to worry about the image being removed unexpectedly
Deployment speed can differ between the two (either way can be faster depending on the scenario)
With custom images, you can actually capture data disks along with the OS disk, allowing you to easily initialize each VM in the scale set with data
Hope this helps! :)
-Neil

Best way to save images on Amazon S3 and distribute them using CloudFront

The application I'm working on (nodejs) has user profiles and each profile can have multiple images. I'm using S3 as my main storage and CloudFront to distribute them.
The thing is sometimes users upload large images and what I want to do is to scale the image when downloading it (view it in an html img tag, or a mobile phone) mainly because of performance.
I don't know if I should scale the image BEFORE uploading it to S3 (maybe using lwip https://github.com/EyalAr/lwip) or is there a way of scaling the image or getting a low quality image when downloading it through CloudFront?. I've read that CloudFront can compress the files using Gzip but also not recommended for images.
I also don't want to upload a scaled + original image to S3 because of the storage.
Should be done in client, server or S3? what is the best way of doing it?
is there a way of scaling the image or getting a low quality image when downloading it through CloudFront?
There is no feature like this. If you want the image resized, resampled, scaled, compressed, etc., you need to do it before it is saved to its final location in S3.
Note that I say its final location in S3.
One solution is to upload the image to an intermediate location in S3, perhaps in a different bucket, and then resize it with code that modifies the image and stores it in the final S3 location, whence CloudFront will fetch it on behalf of the downloading user.
I've read that CloudFront can compress the files using Gzip but also not recommended for images.
Images benefit very little from gzip compression, but the CloudFront documentation also indicates that CloudFront doesn't compress anything that isn't in some way formatted as text, which tends to benefit much more from gzip compression.
I also don't want to upload a scaled + original image to S3 because of the storage.
I believe this is a mistake on your part.
"Compressing" images is not like compressing a zip file. Compressing images is lossy. You cannot reconstruct the original image from the compressed version because image compression as discussed here -- by definition -- is the deliberate discarding information from the image to the point that the size is within the desired range and while the quality is in an acceptable range. Image compression is both a science and an art. If you don't retain the original image, and you later decide that you want to modify your image compression algorithm (either because you later decide the sizes are still too large or because you decide the original algorithm was too aggressive and resulted in unacceptably low quality), you can't run your already-compressed images through the compression algorithm a second time without further loss of quality.
Use S3's STANDARD_IA ("infrequent access") storage class to cut the storage cost of the original images in half, in exchange for more expensive downloads -- because these images will rarely ever be downloaded again, since only you will know their URLs in the bucket where they are stored.
Should be done in client, server or S3?
It can't be done "in" S3 because S3 only stores objects. It doesn't manipulate them.
That leaves two options, but doing it on the server has multiple choices.
When you say "server," you're probably thinking of your web server. That's one option, but this process can be potentially resource-intensive, so you need to account for it in your plans for scalability.
There are projects on GitHub, like this one, designed to do this using AWS Lambda, which provides "serverless" code execution on demand. The code runs on a server, but it's not a server you have to configure or maintain, or pay for when it's not active -- Lambda is billed in 100 millisecond increments. That's the second option.
Doing it on the client is of course an option, but seems potentially more problematic and error-prone, not to mention that some solutions would be platform-specific.
There isn't a "best" way to accomplish this task.
If you aren't familiar with EXIF metadata, you need to familiarize yourself with that, as well. In addition to resampling/resizing, you probably also need to strip some of the metadata from user-contributed images, to avoid revealing sensitive data that your users may not realize is attached to their images -- such as the GPS coordinates where the photo was taken. Some sites also watermark their user-submitted images this also would be something you'd probably do at the same time.
I would store the images in S3 in STANDARD_IA and then resize them on the fly with Lambda running nodejs and sharp to do the heavy lifting. Google does something similar I believe since you can request your profile img in any dimensions.
The AWS Networking & Content Deliver blog has a post that may give you a lot of what you need. Check it out here.
The basic idea is this:
Upload the image to S3 like normal (you can do STANDARD_IA to save on costs if you want)
Send requests to Cloudfront with query parameters that include the size of image you want (i.e. https ://static.mydomain.com/images/image.jpg?d=100×100)
Using Lambda Edge functions, you can build the resized images and store them in s3 as needed before their served up via the CDN. Once a resized version is created, its always available in S3
Cloudfront returns the newly resized image, which was just created.
Its a bit more work, but it gets you some great resizing to whatever size you want/need as you need it. It also gives you flexibility to change your img size you want to serve to the client from the UI at any time. Here are a couple similar posts.. some don't even use Cloudfront, but just serve through ApiGateway as the intermediary.
https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/
https://github.com/awslabs/serverless-image-resizing

Where to store configuration of an elastic beanstalk application?

I create a small nodejs application which run on aws elasticbeanstalk. At the moment the application configuration is store in a json file. I want to create an frontend to manipulate some parts of this configuration and read about MEAN stack. But Amazon has no MongoDB support. So what is the best pratice in aws elasticbeanstalk to handle configurations for an application? To store this in S3 Bucket is very easy but I think the performace is not very good.
Best regards
How much configuration data are you talking about? If it is typical small amount, and it only changes once in a while, but you need it available each time the application restarts, S3 is probably the easiest and cheapest option. Spinning up a MongoDB instance, just to store a small amount of mostly-read-only data is probably overkill. What makes you think the performance is not very good?
AWS usually recommends DynamoDB for such cases, but in this case you are getting vendor lock in. Also choose of the configuration storage depend on requirements how fast new changes need to be applied to the instances?
Good option to use mysql as configuration db, because you avoid vendor lock in, you can deliver configuration changes as fast as they has been applied and in app can be used memcached interface of the mysql.

Should I store an image in MongoDB or in local File System (by Node.js)

I use Node.js for my project.
Should I store an image in local file system, or should I store it in MongoDB?
Which way is more scalable?
The most scalable solution is to use a shared storage service such as Amazon's S3 (or craft your own).
This allows you to scale horizontally a lot easier when you decide to add machines to your application layer, as you won't have to worry about any migration nightmares.
The basic idea behind this is to keep the storage layer decoupled from the application layer. So using this idea you could create a node.js process on a separate machine that accepts file uploads then writes them to disk.
If you're designing a performance sensitive system, use file system to store your images no doubt.
You can find the performance compare from this blog:
http://blog.thisisfeifan.com/2013/12/mongodb-gridfs-performance-test.html
Actually, you can find the recommended MongoDB GridFS use cases here:
https://docs.mongodb.com/manual/core/gridfs/#when-to-use-gridfs
I would use GridFS to take advantage of sharding but for best performance I would use filesystem with nginx.

Automatically sync two Amazon S3 buckets, besides s3cmd?

Is there a another automated way of syncing two Amazon S3 bucket besides using s3cmd? Maybe Amazon has this as an option? The environment is linux, and every day I would like to sync new & deleted files to another bucket. I hate the thought of keeping all eggs in one basket.
You could use the standard Amazon CLI to make the sync.
You just have to do something like:
aws s3 sync s3://bucket1/folder1 s3://bucket2/folder2
http://aws.amazon.com/cli/
S3 buckets != baskets
From their site:
Data Durability and Reliability
Amazon S3 provides a highly durable storage infrastructure designed for mission-critical and primary data storage. Objects are redundantly stored on multiple devices across multiple facilities in an Amazon S3 Region. To help ensure durability, Amazon S3 PUT and COPY operations synchronously store your data across multiple facilities before returning SUCCESS. Once stored, Amazon S3 maintains the durability of your objects by quickly detecting and repairing any lost redundancy. Amazon S3 also regularly verifies the integrity of data stored using checksums. If corruption is detected, it is repaired using redundant data. In addition, Amazon S3 calculates checksums on all network traffic to detect corruption of data packets when storing or retrieving data.
Amazon S3’s standard storage is:
Backed with the Amazon S3 Service Level Agreement.
Designed to provide 99.999999999% durability and 99.99% availability of objects over a given year.
Designed to sustain the concurrent loss of data in two facilities.
Amazon S3 provides further protection via Versioning. You can use Versioning to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket. This allows you to easily recover from both unintended user actions and application failures. By default, requests will retrieve the most recently written version. Older versions of an object can be retrieved by specifying a version in the request. Storage rates apply for every version stored.
That's very reliable.
I'm looking for something similar and there are a few options:
Commercial applications like: s3RSync. Additionally, CloudBerry for S3 provides Powershell extensions for Windows that you can use for scripting, but I know you're using *nix.
AWS API + (Fav Language) + Cron. (hear me out). It would take a decently savvy person with no experience in AWS's libraries a short time to build something to copy and compare files (using the ETag feature of the s3 keys). Just providing a source/target bucket, creds, and iterating through keys and issuing the native "Copy" command in AWS. I used Java. If you use Python and Cron you could make short work of a useful tool.
I'm still looking for something already built that's open source or free. But #2 is really not a terribly difficult task.
EDIT: I came back to this post and realized nowadays Attunity CloudBeam is also a commercial solution many folks.
It is now possible to replicate objects between buckets in two different regions via the AWS Console.
The official announcement on the AWS blog explains the feature.

Resources