Storing assets within the AWS ecosystem - node.js

I am newbie to AWS development (but have extensive experience on traditional development).
I need to build a web app with ReactJS frontend, NodeJs/Express backend, MySQL. Its SaaS app possibly with thousands of clients. There will be a use case where we have a Parent client having hundreds of Child clients.
So, parent-child relationship within clients itself. Child's settings supersede parents. Each client (doesn't matter child or parent) will have its unique logo and style. Child may or may not override logos and styles. If Child doesn't override it gets from Parent Client. and so on..
I can handle logos/styles/settings at the time of client's onboarding using some configuration tool. Thus, I will upload/change the logos/styles/settings for parent and/or child clients- at the time of client's implementation. I need ability to change these logos/styles/settings, later, whenever clients demand so.
What are my options on how to design the app: (again, I am newbie to AWS)?
Storage-wise, what's the best place to store logos/styles/settings? If AWS S3, will it provide me certain folder layout to handle parent-child or should I dump all images/styles(css) in single folder with client's prefix on each item?
Other option, pulling of images/styles/settings during runtime when site renders. Thus, I will to determine parent-child relationships for every click on web app and determine where to grab the resources from. Little overhead at runtime since I am pushing the parent-child logic at runtime instead of configuration-time/one-time.
Any thoughts/alternate design/suggestions/pros&cons with respect to AWS environment?

Assets are definitely best place in Amazon S3, each asset is referred to as an object within Amazon S3. You give the object a key such as client/main.css. By doing this you could separate out each client into their own prefix (you might see this to look like a subfolder within the GUI).
With setting it depends how sensitive they are, if it is simply for your frontend then you could store a JSON file in S3 within the same prefix as your assets. Otherwise if there should be some security over the settings you can use DynamoDB which boasts "DynamoDB offers consistent single-digit millisecond latency".

As Chris Williams has already mentioned, use S3 as your raw data store for images, js, css, html, other assets. Additionally, you can set up a cloudfront distribution in front of these assets to serve them quickly to your customers. Cloudfront has edge support as well so your website will be performant globally.
Theres a lot of great resources on S3 + Cloudfront for website content serving available online.

Related

Serve images and static content from nodejs

I'm building an app on nodejs in Ubuntu 20.
Mainly the app have to handle a series of cars that each user submit to the server through a form. Through this form the user specify the name of the car, the model, an image of the car and other informations ...
I manage the submitted image using gridfs and store them into mongodb with all the other datas.
Each time that a user load the site, a 30 rows table with the uploaded cars like the above one is displayed so the server must manage on each site route request
reading the cars from the db
rendering the html and it's css.minified - js.minified
35/40 requests for rendering the images of the cars in the table and other images
I'm thinking that this 35/40 requests for reading-rendering the images from mongodb can be handled from a secondary nodejs server instead of the main one that manage the app.
I need to make the main app lighter to allow more users on the site, and to do so my idea was to have 2 node applications
The main app that serves the html pages and read/serve all the main informations, like the cars names etc..
The images app that handle all the requests for upload/render the images
But my concern is, Does this solution makes sense or it will only make the server busier to have two node applications instead of one?
Node really is designed for this type of architecture, and breaking up your services into micro-services is actually a good idea when it makes sense ( some will argue this point but personally I feel like the separation of concerns works well when you can scale horizontally for increased load)
Just a few comments ( take with a grain of salt) but..
I would not 'store' the image in the DB ( I'm taking this that you are storing a base64 object for the image?)
Use an object store like s3 to store your images
Use a CDN for static resources
If you architect your stack in such a way, and offload specific request to other services you can greatly reduce load AND increase scalability by not tethering the node server to 1 file system etc.
As an example, if you were to have a stack like
Front-end server that uses say ejs to render things to the client
processing server to do image uploads
Object store to store / serve images (static resources too, js, css etc)
DB server (postgres, mongo etc)
CDN to cache static resources
Redis to cache DB queries / API responses ( if needed )
The front end server should be able to run stateless, meaning it does not have any dependencies to the server (e.g images, css etc)
The processing server, this endpoint will just handle processing images, sending to the object store and updating the DB
The object store will house all images
DB serer to store data
The CDN will cache all request to further reduce server requests
Redis cache to cache API calls, again this is as needed and really depends on how you have it configured to pull your data, may or may not be needed.
The idea here is you are creating an application that can scale out horizontally and is now a great candidate for clustering, containers etc. Because each server has no dependencies to the server it can scale as needed to handle more load.

Upload to AWS S3 Best Practice

I'm creating a node backend application and I have an entity which can have files assigned.
I have the following options:
Make a request and upload the files as soon as the user selects them in the frontend form and assign them to the entity when the user makes the request to create / update it
Upload the files in the same request which creates / updates the entity
I was wondering if there is a best practice for this scenario. I can't really decide whats better.
This is one of those "depends" answers, and it depends how you are doing uploads and if you plan to clean up your S3 buckets.
I'd suggest creating the entity first (option #2), because than you can store what S3 files are with that entity. If you tried option #1, you might have untracked files (or some kind of staging area), which could require cleanup at some point in the future. (If you files are small, it may never matter, and you just eat that $0.03/GB fee each month : )
I've been seeing on some web sites that look like option #1, where files are included in my form/document as I'm "editing". Pasting an image from my buffer is particularly sweet, and sometimes I see a placeholder of text while it is uploaded, showing the picture when complete. Now I think these "documents" are actually saved on their servers in some kind of draft status, so it might be your option #2 anyway. You could do the same that creates a draft entity, and finalizes it later on (and then have a way to clean out drafts and their attachments at some point).
Also, depending on bucket privacy you need to achieve, have a look at AWS Cognito to upload directly from the browser. You could save your server bandwidth, and reduce your request time, by not using your server as as pass-through.

For a web app that allows simple image uploads, how should I store the images? Confused about file system vs. cdn

Every search result says something about storing the images in the file system but store the paths in the database, but I'm not sure exactly what "file system" means. Would that mean you have something like:
/public (assets)
/js
/css
/img
/app (frontend)
/server (backend)
and you'd upload directly to that /public/img directory?
I remember trying something like that in the past with a Node.js app hosted on Heroku, and it wouldn't let me. I had to set up Amazon S3 and upload the images THERE, which leads to my confusion.
Is using something like Amazon S3 the usual practice or do people upload directly to the /img directory (assuming this is the "file system"?) and it just happened to be the case that Heroku doesn't allow this but other hosts do?
I'd characterize the pattern as "store the data in a blob storage service, store a pointer in your database". The uploaded file is the "blob" - once it has left the user's computer and filesystem, is it really a file anymore? :) On the server, a file system can store that "blob". S3 can store that blob. In the first case, you are storing a path. In the second case, you are storing the URL to the S3 object. A database could even store that blob (not at all recommended, though...)
In any case, the question to ask is: "what happens when I need two app servers to support my traffic?". Wherever that blob goes, both app servers need access to it.
In a data center under your control, there are many ways to share a filesystem across servers - network attached storage (NFS- or SMB-mounted volumes), or storage area networks (iSCSI, Fibre Channel). With more limited network/hardware configuration options in cloud-based Infrastructure/Platform-as-a-Service providers, the de facto standard is S3 because it is inexpensive, reliable, easy to use, and can completely offload serving the file from your servers.
For Heroku, though, you don't have much control over the file system. And, know that the file system for each of your dynos is "ephemeral" - it goes away when the dyno restarts. Which will happen when your app goes idle, or every 24 hours, whichever comes first. So that forces the choice a little.
Final point - S3 comes with the ancillary benefit of taking the burden of serving the blob off of your servers. You can also store files directly to S3 from the browser, without routing it through your app (see https://devcenter.heroku.com/articles/s3-upload-node). The benefit in both cases is that those downloads/uploads can take up lots of your application's precious time for stuff that's pretty rote.
Uploading directly to a host file system is generally not a best practice. This is one reason services like S3 are so popular.
If you're using the host file system and ever need more than one instance of a server, the file systems will grow out of sync. Imagine one user uploads 'foo.jpg' to server A (A/app/uploads) and another uploads 'bar.jpg' to server B (B/app/uploads). When either of these images is later requested, the request has a 50% chance of failing, depending on whether the load balancer routes the request to server A or server B.
There are several ancillary benefits to avoiding the host filesystem. For instance, you can set the filesystem serving your app to read-only for increased security. Files are a form of state, and stateless web servers allow you to do things like blow away one instance and deploy another instance to take over its work.
You might find this of help:
https://codeforgeek.com/2014/11/file-uploads-using-node-js/
I used multer in my node.js server file to handle uploading from the front end. Basically I had an html form that would submit the image to the server file, where it would be handled by multer. This actually led it to be saved in the file system (to answer your question concretely, yes, this was to something like the /img directory right in your project file structure). My application is running on heroku, and this feature works on there as well. However, I would not recommending using the file system to store your image like this (I doubt you will have enough space for a large amount of images/files) - using AWS storage or a DB would be better.

Serve custom javascript to browser via Node.js app

I developed a small node.js app in which I can configure conditions for a custom javascript file, which can be embedded in a webpage, and which modifies the DOM of that page in the browser on load. The configuration values are stored in MongoDB. (For sake of argument: add class "A" to DOM element with ID "B" )
I have difficulties to figure out the best way to serve requests / the JavaScript file.
Option 1 and my current implementation is:
I save a configuration in the node app and a distinct JavaScript
file is created for that configuration.
The page references that file which is hosted and served by the server.
Option 2 and where I think I want and should go is:
I saves a configuration (mongodb) NO JavaScript file is created Pages
a generic JavaScript link (for instance: api.service.com/javascript.js)
Node.js / Express app processes the request, and
returns a custom JavaScript (file?) with the correct values as saved in mongodb for that configuration
Now, while I believe this is the right way to go about it, I am unsure HOW to go about it. Any ideas and advise are very welcome!
Ps: For instance I wonder how best to authenticate or identify the origin, user and requested configuration. Shall I do this like: api.service.com/javascript.js&id="userID" - is that good practice?
Why not serve up a generic Javascript file which can take a customized json object (directly from mongodb) and apply the necessary actions? You can include the json data on the page if you really need to have everything embedded, but breaking up configuration and code is the most maintainable approach.

How do I run server-side code from couchdb?

Couchdb is great at storing and serving data, but I'm having some trouble getting to grips with how to do back-end processing with it. GWT, for example, has out of the box support for synchronous and asynchronous call backs, which allow you to run arbitrary Java code on the server. Is there any way to do something like this with couchdb?
For example, I'd like to generate and serve a PDF file when the user clicks a button a web app. Ideally the workflow would look something like this:
User enters some data
User clicks a generate button
A call is made to the server, and the PDF is generated server side. The server code can be written in any language, but preferably Java.
When PDF generation is finished, the user is prompted to download and save the document.
Is there a way to do this with out of the box couchdb, or is some additional, third-party software required to communicate between the web client and backend data processing code?
EDIT:Looks like I did a pretty poor job of explaining my question. What I'm interested in is essentially serving servlets from Couchdb similarly to the way that you can serve Java servlets along side web pages from a war file. I used GWT as an example because it has support for developing the servlets and client side code together and compiling everything into a single war file. I'd be very interested in something like this because it would make deploying fully functional websites a breeze through Couchdb replication.
By the looks of it, however, the answer to my question is no, you can't serve servlets from couchdb. The database is set up for CRUD style interactions, and any servlet style components need to either be served separately, or done by polling the db for changes and acting accordingly.
Here's what I would propose as the general workflow:
When user clicks Generate: serialize the data they've entered and any other relevant metadata (e.g. priority, username) and POST it to couchdb as a new document. Keep track of the _id of the document.
Code up a background process that monitors couchdb for documents that need processing.
When it sees such a document, have it generate the PDF and attach it to that same couch doc.
Now back to the client side. You could use ajax polling to repeatedly GET the couch doc and test whether is has an attachment or not. If it does, then you can show the user the download link.
Of course the devil is in the details...
Two ways your background process(es) can identify pending documents:
Use the _changes API to monitor for new documents with _rev beginning with "1-"
Make requests on a couchdb view that only returns docs that do not have an "_attachments" property. When there are no documents to process it will return nothing.
Optionally: If you have multiple PDF-making processes working on the queue in parallel you will want to update the couch doc with a property like {"being-processed":true} and filter these out the view as well.
Some other thoughts:
I do not recommend using the couchdb externals API for this use case because it (basically) means couchdb and your PDF-generating code must be on the same machine. But it's something to be aware of.
I don't know a thing about GWT, but it doesn't seem necessary to accomplish your goals. Certainly CouchDB can serve any static files (js or other) you want either as attachments to docs in a db or from the filesystem. You could even eval() JSON properties you put into couch docs. So you can use GWT to make ajax calls or whatever but GWT can be completely decoupled from couchdb. Might be simpler that way.
GWT has two parts to it. One is a client that the GWT compiler translates to Java, and the other is a Servlet if you do any RPC. Typically you would run your Client code on a browser and then when you made any RPC calls you would contact a Java Servlet Engine (Such as Tomcat or Jetty or ...) , which in turn calls you persistence layer.
GWT does have the ability to do JSON requests over HTTP and coincidentally, this is what CouchDB uses. So in theory it should be possible. (I do not know if anybody has tried it). There would be a couple of issues.
CouchDB would need to serve up the .js files that have the compiled GWT client code.
The main issue I see in your case is that couchDB would need to generate your PDF files, while couchDB is just a storage engine and does not typically do any processing. I guess you could extend it if you are any good with the Erlang programming language.

Resources