How do ensure static web site pages are fresh? - web

I have a static web site hosted on Amazon S3. I regularly update it. However I am finding out that many of the users accessing it are looking at a stale copy.
By the way, the site is: http://cosi165a-f2016.s3-website-us-west-2.amazonaws.com) and it's generated a ruby static site generator called nanoc (very nice by the way). It compiles the source material for the site: https://github.com/Coursegen/cosi165a-f2016 into the html, css, js and other files.
I assume that this has to do with the page freshness, and the fact that the browser is caching pages.
How do I ensure that my users see a fresh page?

One common technique is to keep track of the last timestamp when you updated static assets to S3, then use that timestamp as a querystring parameter in your html.
Like this:
<script src="//assets.com/app.min.js?1474399850"></script>
The browser will still cache that result, but if the timestamp changes, the browser will have to get a new copy.
The technique is called "cachebusting".
There's a grunt module if you use grunt: https://www.npmjs.com/package/grunt-cachebuster. It will calculate the hash of your asset's contents and use that as the filename.

Related

How do you manage repositories for production/deployment of Node-React app?

Not long ago , we used to have server render pages and then React came for client side rendering and single page application.It introduced virtual DOM's and changed the way we write our code.
We require all these react libraries and install them as dependencies before writing our codes. Now we can break into many components , have many css and scss files including images. But at the end we will build the files, make compact bundle and serve from build folder.
Express get route
app.get('*', (req,res) =>{
res.sendFile(path.join(__dirname+'/client/build/index.html'));
});
Heres, What I have understood :
Build folder is the place where webpack combines all the files and create minified bundle ready for deployment. That file is basically simple HTML and JS files which every browser can understand. As all the browser doesn't understand ES6 and much more, we have to convert all these files into plain language that every browser can understand.
Also, webpack-dev server is only for development purposes and we won't be running it into production.
Is virtual DOM/Real DOM just for development purposes? or
are those react libraries also trans-piled while building the minified files? If later is the case , react is run on background mode on client's browser? I want to know how react takes care of client side routing after the building the app.
How do you manage github repositories for Node-React app? Do you keep two different repositories one for front end and other for back-end? Whats the industry standard?
If you keep two repository, how do you deploy the front-end code? As you can't run the webpack-dev-server into production. Nor you can specify the public static (build folder) in your back-end(express server) as they are separated in two repos. How does, either the integration of these two repositories take place( lets say we have two AWS EC2 instance, one for each) or front-end get served from the front-end repo??). Can you actually use something like npm serve in production ??
what am I trying to do ?
I want to deploy my node-react app on AWS. I have only one repository on github. I have one folder "client" inside my repo where all the react code sits with its package.json file. All the other files for server are inside root folder (server doesn't have its own folder and files are scattered inside root folder). So there are two package.json files, one inside root folder for server and one inside client folder.I am planning to run my-node app on a docker container.
Please help me understand the core concepts and standard practices for code hosting and deployment keeping large scale enterprise application in picture.
I would not go into explaining all the points in your question here because, #Arnav Yagnik and #PrivateOmega have both done a brilliant job at explaining most of them. I would definitely recommend you to read their answers properly and read the links provided for more information before reading this answer.
I would like to address your question of deploying a Node-React application. In production, generally, we have different deployments (or "repositories" as you mention in your question) for both the front-end (React) and back-end (Node). This allows your back-end to sit in an EC2 instance, for example, with auto-scaling to make sure that it can cope up with all the requests coming in.
As mentioned in the previous answers, and in your question as well, webpack compiles and minifies the React files into simple HTML and JS files, which most browsers can run (I'm not going to explain VirtualDOM here because it has already been perfectly explained in other answers). You would then take these minified files and serve them from an S3 bucket for example, because again, it is a single page application (also discussed in the other answers) and the business logic is already in the minified JS files and its just simply sending all requests to your back-end server.
Now for front-end, you can use TravisCI for example to deploy the build folder (the one you talk about in your question) to an EC2 instance and serve your files using NGINX or if you can configure a CDN deployment properly, you can serve the files from an S3 bucket for the most optimal performance.
You can think of serving the React application like sending a cryptic block of code to your user's browser. Now you can deploy this cryptic block of code to a publicly available S3 bucket, and serve it from there. Again, because of webpack and minification/uglification, no on would be able to make any proper sense of what your original code was, remember that you can still access all the code in Chrome's Sources tabs for example.
I would like to address this with different approach.
Server Rendered Pages : The concept has not changed, server when encountered with a DOC request it has to respond with a html. Now HTML may or may not contain scripts(can be either inline or a external server address). In case of question's context you can still ship HTML where it will download scripts that you have written(may include react or not). for most cases you can ship empty html with scripts tags which will download the scripts over network and execute them which would contain all the rendering logic.
To Answer your questions :
1st : There is no background mode in a single threaded JS(unless we want to talk about workers but we can leave them out for this discussion). By writing in code you are not interacting with any DOM. You are instructing your components(extended by React) when to change their state and when to re-render(setState). React internally calculates the virtual DOM and compare to Real DOM to calculate actual changes that are to be made on Real DOM(this is very abstract answer, to get more understanding please read react docs, Baseline here is you are not interacting with any DOM just instructing React core library when to update and what is the updated state)
2nd : If you want to support SSR(server rendered pages). I would suggest to make 2 folders , client(this would include all client components and logic) and server(would include all server side logic) with different package.json as packages differ for both applications.There is no such industry standard here, what floats your boat should work but generally making directories based on logical entities should satisfy separation and maintainability, if in future you think you want to fork out server and client in separate repos , it would definitely make the process easy.
3rd : You shun running webpack-dev-server in production. Files are generally not obfuscated hence payload is heavy(not to forget your written code is out there). Even if you want to make different repos, server can spit out html and html can request scripts with your client server.
How to deploy : Deploy your code and run :
node server/app.js
and in app.js you can write the location block what you have mentioned.
P.S. : If you just need a server with that location block. do you really need a express server? You can upload the client build to a CDN and route your domain to serve index.html from the CDN(s3 bucket can also be used here)
I would like to start off with clearing up the terminologies as much as I can.
Like you said server rendered pages was a more prominent standard in the past, but it hasn't changed at all with the introduction of React, because even React has the support for Server rendering or SSR, which means HTML pages are generated at server side and then served to clients using browser.
And client side rendering means, a HTML page is loaded to browser and then javascript code renders things on top of those HTML pages and make them interactive.
And single page application concept is that we have only a single HTML file or base HTML page on top of which based on user interactions and data from server it is rewritten continuously.
Virtual Dom is an amazing concept introduced by React. React library code recreates the structure of all elements(called DOM elements) of a HTML page in the memory in a tree form. This enables React algorithm called Fiber to reconcile appropriate changes as per route update or any other changes first on this tree like structure before translating them onto the real elements in the HTML page.
Babel is a transpiler to transpile latest features that browser engines haven't started supporting to code that they can understand, usually ES6+ code into pre-ES6 because all browser supports that. In React application, if you have written application using JSX syntax, babel supports transforming JSX into normal javascript also.
Yes, breaking up of pages into many components is possible due to compositional nature of components by React which means we can build complex things by combining small and more focussed things.
At the end before serving it to end users, we can't have web application lag due to the huge size of code, so during the build process, things like minifying(removing whitespace etc) and other optimization like combining multiple javascript files into one etc are done, and then compact bundle is served from build folder like you said.
Yes, build folder is where webpack does the minifcation and combination to create a bundle as small as possible. It is basic HTML and JS files that is understood by every browser, and if the code contains something that a particular browser doesn't support, appropriate support code or something called polyfill is also bundled with it. Technically you can't say browsers only understand pre-ES6 code because a lot of browser engines have implemented plenty of ES6 features already.
Webpack dev server is just used to serve a webpack application over a port like a node.js server and gives us features like live-reloading which is needed when you constantly make changes to your application codebase and it isn't needed at production because like we said previously, at production time it's just HTML and JS and nobody ever makes any changes on these files.
Virtual DOM is a memory representation or concept used by React Code just like we have stacks and queues and it not just used at development time. Yes and No. Because I think appropriate parts of react source code which is required to run the application would also be bundled before generating the production bundle.
I would say, don't have a preset way of things, because it is totally upto the developer and the team, because I have seen people using 2 seperate repos because frontend people work on frontend things whereas backend people work on backend things. But there's also a case when everyone's a fullstack developer and you can Technically have it in a single repo with a single package.json and use the backend to serve the frontend files and you have to manually install each react dependency and cannot directly use CRA or create-react-app like generator.
What has 2 repositories to do with front-end deployment in production? You don't need to run webpack-dev-server to server files in production. You can create a production bundle and then setup any http server to serve the generated bundle.
Regarding your current scenario I would say instead of having 2 package.json, you can go with a single package.json and install all dependencies together or go with a monorepo approach using something like lerna or yarn workspaces.
But for a total beginner I would suggest 2 separate repositories to encounter less problems.
And a bonus point if you are not aware, you can write React in pre-ES6 code and also without JSX as well.
1) virtual DOM is basically to say that you are calling a function of react not the actual function which does manipulation on the real DOM
like this one
document.getElementById("demo").innerHTML ="Helloworld"
modifies the actual dom
but this
ReactDOM.render(
<HelloMessage name="Taylor" />,
document.getElementById('demo')
);
if you see this properly you aren't doing anything directly on the dom you are just giving the react function control to do things , internally react take cares of modifying the that dom element demo whenever the react wants to re-render it based on its own logic which is what they claim as optimized which is why people use it in first place. Yes when you build your code with webpack it does include react in it which is part of that minified code, so if you see any of the error stacktrace in development you do see react is the starting point for it
2) I think its a choice to be made, as there are not restrictions on this
3) Coming to deployment , In general if you want use nodejs you might choose expressjs server type of deployment but otherwise generally its better to use a high performance server like Nginx or Apache or else if you just don't want to get into this whole drama of things people generally use heroku based deployment or else people are using special platforms like netlify,surge.sh these days (its super easy to deploy on these platforms).
I believe others have done a pretty good job explaining the React Virtual DOM. In a simple and practical way, I’ll attempt to explain how I (would) manage the deployment of a dynamic website (including medium-sized enterprise systems) using NodeJS and React. I’ll also attempt not to bore you.
Imagine for once that React never existed and that you were building a traditional Server-Side Rendered application. Each time the user hits a route, the controller works with the model to perform some business logic and returns a view. In NodeJS, this view is usually compiled using a template engine such as handlebars. If you reflect for a second, it becomes obvious that the view could be any html content which is then sent back to the browser as a response.
This is a typical response that could be sent back:
<html>
<head>
<title>Our Website</title>
<style></style>
<script src="/link/to/any/JS/script"></script>
</head>
<body>
<h1>Hello World </h1>
</body>
</html>
If this response hits the browser, obviously “Hello World” is displayed on the screen.
Now, with this simple approach, we can do powerful things!
OPTION 1:
We can designate one controller to handle all incoming routes app.get("*", controllerFunc) and render one view for our entire server.
OPTION 2:
We could ask multiple controllers to handle different routes and render route-specific views from our server.
OPTION 3:
We could ask multiple controllers to handle different routes and generate pages on-the-fly (i.e. dynamically) from our server.
If we were building a traditional web application, option 3 would be the only reasonable standard. Here, pages are generated dynamically for different routes. However, with option 1, we can produce a quality Single-Page Application where the response sent to the server is an empty html page but with the built JS script that has the ability to manipulate the DOM – Yes, React! Here’s what such a response might look like:
<html>
<head>
<title>Our Website</title>
<style></style>
<script src="/link/to/any/JS/script"></script>
</head>
<body>
<h1>Hello World </h1>
<div id="root"> </div>
<script async type=”text/javascript” src="/link/to/our/transpiled/ReactSPA.js"></script>
<!--async attribute is important to ensure that the script has access to the DOM whenever it loads. This also makes the script non-blocking -->
</body>
</html>
Clearly, we’re giving all the responsibility to the generated SPA and all routing logic is handled on the client-side (See, react-router-dom). On the server side, we can introduce the concept in option 2 and tweak NodeJS route handlers to listen to another specific route for any REST API communication. If you’re familiar with NodeJS, the order in which routes are registered either by app.get() or app.post() matters.
However, using option 1, we can quickly become limited and only able to serve one Single-Page application from that server. Why? Because we have asked one controller to handle all non-API incoming routes and render one view. We also risk serving an unnecessarily bloated JS file. Users are served the complete website when all they probably wanted was just the landing page.
If we look to the option 2 though, we can tweak things a lot more and serve multiple Single-Page Applications for different routes, all from our server. This approach helps to reduce the sizes of the JS build being sent to the browser. A typical example would be a website that has a welcome page (or an introduction directory), a login page and a dashboard.
By assigning controllers for different routes, we can build SPAs uniquely for those routes. SPA for the intro page, another for the login page, and then another for the dashboard. Yes, the browser would have to load while transitioning between the three, but at least we highly increase initial render time for our website. We can also use the more secure option of cookie for authorization rather than the less secure option of storing session tokens on localStorage.
In a more advanced setting, we could have dynamic websites with different React components rendered as widgets within the dynamically generated page. Actually, this is what Facebook does.
The way to build such SPAs or components is pretty simple. Start up a react project and configure webpack to render the production-ready JS file into your preferred public static directory within the server-side repo. The <script> specified in the view can then easily load these built react components since they exist within the scope of the server-side’s public directory.
In essence, this means one repo with several client directories and one server directory where the destination of the production build files to be generated by webpack for each client project is set to the server’s public static directory. So, each client side’s directory is a project (either full SPA or simple React Component as a widget) with it’s own webpack.config and package.json file. In fact you can have two separate config files – production and development. Then, to build, you use npm ~relevant command~ for either production or development build.
You could then go ahead to host it the way you would host any NodeJS application. Because, the main application is the NodeJS - that's where the server is. Replace NodeJS with PHP and Apache/NGINX, the concept still remains the same.

How to generate sitemap on user-generated content site in express js?

I'm creating a user-generated content site using expressjs. How can I add the URL of these generated content to the sitemap and get it done automatically?
It also needs to be removed from these URLs via the sitemap when the user deletes the account or deletes the content.
I tried the sitemap builder npm packages created for express js, but none of them worked as I wanted, or the intended use was not the same as my intended use.
I am unsure if I understood your question, so I assume the following:
Your users can generate new URLs that you want to publish in an sitemap.xml that is returned from a specific endpoint right?
If so I'd suggest to use the sitemap.js package. However this package still needs a list of URLs and the metadata you want to deliver.
You could just save the URLs and the metadata to a database table, the filesystem, or whatever data storage you use. Every time content is generated or deleted you also update your URLs list there.
Now, if someone accesses the sitemap endpoint, URLs are read from storage and sitemap.js generates an XML. Goal achieved!

Pre-render a static website from REST-api and templates?

I have a rest-api that I will use to render html using some basic templating language. I wonder if there is any good platform or service for pre-rendering HTML-files and serv them statically. For performance and scalability.
I need to pre render the pages contiously, like every 24 hours, and it should also be possible to tell the system to re-render a specific page somehow. I'm comfortable in most open-source languages, node is a favourite.
It seems to me that the most straightforward way to accomplish this is to use two tiers: a rendering server and a cache server. When cache server starts up it would crawl through every url on the rendering server and store the pre-rendered HTMLS files into its local directory. For simplicity you can mirror the "directory structure" and make the resource paths identical. In other words, for every URL on the rendering server that looks like this:
http://render.xyz/path/to/resource
You create a directory structure /path/to on the cache server and put a file resource in it.
Your end-users don't need to be aware of this architecture. They make requests to the cache server like this:
http://cache.xyz/path/to/resource
The cache server gives them the result they are looking for.
There are many ways to tell the cache server to refresh (re-generate) a page. You could add a "hidden" directory, let's call it .cache-command, and use it to handle refresh requests. For example, to tell the cache server to refresh a resource, you would use a URL like this:
http://cache.xyz/.cache-command/refresh/path/to/resource
When the cache server received that request, it would refresh the resource.
One of the advantages of this approach is that your cache server can be completely independent of the render server. They could be written in different languages, running on different hardware, or they could be part of the same nodejs application. Whatever works best for you.

Serving images based on :foo in URL

I'm trying to limit data usage when serving images to ensure the user isn't loading bloated pages on mobile while still maintaining the ability to serve larger images on desktop.
I was looking at Twitter and noticed they append :large to the end of the url
e.g. https://pbs.twimg.com/media/CDX2lmOWMAIZPw9.jpg:large
I'm really just curious how this request is being handled, if you go directly to that link there is no scripts on the page so I'm assuming it's done serverside.
Can this be done using something like Restify/Express on a Node instance? More than anything I'm really just curious how it is done.
Yes, it can be done using Express in Node. However, it can't be done using express.static() since it is not a static request. Rather, the receiving function must handle the request by parsing the querystring (or whatever :large is) in order to dynamically respond with the appropriate image.
Generally the images will have already been pre-generated during the user-upload phase for a set of varying sizes (e.g. small, medium, large, original), and the function checks the querystring to determine which static request to respond with.
That is a much higher-performing solution than generating the appropriately-sized image server-side on every request from the original image, though sometimes that dynamic approach is necessary if the server is required to generate a non-finite set of image sizes.

Serve custom javascript to browser via Node.js app

I developed a small node.js app in which I can configure conditions for a custom javascript file, which can be embedded in a webpage, and which modifies the DOM of that page in the browser on load. The configuration values are stored in MongoDB. (For sake of argument: add class "A" to DOM element with ID "B" )
I have difficulties to figure out the best way to serve requests / the JavaScript file.
Option 1 and my current implementation is:
I save a configuration in the node app and a distinct JavaScript
file is created for that configuration.
The page references that file which is hosted and served by the server.
Option 2 and where I think I want and should go is:
I saves a configuration (mongodb) NO JavaScript file is created Pages
a generic JavaScript link (for instance: api.service.com/javascript.js)
Node.js / Express app processes the request, and
returns a custom JavaScript (file?) with the correct values as saved in mongodb for that configuration
Now, while I believe this is the right way to go about it, I am unsure HOW to go about it. Any ideas and advise are very welcome!
Ps: For instance I wonder how best to authenticate or identify the origin, user and requested configuration. Shall I do this like: api.service.com/javascript.js&id="userID" - is that good practice?
Why not serve up a generic Javascript file which can take a customized json object (directly from mongodb) and apply the necessary actions? You can include the json data on the page if you really need to have everything embedded, but breaking up configuration and code is the most maintainable approach.

Resources