Race condition on website deployment

Race condition on website deployment - web

Say I have a website with 2 files, which are statically hosted on S3:
index.html
script.js
I have a scheme where these files are updated via a git pull from a remote origin every time I push. This got me thinking, though, that there's the possibility for a request to be made to the server as a git pull is happening to update the files. This seems like it could create two problems that would cause page errors:
A partially-updated index.html or script.js is returned.
An old index.html is returned to the client. The files are then updated on the server. The client then makes a request for script.js, which returns the new version.
What is a good practice for mitigating these two issues?
I imagine the first issue won't be a problem, assuming filesystem operations are atomic and the files are updated in a single go. However, I haven't seen anything that addresses the second issue, which seems more difficult to address.

One way of addressing #2 is to use the cache-busting method of programmatically adding a hash to the script.js filename on compile, e.g.
<script src="script.js"> becomes <script src="script-79b1264ad3bc303fd.js">
Now, when script-[whatever].js is requested via a non-matching index.html, the client gets a 404 rather than the wrong script.
As for how you'd go about deploying this, it depends on your build pipeline. Are you using Grunt/Gulp/etc?

Related

How can I set webpack public path on each server request?

I have a webpack config for client and server, the server can access the bundled code such that I can call react.renderToString(<App>) to produce server-side-rendered content (and then content gets hydrated client side).
The server is mounted in a sub-path, so, to use it locally you'd visit http://localhost:8080/some/path.
Everything works flawlessly except for asset paths! Lets say <App> creates an image such as <img src="/static/logo.svg"/> (that is, App.tsx imports ./logo.svg and uses it in an image tag), because server is mounted in a sub-path, the HTML URL for logo.svg should be prefixed with the base-path. So the real final src should be /some/path/static/logo.svg.
I can make this work on the client by setting __webpack_public_path__ before other imports and it work's great! But how can I use that on the server? I only know where it's mounted per request, and __webpack_public_path__ can only be set during import (as far as I can tell), so it feels like I'm boned. Certainly if I change __webpack_public_path__ after startup the new value is not used.
Are there any solutions? I'm happy to include any snippets of code to clarify any context. Currently the request comes to the server via a request-header, and I thought it'd be easy to "just" use it to prefix assets..
For reference, server is an Express server and the SVG file is loaded via standard file-loader. Happy to update my Q with additional information or code. I've hammered my head against this for hours so any help, insights, thoughts, or guesses much appreciated (I'm still fairly new to webpack et. al).

I like stupid plans. A good stupid plan to execute now is better than searching endlessly for the smart one.
So here's my stupid plan. Be warned, it's a real humdinger:
On the server only, I set __webpack_public_path__ to some arbitrary sentinel value, I render the HTML output, and then I search and replace the sentinel value, swapping in my actual public path.
I'm not going to claim this is elegant, but what I can say is: It works.

How do you manage repositories for production/deployment of Node-React app?

Not long ago , we used to have server render pages and then React came for client side rendering and single page application.It introduced virtual DOM's and changed the way we write our code.
We require all these react libraries and install them as dependencies before writing our codes. Now we can break into many components , have many css and scss files including images. But at the end we will build the files, make compact bundle and serve from build folder.
Express get route
app.get('*', (req,res) =>{
res.sendFile(path.join(__dirname+'/client/build/index.html'));
});
Heres, What I have understood :
Build folder is the place where webpack combines all the files and create minified bundle ready for deployment. That file is basically simple HTML and JS files which every browser can understand. As all the browser doesn't understand ES6 and much more, we have to convert all these files into plain language that every browser can understand.
Also, webpack-dev server is only for development purposes and we won't be running it into production.
Is virtual DOM/Real DOM just for development purposes? or
are those react libraries also trans-piled while building the minified files? If later is the case , react is run on background mode on client's browser? I want to know how react takes care of client side routing after the building the app.
How do you manage github repositories for Node-React app? Do you keep two different repositories one for front end and other for back-end? Whats the industry standard?
If you keep two repository, how do you deploy the front-end code? As you can't run the webpack-dev-server into production. Nor you can specify the public static (build folder) in your back-end(express server) as they are separated in two repos. How does, either the integration of these two repositories take place( lets say we have two AWS EC2 instance, one for each) or front-end get served from the front-end repo??). Can you actually use something like npm serve in production ??
what am I trying to do ?
I want to deploy my node-react app on AWS. I have only one repository on github. I have one folder "client" inside my repo where all the react code sits with its package.json file. All the other files for server are inside root folder (server doesn't have its own folder and files are scattered inside root folder). So there are two package.json files, one inside root folder for server and one inside client folder.I am planning to run my-node app on a docker container.
Please help me understand the core concepts and standard practices for code hosting and deployment keeping large scale enterprise application in picture.

I would not go into explaining all the points in your question here because, #Arnav Yagnik and #PrivateOmega have both done a brilliant job at explaining most of them. I would definitely recommend you to read their answers properly and read the links provided for more information before reading this answer.
I would like to address your question of deploying a Node-React application. In production, generally, we have different deployments (or "repositories" as you mention in your question) for both the front-end (React) and back-end (Node). This allows your back-end to sit in an EC2 instance, for example, with auto-scaling to make sure that it can cope up with all the requests coming in.
As mentioned in the previous answers, and in your question as well, webpack compiles and minifies the React files into simple HTML and JS files, which most browsers can run (I'm not going to explain VirtualDOM here because it has already been perfectly explained in other answers). You would then take these minified files and serve them from an S3 bucket for example, because again, it is a single page application (also discussed in the other answers) and the business logic is already in the minified JS files and its just simply sending all requests to your back-end server.
Now for front-end, you can use TravisCI for example to deploy the build folder (the one you talk about in your question) to an EC2 instance and serve your files using NGINX or if you can configure a CDN deployment properly, you can serve the files from an S3 bucket for the most optimal performance.
You can think of serving the React application like sending a cryptic block of code to your user's browser. Now you can deploy this cryptic block of code to a publicly available S3 bucket, and serve it from there. Again, because of webpack and minification/uglification, no on would be able to make any proper sense of what your original code was, remember that you can still access all the code in Chrome's Sources tabs for example.

I would like to address this with different approach.
Server Rendered Pages : The concept has not changed, server when encountered with a DOC request it has to respond with a html. Now HTML may or may not contain scripts(can be either inline or a external server address). In case of question's context you can still ship HTML where it will download scripts that you have written(may include react or not). for most cases you can ship empty html with scripts tags which will download the scripts over network and execute them which would contain all the rendering logic.
To Answer your questions :
1st : There is no background mode in a single threaded JS(unless we want to talk about workers but we can leave them out for this discussion). By writing in code you are not interacting with any DOM. You are instructing your components(extended by React) when to change their state and when to re-render(setState). React internally calculates the virtual DOM and compare to Real DOM to calculate actual changes that are to be made on Real DOM(this is very abstract answer, to get more understanding please read react docs, Baseline here is you are not interacting with any DOM just instructing React core library when to update and what is the updated state)
2nd : If you want to support SSR(server rendered pages). I would suggest to make 2 folders , client(this would include all client components and logic) and server(would include all server side logic) with different package.json as packages differ for both applications.There is no such industry standard here, what floats your boat should work but generally making directories based on logical entities should satisfy separation and maintainability, if in future you think you want to fork out server and client in separate repos , it would definitely make the process easy.
3rd : You shun running webpack-dev-server in production. Files are generally not obfuscated hence payload is heavy(not to forget your written code is out there). Even if you want to make different repos, server can spit out html and html can request scripts with your client server.
How to deploy : Deploy your code and run :
node server/app.js
and in app.js you can write the location block what you have mentioned.
P.S. : If you just need a server with that location block. do you really need a express server? You can upload the client build to a CDN and route your domain to serve index.html from the CDN(s3 bucket can also be used here)

I would like to start off with clearing up the terminologies as much as I can.
Like you said server rendered pages was a more prominent standard in the past, but it hasn't changed at all with the introduction of React, because even React has the support for Server rendering or SSR, which means HTML pages are generated at server side and then served to clients using browser.
And client side rendering means, a HTML page is loaded to browser and then javascript code renders things on top of those HTML pages and make them interactive.
And single page application concept is that we have only a single HTML file or base HTML page on top of which based on user interactions and data from server it is rewritten continuously.
Virtual Dom is an amazing concept introduced by React. React library code recreates the structure of all elements(called DOM elements) of a HTML page in the memory in a tree form. This enables React algorithm called Fiber to reconcile appropriate changes as per route update or any other changes first on this tree like structure before translating them onto the real elements in the HTML page.
Babel is a transpiler to transpile latest features that browser engines haven't started supporting to code that they can understand, usually ES6+ code into pre-ES6 because all browser supports that. In React application, if you have written application using JSX syntax, babel supports transforming JSX into normal javascript also.
Yes, breaking up of pages into many components is possible due to compositional nature of components by React which means we can build complex things by combining small and more focussed things.
At the end before serving it to end users, we can't have web application lag due to the huge size of code, so during the build process, things like minifying(removing whitespace etc) and other optimization like combining multiple javascript files into one etc are done, and then compact bundle is served from build folder like you said.
Yes, build folder is where webpack does the minifcation and combination to create a bundle as small as possible. It is basic HTML and JS files that is understood by every browser, and if the code contains something that a particular browser doesn't support, appropriate support code or something called polyfill is also bundled with it. Technically you can't say browsers only understand pre-ES6 code because a lot of browser engines have implemented plenty of ES6 features already.
Webpack dev server is just used to serve a webpack application over a port like a node.js server and gives us features like live-reloading which is needed when you constantly make changes to your application codebase and it isn't needed at production because like we said previously, at production time it's just HTML and JS and nobody ever makes any changes on these files.
Virtual DOM is a memory representation or concept used by React Code just like we have stacks and queues and it not just used at development time. Yes and No. Because I think appropriate parts of react source code which is required to run the application would also be bundled before generating the production bundle.
I would say, don't have a preset way of things, because it is totally upto the developer and the team, because I have seen people using 2 seperate repos because frontend people work on frontend things whereas backend people work on backend things. But there's also a case when everyone's a fullstack developer and you can Technically have it in a single repo with a single package.json and use the backend to serve the frontend files and you have to manually install each react dependency and cannot directly use CRA or create-react-app like generator.
What has 2 repositories to do with front-end deployment in production? You don't need to run webpack-dev-server to server files in production. You can create a production bundle and then setup any http server to serve the generated bundle.
Regarding your current scenario I would say instead of having 2 package.json, you can go with a single package.json and install all dependencies together or go with a monorepo approach using something like lerna or yarn workspaces.
But for a total beginner I would suggest 2 separate repositories to encounter less problems.
And a bonus point if you are not aware, you can write React in pre-ES6 code and also without JSX as well.

1) virtual DOM is basically to say that you are calling a function of react not the actual function which does manipulation on the real DOM
like this one
document.getElementById("demo").innerHTML ="Helloworld"
modifies the actual dom
but this
ReactDOM.render(
<HelloMessage name="Taylor" />,
document.getElementById('demo')
);
if you see this properly you aren't doing anything directly on the dom you are just giving the react function control to do things , internally react take cares of modifying the that dom element demo whenever the react wants to re-render it based on its own logic which is what they claim as optimized which is why people use it in first place. Yes when you build your code with webpack it does include react in it which is part of that minified code, so if you see any of the error stacktrace in development you do see react is the starting point for it
2) I think its a choice to be made, as there are not restrictions on this
3) Coming to deployment , In general if you want use nodejs you might choose expressjs server type of deployment but otherwise generally its better to use a high performance server like Nginx or Apache or else if you just don't want to get into this whole drama of things people generally use heroku based deployment or else people are using special platforms like netlify,surge.sh these days (its super easy to deploy on these platforms).

I believe others have done a pretty good job explaining the React Virtual DOM. In a simple and practical way, I’ll attempt to explain how I (would) manage the deployment of a dynamic website (including medium-sized enterprise systems) using NodeJS and React. I’ll also attempt not to bore you.
Imagine for once that React never existed and that you were building a traditional Server-Side Rendered application. Each time the user hits a route, the controller works with the model to perform some business logic and returns a view. In NodeJS, this view is usually compiled using a template engine such as handlebars. If you reflect for a second, it becomes obvious that the view could be any html content which is then sent back to the browser as a response.
This is a typical response that could be sent back:
<html>
<head>
<title>Our Website</title>
<style></style>
<script src="/link/to/any/JS/script"></script>
</head>
<body>
<h1>Hello World </h1>
</body>
</html>
If this response hits the browser, obviously “Hello World” is displayed on the screen.
Now, with this simple approach, we can do powerful things!
OPTION 1:
We can designate one controller to handle all incoming routes app.get("*", controllerFunc) and render one view for our entire server.
OPTION 2:
We could ask multiple controllers to handle different routes and render route-specific views from our server.
OPTION 3:
We could ask multiple controllers to handle different routes and generate pages on-the-fly (i.e. dynamically) from our server.
If we were building a traditional web application, option 3 would be the only reasonable standard. Here, pages are generated dynamically for different routes. However, with option 1, we can produce a quality Single-Page Application where the response sent to the server is an empty html page but with the built JS script that has the ability to manipulate the DOM – Yes, React! Here’s what such a response might look like:
<html>
<head>
<title>Our Website</title>
<style></style>
<script src="/link/to/any/JS/script"></script>
</head>
<body>
<h1>Hello World </h1>
<div id="root"> </div>
<script async type=”text/javascript” src="/link/to/our/transpiled/ReactSPA.js"></script>
<!--async attribute is important to ensure that the script has access to the DOM whenever it loads. This also makes the script non-blocking -->
</body>
</html>
Clearly, we’re giving all the responsibility to the generated SPA and all routing logic is handled on the client-side (See, react-router-dom). On the server side, we can introduce the concept in option 2 and tweak NodeJS route handlers to listen to another specific route for any REST API communication. If you’re familiar with NodeJS, the order in which routes are registered either by app.get() or app.post() matters.
However, using option 1, we can quickly become limited and only able to serve one Single-Page application from that server. Why? Because we have asked one controller to handle all non-API incoming routes and render one view. We also risk serving an unnecessarily bloated JS file. Users are served the complete website when all they probably wanted was just the landing page.
If we look to the option 2 though, we can tweak things a lot more and serve multiple Single-Page Applications for different routes, all from our server. This approach helps to reduce the sizes of the JS build being sent to the browser. A typical example would be a website that has a welcome page (or an introduction directory), a login page and a dashboard.
By assigning controllers for different routes, we can build SPAs uniquely for those routes. SPA for the intro page, another for the login page, and then another for the dashboard. Yes, the browser would have to load while transitioning between the three, but at least we highly increase initial render time for our website. We can also use the more secure option of cookie for authorization rather than the less secure option of storing session tokens on localStorage.
In a more advanced setting, we could have dynamic websites with different React components rendered as widgets within the dynamically generated page. Actually, this is what Facebook does.
The way to build such SPAs or components is pretty simple. Start up a react project and configure webpack to render the production-ready JS file into your preferred public static directory within the server-side repo. The <script> specified in the view can then easily load these built react components since they exist within the scope of the server-side’s public directory.
In essence, this means one repo with several client directories and one server directory where the destination of the production build files to be generated by webpack for each client project is set to the server’s public static directory. So, each client side’s directory is a project (either full SPA or simple React Component as a widget) with it’s own webpack.config and package.json file. In fact you can have two separate config files – production and development. Then, to build, you use npm ~relevant command~ for either production or development build.
You could then go ahead to host it the way you would host any NodeJS application. Because, the main application is the NodeJS - that's where the server is. Replace NodeJS with PHP and Apache/NGINX, the concept still remains the same.

Pre-render a static website from REST-api and templates?

I have a rest-api that I will use to render html using some basic templating language. I wonder if there is any good platform or service for pre-rendering HTML-files and serv them statically. For performance and scalability.
I need to pre render the pages contiously, like every 24 hours, and it should also be possible to tell the system to re-render a specific page somehow. I'm comfortable in most open-source languages, node is a favourite.

It seems to me that the most straightforward way to accomplish this is to use two tiers: a rendering server and a cache server. When cache server starts up it would crawl through every url on the rendering server and store the pre-rendered HTMLS files into its local directory. For simplicity you can mirror the "directory structure" and make the resource paths identical. In other words, for every URL on the rendering server that looks like this:
http://render.xyz/path/to/resource
You create a directory structure /path/to on the cache server and put a file resource in it.
Your end-users don't need to be aware of this architecture. They make requests to the cache server like this:
http://cache.xyz/path/to/resource
The cache server gives them the result they are looking for.
There are many ways to tell the cache server to refresh (re-generate) a page. You could add a "hidden" directory, let's call it .cache-command, and use it to handle refresh requests. For example, to tell the cache server to refresh a resource, you would use a URL like this:
http://cache.xyz/.cache-command/refresh/path/to/resource
When the cache server received that request, it would refresh the resource.
One of the advantages of this approach is that your cache server can be completely independent of the render server. They could be written in different languages, running on different hardware, or they could be part of the same nodejs application. Whatever works best for you.

Construct GitLab URL without slug

Is there any way to build a GitLab URL for a milestone or project based on its id property instead of the slug?
Context:
I have an app that I use as a GitLab web hook, and from its front end would like to link back to GitLab. I'm keeping the project and milestone ids, as they are unique, but can't find a way to link back to them. Ideally something like: http://gitlab.example.com/project/83/milestone/113 or even http://gitlab.example.com/milestone/113 would work for me (even if they do a redirect).

Examining rake routes and config/routes.rb tells me that such routes do not exist.
The only options I can see are:
store just the slugs which are also unique. Your request and memory usage will be slightly larger, but it's worth it.
make an extra API request to get the slugs. This requires 2 requests, so it is worse than having a larger request.
For new routes of form /something to be created in gitlab, something needs to be blacklisted at https://github.com/gitlabhq/gitlabhq/blob/199029b842de7c9d52b5c95bdc1cc897da8e5560/lib/gitlab/blacklist.rb, and interestingly projects is already blacklisted, but it is currently only used for project creation.
milestones however is not blacklisted: so a user could be called milestiones and that would generate ambiguity.
I would also take a close look at how GitHub organizes its API and paths, as it is likely to be optimal: is ID web access possible in GitHub?

What's the easiest way to request a list of web pages from a web server one by one?

Given a list of URLs, how does one implement the following automated task (assuming windows and ubuntu are the available O/Ses)? Are there existing types of tools that can make implementing this easier or do this out of the box?
log in with already-known credentials
for each specified url
request page from server
wait for page to be returned (no specific max time limit)
if request times out, try again (try x times)
if server replies, or x attempts failed, request next url
end for each
// Note: this is intentionally *not* asynchronous to be nice to the web-server.
Background: I'm implementing a worker tool that will request pages from a web server so the data those pages need to crunch through will be cached for later. The worker doesn't care about the resulting pages' contents, although it might care about HTML status codes. I've considered a phantom/casper/node setup, but am not very familiar with this technology and don't want to reinvent the wheel (even though it would be fun).

You can request pages easily with the http module.
Here's an example.
Some people prefer the request module available in npm.
Here's a link to the github page
If you need more than that, you can use phantomjs.
Here's a link to the github page for bridging node and phantom
However, you could also look for simple cli commands for making requests such as wget or curl.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string