Understanding complex website architecture (reactjs,node, CDN, AWS S3, nginx)

Understanding complex website architecture (reactjs,node, CDN, AWS S3, nginx) - node.js

Can somebody explain to me the architecture of this website (link to a picture) ? I am struggling to understand the different elements in the front-end section as well as the fields on top, which seem to be related to AWS S3 and CDNs. The backend-section seems clear enough, although I don't understand the memcache. I also don't get why in the front end section an nginx proxy is needed or why it is there.
I am an absolute beginner, so it would be really helpful if somebody could just once talk me through how these things are connected.
Source

Memcache is probably used to cache the results of frequent database queries. It can also be used as a session database so that authenticated users' session work consistently across multiple servers, eliminating a need for server affinity (memcache is one of several ways of doing this).
The CDN on the left caches images in its edge locations as they are fetched from S3, which is where they are pushed by the WordPress part of the application. The CDN isn't strictly necessary but improves down performance by caching frequently-requested objects closer to where the viewers are, and lowers transport costs somewhat.
The nginx proxy is an HTTP router that selectively routes certain path patterns to one group of servers and other paths to other groups of servers -- it appears that part of the site is powered by WordPress, and part of it node.js, and part of it is static react code that the browsers need to fetch, and this is one way of separating the paths behind a single hostname and routing them to different server clusters. Other ways to do this (in AWS) are Application Load Balancer and CloudFront, either of which can route to a specific server based on the request path, e.g. /assets/* or /css/*.

Related

Is it possible to use the same node.js server for two/three different domains (aliases)?

Is it possible to use the same nodeJS server for two/three different domains (aliases)? (I don't want to redirect my users. I want them to see the exact URL they typed in the address bar. However, all three domains are exactly the same!)
I want my users to be logged in on all three domains at the same time, in order to avoid any confusion.
What is the simplest way to do this and avoid cross-domain issues?
Thanks!

If you mean that all domains will serve the same nodejs app then Yes you can do that.
but if each domain should open a different application then you must have a reverse proxy running on the server to handle and manage the sites/vhosts.
You may install nginx and use it as reverse proxy server or look for http-proxy a library for nodejs.
If you would like to manage the vhosts in your app you can look for vhost middleware for nodejs and use it

Choose one of:
Use some other server (like nginx) as a reverse proxy.
Use node-http-proxy as a reverse proxy.
Use the vhost middleware if each domain can be served from the same Connect/Express codebase and node.js instance.

This is a very broad question. Moreover, it is generally a pretty bad idea, SEO-wise, to have multiple independent domains that each serve the same content.
Logging in is generally either done through Cookies, or through extra parameters in the URL. Cookies are always domain-specific, for obvious security reasons. If you want to ensure folks will be logged in to all the domains at once, you can create an internal purpose-driven domain to handle authentication (without such domain showing in URL bar, and only being used for HTTP redirects, effectively); such domain will store the login state for all the rest, and the rest would pick up the login state through such purpose-driven domain (through HTTP redirects).
In general, however, this sounds like too much trouble. Consider that, perhaps, some users specifically want to use different domains for different accounts, so, you'll effectively break their usage if you mandate that a single login be used for all of them. And, back to the original point, doing this is pretty bad for SEO, so, just don't do it.

Reverse proxy in Azure with Web Apps

I'm moving from Apache on Linux to Azure Web Apps and I have a specific url (mysite.com/blog and everything under it) that is configured with a reverse proxy so the end user doesn't know that the content is actually coming from another service.
I'm sure I can do this within Web Apps (which runs on IIS) but I can't find any documentation on how to do this. As a backup I'm open to putting another service in front of my Web App.
How can I accomplish this in Azure?
Update: I did try using another service - Functions. My architecture looks like this:
This works in production but I'm hitting snags in development. /blog may or may not work depending on the entry point. In prod, our DNS will be configured so mysite.com points to mysite-proxy.azurewebsites.net and, therefore, any URI the user hits will work. In dev, however, we may want to browse to hit /blog from the Traffic Manager which will route us to /blog on the webapp which doesn't exist. Same problem, of course, if we go to /blog directly on the webapp. I tried to show these examples on the right side of the diagram.
I would like to find a solution so the webapp itself can handle the /blog proxying and then we can determine whether it's worth the speed and cost tradeoff compared to the existing solution.

You might want to checkout Azure Functions Proxies: https://learn.microsoft.com/en-us/azure/azure-functions/functions-proxies

Sounds like you want an Application Gateway (caution, costs like $15/day)
The AGW can have multiple listeners against multiple hostnames, including path-based routing.
You will want to add two backends, one for the /blog endpoint and one for the non-/blog stuff. The backends just take the IP or FQDN of the target resource, in this case you will have:
blogBackend: myblog.com
defaultBackend: myWebapp.azurewebsites.net
Then you need to add listeners for your public-facing domain, it would be like:
myHttpListener: port 80, hostname=mywebsite.net
myHttpsListener: port 443, hostname=mywebsite.net
Then you need an HTTP setting
myHttpSetting: protocol=HTTPS, port=443, useWellKnownCACert=Yes, HostnameOverride=Yes,pick from backend target
Then you need rules, one for http=>https redirect, and the other for handling the pathing
myRedirectRule: type=basic, listener=myHttpListener, backendtargettype=redirection, targettype=listener, target=myHttpsListener
myRoutingRule: type=path-based, listener=myHttpsListener, targettype=backendpool, target=defaultBackend, httpSetting=myHttpSetting, pathRules=
path=/* name=root backendpool=defaultBackend
path=/blog name=blog backendpool=blogBackend
You can create additional http settings and assign them to the path rules to change the behaviour of the reverse proxy. For example, you can have the user's URL be https://mywebsite.net/blog, but have the path stripped on the request to the blog so the request looks like myblog.com instead of myblog.com/blog
There's a lot of tiny parts, but the app gateways can handle multiple applications at scale. Biggest thing is to watch out for the cost since this is more of an enterprise solution.

Is there a proxy webserver that routes dynamically requests based on URLs?

I am looking for a way how to dynamically route requests through proxy webserver. I will explain what I need exactly and what I have found so far.
I would like to have some lightweight webserver (thinking about node.js or nginx) set up as proxy webserver with public IP. It would route requests to different local webservers based on URLs. But not only based on hostname but based on full URL.
My idea is, that this proxying webserver would use either local memory cache, memcached or redis to look up key-value based information of URL and local webserver.
I have found these projects:
https://github.com/nodejitsu/node-http-proxy
https://www.steve.org.uk/Software/node-reverse-proxy/
https://github.com/hipache/hipache
They all seem to do similar things, but not exactly what I am looking for, that is:
URL based proxying (absolute URLs routing to different local webservers)
use of memory based configuration storage / cache
dynamically change configuration using API without reloading proxy webserver
Is there any better-suited project or is there a way how to configure one of three projects above to fit my requirements ?
Thank you for your time and effort in advance.

I think this does exactly what you want: https://openresty.org/en/dynamic-routing-based-on-redis.html
It's basically nginx with precompiled modules. You can setup the same by yourself with nginx + lua module + redis ( + of course the necessary lua rocks). OpenResty just makes it easier.

Host multiple site with node.js

I'm currently learning node.js and loving it. I noticing, however, that it seems that's it's really only fit for one site. So it's great for hosting mydomain.com, but what if I want to build an actual full web server with it. In other words, I would like to host mydomain.com, example.com, yourdomain.com and so on. What solutions (modules) are available for this? I was thinking of simply parsing the url from the request object and simply reading from the appropriate directory. For example if I get a request for example.com then read from the example_com directory or if I get a request from mydomain.com read from the mydomain_com directory. The issue here is I don't know how this will affect performance and scalability.
I've looked into Multi-node but I don't fully follow the idea of processes yet (I'm a node beginner).
Any suggestions are welcome.

You can do this a few different ways. One way is to write it directly into your web application by checking what domain the request was made to and then route within your application but unless your application is very basic this can make it fairly bloated and can get messy. A good time to do something like this might be if you're writing a blogging platform where everything is pretty much the same across all your domains. The key difference might be how you query your data to display the right data.
In this case you'd probably use the request to see which blog is being accessed.
If you want to just host a few different domains on the same server all using port 80 (like most websites do) you will want to proxy each request off to a different process. You can do this with nginx or even with node itself. It all comes down to what best fits your needs. bouncy is a quick way to get setup doing this as its a nodejs module and has some pretty impressive benchmarks. nginx (proxy with nginx) is probably the most wildly used method though, as a lot of nodejs servers use nginx to serve static content anyways.
http://blog.noort.be/2011/03/07/node-js-on-nginx.html
https://github.com/substack/bouncy/

You can use connect's vhost middleware (which is also available in express) to dispatch requests to separate request handlers based on the Host: header. This assumes that everything is being handled by the same node process on the same port; if you really need separate processes, then the suggestion about using nginx as a reverse proxy is probably the way to go.

What are the pros and cons of a 100% HTTPS site?

First, let me admit that what I know about HTTPS is pretty rudimentary. I don't know much about session security, encryption, or how either of those things is supposed to be done.
What I do know is that web security is important; that horror stories of XSS, CSRF, and database injections pop up over and over again. I know that a preventative stance against such exploits is better than a reactive one.
But the motivation for this question comes from a different point of view. I work at a site that regularly accepts payment from users. Obviously, the payments are sent over a secure channel (HTTPS). I mainly work on the CSS, HTML, and JavaScript of the site. What I've been told is that it is necessary to duplicate CSS, JavaScript, and image files before they can be called over HTTPS. So assume I have the following files:
css/global.css
js/global.js
images/
logo.png
bg.png
The way I understand it, these files need to be duplicated before they can be "added" to the HTTPS. So a file can either be under security (HTTPS) or not.
If this is true, then this is a major hindrance. In even the smallest site, it would be a major pain to duplicate files and then have to maintain them every time you make a CSS or JS change. Obviously this could be alleviated by moving everything into the HTTPS.
So what I want to know is, what are the pros and cons of a site that is completely behind HTTPS? Does it cause noticeable overhead? Is it just foolish to place the entire site under encryption? Would users feel safer seeing the "secure" notifications in their browser during their entire visit? And last but not least, does it truly make for a more secure site? What can HTTPS not protect against?

You can serve the same content via HTTPS as you do via HTTP (just point it to the same document root).
Cons that may be major or minor, depending:
serving content over HTTPS is slower than serving it via HTTP.
certificates signed by well-known authorities can be expensive
if you don't have a certificate signed by a trusted authority (eg, you sign it yourself), visitors will get a warning
Those are pretty basic, but just a few things to note. Also, personally, I feel much better seeing that the entire site is HTTPS if it's anything related to financial stuff, obviously, but as far as general browsing, no, I don't care.

Noticeable overhead? Yes, but that matters less and less these days as clients and servers are much faster.
You don't need to make a copy of everything, but you do need to make those files accessible via HTTPS. Your HTTPS and HTTP services can use the same doc root.
Is it foolish to put the whole site under encryption? Typically no.
Would users feel safer? Probably.
Does it truly make for a more secure site? Only when dealing with the communication channel between the client and the server. Everything else is still up for grabs.

You've been misinformed. The css, js, and image files need not be duplicated assuming you've set up the http and https mapping to point to the same physical website on the server. The only important thing is that these files are referenced with https when the page you're looking at is also under https. This will prevent the dreaded security message that says that some objects on the page are not secured.
For every other page where you're running the site under http (unsecured) you can reference those same files in the same locations, but with an http address.
To answer your other question, there would indeed be a performance penalty to put the entire site under https. The server has to work hard to encrypt everything it sends over the wire. And then some not-so-old browsers won't cache https content to disk by default, which of course will result in an even heavier load on the server.
Because I like my sites to be as responsive as possible, I'm always selective about which sections of a site I choose to be SSL-encrypted. In most typical e-commerce sites, the only pages that need SSL encryption are the login, registration, and checkout pages.

The traditional reason for not having the entire site behind SSL is processing time. It does take more work for both the client and the server to use SSL. However, this overhead is fairly small compared to modern processors.
If you are running a very large site, you may need to scale slightly faster if you are encrypting everything.
You also need to buy a certificate, or use a self signed one which may not be trusted by your users.
You also need a dedicated IP address. If you are on a shared hosting system, you need to have an IP that you can dedicate to only having SSL on your site.
But if you can afford a certificate and private ip and don't mind needed a slightly faster server, using SSL on your entire site is a great idea.
With the number of attacks that SSL mitigates, I would say do it.

You do not need multiple copies of these files for them to work with HTTPs. You may need to have 2 copies of these files if the hosting setup has been configured in such that you have a separate https directory. So to answer your question - no duplicate files are not required for HTTPs but depending on the web hosting configuration - they may be.
In regards to the pros and cons of https vs http there are already a few posts addressing that.
HTTP vs HTTPS performance
HTTPS vs HTTP speed comparison
HTTPs only encrypts the data between the client computer and the server. It does not software holes or issues such as remote javascript includes. HTTPs doesn't make your application better - it only helps secure the data between the user and your app. You need to make sure your app has no security holes, practice filtering all data, SQL, and review security logs frequently.
However if you're only responsible for the frontend part of the site I wouldn't worry about it but would bring up concerns of security with the main developer for the backend.

One of the concerns is that https traffic could be blocked, for example on Apple computers if you set parental control on it blocks https traffic because it can't read the encrypted content, you can read here:
http://support.apple.com/kb/ht2900
https note: For websites that use SSL
encryption (the URL will usually begin
with https), the Internet content
filter is unable to examine the
encrypted content of the page. For
this reason, encrypted websites must
be explicitly allowed using the Always
Allow list. Encrypted websites that
are not on the Always Allow list will
be blocked by the automatic Internet
content filter.

An important "pro" for more https at your site is the following:
a user connecting thru an unencrypted WiFi, like at an airport, can give their password in https, but if the site then switches back to http after the password page, the session cookie becomes exposed and can be immediately used by an eavesdropper.
See this article http://steve.grc.com/2010/10/28/why-firesheeps-time-has-come/#comment-2666

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string