I am using the Flickr API to retrieve a list of photos with associated sizes, returning as links to the raw image. These links are all HTTPS, as is the domain they are served from.
However, despite the document and all local resources being served over HTTP/2, all images from Flickr are served over HTTP/1.1. What necessary steps are involved in serving all resources over HTTP/2 where available?
I might be misunderstanding its implementation with Cross-Origin - or Flickr's API implementation - but as far as I can tell resources across domains should be able to use the HTTP/2 protocol where available server-side. Any reasons to the contrary or explanations why not would be greatly appreciated, as the newer protocol significantly speeds up the site in question, but serving them locally would take up much space.
Related
what functionalities should a web server at least have?
And in order to implement a web server, what protocols should be read and understood first?
A web server almost by definition has to serve a website.
Websites are served by the HTTP protocol.
This is explained in detail in RFC 2616.
Note: This is HTTP/1.1 rather than 1.0, but literally no one uses HTTP/1.0.
HTTP is a reasonable simple request response protocol, and has been expanded by a huge number of extensions over the years.
It is also being succeeded by HTTP/2.0, which is significantly more complex.
Can somebody explain to me the architecture of this website (link to a picture) ? I am struggling to understand the different elements in the front-end section as well as the fields on top, which seem to be related to AWS S3 and CDNs. The backend-section seems clear enough, although I don't understand the memcache. I also don't get why in the front end section an nginx proxy is needed or why it is there.
I am an absolute beginner, so it would be really helpful if somebody could just once talk me through how these things are connected.
Source
Memcache is probably used to cache the results of frequent database queries. It can also be used as a session database so that authenticated users' session work consistently across multiple servers, eliminating a need for server affinity (memcache is one of several ways of doing this).
The CDN on the left caches images in its edge locations as they are fetched from S3, which is where they are pushed by the WordPress part of the application. The CDN isn't strictly necessary but improves down performance by caching frequently-requested objects closer to where the viewers are, and lowers transport costs somewhat.
The nginx proxy is an HTTP router that selectively routes certain path patterns to one group of servers and other paths to other groups of servers -- it appears that part of the site is powered by WordPress, and part of it node.js, and part of it is static react code that the browsers need to fetch, and this is one way of separating the paths behind a single hostname and routing them to different server clusters. Other ways to do this (in AWS) are Application Load Balancer and CloudFront, either of which can route to a specific server based on the request path, e.g. /assets/* or /css/*.
Say I have a server that serves an HTML file at the url https://example.com/ and this refers to a css file at the url https://test.com/mystyles.css. Is it possible to push the mystyles.css file alongside the html content as part of an HTTP2 connection, so that a browser will use this css content?
I have tried to create such a request using a self-signed certificate on my localhost (and I have pre-created a security exception for both hosts in my browser) by sending the html file when a request arrives at http://localhost/, and pushing the css with a differing hostname/port in the :authority or Host header. However, on a full-page refresh, the CSS file is fetched in a separate request from the server, rather than using the pushed css file.
See this gist for a file that I have been using to test this. If I visit http://localhost:8080/ then the text is red, but if I visit http://test:8080/ it is green, implying that the pushed content is used if the origin is the same.
Is there a combination of headers that needs to be used for this to work? Possibly invoking CORS?
Yes it is theoretically possible according to this blog post from a Chrome developer advocate from 2017.
As the owners of developers.google.com/web, we could get our server to
push a response containing whatever we wanted for android.com, and set
it to cache for a year.
...
You can't push assets for
any origin, but you can push assets for origins which your connection
is "authoritative" for.
If you look at the certificate for developers.google.com, you can see
it's authoritative for all sorts of Google origins, including
android.com.
Viewing certificate information in Chrome Now, I lied a little,
because when we fetch android.com it'll perform a DNS lookup and see
that it terminates at a different IP to developers.google.com, so
it'll set up a new connection and miss our item in the push cache.
We could work around this using an ORIGIN frame. This lets the
connection say "Hey, if you need anything from android.com, just ask
me. No need to do any of that DNS stuff", as long as it's
authoritative. This is useful for general connection coalescing, but
it's pretty new and only supported in Firefox Nightly.
If you're using a CDN or some kind of shared host, take a look at the
certificate, see which origins could start pushing content for your
site. It's kinda terrifying. Thankfully, no host (that I'm aware of) offers full control over HTTP/2 push, and is unlikely to thanks to this little note in the spec: ...
In practice, it sounds like it's possible if your certificate has authority over the other domains and they're hosted at the same IP address, but it also depends on browser support. I was personally trying to do this with Cloudflare and found that they don't support cross-origin push (similar to the blog post author's observations about CDNs in 2017).
I use Amazon Cloudfront to host all my site's images and videos, to serve them faster to my users which are pretty scattered across the globe. I also apply pretty aggressive forward caching to the elements hosted on Cloudfront, setting Cache-Controlto public, max-age=7776000.
I've recently discovered to my annoyance that third party sites are hotlinking to my Cloudfront server to display images on their own pages, without authorization.
I've configured .htaccessto prevent hotlinking on my own server, but haven't found a way of doing this on Cloudfront, which doesn't seem to support the feature natively. And, annoyingly, Amazon's Bucket Policies, which could be used to prevent hotlinking, have effect only on S3, they have no effect on CloudFront distributions [link]. If you want to take advantage of the policies you have to serve your content from S3 directly.
Scouring my server logs for hotlinkers and manually changing the file names isn't really a realistic option, although I've been doing this to end the most blatant offenses.
You can forward the Referer header to your origin
Go to CloudFront settings
Edit Distributions settings for a distribution
Go to the Behaviors tab and edit or create a behavior
Set Forward Headers to Whitelist
Add Referer as a whitelisted header
Save the settings in the bottom right corner
Make sure to handle the Referer header on your origin as well.
We had numerous hotlinking issues. In the end we created css sprites for many of our images. Either adding white space to the bottom/sides or combining images together.
We displayed them correctly on our pages using CSS, but any hotlinks would show the images incorrectly unless they copied the CSS/HTML as well.
We've found that they don't bother (or don't know how).
The official approach is to use signed urls for your media. For each media piece that you want to distribute, you can generate a specially crafted url that works in a given constraint of time and source IPs.
One approach for static pages, is to generate temporary urls for the medias included in that page, that are valid for 2x the duration as the page's caching time. Let's say your page's caching time is 1 day. Every 2 days, the links would be invalidated, which obligates the hotlinkers to update their urls. It's not foolproof, as they can build tools to get the new urls automatically but it should prevent most people.
If your page is dynamic, you don't need to worry to trash your page's cache so you can simply generate urls that are only working for the requester's IP.
As of Oct. 2015, you can use AWS WAF to restrict access to Cloudfront files. Here's an article from AWS that announces WAF and explains what you can do with it. Here's an article that helped me setup my first ACL to restrict access based on the referrer.
Basically, I created a new ACL with a default action of DENY. I added a rule that checks the end of the referer header string for my domain name (lowercase). If it passes that rule, it ALLOWS access.
After assigning my ACL to my Cloudfront distribution, I tried to load one of my data files directly in Chrome and I got this error:
As far as I know, there is currently no solution, but I have a few possibly relevant, possibly irrelevant suggestions...
First: Numerous people have asked this on the Cloudfront support forums. See here and here, for example.
Clearly AWS benefits from hotlinking: the more hits, the more they charge us for! I think we (Cloudfront users) need to start some sort of heavily orchestrated campaign to get them to offer referer checking as a feature.
Another temporary solution I've thought of is changing the CNAME I use to send traffic to cloudfront/s3. So let's say you currently send all your images to:
cdn.blahblahblah.com (which redirects to some cloudfront/s3 bucket)
You could change it to cdn2.blahblahblah.com and delete the DNS entry for cdn.blahblahblah.com
As a DNS change, that would knock out all the people currently hotlinking before their traffic got anywhere near your server: the DNS entry would simply fail to look up. You'd have to keep changing the cdn CNAME to make this effective (say once a month?), but it would work.
It's actually a bigger problem than it seems because it means people can scrape entire copies of your website's pages (including the images) much more easily - so it's not just the images you lose and not just that you're paying to serve those images. Search engines sometimes conclude your pages are the copies and the copies are the originals... and bang goes your traffic.
I am thinking of abandoning Cloudfront in favor of a strategically positioned, super-fast dedicated server (serving all content to the entire world from one place) to give me much more control over such things.
Anyway, I hope someone else has a better answer!
This question mentioned image and video files.
Referer checking cannot be used to protect multimedia resources from hotlinking because some mobile browsers do not send referer header when requesting for an audio or video file played using HTML5.
I am sure of that about Safari and Chrome on iPhone and Safari on Android.
Too bad! Thank you, Apple and Google.
How about using Signed cookies ? Create signed cookie using custom policy which also supports various kind of restrictions you want to set and also it is wildcard.
APIs with terrible security are common place. Case in point - this story on TechCrunch.
It begs the question, how do you balance security with performance when it comes to SSL? Obviously, sensitive information such as usernames and password should be sent over SSL. What about subsequent calls that perhaps use an API key? At what point is it okay to use an unencrypted connection when it comes to API calls that require proof of identity?
If you allow mixed content, then a man-in-the-middle, can rewrite mixed content to inject JS to steal sensitive information already in the page.
With cafés and the like providing free wireless access, man-in-the-middle attacks are not all that difficult.
https://www.eff.org/pages/how-deploy-https-correctly gives a good explanation:
When hosting an application over
HTTPS, there can be no mixed content;
that is, all content in the page must
be fetched via HTTPS. It is common to
see partial HTTPS support on sites, in
which the main pages are fetched via
HTTPS but some or all of the media
elements, stylesheets, and JavaScript
in the page are fetched via HTTP.
This is unsafe because although the
main page load is protected against
active and passive network attack,
none of the other resources are. If a
page loads some JavaScript or CSS code
via HTTP, an attacker can provide a
false, malicious code file and take
over the page’s DOM once it loads.
Then, the user would be back to a
situation of having no security. This
is why all mainstream browsers warn
users about pages that load mixed
content. Nor is it safe to reference
images via HTTP: What if the attacker
swapped the Save Message and Delete
Message icons in a webmail app?
You must serve the entire application
domain over HTTPS. Redirect HTTP
requests with HTTP 301 or 302
responses to the equivalent HTTPS
resource.
The problem is that without understanding the performance of your application it is just wrong to try and optimize the application without metrics. This is what leads to decisions by devs to leave an API unecrypted simply thinking it's eeking out another 10ms's of performance. Simply put the best way to balance security concerns versus performance is to worry about security first, get some load from real customers(not whiteboard stick figures being obsessed over by some architect) and get real metrics from your code when you suspect performance might be an issue. I have a weird feeling that it won't be security related.
You need to gather some evidence about the alleged performance issues of SSL before you leap. You might get quite a surprise.