How browser knows which headers add to requests - browser

When I type url of a site to browser's address bar, browser sends a request to get the resource by the url. But when I go to different web sites (google.com, amazon.com, etc.), requests which initialize the page, have different headers for different sites.
Where browser gets the set of request's headers to load the page if browser has only information about URL of this resource at the first initialization?
for example when I go to google.com browser sends such request headers:
:authority: www.google.com
:method: GET
:path: /
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9,ru-RU;q=0.8,ru;q=0.7
cache-control: max-age=0
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: same-origin
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
For amazon.com, the request's headers are different:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9,ru-RU;q=0.8,ru;q=0.7
Connection: keep-alive
Host: amazon.com
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36

When you type in a URL into the address the bar this needs to be translated to an HTTP request.
So typing www.google.com means you need to GET the default page (/) from that server. That's basically all covered in the first 4 lines in the first request.
The browser also knows what types of format it can accept. Mostly we deliver HTML back so text/html is certainly in there, but we also accept other formats - including the completely generic */* btw! :-)
Requests are often compressed (with either gzip, deflate or the newer brotli (br) format) so the browser tells the server which of those it supports in the accept-encoding header.
When you installed your browser you also set a default language so we can tell the server that. Some servers will return different content based on this.
Then there are some security headers (I won't go into these as quite complicated).
Finally we have the user-agent header. this is basically where the browser tells the server whether it's Chrome, or Firefox or whatever. But for historical reasons it's much longer than just "Chrome".
So basically the request headers are things the browser sends to the server to give it more information about the browser and it's capabilities. For a request that's just typed into the browser the request headers will basically be the same no matter what the URL is. For additional requests made by the page - e.g. by JavaScript code they may be different if it adds more headers.
As to the differences between the two example requests you gave:
Google uses HTTP/2 (or QUIC if using Chrome but for now that's basically HTTP/2 as far as this question is concerned). you can see this if you add the option Protocol column to developer tools.
HTTP/2 has a couple of changes from HTTP/1, namely:
HTTP Header Names are lower cased. Technically in HTTP/1 they are case insensitive, but by convention many tools like browser used title case (capitalising first letter of each word).
The request (e.g. GET / HTTP/1.1) is converted to pseudo headers beginning with a colon (:method: GET, :path: /...etc.).
Host is basically :authority in HTTP/2.
:scheme is basically new in HTTP/2 as previously it wasn't explicitly part of the HTTP request, and handled at a connection level.
Connection is defunct in HTTP/2. Even in HTTP/1.1 it defaulted to keep-alive so above header was not necessary but lots of browsers and other clients sent it for historical reasons.
I think that explains all the differences.
So how does the browser know whether to use HTTP/2 or HTTP/1.1? Which already has an answer on Stack Overflow, but basically it's decided when the HTTPS session is established if the server advises it can support HTTP/2 and the browser wants to use it.

Related

Why does a Tokio 0.2.0-alpha server with async-await not serve requests in parallel? [duplicate]

I'm trying to implement long polling for the first time, and I'm using XMLHttpRequest objects to do it. So far, I've been successful at getting events in Firefox and Internet Explorer 11, but Chrome strangely is the odd one out this time.
I can load one page and it runs just fine. It makes the request right away and starts processing and displaying events. If I open the page in a second tab, one of the pages starts seeing delays in receiving events. In the dev tools window, I see multiple requests with this kind of timing:
"Stalled" will range up to 20 seconds. It won't happen on every request, but will usually happen on several requests in a row, and in one tab.
At first I thought this was an issue with my server, but then I opened two IE tabs and two Firefox tabs, and they all connect and receive the same events without stalling. Only Chrome is having this kind of trouble.
I figure this is likely an issue with the way in which I'm making or serving up the request. For reference, the request headers look like this:
Connection: keep-alive
Last-Event-Id: 530
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36
Accept: */*
DNT: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
The response looks like this:
HTTP/1.1 200 OK
Cache-Control: no-cache
Transfer-Encoding: chunked
Content-Type: text/event-stream
Expires: Tue, 16 Dec 2014 21:00:40 GMT
Server: Microsoft-HTTPAPI/2.0
Date: Tue, 16 Dec 2014 21:00:40 GMT
Connection: close
In spite of the headers involved, I'm not using the browser's native EventSource, but rather a polyfill that lets me set additional headers. The polyfill is using XMLHttpRequest under the covers, but it seems to me that no matter how the request is being made, it shouldn't stall for 20 seconds.
What might be causing Chrome to stall like this?
Edit: Chrome's chrome://net-internals/#events page shows that there's a timeout error involved:
t=33627 [st= 5] HTTP_CACHE_ADD_TO_ENTRY [dt=20001]
--> net_error = -409 (ERR_CACHE_LOCK_TIMEOUT)
The error message refers to a patch added to Chrome six months ago (https://codereview.chromium.org/345643003), which implements a 20-second timeout when the same resource is requested multiple times. In fact, one of the bugs the patch tries to fix (bug number 46104) refers to a similar situation, and the patch is meant to reduce the time spent waiting.
It's possible the answer (or workaround) here is just to make the requests look different, although perhaps Chrome could respect the "no-cache" header I'm setting.
Yes, this behavior is due to Chrome locking the cache and waiting to see the result of one request before requesting the same resource again. The answer is to find a way to make the requests unique. I added a random number to the query string, and everything is working now.
For future reference, this was Chrome 39.0.2171.95.
Edit: Since this answer, I've come to understand that "Cache-Control: no-cache" doesn't do what I thought it does. Despite its name, responses with this header can be cached. I haven't tried, but I wonder if using "Cache-Control: no-store", which does prevent caching, would fix the issue.
adding Cache-Control: no-cache, no-transform worked for me
I have decided to keep it simple and checked the response headers of a website that did not have this issue and I changed my response headers to match theirs:
Cache-Control: max-age=3, must-revalidate

Browser doesn't add max-age=0 to some request on browser refresh

I am using Firefox and I am requesting several URLs from server. When reload page by F5 or Cntrl+R, browser re-sends the request to server to re-validate the cached response with server by setting max-age = 0 in request. This is the desired way the browser should handle the refresh.
But for some URLs, its not resending URLs, instead it serves from its own cache. I want those requests to get revalidated by the origin.
**Response Header for this :**
Access-Control-Allow-Orig... *
Cache-Control public, s-maxage=0, max-age=21600
Content-Encoding gzip
Content-Length 167
Content-Type application/json
Date Wed, 23 Jul 2014 06:51:35 GMT
Expires Wed, 23 Jul 2014 12:51:36 GMT
Proxy-Connection close
Server lighttpd/1.4.32
Vary Accept-Encoding
Via 1.0 roswell:3128 (squid/2.6.STABLE21)
X-Cache MISS from roswell
X-Cache-Lookup MISS from roswell:3128
**Request Header**
Accept application/json, text/javascript, */*; q=0.01
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Connection keep-alive
Host some-host
Origin origin
Referer referer
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0
What could be the reason browser not sending this request again? Please help.
This behaviour has something to do with headers, but not those sent from the browser. Browsers aggressively cache any content, except those that have an expired max-age and the setting "must-revalidate" marked in the Cache-Control header by the server. What you are seeing is normal browser behaviour for content marked with max-age=21600 from the server.
But now, on with your question:
I want those requests to get revalidated by the origin.
You have two options to reload the current page, if you are using keyboard hotkeys (as per https://support.mozilla.org/en-US/kb/keyboard-shortcuts-perform-firefox-tasks-quickly section "Navigation"); you can just reload (and make use of all cacheable items from cache) or reload overriding cache.
The hotkeys are F5 / CTRL+R for reload using cacheable items, and CTRL+F5 / CTRL+SHIFT+R for reload overriding any cache.

How does SSL work on Re-directs?

I am interested in trying to figure out exactly what is going on when a user types in, for example
https://www.bing.com
which lands them on
http://www.bing.com
If you'll notice, www.bing.com apparently doesnt support https, so the page returned has no cert associated with it. Shouldn't the browser complain about this? What's more, is that when looking at the HTTP headers, I never actually see a ridirect or anything that indicates this page returned is not the https version (guess I was expecting some indication this happened).
For another example, gmail does something similar -
I go to https://gmail.com
and I end up on mail.google.com or accounts.google.com depending on whether I'm logged in or not. At least these sites give me a cert, unlike bing, but how come the browser doesn't complain that the URL's are mismatched? It seems like I should also get a cert for gmail.com is that case, right? (the cert on the gmail redirect is good for mail.google.com, but makes no mention wildcard or otherwise of gmail.com)
There's nothing special going on. It's a simple HTTP redirect, but you'll only see it if you ignore the SSL certificate error. (https://www.bing.com currently serves a certificate issued to akamai.) Remember, once you tell your browser to ignore the cert error, it will generally remember that choice for the rest of the session.
If you instruct your browser to ignore the SSL certificate error, the following happens inside a SSL-encrypted connection:
GET https://www.bing.com/ HTTP/1.1
Host: www.bing.com
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.73 Safari/537.36
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8,es;q=0.6
HTTP/1.1 302 Moved Temporarily
Server: AkamaiGHost
Content-Length: 0
Location: http://www.bing.com/
Date: Thu, 02 May 2013 22:02:28 GMT
Connection: keep-alive
There's no rule against a HTTPS site redirecting to plain HTTP1, so the browser just does a normal request for http://www.bing.com. Since we're now on a plain HTTP page, there's nothing to display (warning or otherwise) regarding certificates.
1 - except in certain situations involving POST requests, where some browsers issue warnings.
The other sites you mention work similarly, except the redirect from gmail.com is to https://mail.google.com. mail.google.com has its own certificate, distinct from https://www.gmail.com's certificate.

Is passing username/password in http string safe for this API?

Reading over
http://getpocket.com/api/docs/
Is it safe to pass a password through the HTTP string? My understand is that this is not safe, even though it's HTTPS. Correct?
The API documentation states that you're passing over HTTPS. Actually all of the information whether GET or POST in the HTTP Header is part of the SSL Transport therefore the URL parameters are encrypted as well, so your GET parameters are encrypted. What can't be guaranteed is what your client will retain. Or if there was some other process that exposed some information such as when your server did a DNS lookup for the host name. Another example is if you have a browser and it keeps a history of everything you type in it including your https urls then you may compromise your security.
Below is the HTTP Header, your client will initiate a TCP connection and send something like the following:
GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1
Host: net.tutsplus.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: PHPSESSID=r2t5uvjq435r4q7ib3vtdjq120
Pragma: no-cache
Cache-Control: no-cache
SSL will dictate that all that information is encrypted along with anything that is sent back. I would say you're safe using this API, the only difference between the GET and the POST methods would be that in the POST the parameters would be in the body whereas with the GET the parameters are in the header. In both cases all the sensitive information is encrypted.
I agree in principle that it sounds unsafe. URLs can end up in all kinds of funny places in plain text (even over HTTPS), like logs. It would be best to avoid having it in plaintext anywhere.
You should probably talk to the API authors about whether there is an alternative strategy. For example, it looks like some of those methods support both POST and GET, in which case you could possibly POST password details, which is a relatively safe thing to do over an HTTPS connection.

Using mobile tools module in Drupal6 with varnish?

Can we use Mobile Tools module in Drupal6 with Varnish?
I doubt varnish will cache the index page and will not allow redirection to mobile version of the page.
Any work arround?
You want to make your server return different responses based on the used device/browser. This means your pages 'vary' based on the used User-Agent http request header, and in theory you should instruct any http proxy/cache in between to only use a cached version if the User-Agent string is the same by adding a http response header:
Vary: User-Agent
However, because browsers like Internet Explorer (unlike Chrome) use many slightly different User-Agent headers, this will completely kill your cache hit ratio. You need a smarter cache to understand that Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0) for your purposes is equal to Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.0; Trident/4.0; InfoPath.1; SV1; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 3.0.04506.30), or any other user-agent string used by a desktop browser.
There are two options for you to solve this with Varnish:
1: Do mobile user-agent detection yourself in varnish logic, the same way mobile tools does it. E.g.:
vcl_recv {
if (req.http.user-agent ~ 'ipad|ipod|iphone|android|mini opera|blackberry|up.browser|up.link|mmp|symbian|smartphone|midp|wap|vodafone|o2|pocket|kindle|mobile|pda|psp|treo') {
hash += "mobile"
}
}
2: Or, always set a session cookie mobile=true or mobile=false after you've seen the first request, and only serve cached pages for requests with this cookie.
And after googling a bit, you should read: http://fangel.github.com/mobile-detection-varnish-drupal/

Resources