Consider a request for...
http://www.foo.com/bar?x=1&y=2
... and a subsequent request for...
http://www.foo.com/bar?y=2&x=1
Will a web browser consider them the same for caching purposes?
Yep a browser never keeps track of the order of get parameters to handle cache
Related
I am writing a backend for my application that will accept query parameters from the front end, and then query my DB based on these parameters. This sounds to me like it should be a GET request, but since I have a lot of params that I'm passing with some of them being optional I think it would be easiest to do a POST request and send the search params in a request body. I know I can convert my params to a query string and append it to my GET request, but there has to be a better way because I will be passing different data types and will end up having to parse the params on the backend anyways if I do it this way.
This depends heavily on the context, but I would prefer using GET request in your scenario.
What Request Method should I use
According to the widely accepted convention, one uses:
GET to read existing data
POST to create something new
More details can be found here: https://www.restapitutorial.com/lessons/httpmethods.html
How do I pass the parameters
Regarding the way to pass parameters, it is a less obvious thing. Unless there's something sensitive in the request parameters, it is perfectly fine to send them as part of URL.
Parameters may be either part of path:
myapi/customers/123
or a query string:
myapi?customer=123
Both options are feasible, and I'd say a choice depends heavily on the application domain model. One popular rule of thumb is:
use "parameters as a part of a path" for mandatory parameters
use "parameters as a query string" for optional parameters.
I'd recommend using POST in the case where there are a lot of parameters/options. There are a few of reasons why I think it's better than GET:
Your url will be cleaner looking
You hide internal structure from the user (it's still visible if they use the Developer Tools of the browser though)
People can't easily change the options to adjust your query. Having it in the url is simple to just modify and reload with other values. It's more work to do this as a POST.
However, if it's of any use that the URL you end up with can be bookmarked or shared, then you'd want all parameters encoded as part of the query, so using GET would be best in that case.
Another answer stated that POST should be used for creating something new, but I disagree. That might apply to PUT, but it's perfectly fine to use POST to allow more complex structures to be passed even when retrieving existing data.
For example, with POST you can send a JSON body object that has nested structure. This can be very handy and would be difficult to explode into a traditional GET query. You also have to worry about URL-encoding your data then decoding it when receiving it, which is a hassle.
For simple frontend to backend communication you don't really need REST to start with as it targets cases where the server is accessed by a plethora of clients not under your control or a client has to access plenty of different servers and should work with all of them. REST should be aimed for if you see benefit in a server that can evolve freely in future without having to fear breaking clients as they will adept to changes quite easily. Such strong properties however come at its price in terms of development overhead and careful designing. Don't get me wrong, you can still aim for a REST architecture, but for such a simple application-2-backend scenario this sounds like an overkill.
In a REST architecture usually a server will tell clients how it wants to receive input data. Think of HTML forms where the method and enctype attributes specify which HTTP method to use and to which representation format the input to convert to. Which HTTP method to use depends on the use case actually. If a server constantly receives the same request for the same input parameters and calculating the result may be costly, then caching the response once and serving further requests from that cache might take away a lot of unnecessary computation overhead from the server. I.e. the BBC claims that the cache is the single most important technology in keeping sites scalable and fast. I once read that they cache most articles for only a minute but this is sufficient enough to spare them form retrieving the same content thousands and thousands of times again and again, freeing up the resources for other requests or tasks. It is no miracle that caching also belongs to one of the few constraints REST has.
HTTP by default will allow caches to store response representations for requested URIs (including any query, path or matrix parameters) if requested via safe operations, such as HEAD or GET requests. Any unsafe operation invoked, however, will lead to a cache invalidation and therefore the removal of any stored representations for that target URI. Hence, any followup requests of that URI will reach the server in order to process a response for the requesting client.
Unfortunately caching isn't the only factor to consider when to decide between using GET or POST as also the current representation format the client currently processes has an influence on the decision. Think of a client processing the previous HTML response received from a server. The HTML response contains a form that teaches a client what fields the server expects as input as well as the choices a client can make for certain input parameters. HTML is a perfect example where the media-type restricts which HTTP methods are available (GET as default method and POST are supported) and which not (all of the other HTTP methods). Other representation formats might only support POST (i.e. while application/soap+xml would allow for either GET or POST (at least in SOAP 1.2), I have never seen GET requests in reality and so everything is exchanged with POST).
A further point that may prevent you from using GET requests is a de facto limitation on the URI length most HTTP implementations have. If you exceed this limitations some HTTP frameworks might not be able to process the message exchanged. On looking at the Web, however, one might find a little workaround to such a limitation. In most Web shops the checkout area is usually split into different pages where each page consists of a form that collects some input like address information, bank or payment data and further input that as a whole act as kind of wizard to guide the user through the payment process. Such a wizard style could be implemented in this case as well. Parts of the request are sent via POST to a dedicated endpoint that takes care of collecting the data and on the final "page" of the wizard the server will ask for a final confirmation on the collected data and uses that resource as GET target. This way the response remains cacheable even though the input data exceeded the typical URL limitation imposed by some HTTP frameworks.
While the arguments listed by Always Learning aren't wrong, I wouldn't rely on those from a security standpoint. While it may filter out people with little knowledge, it won't hinder the ones for long with knowledge (and there are plenty out there) to modify the request before sending it to your server. So simply recommending using PUT as a way to making user edits harder feels odd to me.
So, in summary, I'd base the decision whether to use POST or GET for sending data to the server mainly on the factor whether the response should be cacheable, as it is often requested, or not. In cases where the URI might get so large that certain HTTP frameworks may fail processing the request you are basically forced to use POST anyway unless you can split the actual request into multiple tinier requests which act as wizard for the data collection until a final confirmation request triggers the actual final HTTP call.
I want to push to Varnish specific url to cache it after processing. Flow:
processing image
when finished "push" url reference to the image to varnish to cache it
I do not want to wait for requests by clients to cache it then. It should be ready after processing to return it with high performance. Is it possible?
I can send an internal request GET like a standard client and make it cached, but I would prefer defining i.e PUT request in varnish config, and make it cached without returning it in that process.
Your only option is an internal HEAD (better than GET; it will be internally converted to a GET by Varnish when submitting the request to the backend side). The PUT approach is not possible, at least not without implementing a VMOD for it, and it probably won't be a simple one.
I'm wondering if my (possibly strange) use case is possible to implement in Varnish with VCL. My application depends on receiving responses from a cacheable API server with very low latencies (i.e. sub-millisecond if possible). The application is written in such a way that an "empty" response is handled appropriately (and is a valid response in some cases), and the API is designed in such a way that non-empty responses are valid for a long time (i.e. days).
So, what I would like to do is configure varnish so that it:
Attempts to look up (and return) a cached response for the given API call
On a cache miss, immediately return an "empty" response, and queue the request for the backend
On a future call to a URL which was a cache miss in #2, return the now-cached response
Is it possible to make Varnish act in this way using VCL alone? If not, is it possible to write a VMOD to do this (and if so, pointers, tips, etc, would be greatly appreciated!)
I don't think you can do it with VCL alone, but with VCL and some client logic you could manage it quite easily, I think.
In vcl_miss, return an empty document using error 200 and set a response header called X-Try-Again in the default case.
In the client app, when receiving an empty response with X-Try-Again set, request the same resource asynchronously but add a header called X-Always-Fetch to the request. Your app does not wait for the response or do anything with it once it arrives.
Also in vcl_miss, check for the presence of the same X-Always-Fetch header. If present, return (fetch) instead of the empty document. This will request the content from the back end and cache it for future requests.
I also found this article which may provide some help though the implementation is a bit clunky to me compared to just using your client code: http://lassekarstensen.wordpress.com/2012/10/11/varnish-trick-serve-stale-content-while-refetching/
I'm developing a web application in which all dynamic content is retrieved as JSON with Ajax requests. I'm considering whether I should protect GET API calls from being invoked from different origins?
GET requests do not modify state and a common wisdom is that they do not require CSRF protection. But I wonder if there are no corner cases in which browser leaks the result of such requests to a different origin site?
For example, if a different origin site GETs /users/emails as script, css or img, is it possible that a browser would leak resulting json to the calling site (for example via javascript onerror handler)?
Do Browsers give strong enough guarantees that a content of a cross origin JSON response won't be leaked? Do you think protecting GET request against cross origin calls makes sense or is it overkill?
You have nailed a corner case and yet highly relevant issue. Indeed, there is this possibility, and it's called JSON Inclusion or Cross Site Scripting Inclusion or Javascript Inclusion, depending on who you refer to. The attack is, basically, doing a on an evil site, and then accessing the results via javascript once the js engine has parsed it.
The short story is that ALL your JSON responses have to be contained in an Object, not an Array or JSONP (so: {...}) and for better measure you should start all responses with a prefix (while(1), for(;;) or a parser breaker). Look at facebook's or google's JSON responses to have a live example.
Or, you can make your URLs unguessable by using a CSRF protection - both approach works.
No:
This is not a CSRF issue, as long as you're returning pure JSON and your GET's are side affect free, it DOES NOT have to be csrf protected.
what Paradoxengine mentioned is another vulnerabilty: if you are using JSONP it is possible for an attacker to read the JSON sent to an authenticated user. Users of very old browsers (IE 5.5) can also be attacked in this way even using regular JSON.
You can send requests to a different domain (which is what CSRF attacks do), but you can't read the responses.
I learn this in another stack overflow question from here It seems like I understand CSRF incorrectly?
hope this help you understand the question.
I'm afraid I am fairly new to varnish but I have a problem whch I cannot find a solution to anywhere (yet): Varnish is set up to cache GET requests. We have some requests which have so many parameters that we decided to pass them in the body of the request. This works fine when we bypass Varnish but when we go through Varnish (for caching), the request is passed on without the body, so the service behind Varnish fails.
I know we could use POST, but we want to GET data. I also know that Varnish CAN pass the request body on if we use pass mode but as far as I can see, requests made in pass mode aren't cached. I've already put a hash into the url so that when things work, we will actually get the correct data from cache (as far as the url goes the calls would otherwise all look to be the same).
The problem now is "just" how to rewrite vcl_fetch to pass on the request body to the webserver? Any hints and tips welcome!
Thanks in advance
Jon
I don't think you can, but, even if you can, it's very dangerous : Varnish won't store request body into cache or hash table, so it won't be able to see any difference between 2 requests with same URI and different body.
I haven't heard about a VCL key to read request body but, if it exists, you can pass it to req.hash to differenciate requests.
Anyway, request body should only be used with POST or PUT...and POST/PUT requests should not be cached.
Request body is supposed to send data to the server. A cache is used to get data...
I don't know the details, but I think there's a design issue in your process...
I am not sure I got your question right but if you try to interact with the request body in some way this is not possible with VCL. You do not have any VCL variable/subroutine to do this.
You can find the list of variables available in VCL here (or in man vcl) :
https://github.com/varnish/Varnish-Cache/blob/master/lib/libvcl/generate.py#L105
I agree with Gauthier, you seem to have a design issue in your system.
'Hope that helps.