Varnish: rewrite/reguide requests after after API call possible? - varnish

How can I make Varnish work like a switch?
I need to consult an authentication service with a request of the original client request. That authentication service checks upon original request if access is permitted and replies simply with a status code and probably some more information in the header. Upon that status code and header information from that auth service, I would like varnish to serve content from different backends. Depending on the status code the backend can vary and I would like to add some additional header before Varnish fetches the content.
Finally varnish should cache and reply to client.

Yes, that's doable using some VCL and VMODs. For example, you could use the cURL VMOD during vcl_recv in order to trigger the HTTP request against the authentication service, check response, and then use that info for backend selection and other caching decisions (that would be just simple VCL). A much better alternative would be the http VMOD, but that one is only available in Varnish Enterprise. In fact, a similar example to what you want to achieve is available in the linked documentation; see 'HTTP Request' section.
In any case, it would be a good idea to minimise interactions with the authentication service using some high performance caching mechanism. For example, you could use the redis VMOD for that (or even Varnish itself!).

Related

Varnish : Make An API request to fetch data and embedd it into response header to client

I want to send a cached page back to the user But the problem is that I need to generate a unique VISITOR_ID for every new user and
send it back to the user through headers , so I need to send an API call from varnish proxy server to my backend servers to fetch VISITOR_ID and then append it to the response
We were earlier using Akamai and we were able to implement this using edge workers present there,
I want to know if such a thing is possible to do in varnish or not.
Thanks in Advance
Open source solution for HTTP calls
You can use https://github.com/varnish/libvmod-curl and add this VMOD to perform HTTP calls from within VCL. The API for this module can be found here: https://github.com/varnish/libvmod-curl/blob/master/src/vmod_curl.vcc
Commercial solution for HTTP calls
Despite there being an open source solution, I want to mention that there might be a more stable solution that is actually supported and receives frequent updates. See https://docs.varnish-software.com/varnish-cache-plus/vmods/http/ for the HTTP VMOD that is part of Varnish Enterprise.
Generate the VISITOR_ID in VCL
Your solution implies that an HTTP call is needed for every single response. While that is possible through various VMODs, this will result in a lot of extra HTTP calls. Unless you cache every variation that includes the VISITOR_ID.
You could also consider generating the unique ID yourself in VCL.
See https://github.com/otto-de/libvmod-uuid for a VMOD that generates UUIDs or https://github.com/varnish/libvmod-digest for a VMOD that generates hashes.
Fetch VISITOR_ID from Redis
If you prefer to generate the VISITOR_ID in your origin application, you could use a Key/Value store like Redis to store or generate values.
You can generate the ID in your application and store it in Redis. You could also generate and store it using LUA scripting in Redis.
Varnish can then fetch the key from Redis and inject it in the response.
While this is a similar approach to the HTTP calls, at leasts we know Redis is capable of keeping up with Varnish in terms of high performance.
See https://github.com/carlosabalde/libvmod-redis to learn how to interface with Redis from Varnish.

Restful Api's: Am I Not Following the RestFul Api's Definition. One of its constraint is that it should be stateless

Out of six most important constraints of RESTFUL API'S one is that it should be stateless that we should not save any state or variable at server.
As you can see i am storing id into a constant variable. So Am i not making real Restful Api's. Please Help me
'stateless' as it pertains to HTTP, means that, in a nutshell HTTP requests should not be interpreted different depending on what HTTP request came before it. All the information about the request should be contained in the request.
For example, if I open a HTTP request and log in, and then I don't close the TCP connection and do another request, the server should not assume I'm still the same user/person. It can only figure that out based on headers such as Authorization or Cookie.
Your const is not even a global constant. It will be re-created for every request.
But even if it were, this probably doesn't matter. If you do a PUT request and results in something stored in a database, this is 'state', but unrelated to the statelessness of HTTP.

Block http/https requests executed from scripts or any other external sources in node server

In node, if I use a library like axios and a simple async script, I can send unlimited post requests to any web server. If I know all parameters, headers and cookies needed for that url, I'll get a success response.
Also, anyone can easily make those requests using Postman.
I already use CORS in my node servers to block requests coming from different origins, but that works only for other websites triggering requests in browsers.
I'd like to know if it's possible to completely block requests from external sources (manually created scripts, postman, softwares like LOIC, etc...) in a node server using express.
thanks!
I'd like to know if it's possible to completely block requests from external sources (manually created scripts, postman, softwares like LOIC, etc...) in a node server using express.
No, it is not possible. A well formed request from postman or coded with axios in node.js can be made to look exactly like a request coming from a browser. Your server would not know the difference.
The usual scheme for an API is that you require some sort of developer credential in order to use your API. You apply terms of service to that credential that describe what developers are allowed or are not allowed to do with your API.
Then, you monitor usage programmatically and you slow down or ban any credentials that are misusing the APIs according to your terms (this is how Google does things with its APIs). You may also implement rate limiting and other server protections so that a run-away developer account can't harm your service. You may even black list IP addresses that repeatedly abuse your service.
For APIs that you wish for your own web pages to use (to make Ajax calls to), there is no real way to keep others from using those same APIs programmatically. You can monitor their usage and attempt to detect usage that is out-of-line of what your own web pages would do. There are also some schemes where you place a unique, short-use token in your web page and require your web pages to include the token with each request of the API. With some effort, that can be worked around by smart developers by regularly scraping the token out of your web page and then using it programmatically until it expires. But, it is an extra obstacle for the API thief to get around.
Once you have identified an abuser, you can block their IP address. If they happen to be on a larger network (like say a university), their public IP address may be shared by many via NAT and you may end up blocking many more users than you want to - that's just a consequence of blocking an IP address that might be shared by many users.

How to distinguish between HTTP requests sent by my client application and other requests from the Internet

Suppose I have an client/server application working over HTTP. The server provides a RESTy API and client calls the server over HTTP using regular HTTP GET requests.
The server requires no authentication. Anyone on the Internet can send a GET HTTP request to my server. It's Ok. I just wonder how I can distinguish between the requests from my client and other requests from the Internet.
Suppose my client sent a request X. A user recorded this request (including the agent, headers, cookies, etc.) and send it again with wget for example. I would like to distinguish between these two requests in the server-side.
There is no exact solution rather then authentication. On the other hand, you do not need to implement username & password authentication for this basic requirement. You could simply identify a random string for your "client" and send it to api over custom http header variable like ;
GET /api/ HTTP/1.1
Host: www.backend.com
My-Custom-Token-Dude: a717sfa618e89a7a7d17dgasad
...
You could distinguish the requests by this custom header variable and it's values existence and validity. But I'm saying "Security through obscurity" is not a solution.
You cannot know for sure if it is your application or not. Anything in the request can be made up.
But, you can make sure that nobody is using your application inadvertently. For example somebody may create a javascript application and point to your REST API. The browser sends the Origin header (draft) indicating in which application was the request generated. You can use this header to filter calls from applications that are not yours.
However, that somebody may use his own web server as proxy to your application, allowing him then to craft HTTP requests with more detail. In this case, at some point you would be able of pin point his IP address and block it.
But the best solution would be to put some degree of authorization. For example, the UI part can ask for authentication via login/password, or just a captcha to ensure the caller is a person, then generate a token and associate that token with the use session. From that point the calls to the API have to provide such token, otherwise you must reject them.

Return a synthetic response then fetch and cache object in Varnish?

I'm wondering if my (possibly strange) use case is possible to implement in Varnish with VCL. My application depends on receiving responses from a cacheable API server with very low latencies (i.e. sub-millisecond if possible). The application is written in such a way that an "empty" response is handled appropriately (and is a valid response in some cases), and the API is designed in such a way that non-empty responses are valid for a long time (i.e. days).
So, what I would like to do is configure varnish so that it:
Attempts to look up (and return) a cached response for the given API call
On a cache miss, immediately return an "empty" response, and queue the request for the backend
On a future call to a URL which was a cache miss in #2, return the now-cached response
Is it possible to make Varnish act in this way using VCL alone? If not, is it possible to write a VMOD to do this (and if so, pointers, tips, etc, would be greatly appreciated!)
I don't think you can do it with VCL alone, but with VCL and some client logic you could manage it quite easily, I think.
In vcl_miss, return an empty document using error 200 and set a response header called X-Try-Again in the default case.
In the client app, when receiving an empty response with X-Try-Again set, request the same resource asynchronously but add a header called X-Always-Fetch to the request. Your app does not wait for the response or do anything with it once it arrives.
Also in vcl_miss, check for the presence of the same X-Always-Fetch header. If present, return (fetch) instead of the empty document. This will request the content from the back end and cache it for future requests.
I also found this article which may provide some help though the implementation is a bit clunky to me compared to just using your client code: http://lassekarstensen.wordpress.com/2012/10/11/varnish-trick-serve-stale-content-while-refetching/

Resources