Accessing the body of the backend response in Varnish VCL

Accessing the body of the backend response in Varnish VCL - varnish

My Varnish VCL code needs to make a simple GET request into a backend system and decide if the requested resource is accessible or not. Unfortunately, the backend system always returns 200, and I must examine the actual body of the response to decide.
Is there a way to access the response body (which is under 1KB) in VCL and do a substring search on it?
I am trying to avoid implementing a proxy service just for this feature.
P.S. For now I have to stick with Varnish 2.1 (Fastly)

Quick answer: no. Fastly's Varnish has diverged a lot from mainline, so you are basically stuck with what they provide and response body analysis isn't part of it.
Varnish 6.0 and 6.1 internals make that easier to build, but at the moment the vmod (xbody) you need is only available as a commercial product.

Related

Serving a HTTP request response as a dialog response - Composer Framework

We are developing a chatbot to handle internal and external processes for a local authority. We are trying to display contact information for a particular service from our api endpoint. The HTTP request is successful and delivers, in part, exactly what we want but there's still some unnecessary noise we can't exclude.
We specifically just want the text out of the response ("Response").
Logically, it was thought all we need to do is drill down into ${dialog.api_response.content.Response} but that fails the HTTP request and ${x.content} returns successful but includes Tags, response and the fields within 1.
Is there something simple we've missed using composer to access what we're after or do we need to change the way our endpoint is responding 2? Unfortunately the MS documentation for FrwrkComp is lacking to say the very least.
n.b. The response is currently set up as a (syntactically) SSML response, this is just a test case using an existing resource.
Response in the Emulator
Snippet from FwrkComp

Turns out it was the first thing I tried just syntactically correct. For the case of the code given it was as simple as:
${dialog.api_response.content[0].Response}

Is it possible to identify which client sent a HTTP request?

Is it possible to identify the client / library which sent a HTTP request?
I am trying to fetch some data via an API and it is possible to query the API via cURL and python, but when I try to use node (doesn't matter which library, axios requests, unirest, native, ...) or wget I get a proprietary error back from the backend.
Now I am wondering, if the backend is able to identify, which library I am using?
More information:
The requests are exactly the same, so no way to distinguish them
The user-agent header field is set and overwritten for all requests
I already tried to monitor the traffic in wireshark, but couldn't find any differences with the packets on HTTP layer (only the order of the header fields is different, that according to the standard this shouldn't make a difference)

It turns out that the problem was TLS fingerprinting.
See: https://httptoolkit.tech/blog/tls-fingerprinting-node-js/

Nodejs uses google V8 JS engine, V8 based http request clients will not allow you to override headers that would compromise 'web safety', so for example if you are setting "Origin, Host, Referrer" headers, node might refuse to do so. I had the same issue previously.
Un-opinionated http clients, such as the ones written in C++(curl) and python won't 'web safety' check your requests, so that is what is causing the difference in behavior.
In my case I used a C++ library that I called from javascript to make my 'unsafe' requests and the problem was solved.

How to prevent ServiceStack from leaking private server information during 403 Forbidden Response

Servicestack Version: 3.9.71.0
Target Framework: .NET 3.5
Program background: has been in production use for over 3.5 years
Recently due to a customer security audit items were brought to our attention. All but one have been eliminated as IIS configuration changes.
The last item identified describes a situation in which the probing software accessed an endpoint without the proper authentication. This was fine and the expected result was the 403 Forbidden. The unexpected result was that the response body is displaying certain internal information of the server.
Based on quite a few articles I have searched it seams the the response body information being returned is a result of how Servicestack my be configured.
I realize this is a fairly older version of Service Stack. My preference would be to identify an IIS setting to override a forbidden response. Aside from that an option to just return a status code of 403 without the additional information. The third would be to create and use a custom 403 response object to control what is revealed.
Any guidance or help would surely be appreciated.. Thank you in advance.

ServiceStack v3 is a very old version of ServiceStack last updated in 2013. If you need to make any changes you'll need to create a custom build from its Sources
Looking at the v3 sources for how it resolves the ForbiddenHttpHandler:
ForbiddenHttpHandler = config.GetCustomErrorHttpHandler(HttpStatusCode.Forbidden);
It looks like you'll be able to override what HttpHandler is used by overriding the CustomHttpHandlers, e.g:
EndpointHostConfig.Instance.CustomHttpHandlers[HttpStatusCode.Forbidden] = MyHandler {...}

How HTTP response is generated

I'm fairly new to programming and this question is about making sure I get the HTTP protocol correctly. My issue is that when I read about HTTP request/response, it looks like it needs to be in a very specific format with a status code, HTTP version number, headers, a blank line followed by the body.
However, after creating a web app with nodejs/express, I never once had to actually write code that made an HTTP response in this format (I'm assuming, although I don't know for sure that other frameworks like ruby on rails or python/Django are the same). In the express app, I just set up the route handlers to render the appropriate pages, when a request was made to that route.
Is this because express is actually putting the response in the correct HTTP format behind the scenes? In other words, if I looked at the expressJS code, would there be something in that code that actually makes an HTTP response in the HTTP format?
My confusion is that, it seems like the HTTP request/response format is so important but somehow I never had to write any code dealing with it for a node/express application. Maybe this is the entire point of a framework like express... to take out the details so that developers can deal with business logic. And if that is correct, does anyone ever write web apps without a framework to do this. Would you then be responsible for writing code that puts the server's response into the exact HTTP format?

I'm fairly new to programming and this question is about making sure I get the HTTP protocol correctly. My issue is that when I read about HTTP request/response, it looks like it needs to be in a very specific format with a status code, HTTP version number, headers, a blank line followed by the body.
Just to give you an idea, there are probably hundreds of specifications that have something to do with the HTTP protocol. They deal with not only the protocol itself, but also with the data format/encoding for everything you send including headers and all the various content types you can send, authentication schemes, caching, status codes, URL decoding, etc.... You can see some of the specifications involved just by looking here: https://www.w3.org/Protocols/.
Now a simple request and a simple text response could get away with only knowing a few of these specifications, but life is not always that simple.
Is this because express is actually putting the response in the correct HTTP format behind the scenes? In other words, if I looked at the expressJS code, would there be something in that code that actually makes an HTTP response in the HTTP format?
Yes, there would. A combination of Express and the HTTP library that is built into node.js handle all the details of the specification for you. That's the advantage of using a library/framework. They even handle different versions of the protocol and feedback from thousands of other developers have helped them to clean up edge case bugs. A good library/framework allows you to still control any detail about the response (headers, content types, status codes, etc..) without making you have to go through the detail work of actually creating the exact response. This is a good thing. It lets you write code faster and lets you ride on the shoulders of others who have already figured out minutiae details that have nothing to do with the logic of your app.
In fact, one could say the same about the TCP protocol below the HTTP protocol. No regular app developer wants to write their own TCP stack. Instead, you just want a working TCP stack that you can use that's already been tuned and debugged for you.
However, after creating a web app with nodejs/express, I never once had to actually write code that made an HTTP response in this format (I'm assuming, although I don't know for sure that other frameworks like ruby on rails or python/Django are the same). In the express app, I just set up the route handlers to render the appropriate pages, when a request was made to that route.
Yes, this is a good thing. The framework did the detail work for you. You just call res.setHeader(), res.status(), res.cookie(), res.send(), res.json(), etc... and Express makes the entire response for you.
And if that is correct, does anyone ever write web apps without a framework to do this. Would you then be responsible for writing code that puts the server's response into the exact HTTP format?
If you didn't use a framework or library of any kind and were programming at the raw TCP level, then yes you would be responsible for all the details of the HTTP protocol. But, hardly anybody other than library developers ever does this because frankly it's just a waste of time. Every single platform has at least one open source library that does this already and even if you were working on a brand new platform, you could go get an open source body of code and port it to your platform much quicker than you could write all this yourself.
Keep in mind that one of the HUGE advantages of node.js is that there's an enormous body of open source code (mostly in NPM and Github) already prepackaged to work with node.js. And, because node.js is server-side where code memory isn't usually tight and where code just comes from the local hard disk at server init time, there's little downside to grabbing a working and tested package that does what you already need, even if you're only going to use 5% of the functionality in the package. Or, worst case, clone an existing repository and modify it to perfectly suit your needs.

Is this because express is actually putting the response in the
correct HTTP format behind the scenes?
Yes, exactly, HTTP is so ubiquitous that almost all programming languages / frameworks handle the actual writing and parsing of HTTP behind the scenes.
Does anyone ever write web apps without a framework to do this. Would
you then be responsible for writing code that puts the server's
response into the exact HTTP format?
Never (unless you're writing code that needs very low level tweaking of HTTP code or something)

Is it possible for Varnish to examine the content of a request (not just headers) in vcl_fetch and react?

I know that the default Varnish vcl_fetch looks at beresp.ttl and beresp.http.* to reference the HTTP headers returned from the backend, but is it possible to examine the content of the response also? Our backend sometimes fails with junk HTML but with a status of 200 OK. We'd like to be able to run a regex on the result and retry if possible.
I understand that versions of Varnish <= 3.0 don't stream anyway and download the entire object before passing to the client, but I can't find the appropriate field in beresp in the documentation - I'm looking for something like beresp.http.content

Yes and no. It's accessible, but only through inline C, not VCL configuration (to the best of my knowledge). However, it's not easy to do and not really recommended due to the additional overhead of parsing body text. That said, you can see an attempt at something like what you're looking for here: rewrite vmod for varnish 3
If your junk HTML responses are of a specific length, you can retry the request based on the response's Content-Length header. Alternatively, you might consider adding client-side JS to evaluate the HTML and make an AJAX request to a URL to clear the cache of any junk pages. Lastly, if you know that only a specific subset of your site that returns invalid results, you can try proxying those URLs through something like OpenResty with LuaJIT or nginx with the subs module enabled, and do the body parsing there.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string