Multiple url purging Varnish issue

Multiple url purging Varnish issue - varnish

I have an issue with varnish purging :
Our application is very dynamic .
So an event on Object A , will generate 10.000 Purges because Object A infos are present in all pages.
Object A is stats seller and Page are ads page .
We are managing this by an asynch http PURGE call to varnish from the php code using curl .
So we will have 10000 http call
The urls cannot be calculated (so REgex is not an options )
I want to ask you guys , is there any possibility in varnish to do some Batch Purging (HTTP interface) ?
If not , what's the options that you test and works in a very dynamic application when model and events affect a lot your pages .
Thanks in advance
Nabil

Running the purges through varnishadm would be your best bet. You could either tunnel commands through SSH (assuming you are dealing with a remote Varnish server) or allow remote access from your Web server to the Varnish server.
You can easily write your own shell script to run a batch purge using varnishadm or you could take a look at Thinner, which is a Ruby based purger written to do exactly what you're looking for.
The obvious alternative, which you have most likely considered already, is to re-write your App to include Object A in the URL or in a custom header (for example X-Object: A), so you could do the ban based on that header:
sub vcl_recv {
if (req.request == "BAN") {
ban("obj.http.x-object == " + req.http.x-object);
}
}

Related

API Platform with alternative Runtime, Caddy, Vulcain, Cache ecosystem

Currently I'm investigating a setup backed by api-platform with the following goals:
the PHP backend MUST yield minimal resource payloads, thus I do not want to embed relations at all
the PHP backend SHOULD be able to run in alternative runtimes, e.g. Swoole
the webserver should push related resources via HTTP2 Push leveraging the built in vulcain support of the api-platform distribution
I cannot find that many resources about those setups - at least not in such a form that they answer subsequent questions sufficiently.
My starting setup was simply based on the api-platform distribution 2.6.8
So, until now I've learned the following things:
out of the box, the caddy + http2 push setup works with the PHP container being based on php:8.1-fpm-alpine - while caddy is obviously directly using php_fastcgi
when I was fooling around with the currently available cache-handler I was able to get the http cache working but I was struggling to find any information about cache invalidation works. The api-platform docs mostly focus on varnish; there is also only a VarnishPurger shipped in the api-platform core. Wring a custom one should not be that hard if the caddy cache-handler somehow allows BAN requests or something similar - where to find info about that? I see that the handler is based on Souin - but as unfamiliar as I am I have no clue how (and if) Souin supports cache invalidation after all.
when changing the php container to be (in my current testing scenario) based on Swoole then php_fastcgi cannot be used in caddy - instead, I ended up using reverse_proxy (as described in vulcain docs) which basically works and serves proper http responses but does not push any resources requested with Preload headers (as I said, it worked when the PHP backend was based on PHP-FPM). How can I debug what happens here? Caddy does not yield any info about the push handling - nor does the vulcain caddy module
Long story short(er): to sum up my questions
how can I figure out why caddy + vulcain is not working in a reverse_proxy setup?
is the current state of the caddy cache handler functional / supported by the api-platform distribution
how to implement/support BAN requests (or other fine grained cache invalidation) for caddy cache handler?

Souin supports the invalidation using the PURGE HTTP method. I already wrote a PR to set Souin in the api-platform/core project but they are busy with the v3.0 release. Maybe in a near future they'll review and probably merge it, I dunno. But if you use a decorator on the varnish purger and use the code I wrote in the PR, you'll be able to purge automatically the associated endpoints to the base route.

How to extract value from URL and check cache to load data in varnish

I have a scenario where my URL will be either contains a comma delimiter with value or without.
i.e. /api/parameters/XXXXXXXXXX?tables=x0 or tables=x0;x1;x2.
now based on this URL I want to check in the varnish that, if URL contains multiple values as tables then separate that out and pass each table name in seperate URL (/api/parameters/XXXXXXXXXX?tables=x0, /api/parameters/XXXXXXXXXX?tables=x1, /api/parameters/XXXXXXXXXX?tables=x2) either to cache if miss then backend server.
then based on the response of this need to combine the result and return it to the client.
my question here is:
How to segregate the value from the URL and pass a modified URL to varnish cache or backend.
after returning the result I want to return it as a combined JSON object in a sequence of which it was originally requested with a comma delimiter(i.e. x0 result;x1 result;x2 result).

It is possible to turn a single request into multiple subrequests in Varnish. Unfortunately this cannot be done with the open source version, only with the Enterprise version.
vmod_http
https://docs.varnish-software.com/varnish-cache-plus/vmods/http/ describes how you can perform HTTP calls from within Varnish using vmod_http.
By sending HTTP requests to other URLs through Varnish, you can get multiple objects out of the cache and aggregate them into a single response
No looping
The fact that Varnish doesn't have loops makes matters a bit more complicated. You'll have so set an upper limit to the amount of values the tables querystring parameter has and you'll have to check the values using individual if-statements.
Returning the combined JSON output
Once you have fetched the results from the various URLs, you can create a JSON string and return it via return(synth(200,req.http.json)). Where req.http.json contains the JSON string.
This will create a synthetic response.
In Varnish Enterprise it is also possible to cache synthetic output. See https://docs.varnish-software.com/varnish-cache-plus/vmods/synthbackend/ to learn more about vmod_synthbackend.
Varnish Enterprise disclaimer
The solution I suggested in my answer uses Varnish Enterprise, the commercial version of Varnish. It extends Varnish capabilities with additional VMODs and features, which you can read about here. One easy way to try it out without upfront licensing payments, if you’re interested, is to spin up an instance on cloud infrastructure:
Varnish Enterprise on AWS
Varnish Enterprise on Azure
Varnish Enterprise on GCP

How to set SSL versions in script when there are multiple URLs in a concurrent group of requests in a Single script?

There are 2 different URLs in a script that I have recorded and each use a different version of SSL. There is a concurrent group inside the script which has requests with both the URLs. How do I set the SSL version for them without removing the concurrency part?
I have tried using WinInet mode for replay which solved the issue. But I need to measure the response time for each URL and I cannot achieve it using WinInet mode as it doesn't generate the Web Page Diagnostics graph.
I've also tried creating automatic transactions but I couldn't see any of them in the results summary.

If you have access to the servers involved, then enable the time-taken HTTP log field. If you are running IIS, which is a good chance with WinInet, then the default log model for IIS will give you what you need.
At the conclusion of your tests, pull the logs. Use Microsoft logparser (staying with the Microsoft theme), to pull the min, max, avg time-taken values, grouped by request and filtered on the IP addresses of your load generators.

it would be interesting to know which SSL versions you have.
the following function let you set the SSL version before the URL call:
web_set_sockets_option("SSL_VERSION", "put your tls version here");
accepted are TLS1, TLS1.1, TLS1.2 and more.
See the Help hitting F1 on the function to get more information.

How to prevent 3rd part services from using my API?

I have developed a front-end interface using Aja(AngularJS) and HTML5. Right now, I send an HTTP get request to my backend server which returns some data based on the GET parameters.
Since the URL is exposed in the Javascript file, I believe anyone could just use the URL to create there own API to fetch the data. How can I prevent such things ?
One way I could think of is that now instead of directly sending the request to the backend server, an application server could be used (hosting the HTML as well). The Ajax request would then be sent to this server (PHP script ?) which would in turn forward the request to the backend server and return the result to the UI. To prevent 3rd party services, I can disable cross origin requests on my application server.
Is this the correct way to solve my problem or are there better ways to do this? I am concerned that this would unnecessarily create another hop (internal though) for requests.
Note: The backend is running Apache Tomcat

In APIs that are not open to the world the user has to authenticate first in order to use it, see for example https://stripe.com/docs/api#authentication or http://dev.maxmind.com/geoip/geoip2/web-services/ -> Authorization

Pitfalls of accessing a webserver on 127.0.0.1 from js with a public site

I'm thinking about exploring the idea of having our client software run as a service on a high port and listen for simple http GET requests from 127.0.0.1. The theory is that I would be able to access this service via js from a web page that is served from my site.
1) User installs client software that installs itself as a service and waits for authenticated requests on 127.0.0.1:8080
2) When the user hits my home page js on the page makes an xhtml request to 127.0.0.1:8080 and asks for the status
3) The home page then makes another js request back to my web server sending the status that it received.
This would allow my users to upload/download and edit files on a USB attached device in real-time from a browser. Polling could be the fallback method which is close to what we do today.
Has anyone done this and what potential pitfalls are there? Will this even work?

I can't see any potential pitfalls. I do have a couple of points however.
1/ You probably want to make sure your service only accepts incoming connection from the local machine (127.0.0.1). Otherwise, anyone could look at your JavaScript and figure out that it's talking to [your-ip]:8080. They could then try that themselves from a remote site (security hole).
2/ I wouldn't use port 8080 as it's commonly used for other things (alternate HTTP servers, etc.). Make it configurable and choose a nice high random-type value.
3/ I'm not sure what you're trying to do with point 3 but I think you're trying to send the status back to the user. In which case, why wouldn't the JavaScript on your home page just get the status in a single session and output/update the HTML to be presented to the user? Your "another js request back to my web server" doesn't make sense to me.

You may not be able to do a xml http request to 127.0.0.1 as XMLHTTPRequest is usually limited to the same domain as the main content is being served from. I'm not sure if this restriction applies if the server is on the client's machine. That being said, you could still create a <script> tag that had the src pointing to 127.0.0.1, and have the web server return some Javascript to run. If you only need a simple response, this could work well.

I think it is much better for you to avoid implementation of application logic in JavaScript and html. Once user clicks button on a web page JavaScript should send request to your service and allow it do the rest of the work.

You could have problems with step 1 (Client installs itself) depending on your target user base.
You will need a customised install for each supported environment (Win2K, Vista, Linux, MAC OS 9.0/10.0 etc.).
If your user is on a locked down at work PC this simply wont be allowed.
To some users this might look distressingly similar to a trojan unless you explicitly point out you will be installing software that runs as a service.
You didnt mention an unistall procedure. Users resent "Adobe" like software which installs itself and provides no sensible un-install options
Ohterwise the approach is sound, and, there are are couple of commercial products out there that use exactly this approach!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string