Why varnish ban requests not removing/invalidating cached data? - varnish

I am trying to debug inconsistent behavior with varnish.
I have an application in which when a piece of content is updated a ban request is issued to varnish in order to remove it from cache and invalidate that cache. The problem is that this works fine just a few times but not in the majority of times although I can see the bans in the varnish log. Just to rephrase, when I save a piece of content, a ban is issued to varnish of the form
1374003254.031996 75 req.http.host ~ www.example.com && req.url ~ ^(.*)(?<!\d{1})539250(?!\d{1}) where 539250 is the unique content id present in the url.
I logged into the varnish host and check the varnish process. Executing ps -ef |grep varn gives
root 8889 1 0 15:19 ? 00:00:00 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -T :8100 -f /etc/varnish/qa.vcl -u varnish -g varnish -h critbit -p http_max_hdr 256 -p thread_pool_min 200 -p thread_pool_max 4000 -p thread_pools 2 -p thread_pool_stack 262144 -p thread_pool_add_delay 2 -p session_linger 100 -p sess_timeout 60 -p listen_depth 4096 -p lru_interval 20 -p ban_lurker_sleep 0.2 -s malloc,1G
varnish 8897 8889 0 15:19 ? 00:00:00 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -T :8100 -f /etc/varnish/qa.vcl -u varnish -g varnish -h critbit -p http_max_hdr 256 -p thread_pool_min 200 -p thread_pool_max 4000 -p thread_pools 2 -p thread_pool_stack 262144 -p thread_pool_add_delay 2 -p session_linger 100 -p sess_timeout 60 -p listen_depth 4096 -p lru_interval 20 -p ban_lurker_sleep 0.2 -s malloc,1G
Is it Normal to have 2 processes?
then I did an ban.list in the varnish cli:
1374003254.031996 75 req.http.host ~ example.com && req.url ~ ^(.*)(?<!\d{1})539250(?!\d{1})
1374003202.365076 224G req.http.host ~ example.com && req.url ~ ^(.*)(?<!\d{1})539250(?!\d{1})
1374003116.772315 83G req.http.host ~ example.com && req.url ~ ^(.*)(?<!\d{1})539250(?!\d{1})
1374002967.450431 267G req.http.host ~ example.com && req.url ~ ^(.*)(?<!\d{1})539250(?!\d{1})
1374002756.701640 187G req.http.host ~ example.com && req.url ~ ^(.*)(?<!\d{1})539250(?!\d{1})
All I want to know if there is somthing wrong causing the ban not remove cached data.

Your varnish process lines are looking good.
Varnish has a management process which starts (and watches over) a child where all the request handling is done. These are the two processes you are seeing.
If you do a lot of bans, you should consider reading the "Smart bans" chapter in the Varnish book. It will help you keep the list of bans shorter.
https://www.varnish-software.com/static/book/Cache_invalidation.html#smart-bans

Related

Varnish 6 reload

I've upgraded my varnish from 6.2.x to 6.6.x. Amost everyting works Ok, but no reload.
After "start" ps show:
root 10919 0.0 0.0 18960 5288 ? Ss 22:38 0:00 /usr/sbin/varnishd -j unix,user=vcache -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -p thread_pools=8 -p thread_pool_min=100 -p thread_pool_max=4000 -p workspace_client=128k -p workspace_backend=128k -l 200m -S /etc/varnish/secret -s malloc,256m -s static=file,/data/varnish_storage.bin,80g
now I try to reload:
Apr 8 22:42:10 xxx varnishd[10919]: CLI telnet 127.0.0.1 5282 127.0.0.1 6082 Rd auth 0124ef9602b9e6aad2766e52755d02a0d17cd6cfe766304761d21ea058bd8b3b
Apr 8 22:42:10 xxx varnishd[10919]: CLI telnet 127.0.0.1 5282 127.0.0.1 6082 Wr 200 -----------------------------#012Varnish Cache CLI 1.0#012-----------------------------#012Linux,5.4.0-107-generic,x86_64,-junix,-smalloc,-sfile,-sdefa
ult,-hcritbit#012varnish-6.6.1 revision e6a8c860944c4f6a7e1af9f40674ea78bbdcdc66#012#012Type 'help' for command list.#012Type 'quit' to close CLI session.
Apr 8 22:42:10 xxx varnishd[10919]: CLI telnet 127.0.0.1 5282 127.0.0.1 6082 Rd ping
Apr 8 22:42:10 xxx varnishd[10919]: CLI telnet 127.0.0.1 5282 127.0.0.1 6082 Wr 200 PONG 1649450530 1.0
Apr 8 22:42:10 xxx varnishd[10919]: CLI telnet 127.0.0.1 5282 127.0.0.1 6082 Rd vcl.load reload_20220408_204210_11818 /etc/varnish/default.vcl
Apr 8 22:42:15 xxx varnishreload[11818]: VCL 'reload_20220408_204210_11818' compiled
Apr 8 22:42:20 xxx varnishreload[11818]: Command: varnishadm -n '' -- vcl.use reload_20220408_204210_11818
Apr 8 22:42:20 xxx varnishreload[11818]: Rejected 400
Apr 8 22:42:20 xxx varnishreload[11818]: CLI communication error (hdr)
Apr 8 22:42:20 xxx systemd[1]: varnish.service: Control process exited, code=exited, status=1/FAILURE
Apr 8 22:42:20 xxx systemd[1]: Reload failed for Varnish Cache, a high-performance HTTP accelerator.
and now ps shows:
vcache 10919 0.0 0.0 19048 5880 ? SLs 22:38 0:00 /usr/sbin/varnishd -j unix,user=vcache -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -p thread_pools=8 -p thread_pool_min=100 -p thread_pool_max=4000 -p workspace_client=128k -p workspace_backend=128k -l 200m -S /etc/varnish/secret -s malloc,256m -s static=file,/data/varnish_storage.bin,80g
vcache 10959 0.4 0.2 84585576 23088 ? SLl 22:39 0:01 /usr/sbin/varnishd -j unix,user=vcache -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -p thread_pools=8 -p thread_pool_min=100 -p thread_pool_max=4000 -p workspace_client=128k -p workspace_backend=128k -l 200m -S /etc/varnish/secret -s malloc,256m -s static=file,/data/varnish_storage.bin,80g
I see process owner was changed to vcache. What is wrong with it? anoder reload will fail too with same reject code.
Can you try removing -j unix,user=vcache from your varnishd runtime command. If I remember correctly, Varnish will automatically drop privileges on the worker process without really needing to explicitly set jailing settings.
If that doesn't work, please also explain which commands you used to start Varnish and reload Varnish.

Cannot get traffic when setting up transparent proxy mode for specific port

I'd like to capture a HTTP service call from HostA -> HostB to test the client on HostA. Both OS are Linux. I tried following but fail.
What's the recommended way to do this?
I would like to use transparent proxy mode because I cannot modify client and cannot redirect all traffic from HostA to hostB as other service also running on HostA. I'd like only redirect the connection of the client from host A to host B.
The client in Host A call a service on Host B on a certain port 10001 by HTTP.
I tried setup HostC with mitmproxy (HostA and HostC are in the same subnet)
HostA (ip_A) -> HostC(ip_C) with mitmproxy-> HostB(ip_B) , I set the ip table to build transparent mode.
Following is what I setup for on HostA
sudo iptables -t mangle -I OUTPUT -p tcp --dport 10001 -j MARK --set-mark 1
sudo ip route add default via ip_C table 100
sudo ip rule add fwmark 0x1 table 100
On HostC
sudo sysctl -w net.ipv4.ip_forward=1
sudo iptables -t nat -A PREROUTING -o eth0 -p tcp --dport 10001 -j REDIRECT --to-port 8080
mitmproxy -T --host
This doesn't work. client on HostA connection timeout.
If I try traceroute on HostA
traceroute ip_B -p 10000 -T
It shows ip_B is unreachable on TCP from HostA
I also tried setup mitmproxy on HostA, but when I try to redirect traffic of port 10001
on HostA
sudo iptables -t nat -A OUTPUT -p tcp --dport 10001 -j REDIRECT --to-port 8080
mitmproxy -T --host
The service call could be capture by mitmproxy on HostA but cannot get response.
Thanks a lot for your help.

Nrpe Not pulling date NRPE: Unable to read output

I'm trying to get memory metric from client machine. I installed nrpe in client machine and works well for default checks like load, users and all.
Manual output from client machine,
root#Nginx:~# /usr/lib/nagios/plugins/check_mem -w 50 -c 40
OK - 7199 MB (96%) Free Memory
But when i try from server, other metrics works but memory metrics not working,
[ec2-user#ip-10-0-2-179 ~]$ /usr/lib64/nagios/plugins/check_nrpe -H 107.XX.XX.XX -c check_mem
NRPE: Unable to read output
Other metrics works well
[ec2-user#ip-10-0-2-179 ~]$ /usr/lib64/nagios/plugins/check_nrpe -H 107.XX.XX.XX -c check_load
OK - load average: 0.00, 0.01, 0.05|load1=0.000;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.050;5.000;20.000;0;
I ensured that check_mem command has execution permission for all,
root#Nginx:~# ll /usr/lib/nagios/plugins/check_mem
-rwxr-xr-x 1 root root 2394 Sep 6 00:00 /usr/lib/nagios/plugins/check_mem*
Also here is my client side nrpe config commands
command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_procs]=/usr/lib/nagios/plugins/check_procs -w 200 -c 250
command[check_http]=/usr/lib/nagios/plugins/check_http -I 127.0.0.1
command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 30 -c 20
command[check_mem]=/usr/lib/nagios/plugins/check_mem -w 30 -c 20
Can anyone help me to fix the issue?

Varnish process not closed and taking huge memory

We are using Varnish cache 4.1 in centos server, When we started Varnish server lots of varnish process starting and its not closing, due to this issue we are facing memory leak issue, pls let us know how we can resolve it
My Configuration is: /etc/sysconfig/varnish
#DAEMON_OPTS="-a :80 \
# -T localhost:6082 \
# -f /etc/varnish/default.vcl \
# -S /etc/varnish/secret \
# -p thread_pools=8 \
# -p thread_pool_max=4000 \
# -p thread_pool_add_delay=1 \
# -p send_timeout=30 \
# -p listen_depth=4096 \
# -s malloc,2G"
backend default {
.host = "127.0.0.1";
.port = "8080";
.probe = {
.url = "/";
.interval = 5s;
.timeout = 1s;
.window = 5;
.threshold = 3;
}
}
34514 89208 83360 5 0.0 4.3 0:00.00 /usr/sbin/varnishd -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -p thread pool min=50 -p t 1678 varnish 20 0 345M 89208 83360 S 0.0 4.3 0:00.03 /usr/sbin/varnishd -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -p thread_pool_min=50 -p • 1679 varnish 20 0
You are not limiting space for transient objects. By default an unlimited malloc is used (see the official doc : https://www.varnish-cache.org/docs/4.0/users-guide/storage-backends.html#transient-storage )
From what I see in your message, you are not using the parameter DAEMON_OPT.
What are the content of your varnishd.service file and /etc/varnish/varnish.params ?
EDIT
Nothing's wrong with your init.d script. It should use the settings found in /etc/sysconfig/varnish.
How many RAM is consumed by varnish?
All the varnish threads are sharing the same storage (malloc 2G + Transient malloc 100M) so it should take up to 2.1G for storage. you need to add an average overhead of 1KB per object stored in cache to get the total memory used.
I don't think you are suffering memory leak, the process are normal. You told varnish to spawn 50 processes (with the thread_pools parameter) so they are expected.
I'd recommend decreasing the number of thread_pools, you are setting it to 50. You should be able to lessen it to something between 2 and 8, at the same time it will help to increase the thread_pool_max to 5000and set the thread_pool_min to 1000.
We are running very large server with 2 pools * 1000-5000 threads and have no issue.

iptables, apache, Linux Mint (Ubuntu), forward from port 80 to 1080, no listening sockets available, shutting down

I have rewritten the question, using the answer from Pedro.
I am getting the error “no listening sockets available, shutting down”, when i try to execute simple script :
$ apache2ctl -f `pwd`/conf/httpd.conf -d `pwd`
on Linux Mint based on Ubuntu.
i am in the directory /usr/www/apache3/site.toddle,
The contents of /usr/www/apache3/site.toddle/conf/httpd.conf is
User www-data
Group www-data
# added to get rid of apache2: Configuration error: No MPM loaded
Include /etc/apache2/mods-enabled/*.load
Include /etc/apache2/mods-enabled/*.conf
#copied from 000-default.conf from /etc/apache2/sites-available
<VirtualHost *:1081>
ServerName my586
ServerAdmin webmaster#localhost
DocumentRoot /usr/www/apache3/site.toddle/htdocs/
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>
I have followed the Pedro answer and links.
1) using sudo gedit /etc/apache2/ports.conf added to the /etc/apache2/ports.conf the line and saved the file:
#original: Listen 80
Listen 1081
2) restarted apache using $ sudo /etc/init.d/apache2 restart
3) configured iptables using Pedro example for port 1081:
sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 1081
sudo iptables-save
sudo iptables -t nat -I OUTPUT -p tcp -d 127.0.0.1 --dport 80 -j REDIRECT --to-ports 1081
sudo iptables-save
But running the command: apache2ctl -fpwd/conf/httpd.conf -dpwd
gives the error "no listening sockets available, shutting down
AH00015: Unable to open logs
Action '-f /usr/www/apache3/site.toddle/conf/httpd.conf -d /usr/www/apache3/site.toddle' failed."
Checking with netstat, gives that apache listens to the port 1081:
$ sudo netstat -ltnp | grep ':1081'
tcp6 0 0 :::1081 :::* LISTEN 3160/apache2
The rules that have always worked for me for redirecting incoming traffic on port 80 to an apache server on port 1080 are:
sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 1080
sudo iptables -t nat -I OUTPUT -p tcp -d 127.0.0.1 --dport 80 -j REDIRECT --to-ports 1080
You could test these rules by listening with netcat on port 1080 on your server, and trying to connect to your server on port 80 using netcat from a different machine.
Then make sure that your apache server has it's config set to you port 1080.
See this post for setting Apache to run on different port:
Configure apache to listen on port other than 80

Resources