Varnish ESI include only works when parent page is fetched from backend - varnish

I'm running Varnish 3.0.2, Apache2, Pressflow. I am trying to get ESI up and working for the first time, and it does work, but only the first time the parent page is requested. After that it just pulls the parent page and the replaced content from the cache. The only thing I can think of is that the included content is being cached permanently, even though I am telling it to not cache the included file at all, here is the object for the included file being stored...
11 ObjProtocol c HTTP/1.1
11 ObjResponse c OK
11 ObjHeader c Date: Wed, 18 Jul 2012 23:25:56 GMT
11 ObjHeader c X-Powered-By: PHP/5.3.3-1ubuntu9.10
11 ObjHeader c Last-Modified: Wed, 18 Jul 2012 23:25:56 +0000
11 ObjHeader c Expires: Sun, 11 Mar 1984 12:00:00 GMT
11 ObjHeader c Vary: Cookie,Accept-Encoding
11 ObjHeader c ETag: "1342653956"
11 ObjHeader c Content-Encoding: gzip
11 ObjHeader c Content-Length: 656
11 ObjHeader c Content-Type: text/html
11 ObjHeader c Server: Apache/2.2.11
11 ObjHeader c Cache-Control: no-store
I've spent a full day on this, searched, read every article I can find, tried a whole heap of config tweaking, both in the VCL and in the HTTP headers. I can't see anything I'm doing wrong.
This is a snippet from my VCL, trying to force it to not store in the cache
sub vcl_fetch {
set beresp.do_esi = true;
if (req.url ~ "^/esi_") {
set beresp.http.Cache-Control = "no-store";
set beresp.ttl = 0s;
}
}
I would add that I am seeing nothing to inidicate errors in the varnishlog. I've tried using just the path, and host + path in the include src, but no difference. It simply won't ask the backend for fresh content. If you were looking at the logs for the second and subsequent requests, you wouldn't realise it was an ESI page.

provide in the sub vcl_recv {} something that tells varnish not to lookup for the request in the cache and define an additional http response element from your backend server, which is handled by a condition in vcl. e.g. "pragma: no-cache" ..
you might extend this condition in the vcl_recv with ~ "^/esi_" ..
sub vcl_recv(
# ...
# the rest goes here ..
# ...
if ((req.url ~ "^/esi_") && (req.http.pragma ~ "no-cache")) {
return (pass);
}
# ...
}

Related

is maxTotalHeaderLength working as expected?

Warp has a settingsMaxTotalHeaderLength field which by default is 50*1024 : https://hackage.haskell.org/package/warp-3.3.10/docs/src/Network.Wai.Handler.Warp.Settings.html#defaultSettings
I suppose this means 50KB? But, when I try to send a header with ~33KB, server throws bad request:
curl -v -w '%{size_request} %{size_upload}' -H #temp.log localhost:8080/v1/version
Result:
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /v1/version HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.58.0
> Accept: */*
> myheader2: <big header snipped>
>
* HTTP 1.0, assume close after body
< HTTP/1.0 400 Bad Request
< Date: Wed, 22 Jul 2020 13:15:19 GMT
< Server: Warp/3.3.10
< Content-Type: text/plain; charset=utf-8
<
* Closing connection 0
Bad Request33098 0
(note that the request size is 33098)
Same thing works with 32.5KB header file.
My real problem is actually that I need to set settingsMaxTotalHeaderLength = 160000 to send a header of size ~55KB. Not sure if this is a bug or I am misreading something?
Congratulations, it looks like you found a bug in warp. In the definition of push, there's some double-counting going on. Near the top, bsLen is calculated as the complete length of the header so far, but further down in the push' Nothing case that adds newline-free chunks, the length is updated as:
len' = len + bsLen
when bsLen already accounts for len. There are similar problems in the other push' cases, where start and end are offsets into the complete header and so shouldn't be added to len.

How can I view raw content with HTTP request?

I cannot seem to make the script print out JUST the content viewed by the page
I would like this to be using sockets module. No other libraries like requests or urllib
I cannot really try much. So I instantly committed a sin and came here first ^^'
My code:
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("pastebin.com", 80))
sock.sendall(b"GET /raw/yWmuKZyb HTTP/1.1\r\nHost: pastebin.com\r\n\r\n")
r = sock.recv(4096).decode("utf-8")
print(r)
sock.close()
I want the printed result to be:
test
test1
test2
test3
but what I get is
HTTP/1.1 200 OK
Date: Tue, 09 Apr 2019 14:20:45 GMT
Content-Type: text/plain; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=xxx; expires=Wed, 08-Apr-20 14:20:45 GMT; path=/; domain=.pastebin.com; HttpOnly
Cache-Control: no-cache, must-revalidate
Pragma: no-cache
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Vary: Accept-Encoding
X-XSS-Protection: 1; mode=block
CF-Cache-Status: MISS
Server: cloudflare
CF-RAY: 4c4d1f9f685ece41-LHR
19
test
test1
test2
test3
Just extract out the content after \r\r\n\n by using string.split and print it
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("pastebin.com", 80))
sock.sendall(b"GET /raw/yWmuKZyb HTTP/1.1\r\nHost: pastebin.com\r\n\r\n")
r = sock.recv(4096).decode("utf-8")
#Extract the content after splitting the string on \r\n\r\n
content_list = r.split('\r\n\r\n')[1].split('\r\n')
content = '\r\n'.join(content_list)
print(content)
#19
#test
#test1
#test2
#test3
sock.close()
You are doing a HTTP/1.1 request and therefore the web server might reply with a response body in chunked transfer encoding. In this mode each chunk is prefixed by the size in hexadecimal. You either need to implement this mode or you could simply do a HTTP/1.0 request in which case the server will not use chunked transfer encoding since this was only introduced with HTTP/1.1.
Anyway, if you don't want to use any existing libraries but do your own HTTP it is expected that you actually understand HTTP. Understanding means that you have read the relevant standards, because that's what standards are for. For HTTP/1.1 this is originally RFC 2616 which was later slightly reworked into RFC 7230-7235. Once you started reading these standards you likely appreciate that there are existing libraries which deal with these protocols, since these are far from trivial.

Cloudfront to use different Content-Type for compressed and uncompressed files

I am serving a website generated by a static site generator via S3 and Cloudfront. The files are being uploaded to S3 with correct Content-Types. The DNS points to Cloudfront which uses the S3 bucket as its origin. Cloudfront takes care about encryption and compression. I told Cloudfront to compress objects automatically. That worked fine until I decided to change some of the used images from PNG to SVG.
Whenever a file is requested as uncompressed it is delivered as is with the set Content-Type (image/svg+xml) and the site is rendered correctly. However, if the file is requested as compressed it is delivered with the default Content-Type (application/octet-stream) and the image is missing in the rendering. If I then right-click on the image and choose to open the image in a new tab, it will be shown correctly (without the rest of the page).
The result is the same independent of the used browser. In Firefox I know how to set it to force requesting compressed or uncompressed pages. I also tried curl to check the headers. These are the results:
λ curl --compressed -v -o /dev/null http://dev.example.com/img/logo-6998bdf68c.svg
* STATE: INIT => CONNECT handle 0x20049798; line 1090 (connection #-5000)
* Added connection 0. The cache now contains 1 members
* Trying 52.222.157.200...
* STATE: CONNECT => WAITCONNECT handle 0x20049798; line 1143 (connection #0)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to dev.example.com (52.222.157.200) port 80 (#0)
* STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x20049798; line 1240 (connection #0)
* STATE: SENDPROTOCONNECT => DO handle 0x20049798; line 1258 (connection #0)
> GET /img/logo-6998bdf68c.svg HTTP/1.1
> Host: dev.example.com
> User-Agent: curl/7.44.0
> Accept: */*
> Accept-Encoding: deflate, gzip
>
* STATE: DO => DO_DONE handle 0x20049798; line 1337 (connection #0)
* STATE: DO_DONE => WAITPERFORM handle 0x20049798; line 1464 (connection #0)
* STATE: WAITPERFORM => PERFORM handle 0x20049798; line 1474 (connection #0)
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 200 OK
< Content-Type: application/octet-stream
< Content-Length: 7468
< Connection: keep-alive
< Date: Wed, 01 Mar 2017 13:31:33 GMT
< x-amz-meta-cb-modifiedtime: Wed, 01 Mar 2017 13:28:26 GMT
< Last-Modified: Wed, 01 Mar 2017 13:30:24 GMT
< ETag: "6998bdf68c8812d193dd799c644abfb6"
* Server AmazonS3 is not blacklisted
< Server: AmazonS3
< X-Cache: RefreshHit from cloudfront
< Via: 1.1 36c13eeffcddf77ad33d7874b28e6168.cloudfront.net (CloudFront)
< X-Amz-Cf-Id: jT86EeNn2vFYAU2Jagj_aDx6qQUBXFqiDhlcdfxLKrj5bCdAKBIbXQ==
<
{ [7468 bytes data]
* STATE: PERFORM => DONE handle 0x20049798; line 1632 (connection #0)
* Curl_done
100 7468 100 7468 0 0 44526 0 --:--:-- --:--:-- --:--:-- 48493
* Connection #0 to host dev.example.com left intact
* Expire cleared
and for uncompressed it looks better:
λ curl -v -o /dev/null http://dev.example.com/img/logo-6998bdf68c.svg
* STATE: INIT => CONNECT handle 0x20049798; line 1090 (connection #-5000)
* Added connection 0. The cache now contains 1 members
* Trying 52.222.157.203...
* STATE: CONNECT => WAITCONNECT handle 0x20049798; line 1143 (connection #0)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to dev.example.com (52.222.157.203) port 80 (#0)
* STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x20049798; line 1240 (connection #0)
* STATE: SENDPROTOCONNECT => DO handle 0x20049798; line 1258 (connection #0)
> GET /img/logo-6998bdf68c.svg HTTP/1.1
> Host: dev.example.com
> User-Agent: curl/7.44.0
> Accept: */*
>
* STATE: DO => DO_DONE handle 0x20049798; line 1337 (connection #0)
* STATE: DO_DONE => WAITPERFORM handle 0x20049798; line 1464 (connection #0)
* STATE: WAITPERFORM => PERFORM handle 0x20049798; line 1474 (connection #0)
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 200 OK
< Content-Type: image/svg+xml
< Content-Length: 7468
< Connection: keep-alive
< Date: Wed, 01 Mar 2017 20:56:11 GMT
< x-amz-meta-cb-modifiedtime: Wed, 01 Mar 2017 20:39:17 GMT
< Last-Modified: Wed, 01 Mar 2017 20:41:13 GMT
< ETag: "6998bdf68c8812d193dd799c644abfb6"
* Server AmazonS3 is not blacklisted
< Server: AmazonS3
< Vary: Accept-Encoding
< X-Cache: RefreshHit from cloudfront
< Via: 1.1 ac27d939fa02703c4b28926f53f95083.cloudfront.net (CloudFront)
< X-Amz-Cf-Id: AlodMvGOKIoNb8zm5OuS7x_8TquQXzAAXg05efSMdIKgrPhwEPv4kA==
<
{ [2422 bytes data]
* STATE: PERFORM => DONE handle 0x20049798; line 1632 (connection #0)
* Curl_done
100 7468 100 7468 0 0 27667 0 --:--:-- --:--:-- --:--:-- 33639
* Connection #0 to host dev.example.com left intact
I don't want to switch off the compression for performance reasons. And it looks that this only happens for SVG file types. All other types have the correct, ie. the same Content-Type. I already tried to invalidate the cache and even switched it off completely by setting the cache time to 0 seconds. I cannot upload a compressed version when uploading to S3 because the upload process is automated and cannot easily be changed for a single file.
I hope that I did something wrong because that would be easiest to be fixed. But I have no clue what could be wrong with the setting. I already used Google to find someone having a similar issue, but it looks like it's only me. Anyone, who has an idea?
You're misdiagnosing the problem. CloudFront doesn't change the Content-Type.
CloudFront, however, does cache different versions of the same object, based on variations in the request.
If you notice, your Last-Modified times on these objects are different. You originally had the content-type set wrong in S3. You subsequently fixed that, but CloudFront doesn't realize the metadata has changed, since the ETag didn't change, so you're getting erroneous RefreshHit responses. It's serving the older version on requests that advertise gzip encoding support. If the actual payload of the object had changed, CloudFront would have likely already updated its cache.
Do an invalidation to purge the cache and within a few minutes, this issue should go away.
I was able to solve it by forcing the MIME type to "image/svg+xml" instead of "binary/octet-stream" which was selected after syncing the files with python boto3.
When you right click on an svg in a S3 bucket you can check the mimetype by viewing the metadata:
I'm not sure if this was caused by the python sync, or by some weirdness in S3/Cloudfront. I have to add that just a cache invalidation didn't work after that. I had to re-upload my files with the correct mimetype to get cloudfront access to the svg working ok.

WinHTTPRequest Returning Empty Response Text and Body

I'm having trouble getting response text and a response body returned when I run the code below. The "HTTP/1.1 200 OK" message comes back along with response headers, but no response body. I've confirmed this result using Fiddler2 and also looking a netsh trace log.
Other URLs (http://real-chart.finance.yahoo.com/table.csv?s=CELG&d=6&e=26&f=2014&g=d&a=2&b=26&c=1990&ignore=.csv) for example, do return response text as well as a response body.
Why is there a problem with this URL and how can I get it to return a response body?
Sub testlogin()
fileUrl = "http://financials.morningstar.com/ajax/ReportProcess4CSV.html?t=XNYS:HFC&region=USA&culture=en-US&productCode=COM&reportType=is&period=&dataType=A&order=desc&columnYear=5&rounding=3&view=raw"
Set WHTTP = CreateObject("WinHTTP.WinHTTPrequest.5.1")
WHTTP.Open "GET", fileUrl, False
WHTTP.Send
MsgBox WHTTP.Status
MsgBox WHTTP.ResponseText
MsgBox WHTTP.ResponseBody
MsgBox WHTTP.GetAllResponseHeaders
Set WHTTP = Nothing
End Sub
Have you studied those response headers that are returned by the GET calls to both URLs?
Morningstar is like this:
Cache-Control: max-age=0
Connection: keep-alive
Date: Sat, 26 Jul 2014 22:07:33 GMT
Pragma: no-cache
Content-Length: 0
===>> Content-Type: text/html;charset=UTF-8 <<===
===>> Content-Encoding: gzip <<===
Server: Apache
Set-Cookie: JSESSIONID=6FAF41A612ABB32B0C670AB07BF0D8A5; HttpOnly
Vary: User-Agent
Vary: Accept-Encoding
com.coradiant.appvis: vid=ad&sid=CONTROLLER_1&tid=da615c36-2a18-4129-bcd7-1cbb139ab52b
Content-Disposition: attachment;filename=""HFC Income Statement.csv""
ExpiresDefault: access plus 2 hours
Yahoo Finance is like this:
Cache-Control: private
Connection: close
Date: Sat, 26 Jul 2014 22:10:00 GMT
Transfer-Encoding: chunked
===>> Content-Type: text/csv <<===
P3P: policyref=""http://info.yahoo.com/w3c/p3p.xml"", CP=""CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV""
Set-Cookie: B=d3svnbl9t89po&b=3&s=4i; expires=Tue, 26-Jul-2016 22:10:00 GMT; path=/; domain=.yahoo.com
Vary: Accept-Encoding
I've sort-of highlighted the Content-Type and Content-Encoding headers (where available).
Basically, the content returned is different for the two calls. Clearly, Excel can interpret the second case where the content type is "text/csv" but the first one is a strange gzipped html page that I guess Excel can't understand.
I can't possibly give you a solution to this issue, but the content of the headers could certainly explain the difference in behaviour you're seeing.

mod_sec trigger on CSR rule _23

I'm using mod_security with the latest core rules.
It triggers on all my pages whenever I use a querystring.. ie.
www.mypage.com/index.php?querystring=1
I get a warning that it exceeds maximum allowed number of arguements, however the base config defines max_numb_args to = 255 which of course it doesn't exceed.
Any ideas why?
Base conf:
SecRuleEngine On
SecAuditEngine RelevantOnly
SecAuditLog /var/log/apache2/modsec_audit.log
SecDebugLog /var/log/apache2/modsec_debug_log
SecDebugLogLevel 3
SecDefaultAction "phase:2,pass,log,status:500"
SecRule REMOTE_ADDR "^127.0.0.1$" nolog,allow
SecRequestBodyAccess On
SecResponseBodyAccess On
SecResponseBodyMimeType (null) text/html text/plain text/xml
SecResponseBodyLimit 2621440
SecServerSignature Apache
SecUploadDir /tmp
SecUploadKeepFiles Off
SecAuditLogParts ABIFHZ
SecArgumentSeparator "&"
SecCookieFormat 0
SecRequestBodyInMemoryLimit 131072
SecDataDir /tmp
SecTmpDir /tmp
SecAuditLogStorageDir /var/log/apache2/audit
SecResponseBodyLimitAction ProcessPartial
SecAction "phase:1,t:none,nolog,pass,setvar:tx.max_num_args=255"
Rule that triggers:
# Maximum number of arguments in request limited
SecRule &TX:MAX_NUM_ARGS "#eq 1" "chain,phase:2,t:none,pass,nolog,auditlog,msg:'Maximum number of arguments in request reached',id:'960335',severity:'4',rev:'2.0.7'"
SecRule &ARGS "#gt %{tx.max_num_args}" "t:none,setvar:'tx.msg=%{rule.msg}',setvar:tx.anomaly_score=+%{tx.notice_anomaly_score},setvar:tx.policy_score=+%{tx.notice_anomaly_score},setvar:tx.%{rule.id}-POLICY/SIZE_LIMIT-%{matched_var_name}=%{matched_var}"
And the log ouput:
--ad5dc005-C--
queryString=2
--ad5dc005-F--
HTTP/1.1 200 OK
X-Powered-By: PHP/5.3
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: SESSION=ak19oq36gpi94rco2qbi6j2k20; path=/
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 1272
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8
--ad5dc005-H--
Message: Operator GT matched 0 at ARGS. [file "/etc/apache2/conf/modsecurity_crs/base_rules/modsecurity_crs_23_request_limits.conf"] [line "30"] [id "960335"] [rev "2.0.7"] [msg "Maximum number of arguments in request reached"] [severity "WARNING"]
Message: Operator GE matched 0 at TX:anomaly_score. [file "/etc/apache2/conf/modsecurity_crs/base_rules/modsecurity_crs_49_inbound_blocking.conf"] [line "18"] [msg "Inbound Anomaly Score Exceeded (Total Score: 5, SQLi=, XSS=): Maximum number of arguments in request reached"]
Message: Warning. Operator GE matched 0 at TX:inbound_anomaly_score. [file "/etc/apache2/conf/modsecurity_crs/base_rules/modsecurity_crs_60_correlation.conf"] [line "35"] [msg "Inbound Anomaly Score Exceeded (Total Inbound Score: 5, SQLi=, XSS=): Maximum number of arguments in request reached"]
Apache-Handler: application/x-httpd-php
Stopwatch: 1279667800315092 76979 (1546* 7522 72931)
Producer: ModSeurity for Apache/2.5.11 (http://www.modsecurity.org/); core ruleset/2.0.7.
Server: Apache
I was using the lib from Ubuntu.. which had the .11 version. I uninstalled it, compiled from source .12 version and now it's alive, kicking and screaming!
Latest CSR rules needs the .12 version. Cheers.

Resources