.htacces rule for specific filename pdf - .htaccess

I'm trying to assign a canonical to specific Pdf (based on their filename) and at the same time I need to assign a generic canonical to all others pdf.
I'm actually using these rules:
<Files The-life-of-RINGO.pdf>
Header add Link '<https://website.com/01/canonical/>; rel="canonical"'
</Files>
<Files Product-new.pdf>
Header add Link '<https://website.com/02/canonical/>; rel="canonical"'
</Files>
<Files Presentation.pdf>
Header add Link '<https://website.com/03/canonical/>; rel="canonical"'
</Files>
<Files The-rulebook.pdf>
Header add Link '<https://website.com/04/canonical/>; rel="canonical"'
</Files>
<Files Adventure-trial.pdf>
Header add Link '<https://website.com/05/canonical/>; rel="canonical"'
</Files>
# PDF Canonical
RewriteRule ([^/]+)\.pdf$ - [E=FILENAME:$1]
<FilesMatch "\.pdf$">
Header add Link '<https://www.website.com/uploads/%{FILENAME}e>; rel="canonical"'
</FilesMatch>
Problem is that now htaccess adds to canonical to that specific pdf by applying to different rules.
For example to The-life-of-RINGO.pdf it adds:
https://website.com/01/canonical/
https://www.website.com/uploads/The-life-of-RINGO.pdf
Is there a way to add a conditional logic to ignore the generic pdf RewriteRule for to the first 5 files?
Thank you.

Do it the other way around - set the canonical for all PDFs first, and then use set instead of add for the individual ones.
https://httpd.apache.org/docs/2.4/mod/mod_headers.html#header:
add: The response header is added to the existing set of headers, even if this header already exists. This can result in two (or more) headers having the same name. This can lead to unforeseen consequences, and in general set, append or merge should be used instead.
set: The response header is set, replacing any previous header with this name. The value may be a format string.

Related

.htaccess rewrite rule to append the date to the filename, using Content-Disposition header

I am trying to download different file types like PDF, XLS, .jpg, .png, etc. directly from Joomla CMS via web browser. Documents and files like Word, Excel, Powerpoint, PDFs, .jpg, .png and text are stored in Joomla DOCman and when a user wants a file they click a link in DOCman menu in the website and the file is getting downloaded. I would like to get the date appended to the file name, when users download files like PDF, XLS, .jpg, .png, etc. from the website menu created. It loos like this is possible at server level using a htaccess rewrite rule, as the filename the browsers uses to save a file is included in the Content-Disposition header:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
Is it possible to rewrite that header to append the date to the downloaded filename? I tried the below code in htaccess file for PDF file type, but it doesn't work as expected.
RewriteEngine On
RewriteRule [^/]+\.pdf$ - [E=FILENAME:$0]
<FilesMatch "\.(?i:pdf)$">
Header set Content-Type application/octet-stream
Header set Content-Disposition "attachment; filename=%{FILENAME-.date('Y-m-d').}e"
</FilesMatch>
Any help to get the correct htaccess rewrite rule to accomplish the above, would be higly appreciated.
Yes it is possible. This was tested and works.
RewriteEngine On
RewriteRule ([^/]+)\.pdf$ - [E=FILENAME:$1]
<FilesMatch "\.(?i:pdf)$">
Header set Content-Type application/octet-stream
Header onsuccess set Content-Disposition "expr=attachment; filename=\"%{ENV:FILENAME}-%{TIME_YEAR}-%{TIME_MON}-%{TIME_DAY}\.pdf\"" env=FILENAME
</FilesMatch>
The Apache documentation for the Header directive shows that the %{VARNAME}e format specifier should be used for placing environment variables in the Header value. However, since the value is provided via an ap_expr expression in my code sample, the ap_expr function env (or maybe reqenv) should be used instead. The Header directive documentation says that within ap_expr expressions:
Function calls use the %{funcname:arg} syntax rather than funcname(arg).

Why `mod_headers` cannot match based on the content-type?

I can't get my mind wrapped around the comments and way of coding, to set a header only for .html in for example the .htaccess file in html5 boilerplate.
The clue for a big codeblock lays in the fact that 'mod_headers can't match on the content type' (as # commented). So I wander 1: why is there a 'Header unset' in a <FilesMatch>, that just has been announced to be impossible?
<IfModule mod_headers.c>
Header set Content-Security-Policy "script-src 'self'; object-src 'self'"
# `mod_headers` cannot match based on the content-type, however,
# the `Content-Security-Policy` response header should be send
# only for HTML documents and not for the other resources.
<FilesMatch "\.(appcache|atom|bbaw|bmp|crx|css|cur|eot|f4[abpv]|flv|geojson|gif|htc|ico|jpe?g|js|json(ld)?|m4[av]|manifest|map|mp4|oex|og[agv]|opus|otf|pdf|png|rdf|rss|safariextz|svgz?|swf|topojson|tt[cf]|txt|vcard|vcf|vtt|webapp|web[mp]|woff2?|xloc|xml|xpi)$">
Header unset Content-Security-Policy
</FilesMatch>
</IfModule>
I looked everywhere, but only land on pages that have the same, almost ritual used codeblock, without further explanation. So question 2: why is a simple declaration like this not possible?:
<IfModule mod_headers.c>
# Content-Security-Policy for .html files
<FilesMatch "\.(html)$">
Header set Content-Security-Policy "script-src 'self'; object-src 'self'"
</FilesMatch>
# OR like this
<Files ~ "\.(html)$">
Header set Cache-Control "public, must-revalidate"
</Files>
</IfModule>
It is possible to do that as per your second example. But .html are not the only files that could be sent as documents. You could also use .php or .htm or any other number of files. Some of these (like .php) may execute and then ultimately return HTML but the server doesn't know that as all it knows at this stage is the file extension.
CSP should be set on all documents but, to save header bandwidth, does not need to be set on assets used by those documents (e.g. images, style sheets, javascript... etc). Other than wasted bandwidth there is no really harm in setting it on other documents.
Ideally you would set it based on the mime-type returned (which in the PHP example above would be HTML). So anytime a HTML document is returned then set it, else don't. However as this is not possible you're left with two choices:
Set it on everything by default but then explicitly unset it for known media types. This pretty much guarantees it will be set on the document but also risks it being set on a few other file types (e.g. if you don't explicitly unset it for that type) - which, as I say isn't really that bad.
Explicitly state your HTML document types (like your second example). The risk here is you miss a file type (e.g. .php) either now or when someone else starts using php on your site in future.
The first option is the safer option. Particularly for html5boiler plate where they have no idea what technology will be used on the site it's used on. A site might use .php, .cgi, .asp or any number of technologies to generate the HTML document. They could even proxy the request to a back end server so they would be no file extension.
Make sense?

Does a .txt file get indexed? How to prevent this?

I've got a .txt file that I'm using to store chat info. what I'm trying to figure out is how to prevent this page from getting indexed as I'm creating a more friendly version using tthat info. So I have chat.txt where it is recorded then I have pretty-chat-history.php in which I echo that page within my actual page. Is there a way to prevent chat.txt from being picked up?
Add to the htaccess file:
<FilesMatch "(chat.txt)$">
Order allow,deny
deny from all
</FilesMatch>

RegEx to find images with specific word in the file name in htaccess with filesmatch

i am trying to find a right RegEx. I want to protect all jpg images with a specific word in the file name with FilesMatch. The Word schould be "Preview", i need this and i need the same to find all images without the "Preview" in the file name.
<FilesMatch "!=preview\jpg">
Order Deny,Allow
Deny from all
Allow from development.url.com
Allow from 85.13.139.234
</FilesMatch>
<FilesMatch "==preview\jpg">
Order Deny,Allow
Deny from all
Allow from development.url.com
Allow from 85.13.139.234
</FilesMatch>
Like this, one is != "Preview" and one is == "Preview" ;) And a third one to select all jpg images would be also great :)
Thanks :)
You can use a Files directive to do the regular expression matches:
<Files ~ "preview">...</Files>

File security using htaccess blocking regular pdf file

I read this artcile on file upload security, but now it seems that a valid pdf I uploaded is being given access forbidden after implenting this htaccess on top of the other security methods mentioned:
deny from all
<Files ~ "^\w+\.(gif|jpe?g|png|pdf|doc|docx|txt|rtf|ppt|pptx|xls|mp4|mov|mp3|mpg|mpeg)$">
order deny,allow
allow from all
</Files>
The file name looks like this:
Company-apv-A4-Solarpanels_ABC-RH.pdf
Which should be fine because the htaccess is meant to prevent the doubled extension attack if I understand correctly. Hope someone can help!
I just came across this while researching a solution for something else. But, to make an easier solution, since you basically wanted to prevent all double extensions, you should use this:
Order Allow,Deny
<FilesMatch "^[^.]+\.(gif|jpe?g|png|pdf|doc|docx|txt|rtf|ppt|pptx|xls|mp4|mov|mp3|mpg|mpeg)$">
Allow from all
</FilesMatch>
More to the point and simpler. Using FilesMatch (as FilesMatch utilizes REGEX better and more than Files does) it uses the 'Order Allow, Deny' directive which means, match allow or deny, if not matching either, then deny. So this denies all except what's allowed.
[^.] means any character 'not' a literal period. So that covers pretty much everything that you were wanting to achieve. Just remember that these rules do no allow for upper case file extensions. Some people use older apps that create upper case file extensions, so you may want to include those as well.
I'm not sure how well the '/i' case insensitivity works with Files or FilesMatch so you may want to do character classes like this:
([Jj][Pp][Ee]?[Gg]|[Pp][Nn][Gg]|[Gg][Ii][Ff]|[Pp][Dd][Ff])
and so on.
Why not:
SetEnvIf Request_URI "(^|/)[-\w]+\.(gif|jpe?g|png|pdf|doc|docx|txt|rtf|ppt|pptx|xls|mp4|mov|mp3|mpg|mpeg)$" allowed
<Files *>
Order deny,allow
Deny from all
Allow from env=allowed
</Files>
Also note that I dropped the mandatory leading ^ as you surely want to allow access to these extensions in subdirs and [-\w]+ as - is not in \w.
I would just start my regexp \.(gif... as you really only need to check the extension for what you want. Up to you.

Resources