.htaccess Header command produces error 500 when using escaped URLs - .htaccess

I have lots of web pages which seperately exist as PDF files. I was asked to tell Google via .htaccess what the original HTML version URL of the PDF files is.
All web pages are available via an encoded URL using PHP's urlencode() function. The URLs contain company names.
Working example, company name "Very good company":
<Files company123.pdf>
Header append Link '<https://www.example.com/company/123/Very+good+company>; rel="canonical"'
</Files>
Once the company name contains a character that needs to be encoded (e.g. German umlauts), the web server delivers an error 500 for the whole directory:
Non-working example, company name "Very bäd cömpäny":
<Files company456.pdf>
Header append Link '<https://www.example.com/company/456/Very+b%C3%A4d+c%C3%B6mp%C3%A4ny>; rel="canonical"'
</Files>
What do I have to change to solve this problem? Is it wrong that the file contains encoded URLs? Do I have to define unencoded URLs like https://www.example.com/company/123/Very good company and https://www.example.com/company/123/Very bäd cömpäny instead?

https://httpd.apache.org/docs/2.4/mod/mod_headers.html#header
value may be a character string, a string containing mod_headers specific format specifiers (and character literals), or an ap_expr expression prefixed with expr=
The percent character is part of this format specifier syntax, so you need to escape it:
The following format specifiers are supported in value:
Format Description
%% The percent sign
… …
You will need to double all the % characters here.

Related

Hugo: Escape special Characters in taxonomy (tag) paths

I have post tags like C# and F#. Similar cases also show up in titles.
The problem is that Hugo does not appear to support # in a url. Such pages 404 and fail my Netlify build.
With content pages, I can set an explicit url to get around the special character.
The same can't be done for the auto-generated taxonomy pages. How can I get hugo to escape or strip special characters (like #) from urls?
Note: removePathAccents doesn't apply for non-accent special chars like # or %
You can set-up specific URL for almost anything in Hugo. To set-up a different URL for those tags:
Create a file: ./content/tags/C#/_index.md
Set the following in it's frontmatter (YAML assumed):
url: "/tags/c-sharp"

.htaccess rewrite rule to append the date to the filename, using Content-Disposition header

I am trying to download different file types like PDF, XLS, .jpg, .png, etc. directly from Joomla CMS via web browser. Documents and files like Word, Excel, Powerpoint, PDFs, .jpg, .png and text are stored in Joomla DOCman and when a user wants a file they click a link in DOCman menu in the website and the file is getting downloaded. I would like to get the date appended to the file name, when users download files like PDF, XLS, .jpg, .png, etc. from the website menu created. It loos like this is possible at server level using a htaccess rewrite rule, as the filename the browsers uses to save a file is included in the Content-Disposition header:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
Is it possible to rewrite that header to append the date to the downloaded filename? I tried the below code in htaccess file for PDF file type, but it doesn't work as expected.
RewriteEngine On
RewriteRule [^/]+\.pdf$ - [E=FILENAME:$0]
<FilesMatch "\.(?i:pdf)$">
Header set Content-Type application/octet-stream
Header set Content-Disposition "attachment; filename=%{FILENAME-.date('Y-m-d').}e"
</FilesMatch>
Any help to get the correct htaccess rewrite rule to accomplish the above, would be higly appreciated.
Yes it is possible. This was tested and works.
RewriteEngine On
RewriteRule ([^/]+)\.pdf$ - [E=FILENAME:$1]
<FilesMatch "\.(?i:pdf)$">
Header set Content-Type application/octet-stream
Header onsuccess set Content-Disposition "expr=attachment; filename=\"%{ENV:FILENAME}-%{TIME_YEAR}-%{TIME_MON}-%{TIME_DAY}\.pdf\"" env=FILENAME
</FilesMatch>
The Apache documentation for the Header directive shows that the %{VARNAME}e format specifier should be used for placing environment variables in the Header value. However, since the value is provided via an ap_expr expression in my code sample, the ap_expr function env (or maybe reqenv) should be used instead. The Header directive documentation says that within ap_expr expressions:
Function calls use the %{funcname:arg} syntax rather than funcname(arg).

URL Rewrite keep %20 in Query String

I am trying to make a rewrite rule to move all pdf's on my site to point to a specific page and then use a Query String as the pdf's current file path to do a look up in a dictionary to see if that url is in my dictionary if it is redirect them to the correct page. The catch is my dictionary of urls has %20 and when I pull the query string it turns the %20 in a space. Thanks for any and all help.
Can you rewrite it to keep the %20 in the query string?
Example URL: /example/example/Big%20Small%20Something%20Pad.pdf
My Rewrite:
RewriteRule ([^/]*)\.pdf$ /redirectPDF.aspx?pdf=$1.pdf [NE,QSA]
Current Query String Output: Big Small Something Pad.pdf
What I want it to look like is Big%20Small%20Someting%20Pad.pdf
Try removing the NE flag. According to ISAPI docs:
http://www.helicontech.com/isapi_rewrite/doc/RewriteRule.htm:
noescape|NE
Don't escape output. By default ISAPI_Rewrite will encode all non-ANSI >characters as %xx hex codes in output.
So it looks like you simply want to omit the NE so that it'll encode the output like it does by default.
RewriteRule ([^/]*)\.pdf$ /redirectPDF.aspx?pdf=$1.pdf [QSA]

is possible to add more than one '?' in a url?

i need to rewrite url
my Actual URL
http://www.domain.com/page.php?catName/ArticleName....?/&ca=7&prod=44&artId=446
i need to rewrite like this
http://www.domain.com/catID-catName/proID-prodName/artID-ArticleName....?/page.html
Yes it is possible. By the way, your modified URL only has one '?'.
From the [RFC][1] specifying the syntax of URIs and URLs, the query is the part of the URL that follows the http://www.example.com/path or http://www.example.com (the path is optional) component. Note that the "?" character must be the first character of the query section of the URL.
The crucial sentence in the section 3.4 of the RFC is
The characters slash ("/") and question mark ("?") may represent data within the query component.
Here is the pertinent section of the RFC governing URI syntax.
3.4 Query
The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.
query = *( pchar / "/" / "?" )
The characters slash ("/") and question mark ("?") may represent data
within the query component. Beware that some older, erroneous
implementations may not handle such data correctly when it is used as
the base URI for relative references (Section 5.1), apparently
because they fail to distinguish query data from path data when
looking for hierarchical separators. However, as query components
are often used to carry identifying information in the form of
"key=value" pairs and one frequently used value is a reference to
another URI, it is sometimes better for usability to avoid percent-
encoding those characters.
[1]: http://tools.ietf.org/html/rfc3986#section-3

& Ampersand in URL

I am trying to figure out how to use the ampersand symbol in an url.
Having seen it here: http://www.indeed.co.uk/B&Q-jobs I wish to do something similar.
Not exactly sure what the server is going to call when the url is accessed.
Is there a way to grab a request like this with .htaccess and rewrite to a specific file?
Thanks for you help!
Ampersands are commonly used in a query string. Query strings are one or more variables at the end of the URL that the page uses to render content, track information, etc. Query strings typically look something like this:
http://www.website.com/index.php?variable=1&variable=2
Notice how the first special character in the URL after the file extension is a ?. This designates the start of the query string.
In your example, there is no ?, so no query string is started. According to RFC 1738, ampersands are not valid URL characters except for their designated purposes (to link variables in a query string together), so the link you provided is technically invalid.
The way around that invalidity, and what is likely happening, is a rewrite. A rewrite informs the server to show a specific file based on a pattern or match. For example, an .htaccess rewrite rule that may work with your example could be:
RewriteEngine on
RewriteRule ^/?B&Q-(.*)$ /scripts/b-q.php?variable=$1 [NC,L]
This rule would find any URL's starting with http://www.indeed.co.uk/B&Q- and show the content of http://www.indeed.co.uk/scripts/b-q.php?variable=jobs instead.
For more information about Apache rewrite rules, check out their official documentation.
Lastly, I would recommend against using ampersands in URLs, even when doing rewrites, unless they are part of the query string. The purpose of an ampersand in a URL is to string variables together in a query string. Using it out of that purpose is not correct and may cause confusion in the future.
A URI like /B&Q-jobs gets sent to the server encoded like this: /B%26Q-jobs. However, when it gets sent through the rewrite engine, the URI has already been decoded so you want to actually match against the & character:
Rewrite ^/?B&Q-jobs$ /a/specific/file.html [L]
This makes it so when someone requests /B&Q-jobs, they actually get served the content at /a/specific/file.html.

Resources