I had many URLs with square brackets. Then I changed these URLs and removed all square brackets from them, but I still get soft 404 errors, because old URLs are indexed in the web. The URLs are vary, I can redirect each URLs manually, but It will be better if I'll use some universal rule in .htaccess, which removes all square brackets from the URLs.
http://www.example.com/page-[first]
http://www.example.com/page-[second]
etc.. will replaced with:
http://www.example.com/page-first
http://www.example.com/page-second
Can I do it with .htaccess?
Thank you
Try this:
RewriteEngine On
RewriteBase /
RewriteRule ^(.*)\[(.*)$ $1$2 [N,R=301]
RewriteRule ^(.*)\](.*)$ $1$2 [N,R=301]
If your .htacess is in a sub folder, change RewriteBase / with:
RewriteBase /sub-folder-name
Here's my access_log - which shows the redirections:
::1 - - [04/Jun/2016:12:29:23 +0800] "GET /test/hello-[world].html HTTP/1.1" 301 246
::1 - - [04/Jun/2016:12:29:23 +0800] "GET /test/hello-world.html HTTP/1.1" 200 3
::1 - - [04/Jun/2016:12:37:45 +0800] "GET /test/hello-%5bworld%5d.html HTTP/1.1" 301 246
::1 - - [04/Jun/2016:12:37:45 +0800] "GET /test/hello-world.html HTTP/1.1" 304 -
You can do this redirection using a single rule :
RewriteEngine on
RewriteCond %{THE_REQUEST} /page-(?:%5B|\[)(.*?)(?:%5D|\]) [NC]
RewriteRule ^ /%1 [R,L]
Related
My site got hacked recently and has over 3 million pages now when it only has 30 pages (see screenshot).
How do I implement the correct 410 header in .htaccess?
I think the best tactic is to 410 all pages that contain a number OR .htm OR .html as none of the real pages have these in the URL. For example -
https://example.com/cixc-20050gsakuramar/-b00006.htm
https://example.com/sfumato.php?nzlw-21833vetidm4
https://example.com/bzmt-5694ceti.html
https://example.com/pfks-14602sjp/ucqksti.htm
https://example.com/admv-15974mitem/318
Would this code work?
Redirect 410 /*0*
Redirect 410 /*1*
Redirect 410 /*2*
Redirect 410 /*3*
Redirect 410 /*4*
Redirect 410 /*5*
Redirect 410 /*6*
Redirect 410 /*7*
Redirect 410 /*8*
Redirect 410 /*9*
Redirect 410 /*.html*
Redirect 410 /*.htm*
I've also pieced together a rewrite rule which might also work?
RewriteRule ^([0-9]+)$ - [G,L]
I am also thinking of adding Disallow to robots.txt like this -
Disallow: /*0*
Disallow: /*1*
Disallow: /*2*
Disallow: /*3*
Disallow: /*4*
Disallow: /*5*
Disallow: /*6*
Disallow: /*7*
Disallow: /*8*
Disallow: /*9*
Disallow: /*.htm
Disallow: /*.html
The redirect directive of mod_alias doesn't support wild cards. So your rules such as Redirect 410 /*0* would not do what you expect. You could make them into RedirectMatch directives which support regular expressions. I'd combine all the numbers into one rule, and html suffixes into another:
RedirectMatch Gone ".*[0-9].*"
RedirectMatch Gone ".*\.html?$"
From your Google Search Console screenshot, it looks like some of the URLS have query strings in them with a ?. mod_alias doesn't consult the query string at all when matching the URL. If the .html appears in the query string and not in the URL path, RedirectMatch won't be able to match it.
I'd recommend going with mod_rewrite rules which can match the query string. Another reason to prefer .htaccess would be if you have other rewrite rules in your .htaccess. Additional rewrite rules would be less likely to conflict than mod_alias rules.
I've added a condition to skip wp-content URLs because in the comments, you say you actually have some CSS files with numbers in them.
RewriteEngine on
RewriteCond %{REQUEST_URI} !^/?wp-content/
RewriteCond %{REQUEST_URI} !pagespeed
RewriteCond %{REQUEST_URI} !fontawesome
RewriteCond %{REQUEST_URI} !webfont
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule [0-9] - [G,L]
RewriteRule \.html?$ - [G,L]
RewriteCond %{QUERY_STRING} !v(er)?=
RewriteCond %{QUERY_STRING} [0-9]
RewriteRule . - [G,L]
RewriteCond %{QUERY_STRING} \.html?$
RewriteRule . - [G,L]
I wouldn't recommend using a Disallow in robots.txt because Google sometimes indexes disallowed URLs anyway even if it can't crawl them.
I have the an spa running on heroku and I wanted to get it running on forced https. I'm using the PHP stack to have some basic authentication on it, nothing special only one index.php. Thought it should be easy, but I have a strange redirect there:
When I access /appStart for example, I get this in logs:
10.8.149.25 - - [13/Feb/2021:19:23:57 +0000] "GET /appStart HTTP/1.1" 301 248 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36
this is my .htaccess:
RewriteEngine On
#Force SSL
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=302]
# Handle Authorization Header
RewriteCond %{HTTP:Authorization} .
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
# Send Requests To Front Controller...
RewriteCond %{REQUEST_URI} !\.(png|css|js|json|txt|ico)$
RewriteCond %{REQUEST_URI} !_nuxt
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [L]
I switched the https rewrite to 302 in my .htaccess to be sure that the redirect isn't comming from my rules, so ...
Where is this 301 coming from???
If appStart is a physical directory and you request /appStart (no trailing slash) then mod_dir (Apache) will "fix" the URL with a 301-redirect to append the trailing slash. ie. /appStart/.
You should be requesting /appStart/ (with a trailing slash) if appStart is a physical directory.
UPDATE: Just to be sure: DirectorySlash Off will solve my issue, right?
(I'm assuming your .htaccess file is located at /appStart/.htaccess?)
It's actually a bit more complicated than that. Without the trailing slash on the directory then your mod_rewrite directives in .htaccess won't be processed and the front-controller (internal rewrite to index.php) will fail. Other issues like DirectoryIndex (mod_dir) will also fail - although you don't seem to be dependent on that here (you are using mod_rewrite instead).
The net result is that you'll likely get a 403 Forbidden response (unless you have mod_autoindex is enabled - in which case you'll get an auto-generated directory listing, despite the DirectoryIndex document being present! See the security warning for the DirectorySlash directive in the Apache docs.)
In short, Apache needs the trailing slash.
If you disable the auto-appending of the trailing slash with DirectorySlash Off then you'll need to manually append this trailing slash yourself to avoid these issues. And you'll need to do this in the parent/root .htaccess file instead, not /appStart/.htaccess.
For example... move your existing /appStart/.htaccess file to the root /.htaccess file, include the DirectorySlash Off directive and change the last RewriteRule directive to read:
RewriteRule ^appStart appStart/index.php [L]
However, that doesn't necessarily solve the issue completely (depending on how your application is structured), since any relative URL-paths in your application (to static resources etc.) are now relative to the document root, not the /appStart/ subdirectory, as they would have been before. This is a client-side URL issue and can only be resolved by "fixing" the client-side URLs.
I have a site with the following structure:
/
|
folder1
|
index.php
index.html
Default is to open index.php but if user is not logged in I do:
header('Location: '.$domain.'/folder1/index.html');
exit;
This works fine for any folder but for folder1 I have also an .htaccess (locate at the root of the site) to handle some logics on index.php.
The .htaccess is the following:
RewriteEngine on
RewriteBase /
RewriteRule \.(php)$ - [L]
RewriteRule ^folder1/([^/]+)/?$ folder1/index.php?source=$1 [L]
I need help to add a rule so that if I request index.html it gets served without inconvenients.
Otherwise now the access.log returns:
10.211.55.2 - - [14/Jan/2021:20:15:50 +0100] "GET /folder1/index.html HTTP/1.1" 302 589 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:84.0) Gecko/20100101 Firefox/84.0"
and keeps an infinite loop that ends with firefox saying "this site is not redirecting properly".
For any other folder where the .htaccess is not applied access.log shows the correct behaviour:
10.211.55.2 - - [14/Jan/2021:21:30:52 +0100] "GET /folder2/ HTTP/1.1" 302 4230 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:84.0) Gecko/20100101 Firefox/84.0"
and the browser correctly displays the index.html
How do I fix the .htaccess to handle the redirect from index.php to index.html?
RewriteRule \.(php)$ - [L]
You could just add another exception, just as you have done for .php files. For example, after the rule above, but before your rewrite to index.php:
RewriteRule ^folder1/index\.html$ - [L]
Any request for /folder1/index.html goes no further, so is not rewritten to index.php by the directive that follows (so no redirect occurs).
I have a Drupal 8 site and I want to send a certain path to hit a non Drupal 404 (Don't want it to hit Drupal at all so nothing is logged). For example: autodiscover/autodiscover.xml. I tried the following:
I tried the following 3 things in .htaccess but they did not work:
SecRule REQUEST_URI ".*autodiscover/autodiscover.xml" "id:9990001,nolog,status:404,chain"
SecRule SERVER_NAME "mysite\.com" "t:lowercase"
RewriteRule ^autodiscover/autodiscover.xml?$ - [R=404,NC,L]
ErrorDocument 404 autodiscover/autodiscover.xml
This should work:
RewriteCond %{REQUEST_URI} /autodiscover/autodiscover.xml [NC]
RewriteRule ^ - [F,L]
https://support.acquia.com/hc/en-us/articles/360004553253-Avoiding-404-error-messages-in-your-logs
I am using WAMP (apache 2.2.9 and PHP 5.4.17). I have few simple mod rewrite rules that are working on production server (CentOS) but on my windows machine one rule is being ignored. I spent 3 days on finding solution but none worked.
Here is mod rewrite rules
Options -Indexes FollowSymLinks MultiViews
RewriteEngine on
RewriteBase /mysite/
RewriteRule ^reviews/([a-zA-Z0-9_-]+)/([0-9]+)/(.*).html$ page-review.php?t=$1&m=$2&s=$3 [L] #This is being ignored by Apache
RewriteRule ^reviews/?$ reviews.php [L]
RewriteRule ^reviews/([a-zA-Z0-9_-]+)-reviews/?$ reviews.php?r=$1 [L]
RewriteRule ^file/([a-zA-Z0-9_-]+)/s([0-9]+)-([a-zA-Z0-9_-]+)/f([0-9]+)-(.*).html$ file.php?c=$1&s=$2&st=$3&&m=$4&t=$5 [L]
The first rewrite rule is being ignored by Apache. Surprisingly, when I remove all contents from htaccess it still reads my rules. I have turned on mod rewrite log and here is the log detail
log for http://localhost/mysite/reviews/product/11/samsung.html
127.0.0.1 - - [23/Aug/2013:14:37:34 +0100] [localhost/sid#453140][rid#1821000/subreq] (1) [perdir /mydir/] pass through /mydir/reviews.php
127.0.0.1 - - [23/Aug/2013:14:37:34 +0100] [localhost/sid#453140][rid#17cc4f0/initial] (1) [perdir /mydir/] pass through /mydir/reviews.php
127.0.0.1 - - [23/Aug/2013:14:37:34 +0100] [localhost/sid#453140][rid#17d0500/initial] (1) [perdir /mydir/] pass through /mydir/reviews.php
127.0.0.1 - - [23/Aug/2013:14:37:34 +0100] [localhost/sid#453140][rid#1835050/initial] (1) [perdir /mydir/] pass through /mydir/reviews.php
As you can see above, instead of rewriting above request to page-review.php it reads reviews.php file
Here is another request which works fine
log for http://localhost/mysite/file/downloads/software/f5983-docs.html
127.0.0.1 - - [23/Aug/2013:14:39:42 +0100] [localhost/sid#453140][rid#47f4188/initial] (2) [perdir /mydir/] rewrite 'file/downloads/software/f5983-docs.html' -> 'file.php?c=downloads&s=45&st=software&&m=5983&t=docs'
127.0.0.1 - - [23/Aug/2013:14:39:42 +0100] [localhost/sid#453140][rid#47f4188/initial] (2) [perdir /mydir/] trying to replace prefix /mydir/ with /mysite/
127.0.0.1 - - [23/Aug/2013:14:39:42 +0100] [localhost/sid#453140][rid#47f4188/initial] (1) [perdir /mydir/] internal redirect with /mysite/file.php [INTERNAL REDIRECT]
127.0.0.1 - - [23/Aug/2013:14:39:42 +0100] [localhost/sid#453140][rid#47ed3e0/initial/redir#1] (1) [perdir /mydir/] pass through /mydir/file.php
Rules mostly look alright. Try this slightly modified code:
Options -Indexes +FollowSymLinks -MultiViews
RewriteEngine on
RewriteBase /mysite/
RewriteRule ^reviews/(\w+)/([0-9]+)/([^.]+)\.html$ page-review.php?t=$1&m=$2&s=$3 [L,QSA,NC]
RewriteRule ^reviews/?$ reviews.php [L,NC]
RewriteRule ^reviews/(\w+)-reviews/?$ reviews.php?r=$1 [L,QSA,NC]
RewriteRule ^file/(\w+)/s([0-9]+)-(\w+)/f([0-9]+)-([^.]+)\.html$ file.php?c=$1&s=$2&st=$3&&m=$4&t=$5 [L,QSA,NC]
Btw regex in your last rule won't match file/downloads/software/f5983-docs.html