How to remove duplicate urls that are affecting my SEO - .htaccess

Hoping someone can help me. I've tried everything I can think of and have spent almost 2 weeks now trying to solve this issue. I'm using SERanking for a site audit and it is indicating I need to fix the duplicate url issue.
Pages With Issues:
https://www.droneworxphotography.com/
0 (Referring Pages)
https://www.droneworxphotography.com/index.html
10 (Referring Pages)
https://www.droneworxphotography.com
10 (Referring Pages)
My htaccess:
RewriteEngine On
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.droneworxphotography.com/$1 [R,L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [L,R] # <- for test, for prod use [L,R=301]
I've tried adding a 301 redirect, but it is not resolving my issue.
Redirect 301 /droneworxphotography.com/index.html /droneworxphotography.com
I'm hoping someone can help me with fixing this.
Thanks

As mentioned in comments, you need to actually fix your internal links so you are consistently linking to the canonical URL throughout your site, before attempting to implement redirects.
The "reported issues" you are seeing are presumably discovered by crawling your site.
As far as I can tell, my internal links are consistent.
Unfortunately not...
<!--begin footer_details -->
<ul class="footer_details">
<li>
<li> Redacted Photography
(Ignoring the incorrectly nested li elements - this is not valid HTML.)
You have a relative link to index.html in the footer of every page. This should be either, a root-relative URL (a single slash):
Redacted Photopgraphy
Or, an absolute URL (include the scheme + canonical hostname). For example:
Redacted Photopgraphy
Note the trailing slash after the hostname.
These two are the same. (Providing the hostname is correctly canonicalised before reaching your site. It appears to be - you have HTTP to HTTPS and non-www to www redirects in place.)
<!--begin logo -->
<img src="lib/images/cropped-DroneworxLogo_small.png" alt="Logo">
<!--end logo -->
<!--begin nav -->
<ul id="nav">
<li>
<i class="icon-home"></i><br>Home
You have 3 links on every page (the two above and one in the footer - copyright link) that links to scheme + hostname but without the trailing slash.
Strictly speaking, the "correct" URL includes the trailing slash after the hostname.
User-agents (ie. browsers) will correct this and always append the trailing slash, but you should be consistent and link to href="https://www.example.com/" - with a trailing slash. (Or use href="/" as mentioned above.)
Your XML sitemap already correctly links to the canonical absolute URL (with a trailing slash): <loc>https://www.example.com/</loc>
It is this inconsitency that is causing the SEO tool to report on both https://www.example.com/ (slash) and https://www.example.com (no slash). Although this is not really an SEO issue, since user-agents will always append the trailing slash to form a valid URL, as mentioned above. See my answer to the following question on the Webmasters Stack for more information about the trailing slash after the hostname: Is trailing slash automagically added on click of home page URL in browser?
Note that the trailing slash immediately after the hostname (at the start of the URL-path) is not the same thing as the trailing slash at the end of the URL-path.
Redirect to canonical
Once you have corrected the above then you can implement a redirect to correct any indexed URLs (or backlinks from external third parties).
The only redirect you could implement here is from /index.html to / (to remove index.html). You cannot redirect to append a trailing slash after the hostname (this is the same URL).
(Your mod_rewrite directives remove the trailing slash at the end of the URL-path - which is something different entirely. Your site does not have an issue with this, but this redirect causes no harm.)
To remove index.html you could add the following rule at the top of the .htaccess file, before your existing redirects (and after the RewriteEngine directive):
RewriteRule ^index\.html$ https://www.example.com/ [R=301,L]
NB: Test first with a 302 (temporary) redirect to avoid potential caching issues.
This does assume you aren't using a front-controller pattern (and rewriting requests to /index.html). It doesn't look like you are - your site looks like an entirely static HTML site?
A quick check in Google using a site: search does not show that index.html has been indexed. So, this was unlikely causing you an SEO issue at the current time.

Related

Content Security Policy causing CORS errors

weird one but the referer policy is currently creating issues on my website if the domain has a . on the end, for example:
domain.uk - works fine
domain.uk. - has CORS errors
It seems the . on the end os being treated as part of the domain so considered a different origin. Seems to only be a problem in Chrome. Possibly a Chrome bug?
Thought perhaps I could fix this in my .htaccess by setting up a redirect, but the .htaccess cannot do it as it can only match after the domain, and the . is being treated as part of the domain.
Any suggestions?
It seems the . on the end os being treated as part of the domain so considered a different origin. Seems to only be a problem in Chrome. Possibly a Chrome bug?
It is part of the domain. The trailing dot indicates a fully qualified domain name. If you are only seeing different behaviour in Chrome then maybe Chrome is just being more strict - it's not a bug.
Try https://stackoverflow.com./ (for instance) - you'll probably appear logged out (as the cookies are not passed).
Thought perhaps I could fix this in my .htaccess by setting up a redirect, but the .htaccess cannot do it as it can only match after the domain, and the . is being treated as part of the domain.
You can do it in .htaccess. The dot is sent as part of the Host header (since it is part of the domain) - which is available in the HTTP_HOST server variable. Ordinarily, you'd do this as part of your canonical (www vs non-www / HTTP to HTTPS) redirect, but you could do something like the following using mod_rewrite to remove the trailing dot on the requested hostname:
RewriteEngine On
RewriteCond %{HTTP_HOST} (.+)\.$
RewriteRule ^ https://%1%{REQUEST_URI} [R=301,L]
The %1 backreference contains the hostname, less the trailing dot, that is captured in the preceding condition.
UPDATE:
I am using Wordpress so it already has some rewrite rules in the htaccess, ... can you advise on how to add your rewrites
You need to place this redirect before the existing WordPress directives (ie. before the # BEGIN WordPress section), near the top of the file.
You do not need to repeat the RewriteEngine On directive since that already occurs later in the WordPress section. (If there are multiple RewriteEngine directives then the last instance wins and controls the entire file.)

HTACCESS 301 redirect keep sending to the wrong page

I am trying to redirect an old page from a website I have redesigned, to the new one, but it's not working.
Here's my 2 lines of code in the .htaccess file regarding that domain:
Redirect 301 /deaneco http://solutionsgtr.ca/fr/deaneco/accueil.html
RewriteRule ^/deaneco/contact http://solutionsgtr.ca/fr/deaneco/contact.html [R=301,L,QSA]
If go on the solutionsgtr.ca/deaneco/contact URL, it gives me the following page:
http://solutionsgtr.ca/fr/deaneco/accueil.html/contact
The first rule works though (deaneco/ to solutionsgtr.ca/fr/deaneco/accueil.html).
I feel like both lines are being mixed together and are giving me the wrong page, that doesn't exist so I get a 404 error.
There are a couple of issues here:
The Redirect directive (part of mod_alias) is prefix-matching and everything after the match is appended on the end of the target URL. This explains the redirect you are seeing.
The RewriteRule (mod_rewrite) pattern ^/deaneco/contact will never match in a .htaccess context since the URL-path that is matched does not start with a slash. So, this rule is not doing anything currently.
You should avoid mixing redirects from both modules since they execute independently and at different times during the request (mod_rewrite executes first, despite the apparent order of the directives).
Either use mod_alias, ordering the directives most specific first:
Redirect 301 /deaneco/contact http://solutionsgtr.ca/fr/deaneco/contact.html
Redirect 301 /deaneco http://solutionsgtr.ca/fr/deaneco/accueil.html
NB: You will need to clear your browser cache, since the erroneous 301 (permanent) redirect will have been cached by the browser. Test with 302 (temporary) redirects to avoid potential caching issues.
OR, if you are already using mod_rewrite for other redirects/rewrites then consider using mod_rewrite instead (to avoid potential conflicts as mentioned above):
RewriteEngine On
RewriteRule ^deaneco/contact$ http://solutionsgtr.ca/fr/deaneco/contact.html [R=301,L]
RewriteRule ^deaneco$ http://solutionsgtr.ca/fr/deaneco/accueil.html [R=301,L]
The QSA flag is not required, since the query string is passed through to the substitution by default.
The order of the RewriteRule directives are not important in this instance, since they match just that specific URL.
If go on the solutionsgtr.ca/deaneco/contact URL
If you are redirecting to the same host then you don't need to explicitly include the scheme + hostname in the target URL, since this will default.

My htaccess passthrough rule redirects to the url instead

I'm trying to passthrough (not redirect!) an empty old page to its new location using an htaccess RewriteRule.
I essentially want the user to browse to mysite.com/page-old and to see that url in their browser but be delivered the content from mysite.com/page-new. The user should not be aware that the location changed.
RewriteEngine On
RewriteRule ^page-old/?$ /page-new [PT]
The actual result is that they are redirected to page-new instead.
I found the below on apache.org which seems to validate my code some, but this is giving me a 404 error.
Description:
Assume we have recently renamed the page foo.html to bar.html and now want to provide the old URL for backward compatibility. However, we want that users of the old URL even not recognize that the pages was renamed - that is, we don't want the address to change in their browser
https://httpd.apache.org/docs/trunk/rewrite/remapping.html
RewriteRule "^/foo\.html$" "/bar.html" [PT]
RewriteEngine On
RewriteRule ^example/my-stuff/$ /example/home/ [L,R=301]
check this answer as well
How to redirect a specific page using htaccess

Links starting with double slashes cause invalid requests

All links on my website are protocol-less and start with double slashes:
href="//site.com/page.html".
And in the log I see many requests like: 404 - site.com/site.com/page.html
Which means some browsers are interpreting these absolute links as relative. By looking at the user agents I assume those are mostly bots.
Can I fix requests such as site.com/site.com/page.html with .htaccess by directing them to the proper URI? (site.com/site.com/page.html => site.com/page.html)
Try adding this to your document root (of the site that's hosting these protocol relative links):
RedirectMatch 301 ^/site.com/(.*)$ http://site.com/$1
or:
RewriteEngine On
RewriteRule ^site.com/(.*)$ http://site.com/$1 [L,R=301]
If the site that is hosting these links is already site.com, you can remove the http://site.com bit from the targets.

Simple and neat .htaccess redirect help required

This is a strange one...
A while back I managed to write a .htaccess redirect that worked so that the URL was read like: www.website.com/mt?page=index - and what the real URL of this page was www.website.com/PageParser.php?file=index.php
The problem has been that the FTP system of my webhost hides .htaccess files even though they are allowed and do operate - and so I have checked back on local copies I have of my .htaccess files and none of them have the code as to how this works - and I've forgotten how I did it!!
Essentially, I am using wildcards so that anything after mt?page= will actually be showing PageParser.php?file= but without having the PageParser.php showing within the URL (and this is the important bit, because the index.php on my site root is actually sent through PageParser.php first so that anything which shouldn't be there is wiped out before the end user sees it) - so how can .htaccess redirect/rewrite the URL so that any link to /mt?page= show the file located at /PageParser.php?file= without changing the URL the user sees?
RewriteEngine On
RewriteRule ^(.*)mt?page=(.*)$ $1PageParser.php?file=$2
RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} ^page=([^&]+)
RewriteRule ^mt$ /PageParser.php?file=%1.php [NC,L]
This rule will rewrite (internal redirect) request for /mt?page=hello to /PageParser.php?file=hello.php without changing URL in browser.
Your source URL example (www.website.com/mt?page=index) has index while target URL (www.website.com/PageParser.php?file=index.php) has index.php. The above rule will add .php to the page name value, so if you request /mt?page=hello.php it will be rewritten to /PageParser.php?file=hello.php.php.
If there is a typo in your URL example and page value should be passed as is, then remove .php bit from rewrite rule.
The rule will work fine even if some other parameters are present (e.g. /mt?page=hello&name=Pinky) but those extra parameters will not be passed to rewritten URL. If needed -- add QSA flag to rewrite rule.
This rule is to be placed in .htaccess in website root folder. If placed elsewhere some small tweaking may be required.
P.S.
Better write no explanation (I knew it/I did it before .. but now I forgot how I did it) than having these "excuses". While it may be 100% true, it just does not sound that great.

Resources