We are seeing a strang thing where bots are sending odd URLs. They are adding an alexa URL in the url we have. We are looking to remove that part of the URL so it just has everything before the odd URL addition
So we want to go from
www.example.com/search/Linux/page/6/”http:/www.alexa.com/siteinfo/www.example.com“/page/900
to
www.example.com/search/Linux/page/6/
removing the: ”http:/www.alexa.com/siteinfo/www.example.com“/page/900
Due to it having the quotes, we I am unsure what htaccess rule would work to rewrite the URL, but am open to suggestions.
Not sure where the requests are coming from, only see them with our 404 monitor.
If these requests are triggering a 404 (as they should be) then you are essentially already "blocking" such requests - they won't get inadvertently indexed by search engines.
However, if a third party side is mistakenly linking to you with these erroneous links then you might be losing traffic. You can redirect to remove the erroneous portion of the URL.
Due to it having the quotes, we I am unsure what htaccess rule would work to rewrite the URL, but am open to suggestions.
There's nothing particularly special about matching quotes in the URL. However, the quotes used in your question are not the "standard" double-quotes. The opening quote is "U+201D: RIGHT DOUBLE QUOTATION MARK" and closing with "U+201C: LEFT DOUBLE QUOTATION MARK". This is not a problem, we can check for all three.
For example, using mod_rewrite at the top of the .htaccess file to remove the part of the URL from the first quote character onwards:
RewriteEngine On
# Remove everything from the first double quote onwards
RewriteRule ^([^"”“]+)["”“] /$1 [R=301,L]
The $1 backreference contains the part of the URL-path before the first double quote character.
The original query string (if any) is preserved.
Test first with a 302 (temporary) redirect to avoid potential caching issues.
Alternatively, if your URLs are limited to a known subset of characters, eg. a-z, A-Z, 0-9, _ (underscore), - (hyphen), / (slash - path separator) then check for valid chars instead. For example:
# Remove everything from the first "invalid character"
RewriteRule ^([\w-/]+)[^\w-/] /$1 [R=301,L]
Related
I want to change URL format from :
https://example.com/modules/news/article.php?storyid=224039
to :
https://example.com/news/224039
Any one can help to write true .htaccess codes?
thanks
Untested:
RewriteEngine On
RewriteBase /
RewriteRule ^news/([0-9]+) /modules/news/article.php?storyid=$1 [NC,L]
The NC flag is for No Case, if you want case insensitivity. If not, remove this flag. The L is the Last flag, meaning it would be the last rule parsed in the given rewrite instance so further rewrites aren't used. This is a bit counterintuitive in the sense that Apache will re-read all the rules all over again from the beginning anyways after the rewrite to make sure it doesn't have to rewrite again, and is a gotcha for many people regarding infinite rewrite loops... Probably can also omit the L flag altogether, but is more expressive.
The RewriteEngine On can be omitted in Apache configurations that enable this in the httpd.conf file. It is best practice to put it on again before assuming the engine is on. The rewrite base / probably can be omitted, depends on how you write your RewriteRule. Finally the RewriteRule uses a regular expression on the left, the parenthesis stores the match, the brackets define a character list, 0-9 is the valid characters, could also use \d instead, the + means match 1 or more times. The expression on the right is what to replace it with. The leading slash can probably be omitted. Also note that due to the presence of a querystring on the right side, if a querystring was present on the left side, it will be discarded. If you want to merge query strings, use the QSA flag meaning querystring append, and then it will merge querystrings when adding your storyid. Finally the $1 means use the first match that was captured with parenthesis on the left.
Currently I have my htaccess configured so that when I type anything after the domain, it is treated as a get
RewriteRule ^(.*)$ /index.php?id=$1
so
www.example.com/test
will redirect to
www.example.com/index.php?id=test
now what I would like is the page to detect if after the first / there is the # character and do something diffirent
for example,
www.example.com/#test
goes to
www.example.com/index.php?abc=test
whilst still retaining the first rule, can this be done?
And as a bonus, if you know how to use the # symbol instead of the #, please do let me know, I tried putting NE flag in my rule but I had no luck.
I have a page that is normally like this:
http://www.url.com/folder/content.php?name=this-is-the-page-title&item_id=129
As you can see, the title of the page is included in the URL, separated by dashes.
So, I'd like to convert this to the following with mod_rewrite:
http://www.url.com/this-is-the-page-title-129.html
For this, I use a mod_rewrite rule like:
RewriteRule ^([^-]*)-([^-]*)\.html$ /folder/content.php?name=$1&item_id=$2 [L]
Unfortunately, using that rule, I get a 404 error. I think the problem is because the title is separated by dashes (-) and the separator itself is a dash as well, so it likely can't tell the variables from each other or something like that.
When I change the rule from dash (-) to slash (/) like this it works fine:
RewriteRule ^([^-]*)/([^-]*)\.html$ /folder/content.php?name=$1&item_id=$2 [L]
But then the URL becomes:
http://www.url.com/this-is-the-page-title/129.html
...which I don't want as I'd have to rewrite the entire structure of the page.
Is there any way to get it working as
http://www.url.com/this-is-the-page-title-129.html
even with the page title being separated by dashes?
Thank you :)
The problem is, you specify ([^-]*) which means that the matching sub-pattern should not contain any dashes... but your page title might contain them.
So, instead, lets let the first part by anything but slash:
RewriteRule ^([^/]*)-([0-9]+)\.html$ /folder/content.php?name=$1&item_id=$2 [L]
This way, everything before the last dash will go into the first sub-pattern, and digits after the last dash - into the second one.
I could not find the exactly the same question on SO. I hope someone can help me out with this.
Say, user entered http://www.example.com/abc#!def, and what I want to do is remove all symbols in the ${REQUEST_URI} portion, then do a redirect to http://www.example.com/abcdef. The problem is that these symbols can occur anywhere in the string, e.g. #ab!cdeg and abcdef#! should both redirect to abcdef.
If I'm correct, there is no string replace function for mod_rewrite, so this seems impossible to do, but am I correct?
You can capture specific parts of an URL with regular expressions in a RewriteCond
or RewriteRule, but not remove arbitrary characters.
Furthermore, you will never see the hash character '#' and everything after it in a URL, because it is used by the client to navigate to a specific part of the document.
Update using the next flag:
RewriteRule (.*)[^a-zA-Z](.*) $1$2 [N]
This rule removes all characters, which are not ^ alphabetic.
im into SEO and friendly URL's and im trying to create a rule in my htacess file and i need help...
Basically, i have a list of alphabet letters. If the users selects one letter, the db will show all the lyrics that starts with that letter...
so if i click C, there will be a list of lyrics and the the first is 'Car and blues'
So, from this
htpp://www.website.com/lyrics.php?letter=C
i want to do this:
http://www.website.com/lyrics/C/
so far, this is what i have
RewriteRule ^lyrics/$ /lyrics.php?letter=$1 [L]
the rule should be smart enough to pick everything that comes after 'lyrics', in between the 2 slashes, and not what comes after...
Thanks
the rule should be smart enough to pick everything that comes after 'lyrics', in between the 2 slashes, and not what comes after...
Your rule as it stands is looking for exactly lyrics/ with no possibility of anything before or after it (as defined by the ^ and $).
Assuming you're using letters A-Z in only capitals, you can use this:
RewriteRule ^lyrics/([A-Z])/?$ /lyrics.php?letter=$1 [L]
This will look for a single capital letter after the lyrics/ and send that value to the rewrite URL and also match both cases of having a trailing / or not.
the rule should be smart enough to pick everything that comes after
'lyrics', in between the 2 slashes, and not what comes after...
I'd suggest you look into using regular expressions to format your url. See this link