My site have links contains comma. eg. http://example.com/productName,product2123123.html
I set sitemap of this links and google webmaser tools report information that url is not found.
I see google ignore all after comma in url and try index http://example.com/productName that is error url and site generate 404.
Google have bug ? or i must change routing of my site ? or change comma to "%2C", but this could remove my actual offer from google ?
I'm not sure if this could help you but maybe this could help you understand more of what your problem is. Try reading the following links:
Using commas in URL's can break the URL sometimes?
Are you using commas in your URLs? Here’s what you need to know.
Google SEO News and Discussion Forum
Related
Our old website has a search URL structure like this:
example.com/Country/United States/Region/California/Area/Southern California/City/San Diego/Suburb/South Park/Type/House/Bedrooms/4/Bathrooms/3/
This is currently rewritten to point to the physical page:
/search/index.aspx
The parameters in the URL can be mixed up in different orders, and the URL can include one or more parameters.
We want to 301 redirect these old URLs to a new structure that is ordered in a logical way and more concise:
example.com/united-states/california/southern-california/san-diego/south-park/?type=house&bedrooms=4&bathrooms=3
example.com/united-states/california/?type=house&bedrooms=4&bathrooms=3
Is there a way with URL rewriting to interrogate the old URL, work out what parameters are existing and then write out the new URL structure?
Even if we can limit it to just the Country, Region, Area, City and Suburb, that may be good enough to at least return some results even if it's not perfect.
Also, spaces should be turned into hyphens and all text made lowercase.
I already have the RewriteRule to turn the new URL structure into a URL to point to a physical page. It's just transforming the old URL in to the new URL I need help with. I've googled endlessly and it's just beyond me!
Can anyone help? Thanks.
Since you already have the old search page with rewriting rules set up for it and which is capable of parsing all parameters you need, the easiest and most appropriate solution I see here is to issue a redirect you require from this old search page's code. Just put the code that composes new URL with all parameters needed and redirects from this page - this should be a lot easier than trying to parse all these parameters in .htaccess and combine them into the new format.
I have following kind of URL,
http://example.com/controller/method/VBGFrt543ERik4523/text1-text2
I want this to be shown in browser as,
http://example.com/text1-text2
I searched a lot but couldnt find any specific solution on this requirement.
Can anyone help me out please?
Use URL routes with a bit of regex. This will reroute any url with letter and numbers followed by a hyphen and then letters and numbers to controller1/method/abc123/$1:
$route['([a-zA-Z0-9]+)-([a-zA-Z0-9]+)'] = "controller1/method/abc123/$1";
(nb. you only can have one controller in your URL - it goes controller/method/variable1/variable2...)
You set routes in application/config/routes.php
http://ellislab.com/codeigniter/user-guide/general/routing.html
Good luck!
Does anyone know if there is an .htaccess rule to redirect from somthing like this...
XXX/dodge/604-jeep-hard-start-cold
to...
XXX/dodge/jeep-hard-start-cold
Basically wondering if there is some sort a variable that will remove the number from URL's.
All URL's have different numbers, so it cant just be the 604 getting a redirect.
It is a Joomla site, and numbers refer to article #.
Thanks for any help !
Sounds like you need to configure Joomla to not add the article number to the url of the article. You can turn on Search Enging Friendly urls in the Global Configuration.
I'm new to Nutch and not really sure what is going on here. I run nutch and it crawl my website, but it seems to ignore URLs that contain query strings. I've commented out the filter in the crawl-urlfilter.txt page so it look like this now:
# skip urls with these characters
#-[]
#skip urls with slash delimited segment that repeats 3+ times
#-.*(/[^/]+)/[^/]+\1/[^/]+\1/
So, i think i've effectively removed any filter so I'm telling nutch to accept all urls it finds on my website.
Does anyone have any suggestions? Or is this a bug in nutch 1.2? Should i upgrade to 1.3 and will this fix this issue i am having? OR am i doing something wrong?
See my previous question here Adding URL parameter to Nutch/Solr index and search results
The first 'Edit' should answer your question.
# skip URLs containing certain characters as probable queries, etc.
#-[?*!#=]
You have to comment it or modify it as :
# skip URLs containing certain characters as probable queries, etc.
-[*!#]
By default, crawlers shouldn't crawl links with query strings to avoid spams and fake search engines.
I have an htaccess rule that goes:
RewriteRule ^Commercial-Units/For-Sale/(([a-zA-Z]+)*/([0-9]+)*/([a-zA-Z]+)*/([0-9]+)*/([a-zA-Z]+)*/([0-9]+)*)*$ pages/index.php?f=quicksearch&cust_wants=1&want_type=2&at=$3&start=$5&limit=$7 [R=302,L]
This is specifically designed for when a page requires paging records.
I have been trying to find solutions over everywhere in Google and Stackoverflow.com..
The problem is that everytime someone clicks on, say page 2, the address bar keeps on adding my query strings like so:
http://mysite.com/Commerial-Units/For-Sale/page/2/at/10/limit/7/page/2/at/10/limit/7
notice that the url above containes multiple key-value combinations duplicated and this goes on and on everytime someone clicks on the next page...
Hope someone can point me to the right solution to this...
Thank you very much!
Thats not a problem with your rewrite but with your site code the links are adding /page/2/at/10/limit/7 to the current url you need to remove the previous params using something like ../../../../../../page/2/at/10/limit/7
And if that is for SEO please use parameters for pagination, only use SEO friendly urls for categories and items, no need to index every single pagination option, as that will be duplicated content.