Google crawler not seeing prerendered Angular generated page - .htaccess

Trying to get my Angular app generated pages to be crawled perfectly by Google without using HashBangs #!. So I generate pushstate URLs with:
$locationProvider.html5Mode(true);
$locationProvider.hashPrefix('!');
added to app's config, and
<base href="/hockey-att/">
<meta name="fragment" content="!">
to the html header.
And I have this in the .htaccess:
RewriteEngine on
RewriteBase /1/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ #/$1 [L]
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "prerendertoken"
</IfModule>
<IfModule mod_rewrite.c>
<IfModule mod_proxy_http.c>
RewriteEngine On RewriteCond %{HTTP_USER_AGENT} baiduspider|googlebot|googlebot-mobile|bingbot|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_=(\%2F|/)_(._)
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://%{HTTP_HOST}/$2 [P,L]
</IfModule>
</IfModule>
I have the app in the subfolder 1 and the RequestHeader set X-Prerender-Token section in .htaccess set to prerender.io token.
The app with the new urls is working perfectly, and using prerender.io to render the pages. The pages are prerendered and are accessible without problem via http://service.prerender.io/{{http://myurl.com/page}}
My issue is, that the google crawler not accessing the prerendered pages. Tried it via Search Console's Fetch as Google functionality with and without ?_escaped_fragment_= added to the fetched url. But it still get the raw page with angular directives in it's source.
I'm think I missing something in the .htaccess, but I don't know what.
Using ui-router for angular states.

It looks like you are redirecting all requests to a # URL:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ #/$1 [L]
Is that correct? If so, that code is preventing the Prerender config from being run on each request.
If you are using html5 push state, can you remove that section since you don't need a redirect to a # URL.

Related

.htaccess issue for 301 redirect url

I want to use 301 redirects for my website URLs. The URL redirects to the wrong path and the website give me a 404 error. When I use this redirect.
Redirect 301 /couponstore/evitamins-120 https://website.com/codes/evitamins-coupon-codes
it gives me a result like this
https://website.com/codes/evitamins-coupon-codes?lcp=couponstore/evitamins-120
the lcp=couponstore/evitamins-120 is an extra part and it is injected to the URL due to the .htaccess rewrite. When I remove the last part of the rewrite rule ^((.*?)(\-(\d+))?)([.]\w+)?$ index.php?lcp=$1&lcp_id=$4&ext=$5 [QSA,L] then my website stop working and I am getting errors on all pages.
Below is the .htaccess code for my website:
<IfModule mod_rewrite.c>
## Turn on rewrite engine
RewriteEngine on
## Coupons CMS v7
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^((.*?)(\-(\d+))?)([.]\w+)?$ index.php?lcp=$1&lcp_id=$4&ext=$5 [QSA,L]
</IfModule>
what should I do to use redirects for my website URLs, I lost all the SEO for my website due to this one issue which I am not capable to resolve, need suggestion/help to redirect old URLs to the new.
thank you.
Based on your shown samples, could you please try following. Please make sure to clear your browser cache before testing your URLs.
<IfModule mod_rewrite.c>
## Turn on rewrite engine
RewriteEngine ON
## Coupons CMS v7
RewriteRule ^(couponstore/evitamins-120)/?$ codes/evitamins-coupon-codes [R=301,NC,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^((.*?)(\-(\d+))?)([.]\w+)?$ index.php?lcp=$1&lcp_id=$4&ext=$5 [QSA,L]
</IfModule>

timthumb.php and htaccess rules

My PHP site (not Wordpress) is using timthumb.php and I get img related errors because it can't access /team/assets/images/upload-img.jpg?h=50&w=50, but it works without query parameters:
/team/assets/images/upload-img.jpg.
Which rewrite rules should I add in order to support those query parameters for image resizing? At the moment I only have a default htaccess:
<IfModule mod_rewrite.c>
RewriteEngine On
#RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) index.php/$1
</IfModule>
Here is a full (bad request) url: https://example.com/team/includes/timthumb.php?src=https://example.com/team/assets/images/upload-img.jpg?h=50&w=50
It’s not a .htaccess problem. You’re using $ instead of &. The URL should be:
https://example.com/team/includes/timthumb.php?src=https://example.com/team/assets/images/upload-img.jpg&h=50&w=50

I'm using htaccess to redirect mobile users from desktop to website but i keep getting the $_GET variables inside

This question may look like its been asked already and i have seen them all, i've been looking and attempting a lot of answers AND answers that weren't approved. I have successfully made it so that if the user goes to desktop version it will go to the mobile site and even if they go to places such as.
www.domain.com/aboutus
it would take them to
m.domain.com/?page=aboutus
So here is where the problem lies, not that it doesn't work, but I've been trying to remove the $_GET variable from the redirection the "?page=" part.
my .htaccess looks something like...
<IfModule mod_rewrite.c>
RewriteEngine on
# if it is not a file or folder, rewrite to index.php?page=<value>
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?page=$1 [L,QSA]
</IfModule>
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} "android|blackberry|googlebot-mobile|iemobile|ipad|iphone|ipod|opera mobile|palmos|webos" [NC]
RewriteRule ^index.php(.*)$ http://m.domain.com/ [QSA,L]
</IfModule>
I've tried adding the request filename with the redirection for mobile but to no avail. There are websites who have achieved it like 9gag by using the in built Google Chrome inspect element, google changes the user agent to devices that are selected (Mobile Phones) and I've used that to test how the redirection goes. so if i write 9gag.com/hot - it would take me to m.9gag.com/hot not m.9gag.com/?page=hot or wherever.
Thanks in advance, I've really been bothered by this.
You need to check the mobile redirect first, and you need to include the request URI.
<IfModule mod_rewrite.c>
RewriteEngine on
# Redirect mobile requests to the mobile site
# (but don't redirect when accessing certain directories)
RewriteCond %{REQUEST_URI} !^/images [NC]
RewriteCond %{HTTP_USER_AGENT} "android|blackberry|googlebot-mobile|iemobile|ipad|iphone|ipod|opera mobile|palmos|webos" [NC]
RewriteRule ^(.*)$ http://m.domain.com/$1 [R=302,L]
# If it is not a file or folder, rewrite to index.php?page=<value>
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?page=$1 [L,QSA]
</IfModule>
You can make the redirect permanent by changing 302 to 301.

Correctly redirect bot requests to static version of a website

I'm having problems getting my website to index correctly by Google.
My folder structure looks like this:
root
- cms
- www
example.com points to the root where a .htaccess routes all requests to /www:
RewriteEngine on
RewriteRule ^(.*)$ /www/$1 [L]
Front end
The Angular front end inside /www gets data from /cms via REST api. So far so good.
What I want to achieve is that bots don't crawl inside my ajaxified /www page but instead inside /cms where I print out static contents corresponding to the URL structure in /www.
URL for static content:
/www/test1 -> Outputs nice content via REST
/cms/test1 -> Outputs text-only content for the crawler
Bot redirect
I'm redirecting the bots coming to example.com/www to /cms like this:
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteRule ^(.*)$ http://www.example.com/cms/$1 [R=301,L]
Site map
I also registered a sitemap with Google with the following contents:
http://www.example/test1
http://www.example/test2
and so on...
The problem
This all works fine BUT: Google is also crawling the static contents inside /cms without being redirected there by me. I only want this static subdomain to be fed through the redirect but not when Google's bot is searching for it itself. Kind of "disallowing" the bot to crawl here - but in the other hand I NEED it to crawl it. A catch 22 in my opinion.
Edit: complete .htaccess file
RewriteEngine On
# Sitemap
RewriteRule ^sitemap(-+([a-zA-Z0-9_-]+))?\.xml(\.gz)?$ /cms/sitemap$1.xml$2 [L]
RewriteRule ^sitemap(-+([a-zA-Z0-9_-]+))?\.html(\.gz)?$ /cms/sitemap$1.xml$2 [L]
# Redirect bots to static pages
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteRule ^(.*)$ http://www.example.com/cms/$1 [R=301,L]
# Angular HTML5 mode: Don't rewrite files or directories
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !index
# Angular HTML5 mode: Rewrite everything else to index.html to allow html5 state links
RewriteRule (.*) /www/index.html [L]
Edit 2
I have added this tag to the www page
<meta name="fragment" content="!">
to let the crawler know there's AJAX being used on the page. And I'm using the rewrite suggest by #Croises but in reaction to Google's _escaped_fragment_ re-request. Let's wait a few days...
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteCond %{QUERY_STRING} _escaped_fragment_
RewriteCond %{REQUEST_URI} !^/cms/
RewriteRule ^(.*)$ cms/$1 [L]
You can't redirect to static page, and ask them to index or reference the final page without crawling the "real" content.
You can rewrite your link:
# Rewrite bots to static pages
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteCond %{REQUEST_URI} !^/cms/
RewriteRule ^(.*)$ cms/$1 [L]
Just without R=301. Like that you show the page without redirection.
But beware of cloaking (Google and Cloaking).

Symfony htaccess hiding app.php

The following .htaccess was able to allow my symfony project domain to be served without app.php in the URL, as intended. The only issue is that it is breaking all other (non-symfony related) urls and causing an internal server error.
Right now, after deleting the .htaccess file from the server, all my domains work but the project tied to symfony must be accessed using app.php in URL ?
Is it possible to modify the below .htaccess to rewrite the symfony url to not require app.php in the URL while still allowing for all other URLs, not tied to symfony, to be accessed successfully?
Not sure this is needed, but assume the domain tied to symfony is www.apples.com.
Thanks in advance!
<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine On
# Explicitly disable rewriting for front controllers
RewriteRule ^app.php - [L]
RewriteCond %{REQUEST_FILENAME} !-f
# Change below before deploying to production
RewriteRule ^(.*)$ /app.php [QSA,L]
</IfModule>
Edit
I also tried the following:
<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine On
# Explicitly disable rewriting for front controllers
RewriteRule ^app.php - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Change below before deploying to production
RewriteRule ^(.*)$ app.php [QSA,L]
DirectoryIndex index.php
</IfModule>
Right after
RewriteCond %{REQUEST_FILENAME} !-f
add
RewriteCond %{REQUEST_FILENAME} !-d
and set DirectoryIndex to index.php
A bit late to this game but I just found an Apache directive that saves so much time and makes your htaccess cleaner. The only caveat is you must have Apache 2.2.16+. To have all urls except valid files (images, etc) use app.php as a front controller use the FallbackResource directive.
<Directory "/web/symfony-project">
FallbackResource /app.php
</Directory>

Resources