How do I translate a .htaccess file to Firebase hosting config? - .htaccess

I've built an SPA in Angular 2, and I'm hosting it on Firebase Hosting.
I have built som extra static html pages specifically for crawl bots (since they do not read updated dynamic html, only the initial index.html) and now I need to rewrite the URL for HTTP requests from bots to these static pages.
I know how to do this in a .htaccess file, but I can't figure out how to translate the rewrite conditions in my firebase.json file.
This is my .htaccess:
RewriteEngine On
# Remove trailing /
RewriteRule ^(.*)/$ /$1 [L,R=301]
# Rewrite spiders to static html
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|msnbot|yahoo|Baidu|aolbuild|facebookexternalhit|iaskspider|DuckDuckBot|Applebot|Almaden|iarchive|archive.org_bot) [NC]
RewriteCond %{DOCUMENT_ROOT}/static%{REQUEST_URI}.html -f
RewriteRule ^(.*)$ /static/$1.html [L]
# Rewrite spiders to static index.html
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|msnbot|yahoo|Baidu|aolbuild|facebookexternalhit|iaskspider|DuckDuckBot|Applebot|Almaden|iarchive|archive.org_bot) [NC]
RewriteCond %{REQUEST_URI} "^/$"
RewriteRule ^ /static/index.html [L]
# If an existing asset or directory is requested, serve it
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]
# If the requested resource doesn't exist, use the Angular app entry point
RewriteRule ^ /index.html
I've read Firebase's docs on Configuring Rewrites, but can't figure out how to target a specific User Agent.
Any ideas?

Firebase Hosting doesn't support configuring rewrites based on the user-agent header. It can support rewrites based on the path, and rewrites based on the language of the user/browser.
The only option I know of to rewrite based on other headers, is to connect Firebase Hosting to Cloud Functions or Cloud Run and do the rewrite in code. But this is a significantly different exercise than configuring rewrites in the firebase.json file, so I recommend reading up on it before choosing this path.

Related

.htaccess rewriterule does not redirect

My website is located in a subfolder (gng2) of my public_html folder. I used the second part (##2) of the below .htaccess file to rewrite the url to add the gng2 subfolder to the URI. It works fine, the app loads when I enter my url to the browser.
Now I added the first part (##1)to redirect any requests where the url contains the subfolder as well. I want www.staging.gonativeguide.com/gng2/en to be redirected to www.staging.gonativeguide.com/en. However this does not work: the first url does not get redirected to the second. Checking the below code on htaccessTester, it says it is correct and it should redirect.
My websited is hosted by a shared hosting service and I think the web server is nginx. Any idea why the redirect does not work?
RewriteEngine On
## 1. redirect request when it contains the gng2 subfolder.
RewriteCond %{HTTP_HOST} ^staging.gonativeguide.com$ [NC,OR]
RewriteCond %{HTTP_HOST} ^www.staging.gonativeguide.com$
RewriteCond %{REQUEST_URI} gng2/
RewriteRule gng2/(.*) www.staging.gonativeguide.com/$1 [R=301,L]
## 2. rewriting url to add the gng2 subfolder containing the app.
RewriteCond %{HTTP_HOST} ^staging.gonativeguide.com$ [NC,OR]
RewriteCond %{HTTP_HOST} ^www.staging.gonativeguide.com$
RewriteCond %{REQUEST_URI} !gng2/
RewriteRule (.*) /gng2/$1 [L]
There are a number of issues with your current attempt. So I sketched an alternatate version which simplifies the rules and enhances robustness:
RewriteEngine On
## 1. redirect request when it contains the gng2 subfolder.
RewriteCond %{HTTP_HOST} ^(www\.)?staging\.gonativeguide\.com$
RewriteRule ^/?gng2/(.*)$ https://www.staging.gonativeguide.com/$1 [R=301]
## 2. redirect request that do not contain the "www" prefix in the host name.
RewriteCond %{HTTP_HOST} ^staging\.gonativeguide\.com$
RewriteRule ^ https://www.staging.gonativeguide.com%{REQUEST_URI} [R=301]
## 3. rewriting url to add the gng2 subfolder containing the app.
RewriteCond %{HTTP_HOST} ^(www\.)?staging\.gonativeguide\.com$
RewriteCond %{REQUEST_URI} !^/gng2/
RewriteRule ^ /gng2%{REQUEST_URI} [END]
You need to make absolutely sure that you are not looking at cached results when testing. So always test using a fresh anonymous browser window and use "deep reloads" (CTRL-F5 typically) instead of just reloading.

How to rewrite url in htaccess to hide subfolder?

My website is located in a subfolder (gng2) of my public_html folder. The below code from the public_html/.htaccess file works fine in the sense that typing my domain loads the website properly from the subdirectory:
## redirect to subfolder containing the app.
RewriteCond %{HTTP_HOST} ^example.com$ [NC,OR]
RewriteCond %{HTTP_HOST} ^www.example.com$
RewriteCond %{REQUEST_URI} !gng2/
RewriteRule (.*) /gng2/$1 [L]
My problem is that www.example.com/gng2/whateverpage also works. I want to rewrite/reroute these urls to www.example.com/whateverpage so that the "gng2" subfolder does not show up for example in Google Analytics results.
How should I modify the above code to achieve this?
Thanks,
W.
You can use this redirect rule in gng2/.htaccess to remove /gng2/ from all URLs:
RewriteEngine On
RewriteCond %{THE_REQUEST} \s/+gng2/(\S*) [NC]
RewriteRule ^ /%1 [L,R=301,NE]
# other rules appear below this line
Please keep your htaccess file inside gng2 folder and try following rules in your htaccess rule file. Please make sure these rules are at top of your file(for applying http/HTTPS to URLs, in case you have more rules), also please do clear browser cache before testing your URLs.
RewriteEngine ON
## redirect to subfolder containing the app.
RewriteCond %{HTTP_HOST} ^(www\.)?example\.com$ [NC]
##Place your RewriteRule here.....

Is there a way to properly redirect or rewrite vistors in my drupal 8 website to a subfolder?

I have the following code in my .htaccess file:
RewriteEngine on
RewriteCond %{HTTP_HOST} mysitedomain\.com [NC]
RewriteCond %{REQUEST_URI} ^/$
RewriteRule ^(.*)$ /web/$1 [L]
I'm currently using CPanel and CPanel does not allow you to set a new Document Root. So have my drupal files in my CPanel "public_html" directory with the composer.json files etc..and web directory that has all the Drupal related files. I am trying to get the site vistors to rewrite or redirect to www.mysitedomain.com/web for subsequent pages.. I tried the code below but does not seem to work.. Am i missing something?
To be specific.... I need the site to 1. load www.mysitedomain.com/web when www.mysitedomain.com is requested. 2. And ensure /web is is front of every subsequent request page request within the site (ie. www.mysitedomain.com/web/products should load instead of www.mysitedomain.com/products)
I am not sure what your question is, but I think you need something like this:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L]
What it does: If the URL does not resolve to a file or directory in the present directory, put index.php in front of it.
I'd assume your index.php would then include the Drupal framework.

Correctly redirect bot requests to static version of a website

I'm having problems getting my website to index correctly by Google.
My folder structure looks like this:
root
- cms
- www
example.com points to the root where a .htaccess routes all requests to /www:
RewriteEngine on
RewriteRule ^(.*)$ /www/$1 [L]
Front end
The Angular front end inside /www gets data from /cms via REST api. So far so good.
What I want to achieve is that bots don't crawl inside my ajaxified /www page but instead inside /cms where I print out static contents corresponding to the URL structure in /www.
URL for static content:
/www/test1 -> Outputs nice content via REST
/cms/test1 -> Outputs text-only content for the crawler
Bot redirect
I'm redirecting the bots coming to example.com/www to /cms like this:
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteRule ^(.*)$ http://www.example.com/cms/$1 [R=301,L]
Site map
I also registered a sitemap with Google with the following contents:
http://www.example/test1
http://www.example/test2
and so on...
The problem
This all works fine BUT: Google is also crawling the static contents inside /cms without being redirected there by me. I only want this static subdomain to be fed through the redirect but not when Google's bot is searching for it itself. Kind of "disallowing" the bot to crawl here - but in the other hand I NEED it to crawl it. A catch 22 in my opinion.
Edit: complete .htaccess file
RewriteEngine On
# Sitemap
RewriteRule ^sitemap(-+([a-zA-Z0-9_-]+))?\.xml(\.gz)?$ /cms/sitemap$1.xml$2 [L]
RewriteRule ^sitemap(-+([a-zA-Z0-9_-]+))?\.html(\.gz)?$ /cms/sitemap$1.xml$2 [L]
# Redirect bots to static pages
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteRule ^(.*)$ http://www.example.com/cms/$1 [R=301,L]
# Angular HTML5 mode: Don't rewrite files or directories
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !index
# Angular HTML5 mode: Rewrite everything else to index.html to allow html5 state links
RewriteRule (.*) /www/index.html [L]
Edit 2
I have added this tag to the www page
<meta name="fragment" content="!">
to let the crawler know there's AJAX being used on the page. And I'm using the rewrite suggest by #Croises but in reaction to Google's _escaped_fragment_ re-request. Let's wait a few days...
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteCond %{QUERY_STRING} _escaped_fragment_
RewriteCond %{REQUEST_URI} !^/cms/
RewriteRule ^(.*)$ cms/$1 [L]
You can't redirect to static page, and ask them to index or reference the final page without crawling the "real" content.
You can rewrite your link:
# Rewrite bots to static pages
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteCond %{REQUEST_URI} !^/cms/
RewriteRule ^(.*)$ cms/$1 [L]
Just without R=301. Like that you show the page without redirection.
But beware of cloaking (Google and Cloaking).

How to write a htaccess rule specific for a given subdomain? - Avoiding indexing some files

I have the following on my .htaccess file:
Options +FollowSymlinks
#+FollowSymLinks must be enabled for any rules to work, this is a security
#requirement of the rewrite engine. Normally it's enabled in the root and we
#shouldn't have to add it, but it doesn't hurt to do so.
RewriteEngine on
#Apache scans all incoming URL requests, checks for matches in our #.htaccess file
#and rewrites those matching URLs to whatever we specify.
#allow blank referrers.
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?site.com [NC]
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?site.dev [NC]
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?dev.site.com [NC]
RewriteRule \.(jpg|jpeg|png|gif)$ - [NC,F,L]
# if a directory or a file exists, use it directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# otherwise forward it to index.php
RewriteRule . index.php
site.com is the production site.
site.dev is a localhost dev environment.
dev.site.com is a subdomain where we test live.
I'm aware that this will avoid the site to be indexed:
Header set X-Robots-Tag "noindex, nofollow"
cf. http://yoast.com/prevent-site-being-indexed/
My question is however, fairly simple perhaps:
Is there a way to apply this line ONLY on dev.site.com, so that it doesn't get indexed ?
Is there a way to apply this line ONLY on dev.site.com, so that it doesn't get indexed ?
Yes, you need to put the Header line in the vhost config for dev.site.com. There's no way you can make a host check tied to a Header set directive from within an htaccess file.
The other possibility is if you want to block bots via useragent, you can remove the Header set and add some rules:
# request is for http://dev.site.com
RewriteCond %{HTTP_HOST} ^dev.site.com$ [NC]
# user-agent is a search engine bot
RewriteCond %{HTTP_USER_AGENT} (Googlebot|yahoo|msnbot) [NC]
# return forbidden
RewriteRule ^ - [L,F]
Note that the list of user agents isn't complete. You can try to go through the massive list of User-Agents and look for all of the index robots, or at least the more popular ones.

Resources