Hiding some GET parameters from URL - .htaccess

I am redirecting page using PHP header location:
Current URL in browser
https://mywebsite/open/firstpage/php/start.php?&cnt=us&language=en&url=http://secureURL.com
but want to show
https://mywebsite/open/firstpage/php/start.php?&cnt=us&language=en
I am using GET method on the other side to collect all variables. I have to hide &url in querystring but want to receive it on other side $_GET['url']
How can I share my &url without showing in URL querystring? HOw can I write htaccess?

Redirect for all URLs
RewriteEngine on
RewriteRule ^(.*) $1?%{QUERY_STRING}&url=http://secureURL.com [L]
Redirect for only /open/firstpage/php/start.php
RewriteEngine on
RewriteCond %{REQUEST_URI} ^/open/firstpage/php/start.php
RewriteRule ^(.*) $1?%{QUERY_STRING}&url=http://secureURL.com [L]
I think this is what you want.

You can't do that. If a parameter is not present in the query string, it won't be available anywhere, it's just not there. There's no such thing as "hiding" the query string.
You could, however, use some form of session mechanism to pass a piece of data from one page to another. You could put it in the $_SESSION, or use cookies. There may also be a way to achieve this through really arcane mod_rewrite magic, but you shouldn't go down that route. Really.
More importantly: what are you trying to achieve? Why are you trying to do this?
Aesthetic reasons? Then be aware that modern browsers tend to hide the query string part of the URI from the user.
Security reasons? Then you're doing it horribly wrong, you shouldn't use something so easily manipulated by the client.
User tracking? There are established solutions out there for that (say, Google Analytics).

Related

I need help setting up .htaccess

I need to set up .htaccess. If the user clicks the "site.com/profile" link, I need to check if there is a "token" field in the user's cookies. If this field is not empty, I must let the user through, otherwise I redirect them to the "site.com/login" link.
You are probably looking for something like that:
RewriteEngine on
RewriteCond %{HTTP_COOKIE} !(^|;)?token=[^;]+(;|$)
RewriteRule ^ /login [R=302,L]
Obviously the rewriting module needs to be loaded into the http server. It generally is a good idea to implement such rule in the http server's host configuration. If you do not have access to that you can also use a distributed configuration file (often called ".htaccess"), but the consideration of such files needs to be enabled first and their usage comes with a number of disadvantages.

How to redirect a URL in htacess with query string to the same URL but different query string

Because of removing several languages from a multi-language website, I need to 301 redirect pages that end with ?lang=da, ?lang=de, and ?lang=nl to the same URLs but ending in ?lang=en. It sounds like a common scenario, but I haven't found the right code yet to accomplish this or tried some code because the purpose of that code was either to redirect to one new URL or to replace the URL but not the query string, while I need to replace the query string and keep the URL.
This probably is what you are looking for:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^lang=(da|de|nl)$
RewriteRule ^ %{REQUEST_URI}?lang=en [QSD,R=301,END]
For this to work the rewriting module obviously needs to be loaded into your http server and it has to be activated for your http host too.
It is a good idea to start out with a 302 redirection and to only change that into a 301 once you are convinced everything is set up as required. That way you prevent ugly caching effects...
You can implement such rules in the http server's host configuration. Or, if you do not have access to that, you can use a distributed configuration file (".htaccess"), but that has performance disadvantages and you also need to enable the interpretation of such files first. Please see the documentation of the tool you are using to learn how to do that.

Too many Rewrite Rules in .htaccess

I had to redesign a site last week. The problem is that last urls weren't seo friendly so, in order to avoid Google penalizing my site because too many 404 errors, I have to create a lot of Rewrite Rules because all the content had awful URL's ( and that content had a good position on SERP's).
For example:
RewriteRule ^documents/documents_for_subject/22-ecuaciones-exponenciales-y-logaritmicas http://%{HTTP_HOST}/1o-bachillerato/matematicas-cc.ss/aritmetica-y-algebra/ecuaciones-exponenciales-y-logaritmicas [R=301,L]
Is this a problem on my performance? Is there another solution to my situation?
Thanks
They are in the same domain.
Then an internal redirect is much better. A header redirect sends the new URL to the browser and causes it to make a new request; an internal one is handled, as the name says, internally.
This should work:
RewriteRule ^documents/documents_for_subject/22-ecuaciones-exponenciales-y-logaritmicas /1o-bachillerato/matematicas-cc.ss/aritmetica-y-algebra/ecuaciones-exponenciales-y-logaritmicas [L]
Any performance issues are going to be negligible with this - except maybe if you have many thousands or tens of thousands of individual rules, those may slow down Apache. In that case, if you have access to the central server configuration, put the rules there instead of a .htaccess file, because instructions in the server config get stored in memory and are faster.
A. Yes using 301 is the right way to notify search bots about changed URLs and eventually your old URL's will be removed from search results.
B. You don't need to use %{HTTP_HOST} in your rewrite rule just use it like this:
RewriteRule ^documents/documents_for_subject/22-ecuaciones-exponenciales-y-logaritmicas http://%{HTTP_HOST}/1o-bachillerato/matematicas-cc.ss/aritmetica-y-algebra/ecuaciones-exponenciales-y-logaritmicas [R=301,L]
C. If you have lots of RewriteRules like above I recommend using RewriteMap or else use some scripting support (like PHP) to redirect from old to new URL with 301.

Does mod_rewrite only translate external requests to internal files and not vice versa?

I think this is a very stupid question so I apologise, as i think i may completely misunderstand mod_rewrite.
Say you have a URL
www.domain.com/products/item.php?id=1234
mod_rewrite can rewrite that to a friendly URL
wwww.domain.com/products/item/1234
(for example)
So, if i type in wwww.domain.com/products/item/1234 this will be rewritten to www.domain.com/products/item.php?id=1234 and that page is served. Fine.
But what if you type in www.domain.com/products/item.php?id=1234 - that page will be served but not rewritten to the friendly URL.
So my question is can you rewrite internal file names automatically? For example, all URLs on my site are currently in the www.domain.com/products/item.php?id=1234 format. When a user clicks this link can this be rewritten to the friendly URL? Or should you always hard code in the friendly URL?
Im sorry if that made little sense! Im getting confused because i want to rewrite non-friendly to friendly URL, but then serve the non friendly URL - so wont that cause an infinite redirect loop?
Mod_rewrite can't really internally rewrite URLs across domains, though it could proxy them (using P option in RewriteRule). Assuming that the domain is the same, you could do something to redirect the client's browser to a friendly URL if the old one is used while internally rewriting the friendly URL back to the old one, but they have to be both the same domain. You do this by looking at the actual request (%{THE_REQUEST}) variable instead of looking at the URI, which changes as they get rewritten internally.
This redirects the browser when the old URLs are used to the friendly URLs
RewriteCond %{THE_REQUEST} ^([A-Z]{3,9})\ /products/item\.php
RewriteCond %{QUERY_STRING} id=([0-9]+)
RewriteRule ^products/item\.php$ /products/item/%1? [R=301,L]
This rewrites internally when a friendly URL is used:
RewriteCond %{THE_REQUEST} ^([A-Z]{3,9})\ /products/item/[0-9]+
RewriteRule ^products/item/([0-9]+) /products/item.php?id=$1 [QSA,L]
Mod_rewrite does not automatically rewrite the "none friendly" urls to the friendly urls. You have to add some rules yourself to do this.
Also Mod_rewrite does not modify the links inside your html, css, or whatever you use. You need to change those yourself.
If a user uses the friendly url, it will never know that it is rewritten. Mod_rewrite is tranparent from the user's point of view. You can add a [R] flag to your rules which makes apache send a redirect to the client. This way the client does see the rewritten url.
Redirecting the unfriendly to the fiendly url, should only be done to help search engines (and to prevent link-rot, but that's more rare). This can be done without a redirect loop, unlike Sergey says.
Try looking around here on SO to find a script that does the redirect from the unfriendly to the fiendly url. Let me know if you can't find it, and I'll help.

Block user access to internals of a site using HTTP_REFERER

I have control over the HttpServer but not over the ApplicationServer or the Java Applications sitting there but I need to block direct access to certain pages on those applications. Precisely, I don't want users automating access to forms issuing direct GET/POST HTTP requests to the appropriate servlet.
So, I decided to block users based on the value of HTTP_REFERER. After all, if the user is navigating inside the site, it will have an appropriate HTTP_REFERER. Well, that was what I thought.
I implemented a rewrite rule in the .htaccess file that says:
RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteRule (servlet1|servlet2)/.+\?.+ - [F]
I expected to forbid access to users that didn't navigate the site but issue direct GET requests to the "servlet1" or "servlet2" servlets using querystrings. But my expectations ended abruptly because the regular expression (servlet1|servlet2)/.+\?.+ didn't worked at all.
I was really disappointed when I changed that expression to (servlet1|servlet2)/.+ and it worked so well that my users were blocked no matter if they navigated the site or not.
So, my question is: How do I can accomplish this thing of not allowing "robots" with direct access to certain pages if I have no access/privileges/time to modify the application?
I'm not sure if I can solve this in one go, but we can go back and forth as necessary.
First, I want to repeat what I think you are saying and make sure I'm clear. You want to disallow requests to servlet1 and servlet2 is the request doesn't have the proper referer and it does have a query string? I'm not sure I understand (servlet1|servlet2)/.+\?.+ because it looks like you are requiring a file under servlet1 and 2. I think maybe you are combining PATH_INFO (before the "?") with a GET query string (after the "?"). It appears that the PATH_INFO part will work but the GET query test will not. I made a quick test on my server using script1.cgi and script2.cgi and the following rules worked to accomplish what you are asking for. They are obviously edited a little to match my environment:
RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]
The above caught all wrong-referer requests to script1.cgi and script2.cgi that tried to submit data using a query string. However, you can also submit data using a path_info and by posting data. I used this form to protect against any of the three methods being used with incorrect referer:
RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]
Based on the example you were trying to get working, I think this is what you want:
RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule (servlet1|servlet2)\b - [F]
Hopefully this at least gets you closer to your goal. Please let us know how it works, I'm interested in your problem.
(BTW, I agree that referer blocking is poor security, but I also understand that relaity forces imperfect and partial solutions sometimes, which you seem to already acknowledge.)
I don't have a solution, but I'm betting that relying on the referrer will never work because user-agents are free to not send it at all or spoof it to something that will let them in.
You can't tell apart users and malicious scripts by their http request. But you can analyze which users are requesting too many pages in too short a time, and block their ip-addresses.
Using a referrer is very unreliable as a method of verification. As other people have mentioned, it is easily spoofed. Your best solution is to modify the application (if you can)
You could use a CAPTCHA, or set some sort of cookie or session cookie that keeps track of what page the user last visited (a session would be harder to spoof) and keep track of page view history, and only allow users who have browsed the pages required to get to the page you want to block.
This obviously requires you to have access to the application in question, however it is the most foolproof way (not completely, but "good enough" in my opinion.)
Javascript is another helpful tool to prevent (or at least delay) screen scraping. Most automated scraping tools don't have a Javascript interpreter, so you can do things like setting hidden fields, etc.
Edit: Something along the lines of this Phil Haack article.
I'm guessing you're trying to prevent screen scraping?
In my honest opinion it's a tough one to solve and trying to fix by checking the value of HTTP_REFERER is just a sticking plaster. Anyone going to the bother of automating submissions is going to be savvy enough to send the correct referer from their 'automaton'.
You could try rate limiting but without actually modifying the app to force some kind of is-this-a-human validation (a CAPTCHA) at some point then you're going to find this hard to prevent.
If you're trying to prevent search engine bots from accessing certain pages, make sure you're using a properly formatted robots.txt file.
Using HTTP_REFERER is unreliable because it is easily faked.
Another option is to check the user agent string for known bots (this may require code modification).
To make the things a little more clear:
Yes, I know that using HTTP_REFERER is completely unreliable and somewhat childish but I'm pretty sure that the people that learned (from me maybe?) to make automations with Excel VBA will not know how to subvert a HTTP_REFERER within the time span to have the final solution.
I don't have access/privilege to modify the application code. Politics. Do you believe that? So, I must to wait until the rights holder make the changes I requested.
From previous experiences, I know that the requested changes will take two month to get in Production. No, tossing them Agile Methodologies Books in their heads didn't improve anything.
This is an intranet app. So I don't have a lot of youngsters trying to undermine my prestige. But I'm young enough as to try to undermine the prestige of "a very fancy global consultancy services that comes from India" but where, curiously, there are not a single indian working there.
So far, the best answer comes from "Michel de Mare": block users based on their IPs. Well, that I did yesterday. Today I wanted to make something more generic because I have a lot of kangaroo users (jumping from an Ip address to another) because they use VPN or DHCP.
You might be able to use an anti-CSRF token to achieve what you're after.
This article explains it in more detail: Cross-Site Request Forgeries

Resources