How to batch test htaccess redirects - .htaccess

I have a large set of URLs (over 1000) which need to be tested against a complex .htaccess file of redirect rules.
I have used the online htaccess tester to test individual redirects and fine-tune my set of rules, however, it is too laborious to test each url.
I also found a number of answers here but which does not really address batch testing.

In order to solve the batch testing I have written a bash script file that leverages the api exposed by the online htaccess tester. The script makes use of curl and jq to test a file of URLs and outputs a CSV formatted line for each URL with the line number and rule that was matched in the htaccess file along with the redirected URL. This makes it very useful to build a spreadsheet of urls and their redirects and pinpoint which ones are failing.

Related

redirect numerous dynamic urls to home page via .htaccess

I am trying to clean up a previously hacked WordPress site, and domain name reputation, the site has new hosting and is now on a different CMS system, but there are hundreds of spam links in Google I need to get rid of, they look like example.com/votes.php?10054nzwzm75042pw205039
Domain name, then votes.php?**** etc.. Numbers letters all sorts.
So how do I redirect ANYTHING that starts with the domain name then /votes.php?***
Any help greatly appreciated
Unless you have multiple domains, you don't need to explicitly check the domain name.
To send a "410 Gone" for anything that contains /votes.php in the URL-path (and any query string), you can do something like the following at the top of your root .htaccess file using mod_rewrite:
RewriteEngine On
# Serve a 410 Gone for any requests to "/votes.php"
RewriteRule ^votes\.php$ - [G]
A 410 is preferable to a "redirect" if you want to get these URLs removed from the search engines as quickly as possible.
To expedite the process of URL removal from Google then use Google's Removal Tool as well.
If you redirect these pages to the homepage then it will likely be seen as a soft-404 by Google and these URLs are likely to remain in the search results for a lot longer.

Redirecting using .htaccess

I've got an odd redirect request, and I'm not sure if it is possible.
On the server we have a series of PHP files with a name similar to /title-of-the-page-####.php, where #### is an unique integer ID of the page.
I want to be able to redirect users from /#### to the full title of the page (as listed above). Is this possible from within an .htaccess file, and if so, how? (I would like to avoid listing all of the pages from within the .htaccess file)

single-page application with clean URLs without .htaccess file?

My question pertains specifically to the two pages below, but is also more generally relating to methods for using clean URLs without an .htaccess file.
http://www.decitectural.com/
and
http://www.decitectural.com/about/
The pages above are hosted on Amazon's S3, which does not allow for the use of htaccess files. As a result, I have found no easy way to create a clean url rewrite scheme that sends all requests to an index file which, in turn, interprets the URL using javascript and loads up the correct page (with AJAX, or, as is the case with decitectural, with simple div visibility toggling).
In order to circumvent this problem, I usually edit the amazon S3 bucket properties and set both the index page and the error page to the index.html file. In this case, the index.html file is served even when an invalid path (such as /about/) is requested. This has, for the most part, been a functioning solution... That is, until I realized that I was also getting a 404 with the index.html page which would stop Google from indexing it.
This has led me to seek out an alternative solution to this problem. Currently, as a temporary fix, I am actually creating the /about/ directory on the server with a duplicate of the index.html file in it. This works, but obviously is not a real solution to the problem.
I would appreciate any advice on how to set up a clean URL routing scheme on S3 or in any instance where an .htaccess file can't be used.
Here's a few solutions: Pretty URLs without mod_rewrite, without .htaccess
Also, I guess you can run a script to create the files dynamically from an array or database so it generates all your URLs:
/index.html
/about/index.html
/contact/index.html
...
And hook the script on every edit, in a cron or run manually. Not the best in terms of performance but hey, it should work.
I think you are going about it the wrong way. S3 gives you complete control of the page structure of your site. If you want your link to be "/about", just upload a file called "about", and you're done. (Set the headers so that the browser knows it's HTML.)
Yes, it will break if someone links to "/about/" or "/about.html". But pretty much any site will break if you mess with their links in odd ways. You will have to be vigilant when linking to your own site, because you won't have any rewrite rules to clean up for you. But you should have automation doing that.

Dynamically creating URLs for other websites

I'd like to know how websites have created URLs with other domains like these on trafficestimate.com.
I'm guessing it's some .htaccess stuff to redirect domain names to a dynamic page?
Thanks
Your URL has an GET Request. So when someone calls the page http://google.com/search with the parameters hl=en, safe=off etc., the page can process those parameters. So for instance safe=off means that you want to get back any search result. The q=site:... is your search string. In this case Google will look it up in its database and give you the results. So when you call this URL there is probably no .htaccess processing done. However you can process the URL and GET request with .htacces and i.e. redirect the user to another page.
Maybe you'll describe a bit further what exactly you trying to do/want to know. This makes explaining easier.
EDIT: After reading Gumbo's comment I looked at the Google result page. So maybe your question means the trafficestimate-URLs. They look like http://trafficestimate.com/example.org. This is really a good case for .htaccess. So using .htaccess they take the URL and redirect it to http://www.trafficestimate.com/websites/?domain=example.org. Here you have again a GET request and an application builds the page.
Some URL rewriting is probably involved. Otherwise they would have to create an existing file for every possible request.
Using Apache’s mod_rewrite in a .htaccess file is one option. But since the server identifies itself with “Microsoft-IIS/7.5”, they are probably rather using ISAPI_Rewrite, a mod_rewrite derivative for Microsoft’s IIS.

.htaccess and sessions for security?

In my application users have their own "websites" which can be reached if they are signed in.
However, since these websites are just directories containing html and other documents everyone in the world can reach them if they know the address. I can't have that :) A user should be able to decide whether or not thw world might see their files or not.
Can I use .htaccess to activate a PHP-script every time a request is made to that directory?
I.e. if reqested-site is "/websites/{identifier}", run is-user-allowed-to-view.php?website={identifier}
The identifier is a numeric value which refers to both a physical folder and a post in the database... and the script would then return true or false.
Or is there perhaps another way of solving the same issue?
Cheers!
You can use mod_rewrite to rewrite requests with such a URL internally to your script:
RewriteEngine on
RewriteRule ^website/([0-9]+)$ is-user-allowed-to-view.php?website=$1
But this rule is only for the URL path /website/12345 and nothing else.
Or have every page as a PHP page and just put at the top a single line to redirect if the session / cookie is incorrect or not set. Obviously wouldn't work for non-PHP content such as images.
What you need is a proper front-end (written in whatever language). You need to have your web-server (Apache in your case it seems) pass the requests to the said front-end.
You cannot do what you are asking for with just .htaccess files.

Resources