Mod_Rewrite Reverse Url Structure - .htaccess

I wonder is it possible to learn the url structure which is written with mode_rewrite. For example, the url that I have is http://somesite.com/topic-header and there must be a kind of somesite.com?id=123 or somesite.com?title=topic-header. Is there any way to detect that?

Only if you have access to the folder in which the .htaccess file resides (or in the apache configuration files, albeit less used).
It's also common knowledge to redirect any links that are not found on the filesystem to index.php, wordpress does this for example.

Related

Hide directory where css, js, image folder are placed

I want to secure and hide my asset folder from publicity. I hear that can be done with the .htaccess file and change the name from my directory-name to random name of the directory and in that case, users can't know the real name of my directory placed into my public_html. Can someone help me with examples of all kinds of documentation? I didn't try anything because I have really bad knowledge of .htaccess coding. Any help will be thankful.
Well, you can do this, but I'm curious as to the purpose if it's just to casually "hide" the underlying file directory? This doesn't really offer any additional "security" and can also cause issues if you have a front-end proxy that is intended to serve static content. It can also be problematic if you are using a CMS like WordPress as you may need to modify the default behaviour. (Although there may be other developmental issues for which you would choose to do this.)
Ideally, you would do something like this with the Alias directive in the main server config (or vHost container).
In .htaccess you can internally rewrite the request using mod_rewrite. Lets's say you are referencing /assets in the URL, but the actual filesystem directory that this should map to is /secret then you could do the following to simply forward all requests to /secret:
RewriteEngine on
RewriteRule ^assets/(.+) secret/$1 [L]
Only requests for /assets/<something> will be forwarded. A request for /assets or /assets/ will simply result in a 404 (assuming this directory does not actually exist).
To be more selective and only forward requests for specific file types, based on the file extension, then you could do something like the following:
RewriteRule ^assets/(.+\.(?:jpg|webp|gif|css|js))$ secret/$1 [L]
You could also check to see whether the target file actually exists before rewriting, but this is generally unnecessary and best avoided since filesystem checks are relatively expensive.

Nginx URL rewrite to remove folder from URL when its followed by certain subfolders

After I have upgraded my site I see that once I go live with new version some parts of the website URLs will not be redirected for gallery, blogs and files because of new structure. And there is no way fixing it within the CMS. So my goal is to use NGINX redirects.
I wonder do any of you know any NGINX rewrite tricks to make such redirects possible?
website.com/forums/blogs/ into website.com/blogs/
website.com/forums/gallery/ into website.com/gallery/
website.com/forums/files/ into website.com/files/
I actually need the part forums dropped from the URL only and ONLY when the address is going for forums+blogs/gallery/files. Don't want to loose that google traffic.
So for example
website.com/forums/blogs/entry123/my-dog/ is redirected to
website.com/blogs/entry123/my-dog/
BUT
website.com/forums/topic/my-dog/
is left alone and working just like before because the following subfolder is neither blogs or gallery or files.
I needed that once on Apache and this one worked but on Nginx I have no idea.
RewriteRule ^forums/(blogs|gallery|files)/(.*)$ /$1/$2 [L,R=301]
You can try something like
rewrite ^/forums/(blogs|gallery|files)/(.*)$ /$1/$2;
Please note that rewrite directive accepts some flags wich meaning depends on where is it placed (is it inside a server or location block). Detailed documentation is here.

single-page application with clean URLs without .htaccess file?

My question pertains specifically to the two pages below, but is also more generally relating to methods for using clean URLs without an .htaccess file.
http://www.decitectural.com/
and
http://www.decitectural.com/about/
The pages above are hosted on Amazon's S3, which does not allow for the use of htaccess files. As a result, I have found no easy way to create a clean url rewrite scheme that sends all requests to an index file which, in turn, interprets the URL using javascript and loads up the correct page (with AJAX, or, as is the case with decitectural, with simple div visibility toggling).
In order to circumvent this problem, I usually edit the amazon S3 bucket properties and set both the index page and the error page to the index.html file. In this case, the index.html file is served even when an invalid path (such as /about/) is requested. This has, for the most part, been a functioning solution... That is, until I realized that I was also getting a 404 with the index.html page which would stop Google from indexing it.
This has led me to seek out an alternative solution to this problem. Currently, as a temporary fix, I am actually creating the /about/ directory on the server with a duplicate of the index.html file in it. This works, but obviously is not a real solution to the problem.
I would appreciate any advice on how to set up a clean URL routing scheme on S3 or in any instance where an .htaccess file can't be used.
Here's a few solutions: Pretty URLs without mod_rewrite, without .htaccess
Also, I guess you can run a script to create the files dynamically from an array or database so it generates all your URLs:
/index.html
/about/index.html
/contact/index.html
...
And hook the script on every edit, in a cron or run manually. Not the best in terms of performance but hey, it should work.
I think you are going about it the wrong way. S3 gives you complete control of the page structure of your site. If you want your link to be "/about", just upload a file called "about", and you're done. (Set the headers so that the browser knows it's HTML.)
Yes, it will break if someone links to "/about/" or "/about.html". But pretty much any site will break if you mess with their links in odd ways. You will have to be vigilant when linking to your own site, because you won't have any rewrite rules to clean up for you. But you should have automation doing that.

How does Concrete5 arrange it's absolute paths?

I've been asked to figure out how the Concrete5 system works for an employer, and I can't figure something out.
I have Concrete5 installed to a directory on the server called /realprofessionals. When the Concrete5 system makes new pages, it gives them their own absolute paths, for instance:
http://www.wmcpartners.com/realprofessionals/footer
However, it hasn't actually made a folder in the /realprofessionals directory called footer. So how does that work? How can http://www.wmcpartners.com/realprofessionals/footer be a working link?
Short answer: All page requests are actually going through the one and only index.php file. Page content is stored in the database, not in files on the server.
Long answer:
Concrete5 (and most PHP-based CMS's for that matter) work like this: all requests are routed through the index.php file. This routing is enforced with some mod_rewrite rules in the .htaccess file. The rules say "for any request, don't actually go to that page, but instead go to index.php and pass the rest of the requested path as $_GET parameters". Then in the index.php code (or some other code that is included by the index.php file), the requested page is determined based on the path that was put into the $_GET parameters by Apache (as per the mod_rewrite rule in .htaccess), and the appropriate content is retrieved from the database.
Storing content in the database as opposed to files on the server has several advantages. For example, you can re-use the same html template -- header, footer, sidebar -- on every page, and if you change the template it will automatically be reflected on all pages it's used on. Also, it makes it easier to shuffle pages around and to give them whatever URL you want (e.g. no ".php" extension at the end, or /2010/11/date/based/paths/for/blog/posts).
The disadvantage of course is that every request requires many database queries, but for most sites (those without zillions of page views), the trade-off is well worth it (and various types of caching can help reduce the performance hit).
Jordan's answer is excellent, I would add that you probably don't see index.php in the url because you've enabled pretty URLs (type 'pretty' on concrete5's searchbox to check that).
Anyhow, the best way to programmatically add link to internal pages is:
<a href="<?=$this->url('page-name');?>">
page name
</a>
It works both on localhost and online, with or without pretty URLs.
(For the page-name go to dashboard/full sitemap/page-name/properties/page paths and location.)

Fully securing a directory

What are the different approaches to securing a directory?
including an index page so contents can't be viewed
the problem with this is that people can still access the files if they know the filename they're after
including an htaccess file to deny all
this seems to be the best approach, but is there any case that an htaccess file can be passed by? are there any cases as well where htaccess is not available?
restricting folder access
this is also a nice solution, but the problem is, the folder I'm trying to secure should be viewable and writable by the program.
Are there any other ways that folder security can be done?
Best practice for Apache is to use htaccess to restrict - this only restricts from the webserver - but that should be what you need. You can add authentication into this - but for most needs to you can just deny all acess - which hides the directory completely.
Another method that can also work well with using htaccess to deny direct access would be to use htaccess in your route directory to rewrite urls. This means that a request such as /example/listItems/username/ted can be rewritten as a call to a php or other file such as:
/application/index.php?module=listItems&username=ted
The advantage of doing this is that the webserver does not give out paths to any directories so it is much more difficult for people to hack around looking for directories.
If you want to protect a directory of images you could also use htaccess to redirect to a different directory so that /images/image5.png is actually a call to :
/application/images/image5.png
You could also try not placing your protected directory under your www dir but on other "non www visible" location. If your app needs to read / write data, tell it to do it on the other location. Modify its properties so only the app has the proper rights to do so.

Resources