Splitting url by nth characters - .htaccess

I have a site that is getting too much traffic and I want to cache some of the pages, which should alleviate the problem.
I have a system for this in place already, but the issue is the url structure would lead to 11.365 million pages being saved in one directory, e.g.
dir/* <- 11+ million pages saved in this directory.
And this will make things very difficult when it comes to deleting the directory.
Via a predictive search I am using JavaScript to split down the cache like:
people/joh/n-j/one/s.json
Which is more manageable to delete.
Is there anyway I can use mod_rewrite to split urls down in the same way, e.g.
User loads /people/john-jones
Use mod_rewrite to see if caches/html/people/joh/n-j/one/s.html exists, if so server it
Else go to PHP an generate the page
I have a rule for this already, but not with splitting:
RewriteCond %{SCRIPT_FILENAME} ^(.+)\/cache [NC]
RewriteRule .* - [E=PATH:%1]
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^.+ %{ENV:PATH}/index.php?request=a&c=search&m=people&p=$0 [L]

Give the following rules a try:
RewriteEngine On
RewriteRule ^(people/.*?)([^/]{3})([^/]+)$ /$1$2/$3 [R=302,L]
RewriteCond %{DOCUMENT_ROOT}/caches/html%{REQUEST_URI}.html -f
RewriteRule ^ %{DOCUMENT_ROOT}/caches/html%{REQUEST_URI}.html [L]
The suggested edit by OP was rejected in peer review. Here's the solution OP went with:
# Set an environmental var for the root directory, so it works on local dev and live servers
RewriteCond %{SCRIPT_FILENAME} ^(.+)\/index.php$ [NC]
RewriteRule .* - [E=PATH:%1]
# Pick up the actual request from query string and set it as an environmental var
RewriteCond %{QUERY_STRING} ^request=names\/(.*?)([^/]{3})([^/]+) [NC]
RewriteRule .* - [E=SN:%1%2/%3]
# If a cache paged exists, internal redirect to that
RewriteCond %{ENV:PATH}/cache/html/names/%{ENV:SN}.html -f
RewriteRule .* cache/html/names/%{ENV:SN}.html [L]
# Send requests that are not cached to php
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^.+$ index.php?request=$0 [QSA,L]

Related

How to let TYPO3 do the error handling and redirecting for missing files

By default the redirecting for known file storages, like the fileadmin directory is prevented by htaccess:
RewriteRule (?:typo3/|fileadmin/|typo3conf/|typo3temp/|uploads/|favicon\.ico) - [L]
But this also prevents the TYPO3 error handling and redirects module from taking action if a file does not exist.
Our editors have the wish to set up redirects for some deleted files and I wonder if there are any negative effects if I don't exclude fileadmin from the RewriteRule.
Why does this rule even exist by default?
Here is the complete TYPO3 default rewriting, to make the context easier to understand:
<IfModule mod_rewrite.c>
# Enable URL rewriting
RewriteEngine On
# Store the current location in an environment variable CWD to use
# mod_rewrite in .htaccess files without knowing the RewriteBase
RewriteCond $0#%{REQUEST_URI} ([^#]*)#(.*)\1$
RewriteRule ^.*$ - [E=CWD:%2]
# Rules to set ApplicationContext based on hostname
#RewriteCond %{HTTP_HOST} ^dev\.example\.com$
#RewriteRule .? - [E=TYPO3_CONTEXT:Development]
#RewriteCond %{HTTP_HOST} ^staging\.example\.com$
#RewriteRule .? - [E=TYPO3_CONTEXT:Production/Staging]
#RewriteCond %{HTTP_HOST} ^www\.example\.com$
#RewriteRule .? - [E=TYPO3_CONTEXT:Production]
# Rule for versioned static files, configured through:
# - $GLOBALS['TYPO3_CONF_VARS']['BE']['versionNumberInFilename']
# - $GLOBALS['TYPO3_CONF_VARS']['FE']['versionNumberInFilename']
# IMPORTANT: This rule has to be the very first RewriteCond in order to work!
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)\.(\d+)\.(php|js|css|png|jpg|gif|gzip)$ %{ENV:CWD}$1.$3 [L]
# Access block for folders
RewriteRule _(?:recycler|temp)_/ - [F]
RewriteRule fileadmin/templates/.*\.(?:txt|ts)$ - [F]
RewriteRule ^(?:vendor|typo3_src|typo3temp/var) - [F]
RewriteRule (?:typo3conf/ext|typo3/sysext|typo3/ext)/[^/]+/(?:Configuration|Resources/Private|Tests?|Documentation|docs?)/ - [F]
# Block access to all hidden files and directories with the exception of
# the visible content from within the `/.well-known/` hidden directory (RFC 5785).
RewriteCond %{REQUEST_URI} "!(^|/)\.well-known/([^./]+./?)+$" [NC]
RewriteCond %{SCRIPT_FILENAME} -d [OR]
RewriteCond %{SCRIPT_FILENAME} -f
RewriteRule (?:^|/)\. - [F]
# Stop rewrite processing, if we are in the typo3/ directory or any other known directory
# NOTE: Add your additional local storages here
RewriteRule ^(?:typo3/|fileadmin/|typo3conf/|typo3temp/|uploads/|favicon\.ico) - [L]
# If the file/symlink/directory does not exist => Redirect to index.php.
# For httpd.conf, you need to prefix each '%{REQUEST_FILENAME}' with '%{DOCUMENT_ROOT}'.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule ^.*$ %{ENV:CWD}index.php [QSA,L]
</IfModule>
RewriteRule (?:typo3/|fileadmin/|typo3conf/|typo3temp/|uploads/|favicon\.ico) - [L]
But this also prevents the TYPO3 error handling and redirects module from taking action if a file does not exist.
[…]
Why does this rule even exist by default?
Looks like here this rule might not be necessary here; you are right, the not file/not folder checks before the last rule would prevent anything existing in these folders from getting rewritten already.
I guess it is just used as a sort of “performance shortcut” here - doing a basic regex match on the requested URL is cheaper, than having to actually query the file system, “does this exist or not?”
So they apparently make the assumption here, that inside those “special” folders no rewriting will ever be necessary, they are data storage only.
So, yeah, if you want Typo3 handling your 404s for anything in these folders as well, then you should be able to remove/comment out this Rule.

.htaccess rewrite blank spaces with - ("%20" to "-")

I want to remove %20 on my link to - (dash),
My .htaccess is like that right now,
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ icerik.php?name=$1 [QSA,L]
For example,
site.com/vision-and-mision,
site.com/do-re-mi-fa-so-la-si.
Actually I searched something but the informations were very specific and I'm confused
Thank you.
You can use the following in your /.htaccess file:
RewriteEngine On
# Replace whitespace with hyphens, set the environment variable,
# and restart the rewriting process. This essentially loops
# until all whitespace has been converted.
RewriteRule ^([^\s]*)\s(.*)$ $1-$2 [E=whitespace:yes,N]
# Then, once that is done, check if the whitespace variable has
# been set and, if so, redirect to the new URI. This process ensures
# that the URI is rewritten in a loop *internally* so as to avoid
# multiple browser redirects.
RewriteCond %{ENV:whitespace} yes
RewriteRule (.*) /$1 [R=302,L]
Then add your rules afterwards:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /icerik.php?name=$1 [QSA,L]
If this is working for you, and you would like to make the redirects cached by browsers and search engines, change 302 to 301.

Struggling with "clean urls"

After many hours of researching this site (and google) I've decided I need help with this problem I'm having. I'm using a snippet of code in my htaccess file that allows for a url to be accessed by either including the .php extension (like this www.mysite.com/about.php ), leaving the extension off completely with no slash (like this www.mysite.com/about ), or adding a slash at the end in place of the extension (like this www.mysite.com/about/ ).
So that part works beautifully. However it still shows the .php extension in the address bar after the page loads whether the user inputted it or not. So far I'm pretty happy with what it's doing as is, but I'd really just like to be able to hide the extension and even go so far as to put a slash at the end and for somereason nothing I'm doing is working in that respect. Hopefully some of this made sense.
I currently have this in my htaccess file.
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^([^/]+)/$ http://mysite.com/test-server/$1.php
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule (.*)$ http://mysite.com/test-server/$1/ [R=301,L]
This is actually a bad approach SEO-wise because your content is accessible via multiple URLs. Either enforce extensions or don’t.
I prefer extension-less URLs as it for some bizarre reason I want to switch technology stack (i.e. to Rails) I’m not stuck with “.php” on the end of my URLs.
To achieve this, you can just rewrite requests for the extension-less request to a script with “.php” on the end. In your .htaccess file place the following:
RewriteEngine on
# redirect to extension-less URL if requested
RewriteCond %{THE_REQUEST} ^[A-Z]+\s.+\.php\sHTTP/.+
RewriteRule ^(.+)\.php $1 [R=301,L]
I also found this bit of code that works for me quite well at removing the extension, but I've only got it working at the root level so far. I'd like to be able to mod it for different directories within my test site since url structure is really important for this particular project. Nothing but errors when I do that though.
AddType text/x-component .htc
RewriteEngine On
RewriteBase /
# remove .php; use THE_REQUEST to prevent infinite loops
RewriteCond %{THE_REQUEST} ^GET\ (.*)\.php\ HTTP
RewriteRule (.*)\.php$ $1 [R=301]
# remove index
RewriteRule (.*)/index$ $1/ [R=301]
# remove slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]
# add .php to access file, but don't redirect
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1\.php [L]

htaccess url rewrite with search

i am trying to create an .htaccess file that will achieve the following
create a clean url like this www.mydomain.com/About-Us with page page.php?title=About Us
query a database with the parameters passed on the url like this www.mydomain.com/?search=abc and it pulls to page index.php?search=abc
this is my code so far
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^?search=([0-9]*)$ index.php?id=$1 ## e.g www.mydomain.com/?search=try
RewriteRule ^([A-Z]*)$ page.php?name=$1
## www.mydomain.com/About-Us
##ErrorDocument 404 PageNotavailabale
####Protect the system from machines with worms
RewriteRule (cmd|root)\.exe - [F,E=dontlog:1]
#### To hold and redirect css/images/js files
RewriteRule images/(.+)?$ images/$1 [NC,L]
RewriteRule css/(.+)?$ css/$1 [NC,L]
RewriteRule js/(.+)?$ js/$1 [NC,L]
Why not using this kind of url for your search engine : www.domain.com/search/abc ?
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^search/([a-zA-Z0-9]+)$ index.php?search=$1
RewriteRule ^([a-zA-Z0-9]+)$ index.php?page=$1
Then, your access your pages with www.domain.com/<myPage>.
And your search engine with www.domain.com/search/<mySearch>
EDIT :
Please notice your rules doesn't allow a lot of params :
^?search=([0-9]*)$ allows only numbers (even an empty parameter)
^([A-Z]*)$ allows only uppercase letters (and also empty parameter)

Rewrite rule to hide folder, doesn't work right without trailing slash

i have a strange apache mod_rewrite problem. I need to hide a sub-directory from the user, but redirect every request to that sub-directory. I found several quite similar issues on stackoverflow, but nothing really fits, so i decided to post a new question.
My .htaccess looks like this:
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule ^(.*)?$ foo/$1 [QSA,L]
The document-root only contains the following folder/files:
/foo/bar/index.html
I would now expect that example.com/bar and example.com/bar/ would just show me the contents of index.html.
Instead example.com/bar/ show me the content as expected but example.com/bar redirects me with a 301 to example.com/bar/foo/ an then shows the contents. I really don't get why there is a 301 redirect in this case.
When i put something this
RewriteCond %{REQUEST_URI} !^[^.]*/$
RewriteCond %{REQUEST_URI} !^[^.]*\.html$
RewriteCond %{REQUEST_URI} !^[^.]*\.php$
RewriteRule ^(.*)$ $1/ [QSA,L]
on top of that rule it seems to work, but that would require me to list every used file extension...
Is there any other way i can omit the redirect, the folder "bar" should never be seen by an outside user.
Thanks in advance!
1st rewrite rule is redirect from /foo/(.) to ($1) and second - from (.) to $1.
just idea, this has not been tested.
Better late than never...
Got it working with a simple RewriteRule which append a / to every url that doesn't have on.
# only directories
RewriteCond %{REQUEST_FILENAME} !-f
# exclude there directories
RewriteCond %{REQUEST_URI} !^/excluded-dirs
# exclude these extensions
RewriteCond %{REQUEST_URI} !\.excluded-extension$
# exclude request that already have a /
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ /$1/ [R=301,L]

Resources