Regex Replace that works with and without trailing slash - string

I am trying to match a url string and I want to swap when some word is included in this case reviews the problem is that when the url ends on a forward slash my formula adds a double slash. Is there a way to add a slash only when there is no slash at the end?
https://regex101.com/r/cjMEiW/1

Is there a way to add a slash only when there is no slash at the end?
The 'trick' is to use an additional 'dummy' (not used in substitution) group (\/?) which can be empty or contain a slash and not to allow the group 3 to include a slash at its end (.*[^\/$])
See:
https://regex101.com/r/nysBYY/2
Regex:
(<a href='https:\/\/www\.example\.com)\/(reviews)\/(.*[^\/$])(\/?)('>)
Test string:
<a href='https://www.example.com/reviews/citroen/c4/'>here</a>
<a href='https://www.example.com/reviews/citroen/c4'>here</a>
Substitution:
$1/$3/$2/$5
Output:
<a href='https://www.example.com/citroen/c4/reviews/'>here</a>
<a href='https://www.example.com/citroen/c4/reviews/'>here</a>
Check out the answer https://stackoverflow.com/a/72516646/7711283 at Regex, substitute part of a string always at the end for more detailed explanation of the regex of the groups.
P.S. the OPs regex was:
(<a href='https:\/\/www\.example\.com)\/(reviews)\/(.*)('>)
with substitution $1/$3/$2/$4 which resulted in:
<a href='https://www.example.com/citroen/c4//reviews/'>here</a>
<a href='https://www.example.com/citroen/c4/reviews/'>here</a>

Related

Is it possible to put a line break into a 'mailto:' rewrite in htaccess?

For complex reasons I've had to remove an enquiry form from a web site and use a 'mailto:' instead. For simplicity I've changed the htaccess file so that the former 'contact' link to the form now becomes a 'mailto:' as follows:
RewriteRule ^contact$ mailto:myname#mydomain.com?subject=BusinessName\ BandB\ Enquiry&body=You\ can\ find\ our\ availability\ on\ line.\ Delete\ this\ content\ if\ inapplicable
That does work, my local e-mail client (Thunderbird) opens with the information correctly shown in subject and body. (My TB is set to compose in plain text, I've yet to test with HTML)
I would like to introduce a new line in the body so that 'Delete this content if inapplicable' is on a separate line. Is there any way to do this? Given mod_rewrite's intended purpose I could understand if there isn't but I thought I'd ask before giving up.
I would like to introduce a new line in the body so that 'Delete this content if inapplicable' is on a separate line.
New lines in the body need are represented by two characters: carriage return (char 13) + line feed (char 10) (see RFC2368). This would need to be URL encoded in the resulting URL as %0D%0A.
When used in the RewriteRule substitution string the literal % characters would need to backslash-escaped to negate their special meaning as a backreference to the preceding CondPattern (which there isn't one). ie. \%0D\%0A. Otherwise, you will end up with the string DA, because there is no %0 backreference in this example.
You can also avoid having to backslash-escape all the literal spaces by encloses the entire argument (substitution string) in double quotes.
So, try the following instead:
RewriteRule ^contact$ "mailto:myname#mydomain.com?subject=BusinessName BandB Enquiry&body=You can find our availability on line.\%0D\%0ADelete this content if inapplicable" [R,L]

Cant create correct regex

I have an html text. With my regex:
r'(http[\S]?://[\S]+/favicon\.ico[\S^,]+)"'
and with re.findall(), I get this result from it:
['https://cdn.sstatic.net/Sites/stackoverflow/Img/favicon.ico?v=ec617d715196', 'https://stackoverflow.com/favicon.ico,https://cdn.sstatic.net/Sites/stackoverflow/Img/favicon.ico?v=ec617d715196']
But i dont want this second result in list, i understand that it has coma inside, but i have no idea how to exclude coma from my regex. I use re.findall() in order to find necessery link in any place in html text because i dont know where it could be.
Note that [\S]+ contains redundant character class, it is the same as \S+. In http[\S]?://, [\S]? is most likely a human error, as [\S]? matches any optional non-whitespace char. I doubt you implied to match http§:// protocol. Just use s to match s, or S to match S.
You can use
https?://[^\s",]*/favicon\.ico[^",]+
See the regex demo.
Details:
https?:// - http:// or https://
[^\s",]* - zero or more chars other than whitespace, " and , chars
/favicon\.ico - a fixed /favicon.ico string
[^",]+ - one or more chars other than a " and , chars.

How to include space and new line in regular expression?

I have an HTML file which contains the below code snippet.
<div class="col-sm-2 col-sm-offset-1">
<div class="countBox success" id="success">
<h2>467</h2>
Passed Tests
<span class="glyphicon glyphicon-eye-open"></span>
</div>
</div>
I have a regular expression (.*)</h2>\r\nPassed to get the value 467. It is worked till yesterday. But, it is not working now. I have tried by replacing single slash by double slash to new line and row. Used "\s+" to cover whitespace. All failed in error. Could anyone please guide me on how to get the value as 467 by using regular expression for the above code snippet?
It is better to catch <h2>(\d+)</h2> to ensure only a h2 header with a number inside. By the way, \r\n is only one convention (in windows) to represent end of line, but in unix it is only \n so to be more platform independent, you can do \r?\n (marking the \r as optional) and you have to get on the whitespace in front of Passed., so a good (but not probably the best) regexp would be:
<h2>(\d+)<\/h2>\r?\n\s*Passed
See demo.

Removal of trailing dot in RewriteRule of .htaccess

The .htaccess rewrite rule applied in a restful database application:
RewriteRule ^author/([A-z.]+)/([A-z]+)$ get_author.php?first_name=$1&last_name=$2
applied to
http://localhost:8080/API/author/J./Doe
removes the period from "J." and the resulting name "J Doe" is obviously not in the database (while "J. Doe" is). This rewrite rule only removes a trailing period, e.g. "J.O" translates correctly to "J.O". I use XAMPP 7.0.6 plus Apache under Windows 10. What to do in order to NOT remove the trailing dot on the initial?
Update:
Apparently my question wasn't clear, I give it another try.
The regexp (RewriteRule) above is supposed to assign "J." to the variable $1. Instead it assigns "J" to $1, in other words, the regex drops the trailing dot. Secondly, the regex assigns "Doe" to the variable $2, this assignment is as expected and correct. The variables $1 (with incorrect value "J") and $2 (with correct value "Doe") are used in a database search. This search fails because of the missing dot. The database contains "J. Doe", but not "J Doe".
When a dot is not trailing, as in "J.O", the variable $1 gets the correct value "J.O". In other words, the regex does not remove all dots, only the trailing ones.
My question is: how can I tell (the rewrite engine of) .htaccess to apply the regexp correctly?
For comparison, the following piece of JS code does what I want:
var regexp = "^author/([A-z.]+)/([A-z]+)$";
var result = "author/J./Doe".match(regexp);
alert(result[1] + " " + result[2]);
This is apparently (still) a "feature": https://bz.apache.org/bugzilla/show_bug.cgi?id=20036
Problem: Apache strips all trailing dots and spaces unless the path segments is exactly "." or "..".
I ran into the problem because I tried to map an URL from get/a/b/c to get.php?param1=a&param2=b&param3=c, but c can legitimately have trailing dots. The issue is not actually mod_rewrite related but happens with regular URLs too, example URL of a file that's definitely not named this way: Example favicon file. Other servers don't do this. Example: Stackoverflow favicon file, which turns this into a way to detect an Apache server when the HTTP server header is stripped.
To work around this problem, I still map the URL using mod_rewrite, but then in the PHP script, I use the exact same regex to manually map the parameters:
if(preg_match('#/get/([^/]+)/([^/]+)/(.+)$#',$_SERVER['REQUEST_URI'],$matches)){
$param1=$matches[1];
$param2=$matches[2];
$param3=$matches[3];
}
Instead of using the PATH_INFO, I use the REQUEST_URI because it's untouched.
This means if you absolutely need to pass trailing dots in a path string to a backend using apache, your best bet right now is to write an intermediate script that extracts the proper parameters and then does the proxy request for you.

Need a mod_rewrite .htaccess solution to replace %20 spaces with -'s in the finished URL

I need an .htaccess mod_rewrite solution that will take a .cgi search query like this:
www.mydomain.com/cgi-bin/finda/therapist.cgi?Therapy_Type=Pilates Training&City=Los Angeles&State=CA
and return matching results in the browser's address bar to look like this:
www.mydomain.com/therapists/Pilates-Training-Los-Angeles-CA.html
or better yet:
www.mydomain.com/therapists/pilates-training-los-angeles-ca.html
Notice the database includes values with one, two or three words + spaces...
For example:
Therapy_Type=Pilates Training <- includes a space
City=Los Angeles <- includes a space
State=CA <- no space
I used the tool at: http://www.generateit.net/mod-rewrite/ to generate the following RewriteRule:
RewriteEngine On
RewriteRule ^([^-]*)-([^-]*)-([^-]*)\.html$ /cgi-bin/finda/therapist.cgi?Therapy_Types=$1&City=$2&State=$3 [L]
This does work (finds the search matches) and generates the results page, but because the parameter values have spaces in them, we end up with a URL that looks like this:
www.mydomain.com/therapists/Pilates%20Training-Los%20Angeles-CA.html
I've spent days in this forum and others trying to find a solution to get rid of these %20 (encoded spaces) so the final returned URL will look like 1) or 2) above.
I know someone on here must know how to do this... Help ;-)
If you replace the %20 with -, then how would you know where the therapy type ends and the city starts?
pilates-training-los-angeles-ca
would be
type=pilates
city=training
state=los
So I don't think you like to replace the %20 by -. You could however replace it with another character, like _:
pilates_training-los_angeles-ca
You then would have to translate every _ to a space within your PHP script (or whatever language you are using server side).

Resources