%20 in filename, browser sees whitespaces. How to avoid that? - .htaccess

I have imported images to my own server. One of the many filenames has %20 in them, e.g. 50128%202789%20001V%20500.jpg. However the browser sees it as 50128 2789 001V 500.jpg, so it won't display the image.
What solution can I use, to display the image properly?

"%20" is a percent-encoding for a space, and has special meaning in a URL. Basically, you can't use spaces in a URI or URL, and have to replace them with a code to comply with the rules. Things that read URLs (the server for example) translate them back.
Unfortunately, that means if you have a filename containing this sequence, it will be mistaken for a percent-encoded space, as you're seeing.
The solution is to also percent-encode the '%'. The percent-encoding for a '%' character is "%25".
In your example, the name "50128%202789%20001V%20500.jpg" has to be encoded to "50128%25202789%2520001V%2520500.jpg" so that those '%' characters are not mistaken for spaces.
There are of course other things that get encoded. The rules are defined in the URI specification.

Related

How to invalidate a cloudfront path that contains a tilde ~ character?

Trying to invalidate an AWS cloudfront path that contains a tilde ~ character results in an invalid argument error. A tilde is a valid URL character, and invoking things like encodeURI or encodeURIComponent against strings that contain it do not encode it.
What I've tried
I've tried encoding the tile as %7E in the invalidation URL. This does not yield an invalid argument error, but it does not invalidate the path for the desired file either.
Temporary workaround
I've been able to work around this by finding the first index of ~, replacing it with a *, and chopping off the rest of the string afterward. This creates the needed invalidation, though not the desired invalidation, as it can also invalidate paths that may be optimized by remaining cached.
AWS Support says that this is an issue with invalidating paths that contain special characters, that they are aware of the issue, and are actively working toward a solution.
The reason that %7E doesn't work is because cloudfront caches it separately from the tilde.

'%' getting interpreted as '0' when inputting URL into address bar

Percentage sign ('%') is getting interpreted as ('0') when inputting URL into address bar from Blue Prism for a SharePoint website
Issue is to do with the % symbol being interpreted as a command. You need to use curly brackets for it to be used successfully.
Something like this should do it
Replace(Replace("https://health.sharepoint.com/sites/in/Standard/Forms/My%20Claims.aspx","%","{%}"),")","{%}")
While Eoin2111's answer is correct, generally speaking when editing URL, you should make sure to encode it (remove spaces, special characters etc.).
One of the ways to encode URL is HttpUtility.UrlEncode(). The usage of this and some other useful methods is very well described in here.

Why does URL encoding exist for ASCII character set

It is clearly stated in W3Schools that
URLs can only be sent over the Internet using the ASCII character-set.
Why does URL encoding exist for ASCII characters like a , b , c when it can be sent over the internet without any URL encoding ???
Eg: Why encode 'a' when it can send over as 'a'
What are the possible reasons to encode ASCII characters ?? The only reason i can think of are hackers who are trying to make their URL as unreadable as possible to carry out XSS attacks
STD 66, Percent-Encoding:
A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component.
So percent-encoding is a kind of escape mechanism: Some characters have a special meaning in URI components (→ they are reserved). If you want to use such a character without its special meaning, you percent-encode it.
Unreserved characters (like a, b, c, …) can always be used directly, but it’s also allowed to percent-encode them. Such URIs would be equivalent:
URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource.
Why it’s allowed to percent-encode unreserved characters in the first place? The obsolete RFC 2396 contains (bold by me):
Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear.
I can’t think of an example for such a "context", but this sentence suggests that there may be some.
Also, maybe some people/implementations like to simply percent-encode everything (except for delimiters etc.), so they don’t have to check if/which characters would need percent-encoding in the corresponding component.
URL encoding exists for the full range of ASCII because it was easier to define an encoding that works for all characters than to define one that only works for the set of characters with special meanings.
URL encoding allows for characters that have special meaning in a URL to be included in a segment, without their special meaning. There are many examples, but the most common ones to require encoding include " ", "?", "=" and "&"
URL encoding was designed so it can encode any ASCII character.
While = is encoded as %3d, ? is encoded as %3f and & is encoded as %26, it makes sense for a to be encoded as %61 and b to be encoded as %62, as the hex number after the % represents the ASCII code of the character.

Replace character with a safe character and vice-versa

Here's my problem:
I need to store sentences "somewhere" (it doesn't matter where).
The sentences must not contain spaces.
When I extract the sentences from that "somewhere", I need to restore the spaces.
So, before storing the sentence "I am happy" I could replace the spaces with a safe character, such as &. In C#:
theString.Replace(' ', '&');
This would yield 'I&am&happy'.
And when retrieving the sentence, I would to the reverse:
theString.Replace('&', ' ');
But what if the original sentence already contains the '&' character?
Say I would do the same thing with the sentence 'I am happy & healthy'. With the design above, the string would come back as 'I am happy healthy', since the '&' char has been replaced with a space.
(Of course, I could change the & character to a more unlikely symbol, such as ¤, but I want this to be bullet proof)
I used to know how to solve this, but I forgot how.
Any ideas?
Thanks!
Fredrik
Maybe you can use url encoding (percent encoding) as an inspiration.
Characters that are not valid in a url are escaped by writing %XX where XX is a numeric code that represents the character. The % sign itself can also be escaped in the same way, so that way you never run into problems when translating it back to the original string.
There are probably other similar encodings, and for your own application you can use an & just as well as a %, but by using an existing encoding like this, you can probably also find existing functions to do the encoding and decoding for you.

Cyrillic characters in browser address bar

When I put cyrillic symbols in address bar like this:
http://ru2.php.net/manual-lookup.php?pattern=привет
it switches to
http://ru2.php.net/manual-lookup.php?pattern=%EF%F0%E8%E2%E5%F2
What does that characters -- %EF%F0%E8%E2%E5%F2 -- mean? And why is it happening?
The characters are getting URL encoded. A URL may only contain a subset of ASCII characters, so anything outside plain alphanumeric and some special characters must be URL encoded.
Some browsers display non-ASCII characters as human readable characters, but that's entirely up to them. In protocols, URLs are always URL encoded.

Resources