URL rewriting in address bar - browser

I have no idea where to find out about why if I paste this url into the address bar of Chrome, Safari, and Firefox it will be translated into a different URL with a couple of accented characters.
It results in a crypto currency phishing site so beware.
I'm trying to find the logic (javascript) or whatever that causes this translation
The base url is http://www.xn--shapehit-ez9c7y.com
I apologise if this is the wrong site to ask the question on.

This is called PunyCode, which is a way to represent Unicode within the ASCII character set. This allows websites to have names with foreign characters, such as in Chinese or Arabic. While this is incredibly useful, it can also be used for deceptive impersonation (often maliciously, as noted in your question).
Different browsers treat PunyCode differently. For example, Safari and Edge will not attempt to covert PunyCode, and will show the full 'strange' URLs.
However, according to Sophos,
Chrome and Firefox won’t automatically decode punycode URLs if they mix multiple alphabets or languages, on the grounds that such text strings are highly unlikely in real life and therefore suspicious. But both Chrome and Firefox will autoconvert punycode URLs that contain all their characters in the same language.
A security researcher called Xudong Zheng actually registered the domain xn--80ak6aa92e.com, which translates to аррӏе in 'Russian'. When visited in Chrome or Firefox, it looks identical to apple.com in the URL:
Fortunately his site simply warns of this forgery, but it could easily have been used maliciously.
Hope this helps :)

Related

Why domains containing one emoji redirect me to other unsafe and existing sites?

I was editing a text in vim and I typed gx on a play-button emoji to open it as an url and to see what happen. Vim translated the UTF-8 emoji into a punycode one and wrapped it into an url format for my browser: xn--g1h.com. The request had been redirected to another site: anti666.com, that it's quite spooky. The site is uncertificated, of course.
I tryed to wrap other emojis' punycode into the same url format (www.<punicode emoji>.com) with two different browsers (firefox and chromium) and I didn't notice any difference in results between them.
Sometimes I got 404 errors, sometimes it took me to other uncertificated sites.
To take another example, xn--c1yn36f.com redirects me to 1redirb.com/.../....
I didn't tried further because I thought it could be risky.
I think these are redirections and not aliases or whatever because of my research with whois.com but it's just my own speculation.
My question is: How this redirections are possible and why this happens? Are these domains actually existing or there is another explanation?
I expect my browser throws me an error, not redirects me to another unsafe site.
Thanks.

Web pages served by local IIS showing black diamonds with question marks

I'm having an issue in a .NET application where pages served by local IIS display random characters (mostly black diamonds with white question marks in them). This happens in Chrome, Firefox, and Edge. IE displays the pages correctly for some reason.
The same pages in production and in lower pre-prod environments work in all my browsers. This is strictly a local issue.
Here's what I've tried:
Deleted code and re-cloned (also tried switching branches)
Disabled all browser extensions
Ran in incognito mode
Rebooted (you never know)
Deleted temporary ASP.NET files
Looked for corrupt fonts on machine but didn't find any
Other Information:
Running IIS 10.0.17134.1
.NET MVC application with Knockout
I realize there are several other posts regarding black diamonds with question marks, but none of them seem to address my issue.
Please let me know if you need more information.
Thanks for your help!
You are in luck. The explicit purpose of � is to indicate that character encodings are being misused. When users see that, they'll know that we've messed up and lost some of their text data, and we'll know that, at one or more points, our processing and/or configuration is wrong.
(Fonts are not at issue [unless there as no font available to render �]. When there is no font available for a character, it's usually rendered as a white-filled rectangle.)
Character encoding fundamentals are simple: use a sufficient character set (say Unicode), pick an appropriate encoding (say UTF-8), encode text with it to obtain bytes, tell every program and person that gets the bytes that they represent text and which encoding is used. The encoding might be understood from a standard, convention, or specification.
Your editor does the actual encoding.
If the file is part of a project or similar system, a project file might store the intended encoding for all or each text file in the project. If your editor is an IDE, it should understand how the project does that.
Your compiler needs the know the encoding of each text file you give it. A project system would communicate what it knows.
HTML provides an optional way to communicate the encoding. Example: <meta charset="utf-8">. An HTML-aware editor should not allow this indicator to be different than the encoding it uses when saving the file. An HTML-aware editor might discover this indicator when opening the file and use the specified encoding to read the file.
HTTP uses another optional way: Content-Type response header. The web server emits this either statically or in conjunction with code that it runs, such as ASP.NET.
Web browsers use the HTTP way if given.
XHR (AJAX, etc) uses HTTP along with JavaScript processing. If needed the JavaScript processing should apply the HTTP and HTML rules, as appropriate. Note: If the content is JSON, the current RFC requires the encoding to be UTF-8.
No one or thing should have to guess.
Diagnostics
Which character encoding did you intend to use? This century, UTF-8 is so much the norm that if you choose to use a different one, you should have a good reason and document it (for others and your future self).
Compare the bytes in the file with the text you expect it to represent. Does it use the entended encoding? Use an editor or tool that shows bytes in hex.
As suggested by #snakecharmerb, what does the server send? Use a web browser's F12 network tab.
What does the HTTP response header say, if anything?
What does the HTML meta tag say, if anything?
What is the HTML doctype, if any?

Arabic and other Right-to-left slugs ok?

I'm creating a multi-lingual site, where an item has a slug for each of the sites languages.
For Arabic slugs (and I assume any other right-to-left languages) it acts strange when you try to highlight it. The cursor moves opposite while in the RTL text..etc. This isn't a terribly big deal, but it made me think that maybe it's not "normal" or "ok".
I've seen some sites with Arabic slugs, but I've also seen completely Arabic sites that still use English slugs.
Is there a benefit one way or the other? A suggested method? Is doing it the way I am ok? Any suggestions welcome.
I suppose that by "slug" you mean a direct permanent URL to a page. If you don't, you can ignore the rest of this answer :)
Such URLs will work, but avoid them if you can. The fact that it's right-to-left is actually not the worst problem. The worst problem with any non-ASCII URL is that in a lot of contexts it will show like this: https://ar.wikipedia.org/wiki/%D9%86%D9%87%D8%B1_%D9%81%D8%A7%D8%B1%D8%AF%D8%A7%D8%B1 (it's just a link to a random article in the Arabic Wikipedia). You will see a long trail of percent signs and numbers, even if the title is short, because each non-ASCII characters will turn to about six ASCII characters. This gets very ugly when you have to paste it in an email, for example.
I write a blog in Hebrew and I manually change the slug of every post to some ASCII name.

url autofill in the browsers

in the browsers like chrome and mozilla, when we type "F" it automatically fills the url as "facebook.com", how it works, what is this concept, I want to learn this, can anyone help me by suggesting some links here
A more efficient solution is to use a prefix tree (a trie) to store prefixes of words. That's also how spellcheck systems (such as that found in MS Word, etc) usually work.

space in url; did browser got smarter or server?

It looks like today you no longer to have to encode spaces by %20 in your html links or image links. For example, suppose you have this image at 〔http://example.com/i/my house.jpg〕. Notice the space there. In your html code, you can just do this:
<img src="http://example.com/i/my house.jpg" alt="my house">
It work in all current version of browsers. Though, what i'm not sure is that whether the browser encodes it before requesting the url, or a particular server will do the right with with paths with space? (apache)
Addendum:
sorry about the confusion. My real question is about HTTP protocol.
I'll leave this one as is and mark Answered.
I posted a new question here.
does HTTP protocol require space be encoded in file path?
The browser makes the correction.
You still have to encode the spaces though. Just because it works in the browsers you use doesn't make it valid, and doesn't mean it will work everywhere.
You can see a list of reserved characters and other characters that should be encoded here: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
RFC1738 specifically states:
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.
RFC2396 takes place over RFC1738 and expounds on space usage in URLs:
The space character is excluded because significant spaces may
disappear and insignificant spaces may be introduced when URI are
transcribed or typeset or subjected to the treatment of word-
processing programs. Whitespace is also used to delimit URI in many
contexts.

Resources