Arabic and other Right-to-left slugs ok? - browser

I'm creating a multi-lingual site, where an item has a slug for each of the sites languages.
For Arabic slugs (and I assume any other right-to-left languages) it acts strange when you try to highlight it. The cursor moves opposite while in the RTL text..etc. This isn't a terribly big deal, but it made me think that maybe it's not "normal" or "ok".
I've seen some sites with Arabic slugs, but I've also seen completely Arabic sites that still use English slugs.
Is there a benefit one way or the other? A suggested method? Is doing it the way I am ok? Any suggestions welcome.

I suppose that by "slug" you mean a direct permanent URL to a page. If you don't, you can ignore the rest of this answer :)
Such URLs will work, but avoid them if you can. The fact that it's right-to-left is actually not the worst problem. The worst problem with any non-ASCII URL is that in a lot of contexts it will show like this: https://ar.wikipedia.org/wiki/%D9%86%D9%87%D8%B1_%D9%81%D8%A7%D8%B1%D8%AF%D8%A7%D8%B1 (it's just a link to a random article in the Arabic Wikipedia). You will see a long trail of percent signs and numbers, even if the title is short, because each non-ASCII characters will turn to about six ASCII characters. This gets very ugly when you have to paste it in an email, for example.
I write a blog in Hebrew and I manually change the slug of every post to some ASCII name.

Related

Internationalization Web Number-Symbols

do I need to use another number-symbols when I want my webpage to be accessible in other countries? According to Microsoft there are different shape of numbers: https://learn.microsoft.com/en-us/globalization/locale/number-formatting#:~:text=formatting%20for%20details.-,The%20character%20used%20as%20the%20thousands%20separator,thousands%20separator%20is%20a%20space.
I have been searching since a few days to get a clear answer but I cant find some. Also, on most international websites/apps I only ever see the digits 0,1,2,3,4,5,6,7,8,9 although the digits for the language actually look different. That unsettles me. I feel like many websites/apps just ignore this fact. Can anybody help me further? Also do I need to know how to activate foreign symbols in html?
I do not know for sure what language you are translating/typing in HTML. But here is an example of what you can use as a guide to certain scripts in Arabic: https://sites.psu.edu/symbolcodes/languages/mideast/arabic/arabicchart/
You may also need to use a converter. For example, I type Chinese on my website by typing the characters into a character to unicode converter. Then I copy and paste the unicode to my HTML text.

The structure of Arabic letters in Unicode

I got two different "versions" of Arabic letters on Wikipedia. The first example seems to be 3 sub-components in one:
"ـمـ".split('').map(x => x.codePointAt(0).toString(16))
[ '640', '645', '640' ]
Finding this "m medial" letter on this page gives me this:
ﻤ
fee4
The code points 640 and 645 are the "Arabic tatwheel" ـ and "Arabic letter meem" م. What the heck? How does this work? I don't see anywhere in the information so far on Unicode Arabic how these glyphs are "composed". Why is it composed from these parts? Is there a pattern for the structure of all glyphs? (All the glyphs on the first Wikipedia page are similar, but the second they are one code point). Where do I find information on how to parse out the characters effectively in Arabic (or any other language for that matter)?
Arabic is a script with cursive joining; the shape of the letters changes depending on whether they occur initially, medially, or finally within a word. Sometimes you may want to display these contextual forms in isolation, for example to simply show what they look like.
The recommended way to go about this is by using special join-causing characters for the letters to connect to. One of these is the tatweel (also called kashida), which is essentially a short line segment with “glue” at each end. So if you surround the letter م with a tatweel character on both sides, the text renderer automatically selects its medial form as if it occured in the middle of a word (ـمـ). The underlying character code of the م doesn’t change, only its visible glyph.
However, for historical reasons Unicode also contains a large set of so-called presentation forms for Arabic. These represent those same contextual letter shapes, but as separate character codes that do not change depending on their surroundings; putting the “isolated” presentation form of م between two tatweels does not affect its appearance, for instance: ـﻡـ
It is not recommended to use these presentation forms for actually writing Arabic. They exist solely for compatibility with old legacy encodings and aren’t needed for correctly typesetting Arabic text. Wikipedia just used them for demonstration purposes and to show off that they exist, I presume. If you encounter presentation forms, you can usually apply Unicode normalisation (NFKD or NFKC) to the string to get the underlying base letters. See the Unicode FAQ on presentation forms for more information.

Permalinks: Why are hypens(-) used over underscores(_) to replace spaces(and other unwanted characters)?

In some large websystems I have come across lately, friendly permalinks, e.g. part of a HTML path that is based on a (often user-specified) string rather than a numerical number, spaces(and other unwanted/disallowed characters that would otherwise need to be url-escaped) are replaced by hyphens (-), and not by underscores (_).
An example:
in the URL http://example.com/blog/this-is-my-first-post, this-is-my-first-post is a friendly permalink. Using underscores, this would be http://example.com/blog/this_is_my_first_post
Is this only a personal preference, or is there a technical reason to use hyphens over underscores?
Hypothetical possibilities I thought of:
Maybe it matters for Search Engine Optimalization?
Maybe it is actually important for how HTML paths are interpreted?
Maybe there is a historical reason?
What I do know:
Hyphens are treated as word-breaks in most (if not all?) computer systems/programs, e.g. use ctrl+left/ctrl+right to move in a sentence_that_uses_underscores vs a sentence-that-uses-hyphens.
In normal text that a user enters (e.g. names for objects or blogposts), usage of actual hyphens is higher than underscores.
Could someone shine some light on this?
Google has spoken:
Consider using punctuation in your URLs. The URL http://www.example.com/green-dress.html is much more useful to us than http://www.example.com/greendress.html. We recommend that you use hyphens (-) instead of underscores (_) in your URLs.
https://support.google.com/webmasters/answer/76329?hl=en

What kind of sign is "‎" and what is it used for

What kind of sign is "‎" and what is it used for (note there is a invisible sign there)?
I have searched through all my documents and found a lot of them. They messed upp my htaccess file. I think I got them when I copied webadresses from google to redirect. So maybe a warning searching through your documents for this one also :)
It is U+200E LEFT-TO-RIGHT MARK. (A quick way to check out such things is to copy a string containing the character and paste it in the writing area in my Full Unicode input utility, then click on the “Show U+” button there, and use Fileformat.Info character search to check out the name and other properties of the character, on the basis of its U+... number.)
The LEFT-TO-RIGHT MARK sets the writing direction of directionally neutral characters. It does not affect e.g. English or Arabic words, but it may mess up text that contains parentheses for example – though for text in English, there should be no confusion in this sense.
But, of course, when text is processed programmatically, as when a web server processes a .htaccess file, they are character data and make a big difference.

How do I search Google for code and other programming related keywords? It seems to strip special characters

One of the problems I have with Google is that it seems to strip special characters like dots, commas and some other special characters, which are usually what I'm looking for when I'm trying to find anything programming-related
ex: django # sign returns irrelevant data. Perhaps you know a way (or an alternative/technique) to make this possible?
Related Questions
Effective Googling for short names
Why would M# be harder to Google than C#?
If you're looking for actual code examples, you can try code.google.com. Otherwise, the safest bet is to find the main website for whatever language you've got questions about and look around there, although a little digging is likely to turn it up on google.
Have you tried http://www.google.com/codesearch?

Resources