I am currently setting up a multi lingual website that is to support different languages, english and spanish for starters and french in the future.
I have the content etc localized but now I want to make sure that have all meta tags and http headers, lang attribute on body tag, xml:lang etc are correct for the culture selected at the time.
Is there anywhere outline on the internet that outlines what needs to be configured for multi lingual html sites? Anyone have any advice/tips on this.
As far as I know, there are no such "must haves". All of these are optional (maybe with an exception of character encoding). What you might need:
Character encoding declaration
That is for sure, you need content type and character encoding declaration:
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
For obvious reasons UTF-8 is recommended.
Language declaration
Language could be declared with lang attribute, which is defined for most of HTML elements. Therefore you can set it document-wide, body-wide or anywhere you please:
<html lang="de">
<body lang="es">
<p lang="pl">Jakiś tekst</p>
</body>
</html>
Of course that would be nice if these tags make sense (unlike in example above), so it is common to set lang tag for body and for paragraphs (if it happens that given paragraph is in different language than body).
The content of lang attribute is two letter ISO 639 language code (ISO 639-1 to be precise). The same applies to xml:lang (which could be declared in XHTML web sites).
Directionality
If it happens that you will face the challenge of localizing web site into Hebrew, Arabic or other Right-To-Left language, you will also need to apply dir attribute. Just like in case of lang, this attribute is optional and could be placed almost anywhere. If it is omitted, dir="LTR" is assumed. Example:
<body lang="ar" dir="RTL">
That's I am afraid it. Multilingual support for web sites is still kind of grey area, and support is not the best to say the least. For example, you would need to care yourself for providing valid formatting of dates, numbers, currencies and similar artifacts.
Also, you would need to remember not to use gems like CSS text-transform: uppercase | lowercase, JavaScript's uppercase and lowercase related functions, as well as JavaScript's toLocalizedString() functions. These gems are very unreliable and doesn't work correctly.
Related
EDIT: You can see the issue here (look in source).
EDIT2: Interesting, it is not an issue in source. Only with the console (Firebug as well).
I have the following markup in a file called test.html:
<!DOCTYPE html>
<html>
<head>
<title>Test Harness</title>
<link href='/css/main.css' rel='stylesheet' type='text/css' />
</head>
<body>
<h3>Test Harness</h3>
</body>
</html>
But in Chrome, I see:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
"
"
<title>Test Harness</title>
<link href='/css/main.css' rel='stylesheet' type='text/css' />
<h3>Test Harness</h3>
</body>
</html>
It looks like ̢ is a zero width space, but what is causing it? I am using Sublime Text 2 with UTF-8 encoding and Google App Engine with Jinja2 (but Jinja is simply loading test.html). Any thoughts?
Thanks in advance.
It is an issue in the source. The live example that you provided starts with the following bytes (i.e., they appear before <!DOCTYPE html>): 0xE2 0x80 0x8B. This can be seen e.g. using Rex Swain’s HTTP Viewer by selecting “Hex” under “Display Format”. Also note that validating the page with the W3C Markup Validator gives information that suggests that there is something very wrong at the start of the document, especially the message “Line 1, Column 1: Non-space characters found without seeing a doctype first.”
What happens in the validator and in the Chrome tools – as well as e.g. in Firebug – is that the bytes 0xE2 0x80 0x8B are taken as character data, which implicitly starts the body element (since character data cannot validly appear in the head element or before it), implying an empty head element before it.
The solution, of course, is to remove those bytes. Browsers usually ignore them, but you should not rely on such error handling, and the bytes prevent useful HTML validation. How you remove them, and how they got there in the first place, depends on your authoring environment.
Since the page is declared (in HTTP headers) as being UTF-8 encoded, those bytes represent the ZERO WIDTH SPACE (U+200B) character. It has no visible glyph and no width, so you won’t notice anything in the visual presentation even though browsers treat it as being data at the start of the body element. The notation is a character reference for it, presumably used by browser tools to indicate the presence of a normally invisible character.
It is possible that the software that produced the HTML document was meant to insert ZERO WIDTH NO-BREAK SPACE (U+FEFF) instead. That would have been valid, since by a special convention, UTF-8 encoded data may start with this character, also known as byte order mark (BOM) when appearing at the start of data. Using U+200B instead of U+FEFF sounds like an error that software is unlikely to make, but human beings may be mistaken that way if they think of the Unicode names of the characters.
I understand that there is a bug in SharePoint 2013 where the HTML editor adds these characters into your content.
I've been dealing with this for a bit and this is the solution I am using which seems to be working. I added this javascript into a file referenced by my masterpage.
var elements = ["h1","h2","h3","h4","p","strong","label","span","a"];
function targetZWS(){
for (var i = 0; i < elements.length; i++) {
jQuery(elements[i]).each(function() {
removeZWS(this);
});
}
}
function removeZWS(target) {
jQuery(target).html(jQuery(target).html().replace(/\u200B/g,''));
}
/*load functions*/
$(document).ready(function() {
_spBodyOnLoadFunctionNames.push("targetZWS");
});
Links I looked into investigating this:
https://social.msdn.microsoft.com/Forums/sharepoint/en-US/23804eed-8f00-4b07-bc63-7662311a35a4/why-does-sharepoint-put-in-character-code-8203-in-a-richtext-field?forum=sharepointdevelopment
https://social.technet.microsoft.com/Forums/office/en-US/e87a82f0-1ab5-4aa7-bb7f-27403a7f46de/finding-8203-unicode-characters-in-my-source-code?forum=sharepointgeneral
http://www.sharepointpals.com/post/Removing-8203-in-RichTextHTML-field-Sharepoint
Try this script. It works for me
$( document ).ready(function() {
var abc = document.body.innerHTML;
var a = String(abc).replace(/\u200B/g,'');
document.body.innerHTML = a;
});
I have experienced this in a major project I was working on.
The trick is to just:
copy the whole code into notepad.
save it as a text file.
close the file. open it again and copy your code back into your IDE
environment.
and its voilà, it's gone.!
I was able to remove these in Sublime by selecting the characters surrounding it and copy/pasting into Find and Replace.
In my case, symbol "" did not appear in the code editor MS Code and was visible only in the tab Elements Chrome. It helped to delete the tag after which this symbol appeared and the reprint of this tag was handwritten again, apparently this symbol clung to the ctrl+c / ctrl+v while transferring the code.
This “8203;” HTML character is a no width break control.
It can easily find in the Google Chrome Browser inspect elements section. And When you try to remove it from your code, most of the Major IDE not showing to me...(Maybe by my preference).
I found the new text editor Brackets download it and open my code in the editor. It shows the character with red dots. Just remove it check everything is working well.
I found this solution from a blog. What is “8203” HTML character? Why is being injected into my HTML?
Thank You for saving me hours.
I cannot find where it's being injected on my page. I'll investigate it more later, but for now, I just threw this in my page so I can keep working.
$(function(){
$('body').contents().eq(0).each(function(){
if(this.nodeName.toString()=='#text' && this.data.trim().charCodeAt(0)==8203){
$(this).remove();
}
});
});
I built web site using JSF, support multi-language(German and English).
I just set the internationalization values for each language in properties to be use in xhtml. Now I added Arabic language and create lang_ar.properties for Arabic values. It works fine, but the direction is still displayed in the wrong place. Since Arabic is RTL, so when change to Arabic the its should change the direction of the whole content of the page to RTL.
I searched a lot for the solution but I didn't find what I am looking for.
If I use html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ar" dir="rtl">
It will change the direction of the page even if I use German or English. I don't want to use separate page for Arabic content I want use the same pages which take the value from properties files just like English and German, but I need to change the direction when use arabic properties.
If I use the following:
<h:inputText ... dir="#{view.locale.language eq 'ar' ? 'rtl' : 'ltr'}" />
This will works just for the plain text entered in the input field not to the input field itself. So the input field will stay ltr but the text inside will be rtl.
Can any one help me?
Reading mode in Spartan/Edge seems to choose, somehow, which div on the site to display in reading mode. In many pages, it does not find the appropriate div (like bbc.co.uk).
However, on our site, it enables reading mode, but then displays the completely wrong part of the page.
So - how can I tell it to take the right part or at least how to disable it on those pages
You can find information on how to optimize reading view, as well as how to opt-out, here: http://dev.modern.ie/testdrive/demos/readingview/
07/10: Edit to include specific information
Specifically, you may be interested in optimizing your title, body, and image markup to ensure a good reading mode experience.
Title
Your page should include a <title> element in the header. In addition, you should include a <meta title=""> tag that matches your main heading in your content section.
Body text
Ensure your main content does not include a lot of deeply nested elements and that font-sizes and other styles are uniform. Style variations for things like pull quotes, etc. should still be fine.
Images
The first eligible image becomes the dominant image of the article. The dominant image is rendered as the first piece of content and given full column width. All following images are rendered as inline images within the article.
Images are recommended to be wrapped in <figure> tags with no more than two <figcaption> tags.
Opting out
Including this meta tag will disable reading mode in IE11 and, currently, Microsoft Edge.
<meta name="IE_RM_OFF" content="true">
Add the following Tag
<meta name="IE_RM_OFF" content="true">
Check the Below for more details
http://dev.modern.ie/testdrive/demos/readingview/
I recently was browsing a local web design firm's portfolio and found all their sites' code begins as such:
<meta name="keywords" content="a whole bunch of keywords for their site">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
...
I was able to determine that the pages were generated by dreamweaver (at least in part).
Did dreamweaver do this, or did their "developer" just paste the code at the top of the document.
It is my impulse that this is bad practice and it might work incorrectly on some platforms but it got me wondering as to whether or not their may be a reason for this?
That is a terrible practice and invalid HTML. I bet that this would throw IE directly into quirks mode.
But as for your question, either the developer is a script kiddie and shoved the <meta> tag in there with little knowledge of the outcomes, or Dreamweaver did it. I hope it was Dreamweaver...
FYI - just had this issue and Dreamweaver does not put the meta tags in the correct position automatically. Cursor must be placed beforehand into an editable region.
I've been doing some searching around and couldn't find this topic anywhere. My company wants to use an HTML doctype but wordpress outputs XHTML by default. I've seen plugins and I would use these but this site will probably outlive the development of said plugins. Plus it's something else to account for when updating or building new sites.
If I use an XHTML doctype how will HTML5 browsers render it? Will they be backwards-compatible with old doctypes?
Edit 1: It is actually recomended that in order to make the transition to HTML5 easier that you try to follow the XHTML structure when writing any HTML.
There will be additional options and types with XHTML in HTML5 but a lot of it is based on the structure in which you are writing your HTML. The X simply means that it is moving to more of an XML base.
To go along with Kayla's input, you will want to make sure that all tags are being closed:
<br/> Instead of: <br>
You will also want to make sure to put quotations around any parameters:
Instead of: <a href=value></a>
Browsers have been slowly adopting the XHTML structure. This might mean that HTML that is formatted without end tags/etc might look a little different in IE 6 than in newer brower versions. Hope that helps!
It is not recommended to use the XHTML 1.0 or 1.1 doctypes for your HTML5 pages, one because its unnecessary and two your markup won't validate when you use the newer tags. Here is a quick guide on using XML syntax in HTML5 a.k.a. XHTML5.
Update: As noted bellow checkout the W3C Specification.
I am not sure what you are asking. What do plugins have to do with DTD?
Yes, any browsers that supports HTML5 is backwards compatible with (X)HTML, you can mix and match all you want. And basically as long as you are writing tags like:
<div>Hi</div> or <p>There</p>
instead of
<DIV>Hi</DIV> or <P>There</P>
the rest is just semantics.
HTML5 began life specifically because browsers manufacturers wanted to make sure that changes they introduced were backward compatible with existing web pages, in contrast to the now defunct XHTML 2, which was shaping up to be non-backward compatible.
So yes, your XHTML doctype will work just fine in HTML5 browsers.
As far as I know all modern browsers that are adding HTML 5 support will continue to support HTML 4 and XHTML for the foreseeable future so you should be fine.
If you're using Wordpress though stick with XHTML. It'll be supported for a long time to come in all browsers and most Wordpress plugins are designed to output XHTML.