meta http-equiv=content-language is obsolete multilingual site - meta-tags

I read the question at What is the HTML5 alternative to the obsolete meta http-equiv=content-language. but it does not answer my question. So, if I have a simple site of 4 index pages, each one in a different language, how do I specify the language? "Consider specifying the language on the root element instead" means to specify at every div?
<META HTTP-EQUIV="Content-Language" content="EN">

So, it mus look like that:
<html lang="en-GB">

According to the W3C recommendation you should declare the primary
language for each Web page with the lang attribute inside the
tag, like this:
<html lang="en">
...
</html>
In HTML5 you can specify a language to any element. This way you can have multiple languages on the same page. And over ride the primary language for a specific element. Read More Here
And here are all of the supported language codes:
Code Language (region)
af Afrikaans
ar-ae Arabic (U.A.E.)
ar-bh Arabic (Bahrain)
ar-dz Arabic (Algeria)
ar-eg Arabic (Egypt)
ar-iq Arabic (Iraq)
ar-jo Arabic (Jordan)
ar-kw Arabic (Kuwait)
ar-lb Arabic (Lebanon)
ar-ly Arabic (Libya)
ar-ma Arabic (Morocco)
ar-om Arabic (Oman)
ar-qa Arabic (Qatar)
ar-sa Arabic (Saudi Arabia)
ar-sy Arabic (Syria)
ar-tn Arabic (Tunisia)
ar-ye Arabic (Yemen)
be Belarusian
bg Bulgarian
ca Catalan
cs Czech
da Danish
de German (Standard)
de-at German (Austria)
de-ch German (Switzerland)
de-li German (Liechtenstein)
de-lu German (Luxembourg)
el Greek
en English
en-au English (Australia)
en-bz English (Belize)
en-ca English (Canada)
en-gb English (United Kingdom)
en-ie English (Ireland)
en-jm English (Jamaica)
en-nz English (New Zealand)
en-tt English (Trinidad)
en-us English (United States)
en-za English (South Africa)
es Spanish (Spain)
es-ar Spanish (Argentina)
es-bo Spanish (Bolivia)
es-cl Spanish (Chile)
es-co Spanish (Colombia)
es-cr Spanish (Costa Rica)
es-do Spanish (Dominican Republic)
es-ec Spanish (Ecuador)
es-gt Spanish (Guatemala)
es-hn Spanish (Honduras)
es-mx Spanish (Mexico)
es-ni Spanish (Nicaragua)
es-pa Spanish (Panama)
es-pe Spanish (Peru)
es-pr Spanish (Puerto Rico)
es-py Spanish (Paraguay)
es-sv Spanish (El Salvador)
es-uy Spanish (Uruguay)
es-ve Spanish (Venezuela)
et Estonian
eu Basque
fa Farsi
fi Finnish
fo Faeroese
fr French (Standard)
fr-be French (Belgium)
fr-ca French (Canada)
fr-ch French (Switzerland)
fr-lu French (Luxembourg)
ga Irish
gd Gaelic (Scotland)
he Hebrew
hi Hindi
hr Croatian
hu Hungarian
id Indonesian
is Icelandic
it Italian (Standard)
it-ch Italian (Switzerland)
ja Japanese
ji Yiddish
ko Korean
ko Korean (Johab)
ku Kurdish
lt Lithuanian
lv Latvian
mk Macedonian (FYROM)
ml Malayalam
ms Malaysian
mt Maltese
nl Dutch (Standard)
nl-be Dutch (Belgium)
nb Norwegian (Bokmål)
nn Norwegian (Nynorsk)
no Norwegian
pa Punjabi
pl Polish
pt Portuguese (Portugal)
pt-br Portuguese (Brazil)
rm Rhaeto-Romanic
ro Romanian
ro-md Romanian (Republic of Moldova)
ru Russian
ru-md Russian (Republic of Moldova)
sb Sorbian
sk Slovak
sl Slovenian
sq Albanian
sr Serbian
sv Swedish
sv-fi Swedish (Finland)
th Thai
tn Tswana
tr Turkish
ts Tsonga
uk Ukrainian
ur Urdu
ve Venda
vi Vietnamese
xh Xhosa
zh-cn Chinese (PRC)
zh-hk Chinese (Hong Kong)
zh-sg Chinese (Singapore)
zh-tw Chinese (Taiwan)
zu Zulu

Related

Sentence segmentation with trailing whitespaces in stanza (stanford corenlp)

Using the library Stanza for sentence segmentation :
import stanza
stanza.download('en')
snlp = stanza.Pipeline(lang="en",processors='tokenize')
doc = snlp(text)
doc_sents = [sentence.text for sentence in doc.sentences]
Output:
["Arthur's Magazine (1844–1846) was an American literary periodical published in Philadelphia in the 19th century.",
'Edited by T.S. Arthur, it featured work by Edgar A. Poe, J.H. Ingraham, Sarah Josepha Hale, Thomas G. Spear, and others.',
'In May 1846 it was merged into "Godey\'s Lady\'s Book".',
"First for Women is a woman's magazine published by Bauer Media Group in the USA.",
'The magazine was started in 1989.',
'It is based in Englewood Cliffs, New Jersey.',
'In 2011 the circulation of the magazine was 1,310,696 copies.']
But, Then I'm losing the trailing whitespace, is there a way to "keep" them, similar to the behavior in spacy using
[sent.text_with_ws for sent in doc.sents]
Any workaround will help as well. I need to keep the original whitespaces to handle indexes that written originally on the full text
["Arthur's Magazine (1844–1846) was an American literary periodical published in Philadelphia in the 19th century. ",
'Edited by T.S. Arthur, it featured work by Edgar A. Poe, J.H. Ingraham, Sarah Josepha Hale, Thomas G. Spear, and others. ',
'In May 1846 it was merged into "Godey\'s Lady\'s Book". ',
"First for Women is a woman's magazine published by Bauer Media Group in the USA. ",
'The magazine was started in 1989. ',
'It is based in Englewood Cliffs, New Jersey. ',
'In 2011 the circulation of the magazine was 1,310,696 copies.']

I get data in English when I send https://maps.googleapis.com/maps/api/distancematrix/json?units=metric&mode=driving&language=iw

return me text in duration in English when I ask in Hebrew
https://maps.googleapis.com/maps/api/distancematrix/json?units=metric&origins=31.923653790756987,35.04286816346929&destinations=31.905535,35.017173999999997&mode=driving&language=iw

How can I regroup data in Excel?

I'm trying to regroup data in Excel from Statistics Canada census. I downloaded the CSV file from here: https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/details/download-telecharger/comp/page_dl-tc.cfm?Lang=E
There are over 1,000,000 row but the problem is that instead of seeing the number of the dissemination area and then the data, there are like over 1,000 lines with the same dissemination area number and then the data.
E.g.
DA 24660001 : language: English
DA 24660001 : language: French
DA 24660001 : language: Spanish
DA 24660001 : language: German
DA 24660001 : language: Italian
DA 24660001 : language: Russian
Etc. etc. like 1000 times
I'd like to find a way to regroup those so I can get the
DA 24660001 on one side and then the languages on the other side, so I can finally get to DA 24660002 and DA 24660003 without having to browser through thousands of lines...
Can it be done?
Thanks

German umlaut in search query string (example Yelp)

how do I replace a german umlaut in search strings (example Yelp)?
Unfortunately, not all sites treat "ü" as "ue" the same way
Use case: search query input for a web crawler is not recognized as intended.
Example:
Original search string: Asiatische Fusionsküche
Modified search string: Asiatische Fusionskueche
Stuttgart, Baden-Wuerttemberg
site: yelp.de
----> does return different results! (or better: no results)
I tried already:
1) ü in UTF
--> \u00fc
--> U+00FC
Was used like this:
String: "Asiatische Fusionsk\u00fcche"
String: "Asiatische FusionskU+00FCche"
2) ü in HTML
ü
String: "Asiatische Fusionsküche"
Any advice from experts?
Thanks
J.

NumberFormat.parse() does not work for FRANCE Locale space as thousand seperator

I have written below Java code to see how locales behave with numbers. I am facing with FRENCH style.
double n = 123456789.123;
System.out.println("US "+ NumberFormat.getNumberInstance(Locale.US).format(n)); //###,###.###
System.out.println("FRENCH "+ NumberFormat.getNumberInstance(Locale.FRENCH).format(n)); // # ###,##
System.out.println("GERMAN "+ NumberFormat.getNumberInstance(Locale.GERMAN).format(n)); // ###.###,##
System.out.println(NumberFormat.getNumberInstance(Locale.US).parse("123,451.23"));
System.out.println(NumberFormat.getNumberInstance(Locale.GERMANY).parse("123.451,23"));
System.out.println(NumberFormat.getNumberInstance(Locale.FRANCE).parse("123 451,23"));
OUTPUT
US 123,456,789.123
FRENCH 123 456 789,123
GERMAN 123.456.789,123
123451.23
123451.23
123
As you can see space is used as thousands separator for FRENCH locale. But when I tried to generate number "123 451,23" it does not recognize space as thousands separator.
Is this the expected behavior ?
EDIT:
As a workaround I replaced space with ".". So number becomes a GERMANY format. And then convert it using that locale.
input = input.replace(" ", ".");
// Now "123 451,23" is "123.451,23" So which is same as german
System.out.println(NumberFormat.getNumberInstance(Locale.GERMANY).parse(input));
OUTPUT
123451.23
This is a known issue in old JDKs. Upgrade it or you will this issue

Resources