Nwjs Localization, shows Japanese characters as boxes? - node.js

I am using Node.js, on SUSE. When system language is Japanese (my locale is ja_JP.UTF-8), node shows
Japnese characters as square boxes (For links to Japanese website)
Even tried i18n localization, with properties files for Japenese language.
Node displays all Japanese fonts as boxes. Window.Navigator.language does show "ja".
And things works great when language is English or French.
I tried different fonts but observed the similar issue.

re-searching for multilingual IDE (integrated development environment) i am wanting to build, and it would seem Japanese has a few competing formats, which are not utf-8, the kanji / ganji (spelling) has more letters / symbols than other written languages. and a few sites noted competing formats in japan other that utf notations.
http://icu-project.org/ gets data from http://cldr.unicode.org/ from what i understand a lot of software unix / linux / microsoft other big companies use the data. looking at chrome and nodejs, i saw notations of Japanese in a license file. but beyond that i did not find much more info. the cldr site seems iffy on amount of data for japan compared to other languages when skim reading over handful of files to see what data was in the cldr core files.
with above said, make sure you are setting the html header utf-8 correctly, see if a lang tag works
< span lang="jp">some text here</span>
i do not know were i read it, it was not kanjo (spelling) but i want to say ganji? (spelling) or rather the "vertical" reading not left to right, or right to left, but vertical reading (from top -> down). and to get things to display correctly it was being placed into "square boxes".

Related

Internationalization Web Number-Symbols

do I need to use another number-symbols when I want my webpage to be accessible in other countries? According to Microsoft there are different shape of numbers: https://learn.microsoft.com/en-us/globalization/locale/number-formatting#:~:text=formatting%20for%20details.-,The%20character%20used%20as%20the%20thousands%20separator,thousands%20separator%20is%20a%20space.
I have been searching since a few days to get a clear answer but I cant find some. Also, on most international websites/apps I only ever see the digits 0,1,2,3,4,5,6,7,8,9 although the digits for the language actually look different. That unsettles me. I feel like many websites/apps just ignore this fact. Can anybody help me further? Also do I need to know how to activate foreign symbols in html?
I do not know for sure what language you are translating/typing in HTML. But here is an example of what you can use as a guide to certain scripts in Arabic: https://sites.psu.edu/symbolcodes/languages/mideast/arabic/arabicchart/
You may also need to use a converter. For example, I type Chinese on my website by typing the characters into a character to unicode converter. Then I copy and paste the unicode to my HTML text.

Web pages served by local IIS showing black diamonds with question marks

I'm having an issue in a .NET application where pages served by local IIS display random characters (mostly black diamonds with white question marks in them). This happens in Chrome, Firefox, and Edge. IE displays the pages correctly for some reason.
The same pages in production and in lower pre-prod environments work in all my browsers. This is strictly a local issue.
Here's what I've tried:
Deleted code and re-cloned (also tried switching branches)
Disabled all browser extensions
Ran in incognito mode
Rebooted (you never know)
Deleted temporary ASP.NET files
Looked for corrupt fonts on machine but didn't find any
Other Information:
Running IIS 10.0.17134.1
.NET MVC application with Knockout
I realize there are several other posts regarding black diamonds with question marks, but none of them seem to address my issue.
Please let me know if you need more information.
Thanks for your help!
You are in luck. The explicit purpose of � is to indicate that character encodings are being misused. When users see that, they'll know that we've messed up and lost some of their text data, and we'll know that, at one or more points, our processing and/or configuration is wrong.
(Fonts are not at issue [unless there as no font available to render �]. When there is no font available for a character, it's usually rendered as a white-filled rectangle.)
Character encoding fundamentals are simple: use a sufficient character set (say Unicode), pick an appropriate encoding (say UTF-8), encode text with it to obtain bytes, tell every program and person that gets the bytes that they represent text and which encoding is used. The encoding might be understood from a standard, convention, or specification.
Your editor does the actual encoding.
If the file is part of a project or similar system, a project file might store the intended encoding for all or each text file in the project. If your editor is an IDE, it should understand how the project does that.
Your compiler needs the know the encoding of each text file you give it. A project system would communicate what it knows.
HTML provides an optional way to communicate the encoding. Example: <meta charset="utf-8">. An HTML-aware editor should not allow this indicator to be different than the encoding it uses when saving the file. An HTML-aware editor might discover this indicator when opening the file and use the specified encoding to read the file.
HTTP uses another optional way: Content-Type response header. The web server emits this either statically or in conjunction with code that it runs, such as ASP.NET.
Web browsers use the HTTP way if given.
XHR (AJAX, etc) uses HTTP along with JavaScript processing. If needed the JavaScript processing should apply the HTTP and HTML rules, as appropriate. Note: If the content is JSON, the current RFC requires the encoding to be UTF-8.
No one or thing should have to guess.
Diagnostics
Which character encoding did you intend to use? This century, UTF-8 is so much the norm that if you choose to use a different one, you should have a good reason and document it (for others and your future self).
Compare the bytes in the file with the text you expect it to represent. Does it use the entended encoding? Use an editor or tool that shows bytes in hex.
As suggested by #snakecharmerb, what does the server send? Use a web browser's F12 network tab.
What does the HTTP response header say, if anything?
What does the HTML meta tag say, if anything?
What is the HTML doctype, if any?

Creating a Print Monitor / Print Handler

I'm having trouble getting started with building a Print Monitor / Print Handler for Windows using Visual Studio 2012 Ultimate with WDK 8. Basically, this is what I am trying to accomplish:
Create a print monitor (something an application can print to) that will generate a file with the content that should be printed (like the default XPS printer or a PDF printer), and then invokes the print handler
Create a print handler that will parse the generated file and do certain actions with it (check to see if certain text is present, upload the file online, etc)
I feel like the print handler part should not be too hard, but starting with the print monitor is what I'm stuck at. What would I do within VS12? I see options for "Printer Driver V4", "Printer Driver V4 Property Bag", and "Printer XPS Render Filter". Should I use one of those templates, and, if so, what would I do within them? Anything pointing me in the right direction would be appreciated!
EDIT:
Just some more clarification - I only need the text from the print output, but I've read from various sources that getting text-only output leads to no output at all from sources like Firefox, etc since they print text as glyphs.
I will be using the print handler to parse the text for keywords and then upload that information to a web server in a specific format. The print monitor just needs to capture and save the text information from whatever application is printing.
As you pointed out in your comments, some applications such as Firefox print using glyph indices instead of characters. In fact, quite a few do and it's becoming more common. What you need is a print driver. The good news is Microsoft has already written it for you and provided you with sample source code in the WDK. Start by reviewing this to understand your options. The Unidriver is perhaps a little simpler but the Postscript driver has the advantage of generating output that can readily be transformed to PDF or other formats that retain text information (as opposed to raster page images that lose all text information). As far as I'm concerned, don't even think about XPS; it's just an all around disaster.
To handle glyph indices, what you'll need to do is add code to the driver's OEMTextOut function that uses the font's cmap tables to translate glyph indices back into character codes. I'm unaware of any public domain libraries that parse font files, so you'll likely have to write your own code to do this. (Hint: If you support only OpenType/TrueType fonts, you'll cover 99% of all printing applications).
Getting the Microsoft sample code to build, install and run is mostly straightforward, but if you're new to the WDK and installing print drivers, plan on spending a week or more on just that. The glyph index translation part is far more complex and you should plan on spending a lot more time on that.

Document format for writing homeworks in Vim

I'm a college student majoring in CS, and that means I spend a lot of time poking around in vim. I'm still a complete noob, but I love editing text in the terminal--it's more fun than writing documents has any right to be.
However, I'm curious if there's a basic, low-frills document format I can use (from within vim) to typeset my homework assignments. I'm familiar enough with LaTeX, and if it were possible I'd use it for everything, but it has two main disadvantages:
It takes a long time to write an entire LaTeX document, and
LaTeX doesn't handle code very well.
With that in mind, I'd like to know if some format exists which addresses both these needs and is still easy to hash out quickly from a terminal-based text editor. I use vim for literally everything else I write, so the need to keep LibreOffice Writer around just for homeworks seems a bit overbearing to me.
Thanks!
I would tend towards something light like Markdown, but the needed capabilities depend on what requirements you have for the output (formatting and styling).
I find the AsciiDoc project quite interesting. From their website:
AsciiDoc is a text document format for writing notes, documentation,
articles, books, ebooks, slideshows, web pages, blogs and UNIX man
pages. AsciiDoc files can be translated to many formats including
HTML, PDF, EPUB, man page. AsciiDoc is highly configurable: both the
AsciiDoc source file syntax and the backend output markups (which can
be almost any type of SGML/XML markup) can be customized and extended
by the user.
It even comes with a Vim syntax.

How to convert old Japanese text encodings?

I run MacBook Pro under 10.6.7, and I am competent in Unix. I have old Japanese text files in various encodings (EUC, SJS, New-JIS) that I can no longer read or display. The old program jconv.c does not help, since it only converts among these encodings. Is there a way to convert them (or any one of them) to the current "normal" Japanese text that can be seen in TextEdit, etc.? I have set the Terminal preferences to SJS and EUC (can't find NewJIS), among others, including UTF-8. Eleanor
I recommend you look into iconv for doing such conversions.
nkf is a Linux command line program which can meet your requirement.
I am sure it is available at Debian. So you can download sorce code from the Net.

Resources