why python shell is not displaying Ethiopic characters? [duplicate] - python-3.x

We have a project in Team Foundation Server (TFS) that has a non-English character (š) in it. When trying to script a few build-related things we've stumbled upon a problem - we can't pass the š letter to the command-line tools. The command prompt or what not else messes it up, and the tf.exe utility can't find the specified project.
I've tried different formats for the .bat file (ANSI, UTF-8 with and without BOM) as well as scripting it in JavaScript (which is Unicode inherently) - but no luck. How do I execute a program and pass it a Unicode command line?

Try:
chcp 65001
which will change the code page to UTF-8. Also, you need to use Lucida console fonts.

My background: I use Unicode input/output in a console for years (and do it a lot daily. Moreover, I develop support tools for exactly this task). There are very few problems, as far as you understand the following facts/limitations:
CMD and “console” are unrelated factors. CMD.exe is a just one of programs which are ready to “work inside” a console (“console applications”).
AFAIK, CMD has perfect support for Unicode; you can enter/output all Unicode chars when any codepage is active.
Windows’ console has A LOT of support for Unicode — but it is not perfect (just “good enough”; see below).
chcp 65001 is very dangerous. Unless a program was specially designed to work around defects in the Windows’ API (or uses a C runtime library which has these workarounds), it would not work reliably. Win8 fixes ½ of these problems with cp65001, but the rest is still applicable to Win10.
I work in cp1252. As I already said: To input/output Unicode in a console, one does not need to set the codepage.
The details
To read/write Unicode to a console, an application (or its C runtime library) should be smart enough to use not File-I/O API, but Console-I/O API. (For an example, see how Python does it.)
Likewise, to read Unicode command-line arguments, an application (or its C runtime library) should be smart enough to use the corresponding API.
Console font rendering supports only Unicode characters in BMP (in other words: below U+10000). Only simple text rendering is supported (so European — and some East Asian — languages should work fine — as far as one uses precomposed forms). [There is a minor fine print here for East Asian and for characters U+0000, U+0001, U+30FB.]
Practical considerations
The defaults on Window are not very helpful. For best experience, one should tune up 3 pieces of configuration:
For output: a comprehensive console font. For best results, I recommend my builds. (The installation instructions are present there — and also listed in other answers on this page.)
For input: a capable keyboard layout. For best results, I recommend my layouts.
For input: allow HEX input of Unicode.
One more gotcha with “Pasting” into a console application (very technical):
HEX input delivers a character on KeyUp of Alt; all the other ways to deliver a character happen on KeyDown; so many applications are not ready to see a character on KeyUp. (Only applicable to applications using Console-I/O API.)
Conclusion: many application would not react on HEX input events.
Moreover, what happens with a “Pasted” character depends on the current keyboard layout: if the character can be typed without using prefix keys (but with arbitrary complicated combination of modifiers, as in Ctrl-Alt-AltGr-Kana-Shift-Gray*) then it is delivered on an emulated keypress. This is what any application expects — so pasting anything which contains only such characters is fine.
However, the “other” characters are delivered by emulating HEX input.
Conclusion: unless your keyboard layout supports input of A LOT of characters without prefix keys, some buggy applications may skip characters when you Paste via Console’s UI: Alt-Space E P. (This is why I recommend using my keyboard layouts!)
One should also keep in mind that the “alternative, ‘more capable’ consoles” for Windows are not consoles at all. They do not support Console-I/O APIs, so the programs which rely on these APIs to work would not function. (The programs which use only “File-I/O APIs to the console filehandles” would work fine, though.)
One example of such non-console is a part of MicroSoft’s Powershell. I do not use it; to experiment, press and release WinKey, then type powershell.
(On the other hand, there are programs such as ConEmu or ANSICON which try to do more: they “attempt” to intercept Console-I/O APIs to make “true console applications” work too. This definitely works for toy example programs; in real life, this may or may not solve your particular problems. Experiment.)
Summary
set font, keyboard layout (and optionally, allow HEX input).
use only programs which go through Console-I/O APIs, and accept Unicode command-line arguments. For example, any cygwin-compiled program should be fine. As I already said, CMD is fine too.
UPD: Initially, for a bug in cp65001, I was mixing up Kernel and CRTL layers (UPD²: and Windows user-mode API!). Also: Win8 fixes one half of this bug; I clarified the section about “better console” application, and added a reference to how Python does it.

I had same problem (I'm from the Czech Republic). I have an English installation of Windows, and I have to work with files on a shared drive. Paths to the files include Czech-specific characters.
The solution that works for me is:
In the batch file, change the charset page
My batch file:
chcp 1250
copy "O:\VEŘEJNÉ\ŽŽŽŽŽŽ\Ž.xls" c:\temp
The batch file has to be saved in CP 1250.
Note that the console will not show characters correctly, but it will understand them...

Check the language for non-Unicode programs. If you have problems with Russian in the Windows console, then you should set Russian here:

It's is quite difficult to change the default Codepage of Windows console. When you search the web you find different proposals, however some of them may break your Windows entirely, i.e. your PC does not boot anymore.
The most secure solution is this one:
Go to your Registry key HKEY_CURRENT_USER\Software\Microsoft\Command Processor and add String value Autorun = chcp 65001.
Or you can use this small Batch-Script for the most common code pages.
#ECHO off
SET ROOT_KEY="HKEY_CURRENT_USER"
FOR /f "skip=2 tokens=3" %%i in ('reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v OEMCP') do set OEMCP=%%i
ECHO System default values:
ECHO.
ECHO ...............................................
ECHO Select Codepage
ECHO ...............................................
ECHO.
ECHO 1 - CP1252
ECHO 2 - UTF-8
ECHO 3 - CP850
ECHO 4 - ISO-8859-1
ECHO 5 - ISO-8859-15
ECHO 6 - US-ASCII
ECHO.
ECHO 9 - Reset to System Default (CP%OEMCP%)
ECHO 0 - EXIT
ECHO.
SET /P CP="Select a Codepage: "
if %CP%==1 (
echo Set default Codepage to CP1252
reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "#chcp 1252>nul" /f
) else if %CP%==2 (
echo Set default Codepage to UTF-8
reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "#chcp 65001>nul" /f
) else if %CP%==3 (
echo Set default Codepage to CP850
reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "#chcp 850>nul" /f
) else if %CP%==4 (
echo Set default Codepage to ISO-8859-1
add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "#chcp 28591>nul" /f
) else if %CP%==5 (
echo Set default Codepage to ISO-8859-15
add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "#chcp 28605>nul" /f
) else if %CP%==6 (
echo Set default Codepage to ASCII
add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "#chcp 20127>nul" /f
) else if %CP%==9 (
echo Reset Codepage to System Default
reg delete "%ROOT_KEY%\Software\Microsoft\Command Processor" /v AutoRun /f
) else if %CP%==0 (
echo Bye
) else (
echo Invalid choice
pause
)
Using #chcp 65001>nul instead of chcp 65001 suppresses the output "Active code page: 65001" you would get every time you start a new command line windows.
A full list of all available number you can get from Code Page Identifiers
Note, the settings will apply only for the current user. If you like to set it for all users, replace line SET ROOT_KEY="HKEY_CURRENT_USER" by SET ROOT_KEY="HKEY_LOCAL_MACHINE"

Actually, the trick is that the command prompt actually understands these non-english characters, just can't display them correctly.
When I enter a path in the command prompt that contains some non-english chracters it is displayed as "?? ?????? ?????". When you submit your command (cd "??? ?????? ?????" in my case), everything is working as expected.

On a Windows 10 x64 machine, I made the command prompt display non-English characters by:
Open an elevated command prompt (run CMD.EXE as administrator). Query your registry for available TrueType fonts to the console by:
REG query "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont"
You'll see an output like:
0 REG_SZ Lucida Console
00 REG_SZ Consolas
936 REG_SZ *新宋体
932 REG_SZ *MS ゴシック
Now we need to add a TrueType font that supports the characters you need like Courier New. We do this by adding zeros to the string name, so in this case the next one would be "000":
REG ADD "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont" /v 000 /t REG_SZ /d "Courier New"
Now we implement UTF-8 support:
REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 65001 /f
Set default font to "Courier New":
REG ADD HKCU\Console /v FaceName /t REG_SZ /d "Courier New" /f
Set font size to 20:
REG ADD HKCU\Console /v FontSize /t REG_DWORD /d 20 /f
Enable quick edit if you like:
REG ADD HKCU\Console /v QuickEdit /t REG_DWORD /d 1 /f

I found this method as useful in new versions of Windows 10:
Turn on this feature: "Beta: Use Unicode UTF-8 for worldwide language support"
Control panel -> Regional settings -> Administrative tab-> Change
system locale...

One really simple option is to install a Windows bash shell such as MinGW and use that:
There is a little bit of a learning curve as you will need to use Unix command line functionality, but you will love the power of it and you can set the console character set to UTF-8.
Of course you also get all the usual *nix goodies like grep, find, less, etc.

Starting June 2019, with Windows 10, you won't have to change the codepage.
See "Introducing Windows Terminal" (from Kayla Cinnamon) and the Microsoft/Terminal.
Through the use of the Consolas font, partial Unicode support will be provided.
As documented in Microsoft/Terminal issue 387:
There are 87,887 ideographs currently in Unicode. You need all of them too?
We need a boundary, and characters beyond that boundary should be handled by font fallback / font linking / whatever.
What Consolas should cover:
Characters that used as symbols that used by modern OSS programs in CLI.
These characters should follow Consolas' design and metrics, and properly aligned with existing Consolas characters.
What Consolas should NOT cover:
Characters and punctuation of scripts that beyond Latin, Greek and Cyrillic, especially characters need complex shaping (like Arabic).
These characters should be handled with font fallback.

As I haven't seen any full answers for Python 2.7, I'll outline the two important steps and an optional step that is quite useful.
You need a font with Unicode support. Windows comes with Lucida Console which may be selected by right-clicking the title bar of command prompt and clicking the Defaults option. This also gives access to colours. Note that you can also change settings for command windows invoked in certain ways (e.g, open here, Visual Studio) by choosing Properties instead.
You need to set the code page to cp65001, which appears to be Microsoft's attempt to offer UTF-7 and UTF-8 support to command prompt. Do this by running chcp 65001 in command prompt. Once set, it remains this way until the window is closed. You'll need to redo this every time you launch cmd.exe.
For a more permanent solution, refer to this answer on Super User. In short, create a REG_SZ (String) entry using regedit at HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor and name it AutoRun. Change the value of it to chcp 65001. If you don't want to see the output message from the command, use #chcp 65001>nul instead.
Some programs have trouble interacting with this encoding, MinGW being a notable one that fails while compiling with a nonsensical error message. Nonetheless, this works very well and doesn't cause bugs with the majority of programs.

This problem is quite annoying. I usually have Chinese character in my filename and file content. Please note that I am using Windows 10, here is my solution:
To display the file name, such as dir or ls if you installed Ubuntu bash on Windows 10
Set the region to support non-utf 8 character.
After that, console's font will be changed to the font of that locale, and it also changes the encoding of the console.
After you have done previous steps, in order to display the file content of a UTF-8 file using command line tool
Change the page to utf-8 by chcp 65001
Change to the font that supports utf-8, such as Lucida Console
Use type command to peek the file content, or cat if you installed Ubuntu bash on Windows 10
Please note that, after setting the encoding of the console to utf-8, I can't type Chinese character in the cmd using Chinese input method.
The laziest solution: Just use a console emulator such as http://cmder.net/

For a similar problem, (my problem was to show UTF-8 characters from MySQL on a command prompt),
I solved it like this:
I changed the font of command prompt to Lucida Console. (This step must be irrelevant for your situation. It has to do only with what you see on the screen and not with what is really the character).
I changed the codepage to Windows-1253. You do this on the command prompt by "chcp 1253". It worked for my case where I wanted to see UTF-8.

A quick decision for .bat files if you computer displays your path/file name correct when you typing it in DOS-window:
copy con temp.txt [press Enter]
Type the path/file name [press Enter]
Press Ctrl-Z [press Enter]
This way you create a .txt file - temp.txt. Open it in Notepad, copy the text (don't worry it will look unreadable) and paste it in your .bat file.
Executing the .bat created this way in DOS-window worked for mе (Cyrillic, Bulgarian).

I see several answers here, but they don't seem to address the question - the user wants to get Unicode input from the command line.
Windows uses UTF-16 for encoding in two byte strings, so you need to get these from the OS in your program. There are two ways to do this -
1) Microsoft has an extension that allows main to take a wide character array:
int wmain(int argc, wchar_t *argv[]);
https://msdn.microsoft.com/en-us/library/6wd819wh.aspx
2) Call the windows api to get the unicode version of the command line
wchar_t win_argv = (wchar_t)CommandLineToArgvW(GetCommandLineW(), &nargs);
https://learn.microsoft.com/en-us/windows/desktop/api/shellapi/nf-shellapi-commandlinetoargvw
Read this: http://utf8everywhere.org
for detailed info, particularly if you are supporting other operating systems.

A better cleaner thing to do: Just install the available, free, Microsoft Japanese language pack. (Other oriental language packs will also work, but I have tested the Japanese one.)
This gives you the fonts with the larger sets of glyphs, makes them the default behavior, changes the various Windows tools like cmd, WordPad, etc.

Changing code page to 1252 is working for me. The problem for me is the symbol double doller § is converting to another symbol by DOS on Windows Server 2008.
I have used CHCP 1252 and a cap before it in my BCP statement ^§.

I got around a similar issue deleting Unicode-named files by referring to them in the batch file by their short (8 dot 3) names.
The short names can be viewed by doing dir /x. Obviously, this only works with Unicode file names that are already known.

Mind for those using WSL who also do not want the extra packages from Cygwin or Git, wsltty is available which provides just the terminal with UTF-8 support

Related

Special character and encoding handling while running azure CLI commands in PowerShell [duplicate]

I've been forcing the usage of chcp 65001 in Command Prompt and Windows Powershell for some time now, but judging by Q&A posts on SO and several other communities it seems like a dangerous and inefficient solution. Does Microsoft provide an improved / complete alternative to chcp 65001 that can be saved permanently without manual alteration of the Registry? And if there isn't, is there a publicly announced timeline or agenda to support UTF-8 in the Windows CLI in the future?
Personally I've been using chcp 949 for Korean Character Support, but the weird display of the backslash \ and incorrect/incomprehensible displays in several applications (like Neovim), as well as characters that aren't Korean not being supported via 949 seems to become more of a problem lately.
Note:
This answer shows how to switch the character encoding in the Windows console to
(BOM-less) UTF-8 (code page 65001), so that shells such as cmd.exe and PowerShell properly encode and decode characters (text) when communicating with external (console) programs with full Unicode support, and in cmd.exe also for file I/O.[1]
If, by contrast, your concern is about the separate aspect of the limitations of Unicode character rendering in console windows, see the middle and bottom sections of this answer, where alternative console (terminal) applications are discussed too.
Does Microsoft provide an improved / complete alternative to chcp 65001 that can be saved permanently without manual alteration of the Registry?
As of (at least) Windows 10, version 1903, you have the option to set the system locale (language for non-Unicode programs) to UTF-8, but the feature is still in beta as of this writing.
To activate it:
Run intl.cpl (which opens the regional settings in Control Panel)
Follow the instructions in the screen shot below.
This sets both the system's active OEM and the ANSI code page to 65001, the UTF-8 code page, which therefore (a) makes all future console windows, which use the OEM code page, default to UTF-8 (as if chcp 65001 had been executed in a cmd.exe window) and (b) also makes legacy, non-Unicode GUI-subsystem applications, which (among others) use the ANSI code page, use UTF-8.
Caveats:
If you're using Windows PowerShell, this will also make Get-Content and Set-Content and other contexts where Windows PowerShell default so the system's active ANSI code page, notably reading source code from BOM-less files, default to UTF-8 (which PowerShell Core (v6+) always does). This means that, in the absence of an -Encoding argument, BOM-less files that are ANSI-encoded (which is historically common) will then be misread, and files created with Set-Content will be UTF-8 rather than ANSI-encoded.
[Fixed in PowerShell 7.1] Up to at least PowerShell 7.0, a bug in the underlying .NET version (.NET Core 3.1) causes follow-on bugs in PowerShell: a UTF-8 BOM is unexpectedly prepended to data sent to external processes via stdin (irrespective of what you set $OutputEncoding to), which notably breaks Start-Job - see this GitHub issue.
Not all fonts speak Unicode, so pick a TT (TrueType) font, but even they usually support only a subset of all characters, so you may have to experiment with specific fonts to see if all characters you care about are represented - see this answer for details, which also discusses alternative console (terminal) applications that have better Unicode rendering support.
As eryksun points out, legacy console applications that do not "speak" UTF-8 will be limited to ASCII-only input and will produce incorrect output when trying to output characters outside the (7-bit) ASCII range. (In the obsolescent Windows 7 and below, programs may even crash).
If running legacy console applications is important to you, see eryksun's recommendations in the comments.
However, for Windows PowerShell, that is not enough:
You must additionally set the $OutputEncoding preference variable to UTF-8 as well: $OutputEncoding = [System.Text.UTF8Encoding]::new()[2]; it's simplest to add that command to your $PROFILE (current user only) or $PROFILE.AllUsersCurrentHost (all users) file.
Fortunately, this is no longer necessary in PowerShell Core, which internally consistently defaults to BOM-less UTF-8.
If setting the system locale to UTF-8 is not an option in your environment, use startup commands instead:
Note: The caveat re legacy console applications mentioned above equally applies here. If running legacy console applications is important to you, see eryksun's recommendations in the comments.
For PowerShell (both editions), add the following line to your $PROFILE (current user only) or $PROFILE.AllUsersCurrentHost (all users) file, which is the equivalent of chcp 65001, supplemented with setting preference variable $OutputEncoding to instruct PowerShell to send data to external programs via the pipeline in UTF-8:
Note that running chcp 65001 from inside a PowerShell session is not effective, because .NET caches the console's output encoding on startup and is unaware of later changes made with chcp; additionally, as stated, Windows PowerShell requires $OutputEncoding to be set - see this answer for details.
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
For example, here's a quick-and-dirty approach to add this line to $PROFILE programmatically:
'$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding' + [Environment]::Newline + (Get-Content -Raw $PROFILE -ErrorAction SilentlyContinue) | Set-Content -Encoding utf8 $PROFILE
For cmd.exe, define an auto-run command via the registry, in value AutoRun of key HKEY_CURRENT_USER\Software\Microsoft\Command Processor (current user only) or HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor (all users):
For instance, you can use PowerShell to create this value for you:
# Auto-execute `chcp 65001` whenever the current user opens a `cmd.exe` console
# window (including when running a batch file):
Set-ItemProperty 'HKCU:\Software\Microsoft\Command Processor' AutoRun 'chcp 65001 >NUL'
Optional reading: Why the Windows PowerShell ISE is a poor choice:
While the ISE does have better Unicode rendering support than the console, it is generally a poor choice:
First and foremost, the ISE is obsolescent: it doesn't support PowerShell (Core) 7+, where all future development will go, and it isn't cross-platform, unlike the new premier IDE for both PowerShell editions, Visual Studio Code, which already speaks UTF-8 by default for PowerShell Core and can be configured to do so for Windows PowerShell.
The ISE is generally an environment for developing scripts, not for running them in production (if you're writing scripts (also) for others, you should assume that they'll be run in the console); notably, with respect to running code, the ISE's behavior is not the same as that of a regular console:
Poor support for running external programs, not only due to lack of supporting interactive ones (see next point), but also with respect to:
character encoding: the ISE mistakenly assumes that external programs use the ANSI code page by default, when in reality it is the OEM code page. E.g., by default this simple command, which tries to simply pass a string echoed from cmd.exe through, malfunctions (see below for a fix):
cmd /c echo hü | Write-Output
Inappropriate rendering of stderr output as PowerShell errors: see this answer.
The ISE dot-sources script-file invocations instead of running them in a child scope (the latter is what happens in a regular console window); that is, repeated invocations run in the very same scope. This can lead to subtle bugs, where definitions left behind by a previous run can affect subsequent ones.
As eryksun points out, the ISE doesn't support running interactive external console programs, namely those that require user input:
The problem is that it hides the console and redirects the process output (but not input) to a pipe. Most console applications switch to full buffering when a file is a pipe. Also, interactive applications require reading from stdin, which isn't possible from a hidden console window. (It can be unhidden via ShowWindow, but a separate window for input is clunky.)
If you're willing to live with that limitation, switching the active code page to 65001 (UTF-8) for proper communication with external programs requires an awkward workaround:
You must first force creation of the hidden console window by running any external program from the built-in console, e.g., chcp - you'll see a console window flash briefly.
Only then can you set [console]::OutputEncoding (and $OutputEncoding) to UTF-8, as shown above (if the hidden console hasn't been created yet, you'll get a handle is invalid error).
[1] In PowerShell, if you never call external programs, you needn't worry about the system locale (active code pages): PowerShell-native commands and .NET calls always communicate via UTF-16 strings (native .NET strings) and on file I/O apply default encodings that are independent of the system locale. Similarly, because the Unicode versions of the Windows API functions are used to print to and read from the console, non-ASCII characters always print correctly (within the rendering limitations of the console).
In cmd.exe, by contrast, the system locale matters for file I/O (with < and > redirections, but notably including what encoding to assume for batch-file source code), not just for communicating with external programs in-memory (such as when reading program output in a for /f loop).
[2] In PowerShell v4-, where the static ::new() method isn't available, use $OutputEncoding = (New-Object System.Text.UTF8Encoding).psobject.BaseObject. See GitHub issue #5763 for why the .psobject.BaseObject part is needed.
You can put the command chcp 65001 in your Powershell Profile, which will run it automatically when you open Powershell. However, this won't do anything for cmd.exe.
Microsoft is currently working on an improved terminal that will have full Unicode support. It is open source, and if you're using Windows 10 Version 1903 or later, you can already download a preview version.
Alternatively, you can use a third-party terminal emulator such as Terminus.
The Powershell ISE displays Korean perfectly fine. Here's a sample text file encoded in utf8 that would work:
PS C:\Users\js> cat .\korean.txt
The Korean language (South Korean: 한국어/韓國語 Hangugeo; North
Korean: 조선말/朝鮮말 Chosŏnmal) is an East Asian language
spoken by about 77 million people.[3]
Since the ISE comes with every version of Windows 10, I do not consider it obsolete. I disagree with whoever deleted my original answer.
The ISE has some limitations, but some scripting can be done with external commands:
echo 'list volume' | diskpart # as admin
cmd /c echo hi
EDIT:
If you have Windows 10 1903, you can download Windows Terminal from the Microsoft Store https://devblogs.microsoft.com/commandline/introducing-windows-terminal/, and Korean text would work in there. Powershell 5 would need the text format to be UTF8 with bom or UTF16.
EDIT2:
It seems like the ideals are windows terminal + powershell 7 or vscode + powershell 7, for both pasting characters and output.
EDIT3:
Even in the EDIT2 situations, some unicode characters cannot be pasted, like ⇆ (U+21C6), or unicode spaces. Only PS7 in Osx would work.

How do I configure a terminal to read UTF-8 characters?

I am working on a project which accepts user input via the command line. I am using up-to-date Windows 10 and (after much running around in circles...) I am aware that it is notoriously bad when it comes to handling UTF-8 characters. Consequently, I looked to VS Code and the integrated terminal (PowerShell) to perform input into the program. Sadly, the terminal seemed unable to accept accented UTF-8 characters such as "ë". I then did more research and configured the settings.json for VS Code for UTF-8 BOM encoding. Still, the terminal failed to read accented characters. I am certain that my program is not the issue, nor is my font. I have reduced my code to a test algorithm that simply accepts input using readline-sync (which the developers confirm is compatible with UTF-8: https://github.com/anseki/readline-sync/issues/58) and "console.log"s it.
The test case I have been using is "Hëllo". When I input "Hëllo" into the VS Code terminal, my program outputs "H�llo". When I tried converting all of my apps to UTF-8 encoding using the administrative language settings for Windows 10 and subsequently input "Hëllo" via the command terminal, it output "Hllo". I also tried forcing CMD to use Code Page 65001 with chcp 65001 for UTF-8 encoding, but it still produced "Hllo".
Here is the code I used to configure the VS Code PowerShell terminal via settings.json:
{
"[powershell]": {
"files.encoding": "utf8bom",
"files.autoGuessEncoding": true
}
}
And here is the brief code I wrote to test my input/output and whether the "ë" is being read successfully (which it is not):
const rlSync = require('readline-sync');
const name = rlSync.question('Enter Player 1 Username (Case Sensitive): ');
console.log(name);
If y'all see any issues, please let me know!
I am looking for any way to properly configure my CLI to accept accented characters for use in my program. I do not mean to restrict this question to VS Code or Powershell. If there is a way to accomplish this with the basic Windows 10 CMD, I would love that. Thank you for any help y'all can provide! <3
Is there any particular reason you're using VSCode? I think you're looking for the System.Console InputEncoding/OutputEncoding - unfortunately my default encoding just works with "Hëllo" so couldn't accurately test, and I don't know if this works with VSCode.
Try this (one line at a time)
# store current encoding settings
$i = [System.Console]::InputEncoding
$o = [System.Console]::OutputEncoding
# set encoding to UTF8
[System.Console]::InputEncoding = [System.Text.Encoding]::UTF8
[System.Console]::OutputEncoding = [System.Text.Encoding]::UTF8
# test
"Hëllo"
# revert (if you want. if you don't want, I would at least note the default encoding)
[System.Console]::InputEncoding = $i
[System.Console]::OutputEncoding = $o

linux console how to change the codepage to dos cp437

I want to view some ansi-art on the linux local-console. (my setup:raspberry pi3 / newest raspbian - no x11)
i've tried many different settings in raspi-config, dpkg-reconfigure console-setup, /etc files, environment vars but i had no luck yet. do i need a special pcf font to get it working?
a reliable way to enable it for remote terminals would also be great.
thanks in advance
It depends on what your data uses (see chart). Codes 0..31 are a problem unless you have a program that can map those codes to a printable value (as noted in Why does showconsolefont have different output in tmux?, the showconsolefont program does this mapping of 0..31).
Most of the usable fonts for the Linux console are "psf" fonts: having a header which tells which Unicode values each glyph corresponds to. Using that, along with a known character set (cp437), you could convert the data or "play" it using an application which knows how to do this:
You could convert it using iconv or recode, or
The line-drawing (128..255) could be done using luit in a UTF-8 console.

D Language fails to display german Umlaute on Windows?

As you can see, D fails to output german Umlaute. At least on Windows. On Linux or BSD the same program outputs the string as I've saved it.
I already tried wstring or dstring, but the output is the same.
What am I doing wrong?
D will output UTF-8 regardless of the operating system. How the output will be interpreted depends on how it is displayed. In this particular case, it looks like your IDE is interpreting the output as if it was encoded in the Windows-1252 encoding.
For the standard Windows console, you could change the output encoding by calling SetConsoleOutputCP(65001), but note that this may have some undesired side effects (you should restore the codepage before your progam exits, and batch files may not run while the console output codepage is set to 65001).
CyberShadows post guided me to an acceptable answer. :-)
In Eclipse it is possible to change the output-encoding without changing global settings of the OS.
Go to Run --> Run-Configurations...
There select the Common-Tab and change the encoding to UTF-8. Now german Umlaute are displayed correctly. At least in Eclipse. :-)
Another possibility is to use https://babun.github.io/ . It is a Cygwin-based Shell that ouputs UTF-8:

Create/find batch script which performs a 'find and replace' within a directory and sub-directories, for excel files with varied names

I have many files (Excel/CSV) contained in varied layers of sub-folders under a single directory. Many of the files contain a single cell with a common text string, which all need to be updated to a new string. The files all have different names. How difficult would it be to create a (preferably batch) script to search the entire directory (inclusive of sub-directories) for a given text string, replace all instances with a new string, and leave everything else in its place.
I am new to scripting, I have been searching the site and haven't found a solution that has worked for me. I want to stress that I cannot download, install or run third-party software due to security measures, and so applications like FART are out. Is anybody able to provide and input for the creation of something like this, or link me to such a script that already exists? Thanks in advance!!
Robust text editing using pure batch is difficult and slow.
Unless your admin techs have disabled CSCRIPT, you can use my JREPL.BAT - a hybrid JScript/batch text processing utility.
There is no download or installation process required. JREPL.BAT is pure script that runs natively on any Windows machine from XP onward. Simply copy the script into a new file named JREPL.BAT on your local machine.
Once you have your own copy, then all you need is something like the following command, run from the command console:
for /r "c:\your\root\path" %F in (*.csv) do #jrepl "search string" "replace string" /L /F "%F" /O -
I used the /L switch to treat the search as a literal. You may want to drop the /L switch and do a regular expression search instead.
If you put the command within another script, then you will need to double the percents and use CALL JREPL.
#echo off
for /r "c:\your\root\path" %%F in (*.csv) do call jrepl "search string" "replace string" /L /F "%%F" /O -
Issue the following command from the console prompt to get full documentation:
jrepl /? | more
I configure my console window with a large buffer height so I can scroll back to see prior output, thus I don't need | more when I look at the help.

Resources