C++ Win32: Converting scan code to Unicode character - visual-c++

When I switch to Russian layout in Windows 7 and press ; key on the keyboard, I get Russian letter ж on the screen.
I am working on a application where I need to detected pressed keys and draw text on the screen. The requirement is handle all supported languages. This is my code:
// I scan the keyboard for pressed keys
for (short key = KEY_SCAN_MIN; key <= KEY_SCAN_MAX; ++key)
{
if (GetAsyncKeyState(key) & 0x8000)
{
// When I detect a pressed key, I convert the scan code into virtual key.
// The hkl is current keyboard layout parameter, which is Russian.
UINT virtualKey = MapVirtualKeyEx((UINT)key, MAPVK_VK_TO_CHAR, hkl);
// Next I get the state of the keyboard and convert the virtual key
// into Unicode letter
if (!GetKeyboardState(kbrdState))
{
continue;
}
// unicode is defined as wchar_t unicode[2];
int result = ToUnicodeEx(virtualKey, key, (BYTE*)kbrdState, unicode, 2, 0, hkl);
Everything works great except a couple letters in Russian and I cannot figure out why. One specific letter that does not work is ж. When I attempt to translate its scan code, the translation is б, which is a different Russian letter.
I have spent entire day debugging this issue and do not get too far. When I press this Russian key I get 168 for scan code and 1078 for the virtual key. I did this small test to convert the letter back to the virtual key.
short test = VkKeyScanEx(L'ж', hkl);
The value of variable test is 1078! I do not understand why converting letter ж to virtual key gives me 1078 but converting 1078 virtual key (using the same keyboard layout) gives me б.

I always use WM_CHAR to read scan codes as it does the translation work for you and returns the final character in UTF-16. Works with all languages, even ones with which it takes more than one key press to represent a single character.

Related

New cipher problem, how to decode given enciphered code?

While I am trying to solve Caesar cipher, I faced few problems.
#enciphered message = 'I wtvp olel decfnefcpd lyo lwrzctesxd'
plain = 'abcdefghijklmnopqrstuvwxyz'
cipher = 'lmnopqrstuvwxyzabcdefghijk'
cipher_text = input('Enter enciphered message: ')
clean_text = ' '
for i in cipher_text:
if i != " ":
clean_text = clean_text + plain[plain.index(cipher[(ord(i)-ord('a'))])]
else:
clean_text = clean_text + " "
print(clean_text)
Above is the code, that I created and this is what I got as a result:
Enter enciphered message: I wtvp olel decfnefcpd lyo lwrzctesxd
n hega zwpw opnqypqnao wjz whcknepdio
Here are my related questions:
Why wasn't it decoded properly? Like, It should be 'I like data structures and algorithms'
I'm also confused about the capital "I" at the beginning of the enciphered message. Do you have any insight on that?
Finally I have no idea how to decode uppercase and lowercase at the same time; how should I do that?
(1) Why wasn't it decoded properly? Like, It should be 'I like data structures and algorithms'
First of all, the alphabet is already present in ASCII. So there is no need to redefine the alphabet in plain or cipher. The key is the offset from the plaintext character to the ciphertext character, wrapping around directly after the z.
So you generally convert from a character to an index in the alphabet from 0..25, then you add (for encryption) or subtract (for decryption) the key, modulo 26 - the size of the alphabet. Then, to get the result, you convert back into a character. You're already doing the conversion to index using ord(character) - ord(a). The opposite can be done using chr.
(2) I'm also confused about the capital "I" at the beginning of the enciphered message. Do you have any insight on that?
Well, there are more possibilities for single character words. The word A would be a prime suspect.
(3) Finally I have no idea how to decode uppercase and lowercase at the same time; how should I do that?
The best way is to create a variable that indicates that something is uppercase or not; is_uppercase would be a good name. Then convert the character to lowercase. Perform the encryption / decryption operation and then convert the resulting character back into uppercase, if required. That way your encryption / decryption operation is not affected at all and kept relatively simple.

VBA Byte Array to String

Apologies if this question has been previously answered, I was unable to find an explanation. I've created a script in VBScript to encrypt an user input and match to an already encrypted password. I ran into some issues along the way and managed to deduce to the following.
I have a byte array (1 to 2) as values (1, 16). I am then defining a string with the value of the array as per below:
Dim bytArr(1 To 2) As Byte
Dim output As String
bytArr(1) = 16
bytArr(2) = 1
output = bytArr
Debug.Print output
The output I get is Ð (Eth) ASCII Value 208. Could someone please explain how the byte array is converted to this character?
In VBA, Byte Arrays are special because, unlike arrays of other datatypes, a string can be directly assigned to a byte array. In VBA, Strings are UNICODE strings, so when one assigns a string to a byte array then it stores two digits for each character;
although the glyphs seem to be the same, see charmap:
Ð is Unicode Character 'LATIN CAPITAL LETTER ETH' (U+00D0) shown in charmap DOS Western (Central) Europe character set (0xD1, i.e. decimal 209);
Đ is Unicode Character 'LATIN CAPITAL LETTER D WITH STROKE' (U+0110) shown in charmap Windows Western (Central Europe) character set (0xD0, i.e. decimal 208).
Get above statements together keeping in mind endianness (byte order) of the computer architecture: Intel x86 processors use little-endian, so byte array (0x10, 0x01) is the same as unicode string U+0110.
Charaters are amalgamated via flagrant mojibake case. For proof, please use Asc and AscW Functions as follows: Debug.Print output, Asc(output), AscW(output) with different console code pages, e.g. under chcp 852 and chcp 1250.

How do you make a function detect whether a string is binary safe or not

How does one detect if a string is binary safe or not in Go?
A function like:
IsBinarySafe(str) //returns true if its safe and false if its not.
Any comment after this are just things I have thought or attempted to solve this:
I assumed that there must exist a library that already does this but had a tough time finding it. If there isn't one, how do you implement this?
I was thinking of some solution but wasn't really convinced they were good solutions.
One of them was to iterate over the bytes, and have a hash map of all the illegal byte sequences.
I also thought of maybe writing a regex with all the illegal strings but wasn't sure if that was a good solution.
I also was not sure if a sequence of bytes from other languages counted as binary safe. Say the typical golang example:
世界
Would:
IsBinarySafe(世界) //true or false?
Would it return true or false? I was assuming that all binary safe string should only use 1 byte. So iterating over it in the following way:
const nihongo = "日本語abc日本語"
for i, w := 0, 0; i < len(nihongo); i += w {
runeValue, width := utf8.DecodeRuneInString(nihongo[i:])
fmt.Printf("%#U starts at byte position %d\n", runeValue, i)
w = width
}
and returning false whenever the width was great than 1. These are just some ideas I had just in case there wasn't a library for something like this already but I wasn't sure.
Binary safety has nothing to do with how wide a character is, it's mainly to check for non-printable characters more or less, like null bytes and such.
From Wikipedia:
Binary-safe is a computer programming term mainly used in connection
with string manipulating functions. A binary-safe function is
essentially one that treats its input as a raw stream of data without
any specific format. It should thus work with all 256 possible values
that a character can take (assuming 8-bit characters).
I'm not sure what your goal is, almost all languages handle utf8/16 just fine now, however for your specific question there's a rather simple solution:
// checks if s is ascii and printable, aka doesn't include tab, backspace, etc.
func IsAsciiPrintable(s string) bool {
for _, r := range s {
if r > unicode.MaxASCII || !unicode.IsPrint(r) {
return false
}
}
return true
}
func main() {
fmt.Printf("len([]rune(s)) = %d, len([]byte(s)) = %d\n", len([]rune(s)), len([]byte(s)))
fmt.Println(IsAsciiPrintable(s), IsAsciiPrintable("test"))
}
playground
From unicode.IsPrint:
IsPrint reports whether the rune is defined as printable by Go. Such
characters include letters, marks, numbers, punctuation, symbols, and
the ASCII space character, from categories L, M, N, P, S and the ASCII
space character. This categorization is the same as IsGraphic except
that the only spacing character is ASCII space, U+0020.

Reading keyboard KBD.dll files and understanding modifier states

I am writing a virtual keyboard application that reads windows KBD file to build the keyboard. I am having trouble understanding how to interpret the array of characters returned for each virtual key.
I am using the kbd.h files in this CodePlex project.
Here is the struct that I am not understanding.
typedef struct {
PVK_TO_BIT pVkToBit; // Virtual Keys -> Mod bits
WORD wMaxModBits; // max Modification bit combination value
BYTE ModNumber[]; // Mod bits -> Modification Number
} MODIFIERS, *KBD_LONG_POINTER PMODIFIERS;
When reading the documentation, and analyzing the results with a US keyboard, this struct and its contained data makes sense.
---------------
US
---------------
CharModifiers
0001 = SHIFT
0010 = CTRL
0100 = ALT
ModNumber
0000 = 0 = BASE
0001 = 1 = SHIFT
0010 = 2 = CTRL
0011 = 3 = SHIFT + CTRL
What this says is that for the array of characters returned for each virtual key (another kbd.h struct), the first one represents no modifiers, the second represents the value when SHIFT is held, and so on. This is accurate and maps perfectly to the array of characters returned for each virtual key.
However, if I load a German keyboard layout (KBDGR.dll), the PMODIFIERS doesn't line up with array of characters returned for each virtual key.
---------------
German
---------------
CharModifiers
0001 = SHIFT
0010 = CTRL
0100 = ALT
ModNumber
0000 = 0 = BASE = REALLY BASE
0001 = 1 = SHIFT = REALLY SHIFT
0011 = 3 = SHIFT + CTRL = REALLY ALTGR
0100 = 4 = ALT = REALLY CTRL
1111 = 15 = INVALID = INVALID
1111 = 15 = INVALID = INVALID
0010 = 2 = CTRL = REALLY SHIFT + CTRL
0101 = 5 = SHIFT + ALT = REALLY SHIFT + ALTGR
As you can see here, for example, 0010 should correlate with just a CTRL modifier, however, the character returned from the virtual key really represents SHIFT + CTRL.
What am I not understanding? I thought that the array of ModNumber describes each index of characters for each virtual key, and the modifier keys they represent. This assumption worked correctly for the US keyboard layout, but when not for the German keyboard layout?
I emailed the makers of KbdEdit for their input, and they just replied with the answer!
The zero-based position within ModNumber array defines the modifier
combination: eg, the last element "2" is at position 6, whose binary
representation is 110, ie KBDCTRL | KBDALT, ie AltGr
(www.kbdedit.com/manual/low_level_modifiers.html#AltGr) The value "2"
means that AltGr mappings will appear at index 2 in all aVkToWchX[]
arrays (for X>=3).
The position 3 corresponds to Shift+Ctrl (= 011 =
KBDSHIFT | KBDCTRL) - you see that this combination is to be found at
aVkToWchX[4] (for X>=5)
If you open the German layout in KbdEdit,
you will see that indeed AltGr is at position 2, and Shift+Ctrl at
position 4 (zero based) - see attached screenshot.
Hope this helps.
Regards,
Ivica
Thanks Ivica!
I've never figured out this either :) The description in the kbd.h-file didn't make much sense either, so I didn't bother to understand it.
In the kbd.h-header it states:
* CONTROL MENU SHIFT
* ^ ^ ^
* aModification[] = { | | |
* 0, // 0 0 0 = 000 (none)
I belive the modifiers should be:
001 = SHIFT
010 = ALT
100 = CTRL
I agree that the list seems unlogical based in the german, and I reproduced it.
But I worked mostly on figuring out the connection between scan codes (physical placement) and virtual keys. Each virtual keys have the modifier included, that way you could iterate between all modifier combinations.
Since I'm Norwegian I handled the KBDNO.dll first, comparing it to the MKLC: http://msdn.microsoft.com/en-us/goglobal/bb964665.aspx
Since you asked for the German keyboard I also compared it, and it seems to match. Same with US.
Check out my "virtual keyboard" on my site: http://lars.werner.no/?page_id=922
The CKLL-class could help you a lot to achieve what you are trying to do. The class isn't perfect, so you have to put in some hours to get it there. Look at the keyboard as a array of scancodes with attached virtual keys based on modifiers :)
Sorry for not being more helpful atm, but it has been a while since I even programmed. To little spare time for my hobby!

SQLite Query for a character with a prefix and a suffix

Okay actually I'm writing a program for parsing Japanese/Chinese text, but I try to map it to an english example. No, I don't want to use it to create password lists :).
Suppose there is a text without spaces (space is not used in most east asian languages) like :
helloiamwritingproperenglish!
Given is a specific character position in the text like the r in proper:
helloiamwritingproperenglish!
^
so the text can be decomposed in prefix + 'r' + suffix.
Additionally there is a dictionary stored in SQLite containing character combinations (words) like:
sqllite>SELECT writingKey from dic_writings;
writingKey
----------
A, Aa, ...
I want to find all regular words in the dictionary that are containing the selected character 'r' and a (maybe empty) substring of prefix and suffix like:
sqllite>FindCandidates('helloiamwritingp','r','operenglish!');
R, Pro, Rope, Prop, Proper
A Query to find all words in the input text could be:
SELECT * FROM dic_writings WHERE (text LIKE ('%'||writingKey||'%'));
but this approach is not very fast and I need to filter the words containing the selected 'r' (checking for 'r' is not enough actually). Anybody has an idea? Thank you for your time!

Resources