Use python string as byte - string

I have the byte representation of a character in a string, let's say the character is 'H', which has the byte value of 72. My string is therefore "72".
How do I go about converting this string ("72") into its corresponding character value ('H') based on the byte value (72) represented in my string using python 3.6?
Psuedo code:
str = 72
print(decode_as_byte_value(str))
Expected result:
H

ord('H')
chr(72)
Its as simple as that. Remember that chr() only takes int and ord() only takes str

Please do not use this community for such Syntex based questions.
Still your ans is:
# Get the ASCII number of a character
number = ord(char)
# Get the character given by an ASCII number
char = chr(number)
If this is your answer tick mark this response.

Related

Go string appears shorter than it's first rune

I was running some fuzzing on my code and it found a bug. I have reduced it down to the following code snippet and I cannot see what is wrong.
Given the string
s := string("\xc0")
The len(s) function returns 1. However, if you loop through the string the first rune is length 3.
for _, r := range s {
fmt.Println("len of rune:", utf8.RuneLen(r)) // Will print 3
}
My assumptions are:
len(string) is returning the number of bytes in the string
utf8.RuneLen(r) is returning the number of bytes in the rune
I assume I am misunderstanding something, but how can the length of a string be less than the length of one of it's runes?
Playground here: https://go.dev/play/p/SH3ZI2IZyrL
The explanation is simple: your input is not valid UTF-8 encoded string.
fmt.Println(utf8.ValidString(s))
This outputs: false.
The for range over a string ranges over its runes, but if an invalid UTF-8 sequence is encountered, the Unicode replacement character 0xFFFD is set for r. Spec: For statements:
For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type rune, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.
This applies to your case: you get 0xfffd for r which has 3 bytes using UTF-8 encoding.
If you go with a valid string holding a rune of \xc0:
s = string([]rune{'\xc0'})
Then output is:
len of s: 2
runes in s: 1
len of rune: 2
UTF-8 bytes of s: [195 128]
Hexa UTF-8 bytes of s: c3 80
Try it on the Go Playground.

Padding a hexadecimal string with zeros to a 6 character length

I have this:
function dec2hex(IN)
local OUT
OUT = string.format("%x",IN)
return OUT
end
and need IN to have padded zeros to string length of 6.
I can't use String.Utils or PadLeft. It's within an app called Watchmaker which uses a cut down version of Lua.
String formats in Lua work mostly just like in C. So to pad a number with zeros, just use %0n where n is the number of places. For example
print(string.format("%06x", 16^4-1))
will print 00ffff.
See chapter 20 The String Library of “Programming in Lua”, the reference of string.format, and the C reference for the printf family of functions for details.
If you store your format string locally you can call the format method on to the format string and the example of #Henri results in ("%06x"):format(0xffff)
print(("%06x"):format(0xffff)) -- Prints `00ffff`
You can write numbers in hex format. It is the same as C.

What determines the position of a character when looping through UTF-8 strings?

I am reading the section on for statements in the Effective Go documentation and came across this example:
for pos, char := range "日本\x80語" {
fmt.Printf("Character %#U, at position: %d\n", char, pos)
}
The output is:
Character U+65E5 '日', at position: 0
Character U+672C '本', at position: 3
Character U+FFFD '�', at position: 6
Character U+8A9E '語', at position: 7
What I don't understand is why the positions are 0, 3, 6, and 7. This tells me the first and second character is 3 bytes long and the 'replacement rune' (U+FFFD) is 1 byte long, which I accept and understand. However, I thought rune was of int32 type and therefore would be 4 bytes each, not three.
Why are the positions in a range different to the total amount of memory each value should be consuming?
string values in Go are stored as read only byte slices ([]byte), where the bytes are the UTF-8 encoded bytes of the (runes of the) string. UTF-8 is a variable-length encoding, different Unicode code points may be encoded using different number of bytes. For example values in the range 0..127 are encoded as a single byte (whose value is the unicode codepoint itself), but values greater than 127 use more than 1 byte. The unicode/utf8 package contains UTF-8 related utility functions and constants, for example utf8.UTFMax reports the maximum number of bytes a valid Unicode codepoint may "occupy" in UTF-8 encoding (which is 4).
One thing to note here: not all possible byte sequences are valid UTF-8 sequences. A string may be any byte sequence, even those that are invalid UTF-8 sequences. For example the string value "\xff" represents an invalid UTF-8 byte sequence, for details, see How do I represent an Optional String in Go?
The for range construct –when applied on a string value– iterates over the runes of the string:
For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type rune, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.
The for range construct may produce 1 or 2 iteration values. When using 2, like in your example:
for pos, char := range "日本\x80語" {
fmt.Printf("Character %#U, at position: %d\n", char, pos)
}
For each iteration, pos will be byte index of the rune / character, and char will be the rune of the string. As you can see in the quote above, if the string is an invalid UTF-8 byte sequence, when an invalid UTF-8 sequence is encountered, char will be 0xFFFD (the Unicode replacement character), and the for range construct (the iteration) will advance a singe byte only.
To sum it up: The position is always the byte index of the rune of the current iteration (or more specifically: the byte index of the first byte of the UTF-8 encoded sequence of the rune of the current iteration), but if invalid UTF-8 sequence is encountered, the position (index) will only be incremented by 1 in the next iteration.
A must-read blog post if you want to know more about the topic:
The Go Blog: Strings, bytes, runes and characters in Go
rune is code point. Code point is just integer. You can even use int64 to store it if you want to. (But Unicode only has 1,114,112 code points so int32 should be the right choice. No wonder rune is alias of int32 in Golang.)
Different encoding schemes encode code points in different ways. E.g. CJK character is usually encoded to 3 bytes in UTF-8, and to 2 bytes in UTF-16.
String literal in Golang is UTF-8.

How can I convert a character code to a string character in Lua?

How can I convert a character code to a string character in Lua?
E.g.
d = 48
-- this is what I want
str_d = "0"
You are looking for string.char:
string.char (···)
Receives zero or more integers. Returns a string with length equal to the number of arguments, in which each character has the internal numerical code equal to its corresponding argument.
Note that numerical codes are not necessarily portable across platforms.
For your example:
local d = 48
local str_d = string.char(d) -- str_d == "0"
For ASCII characters, you can use string.char.
For UTF-8 strings, you can use utf8.char(introduced in Lua 5.3) to get a character from its code point.
print(utf8.char(48)) -- 0
print(utf8.char(29790)) -- 瑞

transform string/char to uint8

Why does the expression:
test = cast(strtrim('3'), 'uint8')
produce 51?
This is also true for:
test = cast(strtrim('3'), 'int8')
Thanks.
Because 51 is the ASCII code for the character '3'.
If you want to transform the string to numeric 3, you should use
uint8(str2double('3'))
Note that str2double will ignore trailing spaces, so that strtrim isn't necessary.
EDIT
When a string is used in an numeric operation, Matlab automatically converts it to its ASCII value. For example
>> '1'+1
ans =
50
Because 51 is the ASCII value for the character '3'.
This is because '3' is seen as an ASCII character to matlab. By casting as a signed or unsigned integer (8 bits in this case) you are asking Matlab to convert an ASCII '3' to a decimal number. In this case the decimal number is 51. If you want to look at more conversions here is a basic document.

Resources