Simply, what is a way to print ascii character from binary values in python3? - python-3.x

For example if I had the value '00010010' how would a simple function just print it as "H"?
Other answer seem to be rather complicated or don't work at all

You can use the chr and ord classes to convert between numbers and characters. In this case, given a binary number, you'll also need to use the int class to convert from a binary string to a Python integer.
For example:
>>> chr(int("00010010", 2))
'\x12'
This gives the ascii character of the given input. Note that the binary "00010010" does not correspond to a "H" character in ASCII; the value of "H" can be found with the ord function:
>>> bin(ord("H"))
'0b1001000'

Related

Compare Unicode code point range in Python3

I would like to check if a character is in a certain Unicode range or not, but seems I cannot get the expected answer.
char = "?" # the unicode value is 0xff1f
print(hex(ord(char)))
if hex(ord(char)) in range(0xff01, 0xff60):
print("in range")
else:
print("not in range")
It should print: "in range", but the results show: "not in range". What have I done wrong?
hex() returns a string. To compare integers you should simply use ord:
if ord(char) in range(0xff01, 0xff60):
You could've also written:
if 0xff01 <= ord(char) < 0xff60:
In general for such problems, you can try inspecting the types of your variables.
Typing 0xff01 without quotes, represents a number.
list(range(0xff01, 0xff60)) will give you a list of integers [65281, 65282, .., 65375]. range(0xff01, 0xff60) == range(65281, 65376) evaluates to True.
ord('?') gives you integer 65311.
hex() takes an integer and converts it to '0xff01' (a string).
So, you simply need to use ord(), no need to hex() it.
Just only use ord:
if ord(char) in range(0xff01, 0xff60):
...
hex is not needed.
As mentioned in the docs:
Convert an integer number to a lowercase hexadecimal string prefixed with “0x”.
Obviously that already describes it, it becomes a string instead of what we want, an integer.
Whereas the ord function does what we want, as mentioned in the docs:
Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. For example, ord('a') returns the integer 97 and ord('€') (Euro sign) returns 8364. This is the inverse of chr().

can not decoed using utf-8 after encoding with utf-8

In a situation I had to store data as utf-8 and now when I want to fetch and decode('utf-8') data it's just simply does not work. Consider line below as an example:
\x0d\x0a\xd8\xb3\xd8\xa7\xd9\x82\xdb\x8c\xe2\x80\x8c\xd9\x86\xd8\xa7\xd9\x85\xd9\x87
You can simply copy the line below to convert the string above to the human readable format:
b"\x0d\x0a\xd8\xb3\xd8\xa7\xd9\x82\xdb\x8c\xe2\x80\x8c\xd9\x86\xd8\xa7\xd9\x85\xd9\x87".decode("utf-8")
However could not find a way to convert the string to bytestring without corrupting the string. I tried following methods but all of them failed:
.decode("utf-8")
.decode()
.bytes()
Up until this point I could not find solution in OS or other places. Appreciate any help.
x0d\x0a\xd8\xb3\xd8\xa7\xd9\x82\xdb\x8c\xe2\x80\x8c\xd9\x86\xd8\xa7\xd9\x85\xd9\x87
b'x0d\x0a\xd8\xb3\xd8\xa7\xd9\x82\xdb\x8c\xe2\x80\x8c\xd9\x86\xd8\xa7\xd9\x85\xd9\x87'
The above lines (both given in the question) are particular instances of String and Bytes literals (respectively):
\xhh Character with hex value hh (2, 3)
2 Unlike in Standard C, exactly two hex digits are
required.
3 In a bytes literal, hexadecimal and octal escapes denote
the byte with the given value. In a string literal, these escapes
denote a Unicode character with the given value.
Let's check the string defined in such a way (inside Python prompt):
>>> xstr = "\x0d\x0a\xd8\xb3\xd8\xa7\xd9\x82\xdb\x8c\xe2\x80\x8c\xd9\x86\xd8\xa7\xd9\x85\xd9\x87"
>>> xstr
'\r\nساÙ\x82Û\x8câ\x80\x8cÙ\x86اÙ\x85Ù\x87'
>>> print( xstr)
ساÙÛâÙاÙ
Ù
>>>
Apparently, the print( xstr) output does not resemble a word in any known language however all its characters belong (by definition) to Unicode range r'[\u0000-\u00ff]' i.e. the first 256 of characters in Unicode, and voila - it's iso-8859-1 aka 'latin1'.
We need to get an encoded version of the xstr string as a bytes object, e.g. using str.encode method or built-in bytes() function. Then
print( bytes(xstr,'latin1').decode()); print(xstr.encode("latin1").decode())
ساقی‌نامه
ساقی‌نامه

Padding a hexadecimal string with zeros to a 6 character length

I have this:
function dec2hex(IN)
local OUT
OUT = string.format("%x",IN)
return OUT
end
and need IN to have padded zeros to string length of 6.
I can't use String.Utils or PadLeft. It's within an app called Watchmaker which uses a cut down version of Lua.
String formats in Lua work mostly just like in C. So to pad a number with zeros, just use %0n where n is the number of places. For example
print(string.format("%06x", 16^4-1))
will print 00ffff.
See chapter 20 The String Library of “Programming in Lua”, the reference of string.format, and the C reference for the printf family of functions for details.
If you store your format string locally you can call the format method on to the format string and the example of #Henri results in ("%06x"):format(0xffff)
print(("%06x"):format(0xffff)) -- Prints `00ffff`
You can write numbers in hex format. It is the same as C.

Is there a built-in in Python 3 that checks whether a character is a "basic" algebraic symbol?

I know the string methods str.isdigit, str.isdecimal and str.isnumeric.
I'm looking for a built-in method that checks if a character is algebraic, meaning that it can be found in a declaration of a decimal number.
The above mentioned methods return False for '-1' and '1.0'.
I can use isdigit to retrieve a positive integer from a string:
string = 'number=123'
number = ''.join([d for d in string if d.isdigit()]) # returns '123'
But that doesn't work for negative integers or floats.
Imagine a method called isnumber that works like this:
def isnumber(s):
for c in s:
if c not in list('.+-0123456789'):
return False
return True
string1 = 'number=-1'
string2 = 'number=0.1'
number1 = ''.join([d for d in string1 if d.isnumber()]) # returns '-1'
number2 = ''.join([d for d in string2 if d.isnumber()]) # returns '0.1'
The idea is to test against a set of "basic" algebraic characters. The string does not have to contain a valid Python number. It could also be an IP address like 255.255.0.1.
.
Does a handy built-in that works approximately like that exist?
If not, why not? It would be much more efficient than a python function and very useful. I've seen alot of examples on stackoverflow that use str.isdigit() to retrieve a positive integer from a string. Is there a reason why there isn't a built-in like that, although there are three different methods that do almost the same thing?
No such function exists. There are a bunch of odd characters that can be part of number literals in Python, such as o, x and b in the prefix of integers of non-decimal bases, and e to introduce the exponential part of a float. I think those plus the hex digits (0-9 and A-F) and sign characters and the decimal point are all you need.
You can put together a string with the right character yourself and test against it:
from string import hex_digits
num_literal_chars = hex_digits + "oxOX.+-"
That will get a bunch of garbage though if you use it to test against mixed text and numbers:
string1 = "foo. bar. 0xDEADBEEF 10.0.0.1"
print("".join(c for c in string1 if c in num_literal_chars))
# prints "foo.ba.0xDEADBEEF10.0.0.1"
The fact that it gives you a bunch of junk is probably why no builtin function exists to do this. If you want to match a certain kind of number out of a string, write an appropriate regular expression to match that specific kind of number. Don't try to do it character-by-character, or try to match all the different kinds of Python numbers.

Erlang howto make a list from this binary <<"a,b,c">>

I have a binary <<"a,b,c">> and I would like to extract the information from this binary.
So I would like to have something like A=a, B=b and so on.
I need a general approach on this because the binary string always changes.
So it could be <<"aaa","bbb","ccc">>...
I tried to generate a list
erlang:binary_to_list(<<"a","b","c">>)
but I get string as a result.
"abc"
Thank you.
You did use the right method.
binary_to_list(Binary) -> [char()]
Returns a list of integers which correspond to the bytes of Binary.
There is no string type in Erlang: http://www.erlang.org/doc/reference_manual/data_types.html#id63119. The console just displays the lists in string representation as a courtesy, if all elements are in printable ASCII range.
You should read Erlang's "Bit Syntax Expressions" documentation to understand how to work on binaries.
Do not convert the whole binary into a list if you don't need it in list representation!
To extract the first three bytes you could use
<<A, B, C, Rest/binary>> = <<"aaa","bbb","ccc">>.
If you want to iterate over the binary data, you can use binary comprehension.
<< <<(F(X))>> || <<X>> <= <<"aaa","bbb","ccc">> >>.
Pattern matching is possible, too:
test(<<A, Tail/binary>>, Accu) -> test(Tail, Accu+A);
test(_, Accu) -> Accu.
882 = test(<<"aaa","bbb","ccc">>, 0).
Even for reading one UTF-8 character at once. So to convert a binary UTF-8 string into Erlang's "list of codepoints" format, you could use:
test(<<A/utf8, Tail/binary>>, Accu) -> test(Tail, [A|Accu]);
test(_, Accu) -> lists:reverse(Accu).
[97,97,97,600,99,99,99] = test(<<"aaa", 16#0258/utf8, "ccc">>, "").
(Note that `<<"aaa","bbb","ccc">> = <<"aaabbbccc">>. Don't actually use the last code snipped but the linked method.)

Resources