Compare Unicode code point range in Python3

Compare Unicode code point range in Python3 - python-3.x

I would like to check if a character is in a certain Unicode range or not, but seems I cannot get the expected answer.
char = "？" # the unicode value is 0xff1f
print(hex(ord(char)))
if hex(ord(char)) in range(0xff01, 0xff60):
print("in range")
else:
print("not in range")
It should print: "in range", but the results show: "not in range". What have I done wrong?

hex() returns a string. To compare integers you should simply use ord:
if ord(char) in range(0xff01, 0xff60):
You could've also written:
if 0xff01 <= ord(char) < 0xff60:

In general for such problems, you can try inspecting the types of your variables.
Typing 0xff01 without quotes, represents a number.
list(range(0xff01, 0xff60)) will give you a list of integers [65281, 65282, .., 65375]. range(0xff01, 0xff60) == range(65281, 65376) evaluates to True.
ord('?') gives you integer 65311.
hex() takes an integer and converts it to '0xff01' (a string).
So, you simply need to use ord(), no need to hex() it.

Just only use ord:
if ord(char) in range(0xff01, 0xff60):
...
hex is not needed.
As mentioned in the docs:
Convert an integer number to a lowercase hexadecimal string prefixed with “0x”.
Obviously that already describes it, it becomes a string instead of what we want, an integer.
Whereas the ord function does what we want, as mentioned in the docs:
Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. For example, ord('a') returns the integer 97 and ord('€') (Euro sign) returns 8364. This is the inverse of chr().

Related

Ocaml String.sub: How to get all characters until end of string

So I'm using the String.sub function and I'm wondering if there's any way to get all the characters in a string until the end of the string. Since String.sub takes in a string, int (the index of where to start getting chars), and then a number of how many chars, I'm not sure what the easiest way of doing all chars since we want a possibly positive infinite amount

For String.sub you just have to subtract from the length of the string. There's no simpler way, i.e., there's no value for length that has a special meaning. A value larger than the remainder of the string is an error in OCaml (which tends to be strict when checking parameters).
Assume i is >= 0 and < length of the string:
String.sub s i (String.length s - i)
You can use Str.last_chars, but you still need to know how many characters you want. I.e., you still have to subtract from the length of the string.
Str.last_chars s (String.length s - i)

Is there a built-in in Python 3 that checks whether a character is a "basic" algebraic symbol?

I know the string methods str.isdigit, str.isdecimal and str.isnumeric.
I'm looking for a built-in method that checks if a character is algebraic, meaning that it can be found in a declaration of a decimal number.
The above mentioned methods return False for '-1' and '1.0'.
I can use isdigit to retrieve a positive integer from a string:
string = 'number=123'
number = ''.join([d for d in string if d.isdigit()]) # returns '123'
But that doesn't work for negative integers or floats.
Imagine a method called isnumber that works like this:
def isnumber(s):
for c in s:
if c not in list('.+-0123456789'):
return False
return True
string1 = 'number=-1'
string2 = 'number=0.1'
number1 = ''.join([d for d in string1 if d.isnumber()]) # returns '-1'
number2 = ''.join([d for d in string2 if d.isnumber()]) # returns '0.1'
The idea is to test against a set of "basic" algebraic characters. The string does not have to contain a valid Python number. It could also be an IP address like 255.255.0.1.
.
Does a handy built-in that works approximately like that exist?
If not, why not? It would be much more efficient than a python function and very useful. I've seen alot of examples on stackoverflow that use str.isdigit() to retrieve a positive integer from a string. Is there a reason why there isn't a built-in like that, although there are three different methods that do almost the same thing?

No such function exists. There are a bunch of odd characters that can be part of number literals in Python, such as o, x and b in the prefix of integers of non-decimal bases, and e to introduce the exponential part of a float. I think those plus the hex digits (0-9 and A-F) and sign characters and the decimal point are all you need.
You can put together a string with the right character yourself and test against it:
from string import hex_digits
num_literal_chars = hex_digits + "oxOX.+-"
That will get a bunch of garbage though if you use it to test against mixed text and numbers:
string1 = "foo. bar. 0xDEADBEEF 10.0.0.1"
print("".join(c for c in string1 if c in num_literal_chars))
# prints "foo.ba.0xDEADBEEF10.0.0.1"
The fact that it gives you a bunch of junk is probably why no builtin function exists to do this. If you want to match a certain kind of number out of a string, write an appropriate regular expression to match that specific kind of number. Don't try to do it character-by-character, or try to match all the different kinds of Python numbers.

Convert a string into an integer of its ascii values

I am trying to write a function that takes a string txt and returns an int of that string's character's ascii numbers. It also takes a second argument, n, that is an int that specified the number of digits that each character should translate to. The default value of n is 3. n is always > 3 and the string input is always non-empty.
Example outputs:
string_to_number('fff')
102102102
string_to_number('ABBA', n = 4)
65006600660065
My current strategy is to split txt into its characters by converting it into a list. Then, I convert the characters into their ord values and append this to a new list. I then try to combine the elements in this new list into a number (e.g. I would go from ['102', '102', '102'] to ['102102102']. Then I try to convert the first element of this list (aka the only element), into an integer. My current code looks like this:
def string_to_number(txt, n=3):
characters = list(txt)
ord_values = []
for character in characters:
ord_values.append(ord(character))
joined_ord_values = ''.join(ord_values)
final_number = int(joined_ord_values[0])
return final_number
The issue is that I get a Type Error. I can write code that successfully returns the integer of a single-character string, however when it comes to ones that contain more than one character, I can't because of this type error. Is there any way of fixing this. Thank you, and apologies if this is quite long.

Try this:
def string_to_number(text, n=3):
return int(''.join('{:0>{}}'.format(ord(c), n) for c in text))
print(string_to_number('fff'))
print(string_to_number('ABBA', n=4))
Output:
102102102
65006600660065
Edit: without list comprehension, as OP asked in the comment
def string_to_number(text, n=3):
l = []
for c in text:
l.append('{:0>{}}'.format(ord(c), n))
return int(''.join(l))
Useful link(s):
string formatting in python: contains pretty much everything you need to know about string formatting in python

The join method expects an array of strings, so you'll need to convert your ASCII codes into strings. This almost gets it done:
ord_values.append(str(ord(character)))
except that it doesn't respect your number-of-digits requirement.

Simply, what is a way to print ascii character from binary values in python3?

For example if I had the value '00010010' how would a simple function just print it as "H"?
Other answer seem to be rather complicated or don't work at all

You can use the chr and ord classes to convert between numbers and characters. In this case, given a binary number, you'll also need to use the int class to convert from a binary string to a Python integer.
For example:
>>> chr(int("00010010", 2))
'\x12'
This gives the ascii character of the given input. Note that the binary "00010010" does not correspond to a "H" character in ASCII; the value of "H" can be found with the ord function:
>>> bin(ord("H"))
'0b1001000'

transform string/char to uint8

Why does the expression:
test = cast(strtrim('3'), 'uint8')
produce 51?
This is also true for:
test = cast(strtrim('3'), 'int8')
Thanks.

Because 51 is the ASCII code for the character '3'.
If you want to transform the string to numeric 3, you should use
uint8(str2double('3'))
Note that str2double will ignore trailing spaces, so that strtrim isn't necessary.
EDIT
When a string is used in an numeric operation, Matlab automatically converts it to its ASCII value. For example
>> '1'+1
ans =
50

Because 51 is the ASCII value for the character '3'.

This is because '3' is seen as an ASCII character to matlab. By casting as a signed or unsigned integer (8 bits in this case) you are asking Matlab to convert an ASCII '3' to a decimal number. In this case the decimal number is 51. If you want to look at more conversions here is a basic document.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Compare Unicode code point range in Python3 - python-3.x

hex() returns a string. To compare integers you should simply use ord: if ord(char) in range(0xff01, 0xff60): You could've also written: if 0xff01 <= ord(char) < 0xff60:

Related

Ocaml String.sub: How to get all characters until end of string

Is there a built-in in Python 3 that checks whether a character is a "basic" algebraic symbol?

Convert a string into an integer of its ascii values

Simply, what is a way to print ascii character from binary values in python3?

transform string/char to uint8

Categories

Resources