Is there a built-in in Python 3 that checks whether a character is a "basic" algebraic symbol? - string

I know the string methods str.isdigit, str.isdecimal and str.isnumeric.
I'm looking for a built-in method that checks if a character is algebraic, meaning that it can be found in a declaration of a decimal number.
The above mentioned methods return False for '-1' and '1.0'.
I can use isdigit to retrieve a positive integer from a string:
string = 'number=123'
number = ''.join([d for d in string if d.isdigit()]) # returns '123'
But that doesn't work for negative integers or floats.
Imagine a method called isnumber that works like this:
def isnumber(s):
for c in s:
if c not in list('.+-0123456789'):
return False
return True
string1 = 'number=-1'
string2 = 'number=0.1'
number1 = ''.join([d for d in string1 if d.isnumber()]) # returns '-1'
number2 = ''.join([d for d in string2 if d.isnumber()]) # returns '0.1'
The idea is to test against a set of "basic" algebraic characters. The string does not have to contain a valid Python number. It could also be an IP address like 255.255.0.1.
.
Does a handy built-in that works approximately like that exist?
If not, why not? It would be much more efficient than a python function and very useful. I've seen alot of examples on stackoverflow that use str.isdigit() to retrieve a positive integer from a string. Is there a reason why there isn't a built-in like that, although there are three different methods that do almost the same thing?

No such function exists. There are a bunch of odd characters that can be part of number literals in Python, such as o, x and b in the prefix of integers of non-decimal bases, and e to introduce the exponential part of a float. I think those plus the hex digits (0-9 and A-F) and sign characters and the decimal point are all you need.
You can put together a string with the right character yourself and test against it:
from string import hex_digits
num_literal_chars = hex_digits + "oxOX.+-"
That will get a bunch of garbage though if you use it to test against mixed text and numbers:
string1 = "foo. bar. 0xDEADBEEF 10.0.0.1"
print("".join(c for c in string1 if c in num_literal_chars))
# prints "foo.ba.0xDEADBEEF10.0.0.1"
The fact that it gives you a bunch of junk is probably why no builtin function exists to do this. If you want to match a certain kind of number out of a string, write an appropriate regular expression to match that specific kind of number. Don't try to do it character-by-character, or try to match all the different kinds of Python numbers.

Related

Compare Unicode code point range in Python3

I would like to check if a character is in a certain Unicode range or not, but seems I cannot get the expected answer.
char = "?" # the unicode value is 0xff1f
print(hex(ord(char)))
if hex(ord(char)) in range(0xff01, 0xff60):
print("in range")
else:
print("not in range")
It should print: "in range", but the results show: "not in range". What have I done wrong?
hex() returns a string. To compare integers you should simply use ord:
if ord(char) in range(0xff01, 0xff60):
You could've also written:
if 0xff01 <= ord(char) < 0xff60:
In general for such problems, you can try inspecting the types of your variables.
Typing 0xff01 without quotes, represents a number.
list(range(0xff01, 0xff60)) will give you a list of integers [65281, 65282, .., 65375]. range(0xff01, 0xff60) == range(65281, 65376) evaluates to True.
ord('?') gives you integer 65311.
hex() takes an integer and converts it to '0xff01' (a string).
So, you simply need to use ord(), no need to hex() it.
Just only use ord:
if ord(char) in range(0xff01, 0xff60):
...
hex is not needed.
As mentioned in the docs:
Convert an integer number to a lowercase hexadecimal string prefixed with “0x”.
Obviously that already describes it, it becomes a string instead of what we want, an integer.
Whereas the ord function does what we want, as mentioned in the docs:
Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. For example, ord('a') returns the integer 97 and ord('€') (Euro sign) returns 8364. This is the inverse of chr().

Hash function to see if one string is scrambled form/permutation of another?

I want to check if string A is just a reordered version of string B. For example, "abc" = "bca" = "cab"...
There are other solutions here: https://www.geeksforgeeks.org/check-if-two-strings-are-permutation-of-each-other/
However, I was thinking a hash function would be an easy way of doing this, but the typical hash function takes order into consideration. Are there any hash functions that do not care about character order?
Are there any hash functions that do not care about character order?
I don't know of real-world hash functions that have this property, no. Because this is not a problem they are designed to solve.
However, in this specific case, you can make your own "hash" function (a very very bad one) that will indeed ignore order: just sum ASCII codes of characters. This works due to the commutative property of addition (a + b == b + a)
def isAnagram(self,a,b):
sum_a = 0
sum_b = 0
for c in a:
sum_a += ord(c)
for c in b:
sum_b += ord(c)
return sum_a == sum_b
To reiterate, this is absolutely a hack, that only happens to work because input strings are limited in content in the judge system (only have lowercase ASCII characters and do not contain spaces). It will not work (reliably) on arbitrary strings.
For a fast check you could use a kind af hash-funkction
Candidates are:
xor all characters of a String
add all characters of a String
multiply all characters of a String (be careful might lead to overflow for large Strings)
If the hash-value is equal, it could still be a collision of two not 'equal' strings. So you still need to make a dedicated compare. (e.g. sort the characters of each string before comparing them).

Inner workings of map() in a specific parsing situation

I know there are already at least two topics that explain how map() works but I can't seem to understand its workings in a specific case I encountered.
I was working on the following Python exercise:
Write a program that computes the net amount of a bank account based a
transaction log from console input. The transaction log format is
shown as following:
D 100
W 200
D means deposit while W means withdrawal. Suppose the following input
is supplied to the program:
D 300
D 300
W 200
D 100
Then, the output should be:
500
One of the answers offered for this exercise was the following:
total = 0
while True:
s = input().split()
if not s:
break
cm,num = map(str,s)
if cm=='D':
total+=int(num)
if cm=='W':
total-=int(num)
print(total)
Now, I understand that map applies a function (str) to an iterable (s), but what I'm failing to see is how the program identifies what is a number in the s string. I assume str converts each letter/number/etc in a string type, but then how does int(num) know what to pick as a whole number? In other words, how come this code doesn't produce some kind of TypeError or ValueError, because the way I see it, it would try and make an integer of (for example) "D 100"?
first
cm,num = map(str,s)
could be simplified as
cm,num = s
since s is already a list of strings made of 2 elements (if the input is correct). No need to convert strings that are already strings. s is just unpacked into 2 variables.
the way I see it, it would try and make an integer of (for example) "D 100"?
no it cannot, since num is the second parameter of the string.
if input is "D 100", then s is ['D','100'], then cm is 'D' and num is '100'
Then since num represents an integer int(num) is going to convert num to its integer value.
The above code is completely devoid of error checking (number of parameters, parameters "type") but with the correct parameters it works.
and map is completely useless in that particular example too.
The reason is the .split(), statement before in the s = input().split(). This creates a list of the values D and 100 (or ['D', '100']), because the default split character is a space ( ). Then the map function applies the str operation to both 'D' and '100'.
Now the map, function is not really required because both values upon input are automatically of the type str (strings).
The second question is how int(num) knows how to convert a string. This has to do with the second (implicit) argument base. Similar to how .split() has a default argument of the character to split on, so does num have a default argument to convert to.
The full code is similar to int(num, base=10). So as long as num has the values 0-9 and at most 1 ., int can convert it properly to the base 10. For more examples check out built in int.

Representing a non-numeric string as in integer in Julia

I'm looking for a simple and invertible way to represent a Julia string by an integer (e.g. for cryptography). To be clear, I'm not considering string representations of integers like "123", but arbitrary strings like "Hello". The representation doesn't need to be human-readable, but it needs to be easily invertible back to a unique string (so not a hash). It doesn't need to be efficient; I'm just looking for something as simple as possible. (Also, it's fine if it only works on a small character set, e.g. lowercase Roman letters.)
One naive way would be to collect the string into a vector of chars, parse(Int, _) each char to an integer, and concatenate the integers. But this seems cumbersome, and I suspect that there's in built-in Julia function (or small composition of functions) that will get the job done more easily.
If your strings only use the numbers 0-9 and letters a-z and A-Z, then you can parse the string directly as base 62 BigInteger:
julia> s = randstring(123)
"RFXkzD6VpWcwvbsxOtdTxS4DGcgciKgDXECa9fEK0Djcdkcj5N75vIHEMVyuH9mcYgvFbLhbPdrKyPIO4JsK1DKgZIacov6WKDZdIpGJ5iJ15dpjmcCBCybMmxB"
julia> i = parse(BigInt, s, base=62)
12798646956721889529517502411501433963894611324020956397632780092623456213685688389093681112679380669903728068303911743800989012987014660454736389459814982802097607808640628339365945710572579898457023165244164689548286133
julia> string(i, base=62)
"RFXkzD6VpWcwvbsxOtdTxS4DGcgciKgDXECa9fEK0Djcdkcj5N75vIHEMVyuH9mcYgvFbLhbPdrKyPIO4JsK1DKgZIacov6WKDZdIpGJ5iJ15dpjmcCBCybMmxB"
I created a (somewhat complicated) implementation that works for ASCII strings:
stringToInt(str::String) = sum(i -> Int(str[end-i]) * 128^i, 0:length(str)-1)
function intToString(m::Int)
chars = Char[]
for n in div(ceil(Int, log2(x)), 7)-1:-1:0
d, m = divrem(m, 128^n)
push!(chars, d)
end
String(chars)
end
Let me know if you can think of a better one.

Convert a string into an integer of its ascii values

I am trying to write a function that takes a string txt and returns an int of that string's character's ascii numbers. It also takes a second argument, n, that is an int that specified the number of digits that each character should translate to. The default value of n is 3. n is always > 3 and the string input is always non-empty.
Example outputs:
string_to_number('fff')
102102102
string_to_number('ABBA', n = 4)
65006600660065
My current strategy is to split txt into its characters by converting it into a list. Then, I convert the characters into their ord values and append this to a new list. I then try to combine the elements in this new list into a number (e.g. I would go from ['102', '102', '102'] to ['102102102']. Then I try to convert the first element of this list (aka the only element), into an integer. My current code looks like this:
def string_to_number(txt, n=3):
characters = list(txt)
ord_values = []
for character in characters:
ord_values.append(ord(character))
joined_ord_values = ''.join(ord_values)
final_number = int(joined_ord_values[0])
return final_number
The issue is that I get a Type Error. I can write code that successfully returns the integer of a single-character string, however when it comes to ones that contain more than one character, I can't because of this type error. Is there any way of fixing this. Thank you, and apologies if this is quite long.
Try this:
def string_to_number(text, n=3):
return int(''.join('{:0>{}}'.format(ord(c), n) for c in text))
print(string_to_number('fff'))
print(string_to_number('ABBA', n=4))
Output:
102102102
65006600660065
Edit: without list comprehension, as OP asked in the comment
def string_to_number(text, n=3):
l = []
for c in text:
l.append('{:0>{}}'.format(ord(c), n))
return int(''.join(l))
Useful link(s):
string formatting in python: contains pretty much everything you need to know about string formatting in python
The join method expects an array of strings, so you'll need to convert your ASCII codes into strings. This almost gets it done:
ord_values.append(str(ord(character)))
except that it doesn't respect your number-of-digits requirement.

Resources