isalnum() in python not giving expected results - python-3.x

string = "xyz123"
print(string.isalnum()) # this returns 'True'
string = "xy 12"
print(string.isalnum()) # this returns 'False'
string = "xy"
print(string.isalnum()) # this return 'True'
But 'xy' is not alphanumeric.
Python version 3.6.4

As #BugHunter has pointed out, isalnum() may not be helpful in your case.
You can try:
bool(re.match('^(?=.*[a-zA-Z])(?=.*[0-9])', string))

Alphanumeric means consisting of either letters and numbers or both, so that is why you received that output. Whitespace is not alphanumeric hence why it is false.

Related

Compare Unicode code point range in Python3

I would like to check if a character is in a certain Unicode range or not, but seems I cannot get the expected answer.
char = "?" # the unicode value is 0xff1f
print(hex(ord(char)))
if hex(ord(char)) in range(0xff01, 0xff60):
print("in range")
else:
print("not in range")
It should print: "in range", but the results show: "not in range". What have I done wrong?
hex() returns a string. To compare integers you should simply use ord:
if ord(char) in range(0xff01, 0xff60):
You could've also written:
if 0xff01 <= ord(char) < 0xff60:
In general for such problems, you can try inspecting the types of your variables.
Typing 0xff01 without quotes, represents a number.
list(range(0xff01, 0xff60)) will give you a list of integers [65281, 65282, .., 65375]. range(0xff01, 0xff60) == range(65281, 65376) evaluates to True.
ord('?') gives you integer 65311.
hex() takes an integer and converts it to '0xff01' (a string).
So, you simply need to use ord(), no need to hex() it.
Just only use ord:
if ord(char) in range(0xff01, 0xff60):
...
hex is not needed.
As mentioned in the docs:
Convert an integer number to a lowercase hexadecimal string prefixed with “0x”.
Obviously that already describes it, it becomes a string instead of what we want, an integer.
Whereas the ord function does what we want, as mentioned in the docs:
Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. For example, ord('a') returns the integer 97 and ord('€') (Euro sign) returns 8364. This is the inverse of chr().

Python - Replacing repeated consonants with other values in a string

I want to write a function that, given a string, returns a new string in which occurences of a sequence of the same consonant with 2 or more elements are replaced with the same sequence except the first consonant - which should be replaced with the character 'm'.
The explanation was probably very confusing, so here are some examples:
"hello world" should return "hemlo world"
"Hannibal" should return "Hamnibal"
"error" should return "emror"
"although" should return "although" (returns the same string because none of the characters are repeated in a sequence)
"bbb" should return "mbb"
I looked into using regex but wasn't able to achieve what I wanted. Any help is appreciated.
Thank you in advance!
Regex is probably the best tool for the job here. The 'correct' expression is
test = """
hello world
Hannibal
error
although
bbb
"""
output = re.sub(r'(.)\1+', lambda g:f'm{g.group(0)[1:]}', test)
# '''
# hemlo world
# Hamnibal
# emror
# although
# mbb
# '''
The only real complicated part of this is the lambda that we give as an argument. re.sub() can accept one as its 'replacement criteria' - it gets passed a regex object (which we call .group(0) on to get the full match, i.e. all of the repeated letters) and should output a string, with which to replace whatever was matched. Here, we use it to output the character 'm' followed by the second character onwards of the match, in an f-string.
The regex itself is pretty straightforward as well. Any character (.), then the same character (\1) again one or more times (+). If you wanted just alphanumerics (i.e. not to replace duplicate whitespace characters), you could use (\w) instead of (.)

Is there a built-in in Python 3 that checks whether a character is a "basic" algebraic symbol?

I know the string methods str.isdigit, str.isdecimal and str.isnumeric.
I'm looking for a built-in method that checks if a character is algebraic, meaning that it can be found in a declaration of a decimal number.
The above mentioned methods return False for '-1' and '1.0'.
I can use isdigit to retrieve a positive integer from a string:
string = 'number=123'
number = ''.join([d for d in string if d.isdigit()]) # returns '123'
But that doesn't work for negative integers or floats.
Imagine a method called isnumber that works like this:
def isnumber(s):
for c in s:
if c not in list('.+-0123456789'):
return False
return True
string1 = 'number=-1'
string2 = 'number=0.1'
number1 = ''.join([d for d in string1 if d.isnumber()]) # returns '-1'
number2 = ''.join([d for d in string2 if d.isnumber()]) # returns '0.1'
The idea is to test against a set of "basic" algebraic characters. The string does not have to contain a valid Python number. It could also be an IP address like 255.255.0.1.
.
Does a handy built-in that works approximately like that exist?
If not, why not? It would be much more efficient than a python function and very useful. I've seen alot of examples on stackoverflow that use str.isdigit() to retrieve a positive integer from a string. Is there a reason why there isn't a built-in like that, although there are three different methods that do almost the same thing?
No such function exists. There are a bunch of odd characters that can be part of number literals in Python, such as o, x and b in the prefix of integers of non-decimal bases, and e to introduce the exponential part of a float. I think those plus the hex digits (0-9 and A-F) and sign characters and the decimal point are all you need.
You can put together a string with the right character yourself and test against it:
from string import hex_digits
num_literal_chars = hex_digits + "oxOX.+-"
That will get a bunch of garbage though if you use it to test against mixed text and numbers:
string1 = "foo. bar. 0xDEADBEEF 10.0.0.1"
print("".join(c for c in string1 if c in num_literal_chars))
# prints "foo.ba.0xDEADBEEF10.0.0.1"
The fact that it gives you a bunch of junk is probably why no builtin function exists to do this. If you want to match a certain kind of number out of a string, write an appropriate regular expression to match that specific kind of number. Don't try to do it character-by-character, or try to match all the different kinds of Python numbers.

Convert a string into an integer of its ascii values

I am trying to write a function that takes a string txt and returns an int of that string's character's ascii numbers. It also takes a second argument, n, that is an int that specified the number of digits that each character should translate to. The default value of n is 3. n is always > 3 and the string input is always non-empty.
Example outputs:
string_to_number('fff')
102102102
string_to_number('ABBA', n = 4)
65006600660065
My current strategy is to split txt into its characters by converting it into a list. Then, I convert the characters into their ord values and append this to a new list. I then try to combine the elements in this new list into a number (e.g. I would go from ['102', '102', '102'] to ['102102102']. Then I try to convert the first element of this list (aka the only element), into an integer. My current code looks like this:
def string_to_number(txt, n=3):
characters = list(txt)
ord_values = []
for character in characters:
ord_values.append(ord(character))
joined_ord_values = ''.join(ord_values)
final_number = int(joined_ord_values[0])
return final_number
The issue is that I get a Type Error. I can write code that successfully returns the integer of a single-character string, however when it comes to ones that contain more than one character, I can't because of this type error. Is there any way of fixing this. Thank you, and apologies if this is quite long.
Try this:
def string_to_number(text, n=3):
return int(''.join('{:0>{}}'.format(ord(c), n) for c in text))
print(string_to_number('fff'))
print(string_to_number('ABBA', n=4))
Output:
102102102
65006600660065
Edit: without list comprehension, as OP asked in the comment
def string_to_number(text, n=3):
l = []
for c in text:
l.append('{:0>{}}'.format(ord(c), n))
return int(''.join(l))
Useful link(s):
string formatting in python: contains pretty much everything you need to know about string formatting in python
The join method expects an array of strings, so you'll need to convert your ASCII codes into strings. This almost gets it done:
ord_values.append(str(ord(character)))
except that it doesn't respect your number-of-digits requirement.

ValueError: invalid literal for int() with base 2 using Python 3

I have created my own version of AES (baby version) everything is working correctly however.
Some binary numbers somehow pick up a 'b' within them example: b1b10101
I am not very clued up on how python works with binary conversions but when trying to convert to a decimal using: pepee = int(pepe,2). It throws the error mentioned in the title when the string contains 'b'.
I found one other answer for this error on here, however the solution does not work for me. using 'format(pepe,'b')' throws an error for me.
I suspect it was written for Python 2.
I need to know, how I can prevent these b's from occurring in my binary strings, or how I can convert them back to the original bit value.
Sample code:
subList2 = ['b1', 'b1', '00', '00']
subStr = b1b10000
subStr = ''.join(subList2)
subDec = int(subStr,2)
Please note I did not intend these b's to appear in the string, they appear during runtime
Have you tried making a smale code snippet to just convert a binary string? Where do you get the binary strings from? If you for example make binary string using bin(), the string will contain a 'b' character.
print(bin(10))
# Outputs: 0b1010
But if you use format(int, 'b') instead, it will not contain the 'b'.
# Set test to a binary string and print it
test = '101001'
print(test)
# Convert test from binary string to int and print it
test = int(test, 2)
print(test)
# Convert test from int to binary string and print it
test = format(test, 'b')
print(test)
Ok,
I got it working.
I had made an athematic error in my code, that was producing minus numbers for binary conversion. which created these 'b' characters in place of the minus numbers. now it is fixed.

Resources