SyntaxError: unexpected character after line continuation character re - python-3.x

I want to get the first two number of str "21N"
import re
str = "21N"
number = re.find(r\'d{1,2}', str)
but get this error, how do I get the first two number from the the str
I'am very think for

Move the backslash between the apostrophe and d:
number = re.findall(r'\d{1,2}', str)

You have a couple of errors. re.find doesn't exist, you can use re.search instead. And your backslash needs to be inside rather than outside your opening quote.
So the following would work:
number = re.search(r'\d{1,2}', str)
But the {1,2} is actually unnecessary if you know you're looking for exactly 2 digits. Just use:
number = re.search(r'\d{2}', str)
And as an aside, don't use the variable name str, as it is a built-in type in Python.

Related

How do i find/count number of variable in string using Python

Here is example of string
Hi {{1}},
The status of your leave application has changed,
Leaves: {{2}}
Status: {{3}}
See you soon back at office by Management.
Expected Result:
Variables Count = 3
i tried python count() using if/else, but i'm looking for sustainable solution.
You can use regular expressions:
import re
PATTERN = re.compile(r'\{\{\d+\}\}', re.DOTALL)
def count_vars(text: str) -> int:
return sum(1 for _ in PATTERN.finditer(text))
PATTERN defines the regular expression. The regular expression matches all strings that contain at least one digit (\d+) within a pair of curly brackets (\{\{\}\}). Curly brackets are special characters in regular expressions, so we must add \. re.DOTALL makes sure that we don't skip over new lines (\n). The finditer method iterates over all matches in the text and we simply count them.

Pathlib how to deal with folders that start with a number

Using Python 3 pathlib on Windows, is there a way to deal with folders that start with a number, other than adding an extra slash?
For example:
from pathlib import Path, PureWindowsPath
op = pathlib.Path("D:\Documents\01")
fn = "test.txt"
fp = outpath / fn
with fp.open("w", encoding ="utf-8") as f:
f.write(result)
Returns error: [Errno 22] Invalid argument: 'D:\\Documents\x01\\test.txt'
I would have thought the PureWindowsPath would have taken care of this. If I manually escape out of it with op = pathlib.Path("D:\Documents\\01"), then it is fine. Do I always have to manually add a backslash to avoid the escape?
"\01" is a byte whose value is 1, not "backslash, zero, one".
You can do, for example:
op = pathlib.Path("D:\Documents") / "01"
The extra slash in "D:\Documents\\01" is there to tell Python that you don't want it to interpret \01 as an escape sequence.
From the comments chain:
It's the Python interpreter that's doing the escaping: \01 will always
be treated as an escape sequence (unless it's in a raw string literal
like r"\01"). pathlib has nothing to do with escaping in this case

python converting strings into three blocks and if not two blocks

I want to write a function that converts the given string T and group them into three blocks.
However, I want to split the last block into two if it can't be broken down to three numbers.
For example, this is my code
import re
def num_format(T):
clean_number = re.sub('[^0-9]+', '', T)
formatted_number = re.sub(r"(\d{3})(?=(\d{3})+(?!\d{3}))", r"\1-", clean_number)
return formatted_number
num_format("05553--70002654")
this returns : '055-537-000-2654' as a result.
However, I want it to be '055-537-000-26-54'.
I used the regular expression, but have no idea how to split the last remaining numbers into two blocks!
I would really appreciate helping me to figure this problem out!!
Thanks in advance.
You can use
def num_format(T):
clean_number = ''.join(c for c in T if c.isdigit())
return re.sub(r'(\d{3})(?=\d{2})|(?<=\d{2})(?=\d{2}$)', r'\1-', clean_number)
See the regex demo.
Note you can get rid of all non-numeric chars using plain Python comprehension, the solution is borrowed from Removing all non-numeric characters from string in Python.
The regex matches
(\d{3}) - Group 1 (\1): three digits...
(?=\d{2}) - followed with two digits
| - or
(?<=\d{2})(?=\d{2}$) - a location between any two digit sequence and two digits that are at the end of string.
See the Python demo:
import re
def num_format(T):
clean_number = ''.join(c for c in T if c.isdigit())
return re.sub(r'(\d{3})(?=\d{2})|(?<=\d{2})(?=\d{2}$)', r'\1-', clean_number)
print(num_format("05553--70002654"))
# => 055-537-000-26-54

re.sub replacing string using original sub-string

I have a text file. I would like to remove all decimal points and their trailing numbers, unless text is preceding.
e.g 12.29,14.6,8967.334 should be replaced with 12,14,8967
e.g happypants2.3#email.com should not be modified.
My code is:
import re
txt1 = "9.9,8.8,22.2,88.7,morris1.43#email.com,chat22.3#email.com,123.6,6.54"
txt1 = re.sub(r',\d+[.]\d+', r'\d+',txt1)
print(txt1)
unless there is an easier way of completing this, how do I modify r'\d+' so it just returns the number without a decimal place?
You need to make use of groups in your regex. You put the digits before the '.' into parentheses, and then you can use '\1' to refer to them later:
txt1 = re.sub(r',(\d+)[.]\d+', r',\1',txt1)
Note that in your attempted replacement code you forgot to replace the comma, so your numbers would have been glommed together. This still isn't perfect though; the first number, since it doesn't begin with a comma, isn't processed.
Instead of checking for a comma, the better way is to check word boundaries, which can be done using \b. So the solution is:
import re
txt1 = "9.9,8.8,22.2,88.7,morris1.43#email.com,chat22.3#email.com,123.6,6.54"
txt1 = re.sub(r'\b(\d+)[.]\d+\b', r'\1',txt1)
print(txt1)
Considering these are the only two types of string that is present in your file, you can explicitly check for these conditions.
This may not be an efficient way, but what I have done is split the str and check if the string contains #email.com. If thats true, I am just appending to a new list. For your 1st condition to satisfy, we can convert the str to int which will eliminate the decimal points.
If you want everything back to a str variable, you can use .join().
Code:
txt1 = "9.9,8.8,22.2,88.7,morris1.43#email.com,chat22.3#email.com,123.6,6.54"
txt_list = []
for i in (txt1.split(',')):
if '#email.com' in i:
txt_list.append(i)
else:
txt_list.append(str(int(float(i))))
txt_new = ",".join(txt_list)
txt_new
Output:
'9,8,22,88,morris1.43#email.com,chat22.3#email.com,123,6'

How to replace ALL characters in a string with one character

Does anyone know a method that allows you to replace all the characters in a word with a single character?
If not, can anyone suggest a way to basically print _ (underscore) the number of times which is the length of the string itself without using any loops or ifs in the code?
mystring = '_'*len(mystring)
Of course, I'm guessing at the name of your string variable and the character that you want to use.
Or, if you just want to print it out, you can:
print('_'*len(mystring))
import re
str = "abcdefghi"
print(re.sub('[a-z]','_',str))

Resources