I have a string: "0x1.9999999afpap-4". I hope to have a regular expression to extract 1.9999999afpa from this string. I am looking for a regular expression solution to extract everything from a string except the "0x" and "p-4". At meanwhile, I hope this solution could be applied to other strings with random letters and length, such as extract"1.999999999pp" from "0x1.999999999ppp-4"
Thanks.
use a non-capturing group (note: you have to escape the dot to be accurate):
test ="0x1.9999999afp-4"
import re
new=re.search("1\.\w*(?=[$p])", test)
print (new.group())
now output is
"1.9999999af"
Related
How can I remove all characters inside angle brackets including the brackets in a string? How can I also remove all the text between ("\r\n") and ("."+"any 3 characters") Is this possible? I am currently using the solution by #xkcdjerry
e.g
body = """Dear Students roads etc. you place a tree take a snapshot, then when you place a\r\nbuilding, take a snapshot. Place at least 5-6 objects and then have 5-6\r\nsnapshots. Please keep these snapshots with you as everyone will be asked\r\nto share them during the class.\r\n\r\nI am attaching one PowerPoint containing instructions and one video of\r\nexplanation for your reference.\r\n\r\nKind regards,\r\nTeacher Name\r\n zoom_0.mp4\r\n<https://drive.google.com/file/d/1UX-klOfVhbefvbhZvIWijaBdQuLgh_-Uru4_1QTkth/view?usp=drive_web>"""
d = re.compile("\r\n.+?\\....")
body = d.sub('', body)
a = re.compile("<.*?>")
body = a.sub('', body)
print(body)```
For some reason the output is fine except that it has:
```gle.com/file/d/1UX-klOfVhbefvbhZvIWijaBdQuLgh_-Uru4_1QTkth/view?usp=drive_web>
randomly attached to the end How can I fix it.
Answer
Your problem can be solved by a regex:
Put this into the shell:
import re
a=re.compile("<.*?>")
a.sub('',"Keep this part of the string< Remove this part>Keep This part as well")
Output:
'Keep this part of the stringKeep This part as well'
Second question:
import re
re.compile("\r\n.*?\\..{3}")
a.sub('',"Hello\r\nFilename.png")
Output:
'Hello'
Breakdown
Regex is a robust way of finding, replacing, and mutating small strings inside bigger ones, for further reading,consult https://docs.python.org/3/library/re.html. Meanwhile, here are the breakdowns of the regex information used in this answer:
. means any char.
*? means as many of the before as needed but as little as possible(non-greedy match)
So .*? means any number of characters but as little as possible.
Note: The reason there is a \\. in the second regex is that a . in the match needs to be escaped by a \, which in its turn needs to be escaped as \\
The methods:
re.compile(patten:str) compiles a regex for farther use.
regex.sub(repl:str,string:str) replaces every match of regex in string with repl.
Hope it helps.
There is one condition where I have to split my string in the manner that all the alphabetic characters should stay as one unit and everything else should be separated like the example shown below.
Example:
Some_var='12/1/20 Balance Brought Forward 150,585.80'
output_var=['12/1/20','Balance Brought Forward','150,585.80']
Yes, you could use some regex to get over this.
Some_var = '12/1/20 Balance Brought Forward 150,585.80'
match = re.split(r"([0-9\s\\\/\.,-]+|[a-zA-Z\s\\\/\.,-]+)", Some_var)
print(match)
You will get some extra spaces but you can trim that and you are good to go.
split isn't gonna cut it. You might wanna look into Regular Expressions (abbreviated regex) to accomplish this.
Here's a link to the Python docs: re module
As for a pattern, you could try using something like this:
([0-9\s\\\/\.,-]+|[a-zA-Z\s\\\/\.,-]+)
then trim each part of the output.
There are some strings with the following pattern '{substring1}.{substring2}'. I only want to keep the substring1. For instance, for e23.hd, I only want to keep e23.
Here is a code for testing
a = 'e23.hd'
import re
re.sub(".*","",a)
a
e23.hd
I tried to use .* to represent the .{substring2}, it seems it does not work.
Are there any reason you use regex? This can be solved without regex.
But if you really want to, here the regex way:
a = 'e23.hd'
import re
re.sub("\..*","",a)
print(a)
#'e23'
or without regex:
print(a.split(".")[0])
#'e23'
or without regex and if multiple "." are possible:
print(a.rsplit(".", 1)[0])
#'e23'
I have a dynamically generated string like:
'\n\n\n0\n1\n\n\n\n\n\n'
or
'\r\n\r\n\r\n0\r\n\r\n1\r\n\r\n'
or
'\r\n\r\n\r\n1/2\r\n\r\n1/2\r\n\r\n'
I wonder what is the best way to extract only the number 1, 0 or 1/2 with python 3
What I am doing now is use \r\n or \n to split the string and check each element in the list - I don't like my own way to process, there should be a better and elegant way to do that.
Thanks
Split on whitespace to retrieve words. Then turn each word into a number, a fraction. Then convert to floating point, in case you find that more convenient.
(No need to wrap it with list(), if you're happy with a generator.)
from fractions import Fraction
def get_nums(s):
"""Extracts fractional numbers from input string s."""
return list(map(float, (map(Fraction, s.split()))))
It all can be done as a single one-liner using list-comprehension:
numbers=[float(i) for i in myString.split()]
The split method of the string class will take care of the excess whitespace for you.
Hey guys so I tried looking at previous questions but they dont answer it like my teacher wants it to be answered. Basically i need to get a string from a user input and see if it has:
at least one of [!,#,#,$,%,^,&,*,(,)] (non-letter and nonnumeric
character)
o Create a list for these special characters
I have no idea how to make a def to do this. Please help!
You should probably look into Regular expressions. Regular expressions allow you to do many string operations in a concise way. Specifically, you'll want to use re.findall() in order to find all special characters in your string and return them. You can check if the returned list has length 0 to check if there were any special characters at all.
With regards to building the regular expression to find special characters itself... I'm sure you can figure that part out ;)
Please try the below
import re
inputstring = raw_input("Enter String: ")
print inputstring
print "Valid" if re.match("^[a-zA-Z0-9_]*$", inputstring) else "Invalid"