remove the substring starting with some given characters - python-3.x

There are some strings with the following pattern '{substring1}.{substring2}'. I only want to keep the substring1. For instance, for e23.hd, I only want to keep e23.
Here is a code for testing
a = 'e23.hd'
import re
re.sub(".*","",a)
a
e23.hd
I tried to use .* to represent the .{substring2}, it seems it does not work.

Are there any reason you use regex? This can be solved without regex.
But if you really want to, here the regex way:
a = 'e23.hd'
import re
re.sub("\..*","",a)
print(a)
#'e23'
or without regex:
print(a.split(".")[0])
#'e23'
or without regex and if multiple "." are possible:
print(a.rsplit(".", 1)[0])
#'e23'

Related

How can I remove all characters inside angle brackets python?

How can I remove all characters inside angle brackets including the brackets in a string? How can I also remove all the text between ("\r\n") and ("."+"any 3 characters") Is this possible? I am currently using the solution by #xkcdjerry
e.g
body = """Dear Students roads etc. you place a tree take a snapshot, then when you place a\r\nbuilding, take a snapshot. Place at least 5-6 objects and then have 5-6\r\nsnapshots. Please keep these snapshots with you as everyone will be asked\r\nto share them during the class.\r\n\r\nI am attaching one PowerPoint containing instructions and one video of\r\nexplanation for your reference.\r\n\r\nKind regards,\r\nTeacher Name\r\n zoom_0.mp4\r\n<https://drive.google.com/file/d/1UX-klOfVhbefvbhZvIWijaBdQuLgh_-Uru4_1QTkth/view?usp=drive_web>"""
d = re.compile("\r\n.+?\\....")
body = d.sub('', body)
a = re.compile("<.*?>")
body = a.sub('', body)
print(body)```
For some reason the output is fine except that it has:
```gle.com/file/d/1UX-klOfVhbefvbhZvIWijaBdQuLgh_-Uru4_1QTkth/view?usp=drive_web>
randomly attached to the end How can I fix it.
Answer
Your problem can be solved by a regex:
Put this into the shell:
import re
a=re.compile("<.*?>")
a.sub('',"Keep this part of the string< Remove this part>Keep This part as well")
Output:
'Keep this part of the stringKeep This part as well'
Second question:
import re
re.compile("\r\n.*?\\..{3}")
a.sub('',"Hello\r\nFilename.png")
Output:
'Hello'
Breakdown
Regex is a robust way of finding, replacing, and mutating small strings inside bigger ones, for further reading,consult https://docs.python.org/3/library/re.html. Meanwhile, here are the breakdowns of the regex information used in this answer:
. means any char.
*? means as many of the before as needed but as little as possible(non-greedy match)
So .*? means any number of characters but as little as possible.
Note: The reason there is a \\. in the second regex is that a . in the match needs to be escaped by a \, which in its turn needs to be escaped as \\
The methods:
re.compile(patten:str) compiles a regex for farther use.
regex.sub(repl:str,string:str) replaces every match of regex in string with repl.
Hope it helps.

How to use the split() method with some condition?

There is one condition where I have to split my string in the manner that all the alphabetic characters should stay as one unit and everything else should be separated like the example shown below.
Example:
Some_var='12/1/20 Balance Brought Forward 150,585.80'
output_var=['12/1/20','Balance Brought Forward','150,585.80']
Yes, you could use some regex to get over this.
Some_var = '12/1/20 Balance Brought Forward 150,585.80'
match = re.split(r"([0-9\s\\\/\.,-]+|[a-zA-Z\s\\\/\.,-]+)", Some_var)
print(match)
You will get some extra spaces but you can trim that and you are good to go.
split isn't gonna cut it. You might wanna look into Regular Expressions (abbreviated regex) to accomplish this.
Here's a link to the Python docs: re module
As for a pattern, you could try using something like this:
([0-9\s\\\/\.,-]+|[a-zA-Z\s\\\/\.,-]+)
then trim each part of the output.

Problem with regex fuzzy search with positive lookahead (AND) and {e=<3}

I've a problem with the Python regex fuzzy search.
This is working:
import regex
s = '2991 Nixon Avenue Chattanooga Tennessee'
regex.search(r"(?msi)(?=.*\bnixon\b)(?=.*\bchattanooga\b)",s)
This is not working (removed a t from Chattanooga): result None
import regex
s = '2991 Nixon Avenue Chatanooga Tennessee'
regex.search(r"(?msie)(?=.*\bnixon\b)(?=.*\bchattanooga\b){e=<3}",s)
What am I doing wrong here?
It looks like it's something with the positive lookahead and the word bounderies.
Note: This is just a simple example to get it working. I reality is the part of a more complex job.
Aside, do i need to specify the fuzziness per regex item (nixon, chattanooga) or is it possible to do it for both at the same time e.g. ((?=.*\bnixon)(?=.*\bchattanooga\b)){e=<3}
I was applying the fuzziness to the lookahead itself instead of to its
contents.
If it's "Chattanooga" that's fuzzy, do:
regex.search(r"(?msie)(?=.*\bnixon\b)(?=.*\b(?:chattanooga){e<=3}\b)",s)

regular expression for stop position in python 3

I have a string: "0x1.9999999afpap-4". I hope to have a regular expression to extract 1.9999999afpa from this string. I am looking for a regular expression solution to extract everything from a string except the "0x" and "p-4". At meanwhile, I hope this solution could be applied to other strings with random letters and length, such as extract"1.999999999pp" from "0x1.999999999ppp-4"
Thanks.
use a non-capturing group (note: you have to escape the dot to be accurate):
test ="0x1.9999999afp-4"
import re
new=re.search("1\.\w*(?=[$p])", test)
print (new.group())
now output is
"1.9999999af"

Groovy - characters loss with stream.getText

I have this Groovy script that I'm testing:
InputStream is = awsS3Stream.getObjectContent();
def lines = is.getText("UTF-8");
println "lines:"+lines;
Pattern pattern = ~/type\"\:\"[A-Z][a-z]*\"/;
Matcher matcher = pattern.matcher(lines);
...
I noticed that depending on the size of the awsS3Stream object, variable lines may not have all of the text - the end of it is missing. I was hoping that using StringBuffer instead of String would solve the issue, but it did not. I hope someone may know a Groovy based solution to it as I'm not terribly familiar with Groovy... much appreciate your time.
P.S The issues I'm seeing is not related to the pattern - I don't need pattern there to see that the variable lines doesn't always have all of the data.
Are you trying to match alphabetic strings with just one initial uppercase letter? If not, the problem is with your regexp. To match camel case strings with any number of capital letters, use this:
Pattern pattern = ~/type\"\:\"[A-Za-z]*\"/;
The issue was with the data going into s3, not how I retrieve it.

Resources