How to use the split() method with some condition?

How to use the split() method with some condition? - python-3.x

There is one condition where I have to split my string in the manner that all the alphabetic characters should stay as one unit and everything else should be separated like the example shown below.
Example:
Some_var='12/1/20 Balance Brought Forward 150,585.80'
output_var=['12/1/20','Balance Brought Forward','150,585.80']

Yes, you could use some regex to get over this.
Some_var = '12/1/20 Balance Brought Forward 150,585.80'
match = re.split(r"([0-9\s\\\/\.,-]+|[a-zA-Z\s\\\/\.,-]+)", Some_var)
print(match)
You will get some extra spaces but you can trim that and you are good to go.

split isn't gonna cut it. You might wanna look into Regular Expressions (abbreviated regex) to accomplish this.
Here's a link to the Python docs: re module
As for a pattern, you could try using something like this:
([0-9\s\\\/\.,-]+|[a-zA-Z\s\\\/\.,-]+)
then trim each part of the output.

Related

How to get a substring with Regex in Python

I am trying to formnulate a regex to get the ids from the below two strings examples:
/drugs/2/drug-19904-5106/magnesium-oxide-tablet/details
/drugs/2/drug-19906/magnesium-moxide-tablet/details
In the first case, I should get 19904-5106 and in the second case 19906.
So far I tried several, the closes I could get is [drugs/2/drug]-.*\d but would return g-19904-5106 and g-19907.
Please any help to get ride of the "g-"?
Thank you in advance.

When writing a regex expression, consider the patterns you see so that you can align it correctly. For example, if you know that your desired IDs always appear in something resembling ABCD-1234-5678 where 1234-5678 is the ID you want, then you can use that. If you also know that your IDs are always digits, then you can refine the search even more
For your example, using a regex string like
.+?-(\d+(?:-\d+)*)
should do the trick. In a python script that would look something like the following:
match = re.search(r'.+?-(\d+(?:-\d+)*)', my_string)
if match:
my_id = match.group(1)
The pattern may vary depending on the depth and complexity of your examples, but that works for both of the ones you provided

This is the closest I could find: \d+|.\d+-.\d+

Stop matching after first occurence (single line)

How could I just match the first anchor tag and not all of them until the last one? Basically all of this: "<a...>...</a>" without the other ones? Would I need to sub the string before matching?
Here's what I got:
https://regex101.com/r/hXh2JI/1
Thank you!

Try
<a[^>]*>[^<]*</a>

I think this regex does what you're asking for. Add the global and multi line regex flags to capture all cases of <a> ... </a>.
Regex looks like this:
(<a>.*<\/a>)

Regex for specific permutations of a word

I am working on a wordle bot and I am trying to match words using regex. I am stuck at a problem where I need to look for specific permutations of a given word.
For example, if the word is "steal" these are all the permutations:
'tesla', 'stale', 'steal', 'taels', 'leats', 'setal', 'tales', 'slate', 'teals', 'stela', 'least', 'salet'.
I had some trouble creating a regex for this, but eventually stumbled on positive lookaheads which solved the issue. regex -
'(?=.*[s])(?=.*[l])(?=.*[a])(?=.*[t])(?=.*[e])'
But, if we are looking for specific permutations, how do we go about it?
For example words that look like 's[lt]a[lt]e'. The matching words are 'steal', 'stale', 'state'. But I want to limit the count of l and t in the matched word, which means the output should be 'steal' & 'stale'. 1 obvious solution is this regex r'slate|stale', but this is not a general solution. I am trying to arrive at a general solution for any scenario and the use of positive lookahead above seemed like a starting point. But I am unable to arrive at a solution.
Do we combine positive lookaheads with normal regex?
s(?=.*[lt])a(?=.*[lt])e (Did not work)
Or do we write nested lookaheads or something?
A few more regex that did not work -
s(?=.*[lt]a[tl]e)
s(?=.*[lt])(?=.*[a])(?=.*[lt])(?=.*[e])
I tried to look through the available posts on SO, but could not find anything that would help me understand this. Any help is appreciated.

You could append the regex which matches the permutations of interest to your existing regex. In your sample case, you would use:
(?=.*s)(?=.*l)(?=.*a)(?=.*t)(?=.*e)s[lt]a[lt]e
This will match only stale and slate; it won't match state because it fails the lookahead that requires an l in the word.
Note that you don't need the (?=.*s)(?=.*a)(?=.*e) in the above regex as they are required by the part that matches the permutations of interest. I've left them in to keep that part of the regex generic and not dependent on what follows it.
Demo on regex101
Note that to allow for duplicated characters you might want to change your lookaheads to something in this form:
(?=(?:[^s]*s){1}[^s]*)
You would change the quantifier on the group to match the number of occurrences of that character which are required.

How can I remove all characters inside angle brackets python?

How can I remove all characters inside angle brackets including the brackets in a string? How can I also remove all the text between ("\r\n") and ("."+"any 3 characters") Is this possible? I am currently using the solution by #xkcdjerry
e.g
body = """Dear Students roads etc. you place a tree take a snapshot, then when you place a\r\nbuilding, take a snapshot. Place at least 5-6 objects and then have 5-6\r\nsnapshots. Please keep these snapshots with you as everyone will be asked\r\nto share them during the class.\r\n\r\nI am attaching one PowerPoint containing instructions and one video of\r\nexplanation for your reference.\r\n\r\nKind regards,\r\nTeacher Name\r\n zoom_0.mp4\r\n<https://drive.google.com/file/d/1UX-klOfVhbefvbhZvIWijaBdQuLgh_-Uru4_1QTkth/view?usp=drive_web>"""
d = re.compile("\r\n.+?\\....")
body = d.sub('', body)
a = re.compile("<.*?>")
body = a.sub('', body)
print(body)```
For some reason the output is fine except that it has:
```gle.com/file/d/1UX-klOfVhbefvbhZvIWijaBdQuLgh_-Uru4_1QTkth/view?usp=drive_web>
randomly attached to the end How can I fix it.

Answer
Your problem can be solved by a regex:
Put this into the shell:
import re
a=re.compile("<.*?>")
a.sub('',"Keep this part of the string< Remove this part>Keep This part as well")
Output:
'Keep this part of the stringKeep This part as well'
Second question:
import re
re.compile("\r\n.*?\\..{3}")
a.sub('',"Hello\r\nFilename.png")
Output:
'Hello'
Breakdown
Regex is a robust way of finding, replacing, and mutating small strings inside bigger ones, for further reading,consult https://docs.python.org/3/library/re.html. Meanwhile, here are the breakdowns of the regex information used in this answer:
. means any char.
*? means as many of the before as needed but as little as possible(non-greedy match)
So .*? means any number of characters but as little as possible.
Note: The reason there is a \\. in the second regex is that a . in the match needs to be escaped by a \, which in its turn needs to be escaped as \\
The methods:
re.compile(patten:str) compiles a regex for farther use.
regex.sub(repl:str,string:str) replaces every match of regex in string with repl.
Hope it helps.

Select substring between two characters in Scala

I'm getting a garbled JSON string from a HTTP request, so I'm looking for a temp solution to select the JSON string only.
The request.params() returns this:
[{"insured_initials":"Tt","insured_surname":"Test"}=, _=1329793147757,
callback=jQuery1707229194729661704_1329793018352
I would like everything from the start of the '{' to the end of the '}'.
I found lots of examples of doing similar things with other languages, but the purpose of this is not to only solve the problem, but also to learn Scala. Will someone please show me how to select that {....} part?

Regexps should do the trick:
"\\{.*\\}".r.findFirstIn("your json string here")

As Jens said, a regular expression usually suffices for this. However, the syntax is a bit different:
"""\{.*\}""".r
creates an object of scala.util.matching.Regex, which provides the typical query methods you may want to do on a regular expression.
In your case, you are simply interested in the first occurrence in a sequence, which is done via findFirstIn:
scala> """\{.*\}""".r.findFirstIn("""[{"insured_initials":"Tt","insured_surname":"Test"}=, _=1329793147757,callback=jQuery1707229194729661704_1329793018352""")
res1: Option[String] = Some({"insured_initials":"Tt","insured_surname":"Test"})
Note that it returns on Option type, which you can easily use in a match to find out if the regexp was found successfully or not.
Edit: A final point to watch out for is that the regular expressions normally do not match over linebreaks, so if your JSON is not fully contained in the first line, you may want to think about eliminating the linebreaks first.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to use the split() method with some condition? - python-3.x

Yes, you could use some regex to get over this. Some_var = '12/1/20 Balance Brought Forward 150,585.80' match = re.split(r"([0-9\s\\\/\.,-]+|[a-zA-Z\s\\\/\.,-]+)", Some_var) print(match) You will get some extra spaces but you can trim that and you are good to go.

split isn't gonna cut it. You might wanna look into Regular Expressions (abbreviated regex) to accomplish this. Here's a link to the Python docs: re module As for a pattern, you could try using something like this: ([0-9\s\\\/\.,-]+|[a-zA-Z\s\\\/\.,-]+) then trim each part of the output.

Related

How to get a substring with Regex in Python

Stop matching after first occurence (single line)

Regex for specific permutations of a word

How can I remove all characters inside angle brackets python?

Select substring between two characters in Scala

Categories

Resources