Get the second-to-last occurence of a character from a string - python-3.x

How do I get the second to last occurence of a character in a string?
I want to use it to get aaa from www.example.com/example/abc/aaa

I like regular expressions:
import re
p = re.compile('a')
all_positions = [oMatch.span() for oMatch in p.finditer(sText)]
second_last_position_span = all_positions[-2]
second_last_position_start_index = second_last_position_span[0]

Related

how to get values using regex in python

here is my sample code
import re
string = '[P-123,SHA-123]'
pattern = re.compile(r"^\[(?P<curve>).*\]$", re.MULTILINE | re.IGNORECASE)
result = pattern.search(string)
print(result)
Expected output:
P-123
If you want to match that data format:
^\[(?P<curve>[A-Z]-\d+),[A-Z]+-\d+]\Z
Explanation
^ Start of string
\[ Match [
(?P<curve> Named capture group curve
[A-Z]-\d+ Match a single uppercase char, - and 1+ digits
) Close group
,[A-Z]+-\d+ Match 1+ uppercase chars - and 1+ digits
] Match ]
\Z End of string (or use $ if a newline after is allowed)
The value is in named capturing group curve. You could also use re.match instead of re.search as you are looking for a single group in the whole string.
Regex demo | Python demo
Example code
import re
string = '[P-123,SHA-123]'
pattern = re.compile(r"\[(?P<curve>[A-Z]-\d+),[A-Z]+-\d+]\Z", re.MULTILINE | re.IGNORECASE)
result = pattern.match(string)
print(result.group("curve"))
Output
P-123
string = '[P-123,SHA-123]'
pattern = re.compile(r"(P.\d*)", re.MULTILINE | re.IGNORECASE)
result = pattern.search(string)
print(result[1])
You can try this regex \W([A-Z]-[0-9]*) that extracting capital letter follow by - and then numbers
import re
string = '[P-123,SHA-123]'
pattern = re.compile(r"\W([A-Z]-[0-9]*)", re.MULTILINE | re.IGNORECASE)
result = pattern.search(string).group(1)
print(result)
Output
P-123

String replacement in python is replacing the entire string

When I do the string replacement I am getting an error. For example: my string is my_string = '15:15'. I want to replace 15 which is after the colon to 30. For example I need '15:30'. When I try to do the string replace it's working fine for all other values for example, '09:15', '09:20'.
I have tried:
my_string = '15:15'
my_new_string = my_string.replace(my_string[-2:], '30')
my_string = '15:15'
my_new_string = my_string.replace(my_string[-2:], '30')
What I am expecting is 15:30 but my actual output is 30:30
my_new_string = my_string.replace(my_string[-2:],'30') gets you 30:30, because you are replacing all occurrences of 15 -> 15:15 will become 30:30.
You could use str.split and str.format to get your new string:
my_string = '15:15'
my_new_string = '{}:{}'.format(my_string.split(':')[0], '30')
print(my_new_string)
Prints:
15:30
That is the expected behavior. Look at what the arguments for str.replace() mean:
replace(...)
S.replace(old, new[, count]) -> string
Return a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
It does not replace a substring, rather all occurrences of what you pass as the first parameter.
By calling my_string.replace(my_string[-2:], '30') you're essentially calling '15:15'.replace('15', '30') -- which will replace all occurrences of "15" by "30" so you'll end up with '30:30'.
If you want to replace the last two characters, reverse your logic: keep everything up to the last two characters and then add the '30' string you want at the end:
my_new_string = my_string[:-2] + '30'
When you use my_string[-2:] you are getting the string '15:'. Then when you substitute the function string.replace replaces all occurrences of 15: with 3, giving you 3030.
Instead, you can use my_string[2:] to get the string ':15' and replace it with ':30'. If you don't include the colon, then you will replace both occurrences of 15 and get '30:30'
my_new_string = my_string.replace(my_string[:-2], ':30')

How to better code, when looking for substrings?

I want to extract the currency (along with the $ sign) from a list, and create two different currency lists which I have done. But is there a better way to code this?
The list is as below:
['\n\n\t\t\t\t\t$59.90\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$55.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$38.50\n\t\t\t\t\n\n\n\t\t\t\t\t\t$49.90\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$49.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$62.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$68.80\n\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$49.80\n\t\t\t\t\n\n\n\t\t\t\t\t\t$60.50\n\t\t\t\t\t\n\n']
Python code:
pp_list = []
up_list = []
for u in usual_price_list:
rep = u.replace("\n","")
rep = rep.replace("\t","")
s = rep.rsplit("$",1)
pp_list.append(s[0])
up_list.append("$"+s[1])
For this kind of problem, I tend to use a lot the re module, as it is more readable, more maintainble and does not depend on which character surround what you are looking for :
import re
pp_list = []
up_list = []
for u in usual_price_list:
prices = re.findall(r"\$\d{2}\.\d{2}", u)
length_prices = len(prices)
if length_prices > 0:
pp_list.append(prices[0])
if length_prices > 1:
up_list.append(prices[1])
Regular Expresion Breakdown
$ is the end of string character, so we need to escape it
\d matches any digit, so \d{2} matches exactly 2 digits
. matches any character, so we need to escape it
If you want it you can modify the number of digits for the cents with \d{1,2} for matches one or two digits, or \d* to match 0 digit or more
As already pointed for doing that task re module is useful - I would use re.split following way:
import re
data = ['\n\n\t\t\t\t\t$59.90\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$55.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$38.50\n\t\t\t\t\n\n\n\t\t\t\t\t\t$49.90\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$49.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$62.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$68.80\n\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$49.80\n\t\t\t\t\n\n\n\t\t\t\t\t\t$60.50\n\t\t\t\t\t\n\n']
prices = [re.split(r'[\n\t]+',i) for i in data]
prices0 = [i[1] for i in prices]
prices1 = [i[2] for i in prices]
print(prices0)
print(prices1)
Output:
['$59.90', '$55.00', '$38.50', '$49.00', '$68.80', '$49.80']
['$68.00', '$68.00', '$49.90', '$62.00', '', '$60.50']
Note that this will work assuming that there are solely \n and \t excluding prices and there is at least one \n or \t before first price and at least one \n or \t between prices.
[\n\t]+ denotes any string made from \n or \t with length 1 or greater, that is \n, \t, \n\n, \t\t, \n\t, \t\n and so on

How can create a new string from an original string replacing all non-instances of a character

So Let's say I have a random string "Mississippi"
I want to create a new string from "Mississippi" but replacing all the non-instances of a particular character.
For example if we use the letter "S". In the new string, I want to keep all the S's in "MISSISSIPPI" and replace all the other letters with a "_".
I know how to do the reverse:
word = "MISSISSIPPI"
word2 = word.replace("S", "_")
print(word2)
word2 gives me MI__I__IPPI
but I can't figure out how to get word2 to be __SS_SS____
(The classic Hangman Game)
You would need to use the sub method of Python strings with a regular expression for symbolizing a NOT character set such as
import re
line = re.sub(r"[^S]", "_", line)
This replaces any non S character with the desired character.
You could do this with str.maketrans() and str.translate() but it would be easier with regular expressions. The trick is you need to insert your string of valid characters into the regular expression programattically:
import re
word = "MISSISSIPPI"
show = 'S' # augment as the game progresses
print(re.sub(r"[^{}]".format(show), "_", word))
A simpler way is to map a function across the string:
>>> ''.join(map(lambda w: '_' if w != 'S' else 'S', 'MISSISSIPPI'))
'__SS_SS____'

Replace substring that lies between two positions

I have a string S in Matlab. How can I replace a substring in S with some pattern P. I only know the first and the last index of substring in S. What is the approach?
How about that?
str = 'My dog is called Jim'; %// original string
a = 4; %// starting index
b = 6; %// last index
replace = 'hamster'; %// new pattern
newstr = [str(1:a-1) replace str(b+1:end)]
returns:
newstr = My hamster is called Jim
In case the pattern you want to substitute has the same number of characters as the new one, you can use simple indexing:
str(a:b) = 'cat'
returns:
str = My cat is called Jim

Resources