How to write High order function in Python? - python-3.x

I am trying to solve this question, on Codewars,
This kata is the first of a sequence of four about "Squared Strings".
You are given a string of n lines, each substring being n characters long: For example:
s = "abcd\nefgh\nijkl\nmnop"
We will study some transformations of this square of strings.
Vertical mirror: vert_mirror (or vertMirror or vert-mirror)
vert_mirror(s) => "dcba\nhgfe\nlkji\nponm"
Horizontal mirror: hor_mirror (or horMirror or hor-mirror)
hor_mirror(s) => "mnop\nijkl\nefgh\nabcd"
or printed:
vertical mirror |horizontal mirror
abcd --> dcba |abcd --> mnop
efgh hgfe |efgh ijkl
ijkl lkji |ijkl efgh
mnop ponm |mnop abcd
My Task:
--> Write these two functions
and
--> high-order function oper(fct, s) where
--> fct is the function of one variable f to apply to the string s (fct will be one of vertMirror, horMirror)
Examples:
s = "abcd\nefgh\nijkl\nmnop"
oper(vert_mirror, s) => "dcba\nhgfe\nlkji\nponm"
oper(hor_mirror, s) => "mnop\nijkl\nefgh\nabcd"
Note:
The form of the parameter fct in oper changes according to the language. You can see each form according to the language in "Sample Tests".
Bash Note:
The input strings are separated by , instead of \n. The output strings should be separated by \r instead of \n.
Here's the code below:
def vert_mirror(strng):
# your code
def hor_mirror(strng):
# your code
pass
def oper(fct, s):
# your code
pass
"I'Have tried using reverse [::-1] but it doesn't work..

The if statement at the bottom is for testing, remove it if you want to use the code somewhere else.
def vert_mirror(string):
rv = []
separator = '\n'
words = string.split(separator)
for word in words:
rv.append(word[::-1])
rv = separator.join(rv)
#return the representation of rv, bc \n will be displayed as a newline
return repr(rv)
def hor_mirror(string):
rv = []
separator = '\n'
words = string.split(separator)
rv = words[::-1]
rv = separator.join(rv)
#return the representation of rv, bc \n will be displayed as a newline
return repr(rv)
def oper(fct, s):
return fct(s)
if __name__ == '__main__':
s = "abcd\nefgh\nijkl\nmnop"
print(oper(vert_mirror, s))
print(oper(hor_mirror, s))
EDIT: I've just seen the note "The input strings are separated by , instead of \n. The output strings should be separated by \r instead of \n.", if you need to change separators, just change the value of "separator" accordingly.
Or remove the repr(), if you want the raw string.

Related

Split string with commas while keeping numeric parts

I'm using the following function to separate strings with commas right on the capitals, as long as it is not preceded by a blank space.
def func(x):
y = re.findall('[A-Z][^A-Z\s]+(?:\s+\S[^A-Z\s]*)*', x)
return ','.join(y)
However, when I try to separate the next string it removes the part with numbers.
Input = '49ersRiders Mapple'
Output = 'Riders Mapple'
I tried the following code but now it removes the 'ers' part.
def test(x):
y = re.findall(r'\d+[A-Z]*|[A-Z][^A-Z\s]+(?:\s+\S[^A-Z\s]*)*', x)
return ','.join(y)
Output = '49,Riders Mapple'
The output I'm looking for is this:
'49ers,Riders Mapple'
Is it possible to add this indication to my regex?
Thanks in advance
Maybe naive but why don't you use re.sub:
def func(x):
return re.sub(r'(?<!\s)([A-Z])', r',\1', x)
inp = '49ersRiders Mapple'
out = func(inp)
print(out)
# Output
49ers,Riders Mapple
Here is a regex re.findall approach:
inp = "49ersRiders"
output = ','.join(re.findall('(?:[A-Z]|[0-9])[^A-Z]+', inp))
print(output) # 49ers,Riders
The regex pattern used here says to match:
(?:
[A-Z] a leading uppercase letter (try to find this first)
| OR
[0-9] a leading number (fallback for no uppercase)
)
[^A-Z]+ one or more non capital letters following

remove string which contains special character by python regular expression

my code:
s = '$ascv abs is good'
re.sub(p.search(s).group(),'',s)
ouput:
'$ascv abs is good'
the output what i want:
'abs is good'
I want to remove string which contains special character by python regular expression. I thought my code was right but the output is wrong.
How can i fix my code to make the output right?
invalid_chars = ['#'] # Characters you don't want in your text
# Determine if a string has any character you don't want
def if_clean(word):
for letter in word:
if letter in invalid_chars:
return False
return True
def clean_text(text):
text = text.split(' ') # Convert text to a list of words
text_clean = ''
for word in text:
if if_clean(word):
text_clean = text_clean+' '+word
return text_clean[1:]
# This will print 'abs is good'
print(clean_text('$ascv abs is good'))

Python regex multiple matches occurrences between two strings

I have a multi-line string with my start/end magic strings ("X" and "Y"). I'm trying to capture all occurrences but I'm experiencing some issues.
Here is the code
testString = '''AAAAAXBBBBBYCCCCCXDDDDDYEEEEEEXFFF
FFFYGGG
'''
pattern = re.compile(r'(.*)X(.*)Y(.*)', re.MULTILINE)
match = re.search(pattern, testString)
print match.group(1) # output: AAAAAXBBBBBYCCCCC
print match.group(2) # output: DDDDD
print match.group(3) # output: EEEEEEXFFF
Basically, I'm trying to capture all occurrences of the following (And I have to maintain text order):
Text before the magic start string (e.g.: AAAAA, CCCCC, EEEEEE)
Text between start/end magic strings (e.g.: BBBBB, DDDDD, FFF\nFFF)
Text after the magic start string (e.g.: CCCCC, GGG)
So I'm trying to print the following output: (what's in between brackets below is just a comment)
AAAAA (before magic string)
BBBBB (between magic strings)
CCCCC (before/after magic strings, it does not matter. Just the order matters.)
DDDDD (after magic string)
And so on. Printing them in that order would solve the issue. (Then I can pass each to other functions, ...etc.)
The code works nicely when the text is as simple as for example "AAXBBYCC", but with complicated strings I'm losing control.
Any ideas or alternative ways to do this?
You could match any character except X or Y in group 1 and then match X and do the same for Y. The "after the magic string" part you could capture in a lookahead with a third group.
The negated character class using [^ will also match an newline to match the FFFFFF part.
([^XY]+)X([^XY]+)Y(?=([^XY]+))
([^XY]+)X Capture group 1, match 1+ times any char except X or Y, then match X
([^XY]+)Y Capture group 2, match 1+ times any char except X or Y, then match Y
(?= Positive lookahead, assert what is directly to the right is
([^XY]+) Capture group 3, match 1+ times any char except X or Y
) Close lookahead
Regex demo | Python demo
import re
regex = r"([^XY]+)X([^XY]+)Y(?=([^XY]*))"
s = ("AAAAAXBBBBBYCCCCCXDDDDDYEEEEEEXFFF\n"
"FFFYGGG")
matches = re.findall(regex, s)
print(matches)
Output
[('AAAAA', 'BBBBB', 'CCCCC'), ('CCCCC', 'DDDDD', 'EEEEEE'), ('EEEEEE', 'FFF\nFFF', 'GGG')]
So I'm trying to print the following output: (what's in between brackets below is just a comment)
AAAAA (before magic string)
BBBBB (between magic strings)
CCCCC (before/after magic strings, it does not matter. Just the order matters.)
DDDDD (after magic string)
And so on.
Since it doesn't matter whether before or after start or end, it is as simple as:
import re
o = re.split("X|Y", testString)
print(*o, sep='\n')
Can't you just use:
pattern = re.compile(r'[^XY]+')
match = re.findall(pattern, testString)
print(match)
# ['AAAAA', 'BBBBB', 'CCCCC', 'DDDDD', 'EEEEEE', 'FFF\nFFF', 'GGG\n']

Groovy: tokenize string up to 3rd occurence of delimiter only

I want to tokenize string up to 3rd occurence of some delimiter and then return the rest of the string as last element of the tokenize array.
Example:
I have a String which looks like this:
String someString= 1.22.33.4
Now im tokenizing it by delimiter '.' like this:
def (a, b, c, d) = someString.tokenize('.')
And it works, but only if number of dots are exactly 3.
Now if someone puts more number of dots like:
String someString = 1.22.33.4.55
Then it wouldn't work, because the number of variables won't match. So i want to make sure it only tokenizes up to 3rd dot, and then gives back whatever is left. So what i want to achieve in this case would be:
a = 1, b=22, c=33, d=4.55
How to do that?
You can use the version of split with the second argument to restrict
the returned items. E.g.
def (a,b,c,d) = '1.22.33.4.55'.split("\\.", 4)
assert ["1","22","33","4.55"] == [a,b,c,d]
Not a one liner but it works:
String someString= '1.22.33.4.55'
def stringArray = someString.tokenize('.')
def (a,b,c) = stringArray
def d = stringArray.drop(3).join('.')
println "a=$a, b=$b, c=$c, d=$d"
result:
a=1, b=22, c=33, d=4.55

Matching key value pair from a python dictionary gives absurd results

I have created a python dictionary for expanding acronyms. For example, the dictionary has the following entry:
Acronym_dict = {
"cont":"continued"
}
The code for the dictionary lookup is as follows:
def code_dictionary(text, dict1=Acronym_dict):
for word in text.split():
for key in Acronym_dict:
if key in text:
text = text.replace(key, Acronym_dict[key],1)
return text
The problem is that the code is replacing every string that contains substring 'cont' with continued. For example, continental is getting replaced by 'continuedinental' by the dictionary. This is something that I don't want. I know I can add space before and after each key in the dictionary but that will be time-consuming as the dictionary is quite long. Any other alternative?? Please suggest.
A few solutions:
Use regular expressions to find isolated words using \b (word break):
import re
Acronym_dict = {
r'\bcont\b':'continued'
}
def code_dictionary(text, dict1=Acronym_dict):
for key,value in dict1.items():
text = re.sub(key,value,text)
return text
s = 'to be cont in continental'
print(code_dictionary(s))
to be continued in continental
If you don't want to change your dictionary, build the regular expression. Note re.escape makes sure the key doesn't contain anything treated differently by a regular expression:
import re
Acronym_dict = {
'cont':'continued'
}
def code_dictionary(text, dict1=Acronym_dict):
for key,value in dict1.items():
regex = r'\b' + re.escape(key) + r'\b'
text = re.sub(regex,value,text)
return text
s = 'to be cont in continental'
print(code_dictionary(s))
to be continued in continental
Fanciest version, does all the acronym replacement in one call to re.sub:
import re
Acronym_dict = {'a':'aaa',
'b':'bbb',
'c':'ccc',
'd':'ddd'}
def code_dictionary(text, dict1=Acronym_dict):
# ORs all the keys together, longest match first.
# E.g. generates r'\b(abc|ab|b)\b'.
# Captures the value it matches.
regex = r'\b(' + '|'.join([re.escape(key)
for key in
sorted(dict1,key=len,reverse=True)]) + r')\b'
# Replace everything in the text in one regex.
# Uses a callback to look up the value of the acronym.
return re.sub(regex,lambda m: dict1[m.group(1)],text)
s = 'a abcd b abcd c abcd d'
print(code_dictionary(s))
aaa abcd bbb abcd ccc abcd ddd
Try this:
import re
Acronym_dict = {
"cont":"continued"
}
def code_dictionary(text, dict1=Acronym_dict):
# for word in text.split():
for key in Acronym_dict:
text = re.sub(r'\b' + key + r'\b', Acronym_dict[key], text)
return text
if __name__ == "__main__":
text = '''
abcd cont ajflkasdfla cont.
cont continental afakjsklfjakl jfalfj asl cont fjdlaskfjal fjal
cont
'''
print(text)
print('--------------------')
print(code_dictionary(text))

Resources