Modification to Hamming distance/ Edit Distance

Modification to Hamming distance/ Edit Distance - hamming-distance

I am having trouble modifying the Hamming distance algorithm in order to affect my data in two ways
Add .5 to the Hamming distance if a capital letter is switched for a lower case letter unless it is in the first position.
Examples include: "Killer" and "killer" have a distance of 0 "killer" and "KiLler" have a Hamming distance of .5. "Funny" and FAnny" have a distance of 1.5 (1 for the different letter, additional .5 for the different capitalization).
Making it so that b and d (and their capitalized counterparts) are seen as the same thing
Here is the code i have found that makes up the basic Hamming program
def hamming_distance(s1, s2):
assert len(s1) == len(s2)
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
if __name__=="__main__":
a = 'mark'
b = 'Make'
print hamming_distance(a, b)
Any suggestions would be welcomed!

Here is a simple solution. For sure it could be optimized for better performance.
Note: I used Python 3, since Python 2 will retire soon.
def hamming_distance(s1, s2):
assert len(s1) == len(s2)
# b and d are interchangeable
s1 = s1.replace('b', 'd').replace('B', 'D')
s2 = s2.replace('b', 'd').replace('B', 'D')
# add 1 for each different character
hammingdist = sum(ch1 != ch2 for ch1, ch2 in zip(s1.lower(), s2.lower()))
# add .5 for each lower/upper case difference (without first letter)
for i in range(1, len(s1)):
hammingdist += 0.5 * (s1[i] >= 'a' and s1[i] <= 'z' and\
s2[i] >= 'A' and s2[i] <= 'Z' or\
s1[i] >= 'A' and s1[i] <= 'Z' and\
s2[i] >= 'a' and s2[i] <= 'z')
return hammingdist
def print_hamming_distance(s1, s2):
print("hamming distance between", s1, "and", s2, "is",
hamming_distance(s1, s2))
if __name__ == "__main__":
assert hamming_distance('mark', 'Make') == 2
assert hamming_distance('Killer', 'killer') == 0
assert hamming_distance('killer', 'KiLler') == 0.5
assert hamming_distance('bole', 'dole') == 0
print("all fine")
print_hamming_distance("organized", "orGanised")
# prints: hamming distance between organized and orGanised is 1.5

Related

Write a recursive function

I am trying to write a function to compare the letters in 2 strings.
If the 2 letters are in same position then replace it with '!' and '^' if otherwise.
S1: 'ABACADABRA'
S2: 'ACABADACCD'
This is my code using iterative method:
enter image description here
def Compare_String_I(S1,S2):
difference = ''
if len(S1) == len(S2):
for i in range(0, len(S1)) and range(0,len(S2)):
if S1[i] != S2[i]:
difference += str('^')
else:
difference += str('!')
return difference
I am trying to learn how to write the code recursively, but I am not sure how to do it.

How about this:
def compare_string(S1, S2):
if len(S1) != len(S2):
return
diff = ''
for s1, s2 in zip(S1, S2):
diff += '!' if s1 == s2 else '^'
return diff
print(compare_string('aaa', 'aba')) # !^!

if len(s1) != len(s2):
quit()
else:
i = 0
while i < len(s1):
if s1[i] == s2[i]:
s1 = s1.replace(s1[i], "!")
s2 = s2.replace(s2[i], "^")
i += 1
print(f"{s1}\n{s2}")

Is there a way to speed up these functions?

I have these functions. They are working perfectly, but is there a way to speed them up? I tried to split the dataset, but it takes the same or more time than the original functions. I'm working with big arrays (1Mill+X2504X2). create_needed_pos takes arount 350sec for 1.2millX2054X2 array, but my biggest is around 10bilionx2054x2.
#nb.njit
def create_needed_pos(chr_pos, pos):
needed_pos = nb.typed.List.empty_list(nb.int32)
for i in range(len(chr_pos)):
for k in range(len(pos)):
if chr_pos[i] == pos[k]:
if i == k == 1:
needed_pos = nb.typed.List([pos[k]])
else:
needed_pos.append(pos[k])
return needed_pos
#nb.njit
def create_mat(geno):
# create matrix as np.uint8 (1 byte) instead of list of python integers (8 byte)
# also no need to dynamically resize / increase list size
geno_mat = np.zeros((len(geno[:, 0]), len(geno[1, :])), dtype=np.uint8)
for i in np.arange(len(geno[:, 0])):
for k in np.arange(len(geno[1, :])):
g = geno[i, k]
# nested ifs to avoid duplicate comparisons
if g[0] == 0:
if g[1] == 0:
geno_mat[i, k] = 2
elif g[1] == 1:
geno_mat[i, k] = 1
else:
geno_mat[i, k] = 9
elif g[0] == 1:
if g[1] == 0:
geno_mat[i, k] = 1
elif g[1] == 1:
geno_mat[i, k] = 0
else:
geno_mat[i, k] = 9
else:
geno_mat[i, k] = 9
return geno_mat

Is there any way of eliminating the nested loops to prevent time execution error

how to make the code more efficient by using list comprehension or using itertools in python because this program gives timeexecution error for large input datasets.
n=0
k=0
v='AEIOU'
for i in range(0,len(string)):
for j in range(i+1,len(string)+1):
a = string[i:j]
#print(a)
if (a[0] == 'A') or (a[0] == 'E') or (a[0] == 'I') or (a[0] == 'O') or (a[0] == 'U'):
n+= 1
else:
k+=1
if n>k:
print('Kevin'+' '+str(n))
elif n<k:
print('Stuart'+' '+str(k))
else:
print('Draw')
if __name__ == '__main__':
s = input()
minion_game(s)
Please check the question from this link
https://solution.programmingoneonone.com/2020/06/hackerrank-the-minion-game-problem-solution-python.html
I would appreciate it if you please explain the solution to the program as I am totally new to programming.

Basically what you have to do is:
def isVowel(c):
if c in ['A', 'E', 'I', 'O', 'U']:
return True
return False
Kevin=0
Stuart=0
for i in range(len(s)): #s is the input string
a=len(s)-i
if isVowel(s[i]):
Kevin+=a
else :
stuart+=a
#check who has scored more he is the winner.
This works because, suppose for a string BANANA:
B is consonant so, we have to include all the strings starting with B.
B,BA,BAN.... so we will have total of (n-indexOf(B)) numbers of strings = 6-0 = 6 pts for stuart
A is vowel,
all strings with A = n-indexOf(A)=6-1=5 so 5 pts for kevin.
You dont have to explicitly check the numbers of times current substrings appear in the string as you will be iterating over all of them.
for example,
total pts for Kevin =
pts for A at : Index(1) + Index(3) + Index(5)
total pts = (6-1) + (6-3) + (6-5) = 9

Codingame 'A child's play' process times out

I'm trying to solve the coding challenge A child's play on Codingame using python.
With my program I can pass the first two test cases but when the test requires a lot of loops my program goes in timeout. What could I improve?
To fully understand the problem the details of the challenge are needed but I don't want to copy and paste them here because I'm not sure it's allowed.
I try to explain the problem with my words. Given this input:
12 6
987
...#........
...........#
............
............
..#O........
..........#.
O is the character starting point.
# are the walls you can not step on
. is where the character can step
In this example w=12 (width of the matrix) and h=6 (height of the matrix).
n = 987 is the number of steps the character has to take.
Required Output:
In this case 7 1 the position of the character after the number of moves given
Rules:
The character starts always by moving upwards
When a wall is encountered the character turns clockwise and keeps moving
The walls are placed so that the caracter can not get stuck and can not exit the matrix.
When I run the program with that test case I get the right result.
With the following test case instead:
14 10
123456789
..#...........
....#..#......
.#O.....#.....
..............
..............
.......##...#.
............#.
.#........###.
.#.#..........
..............
I get:
Failure
Process has timed out. This may mean that your solution is not optimized enough to handle some cases.
This is the code I managed to write:
import math
import sys
def find_initial_position(maze, w, h):
for i in range(0, h):
for j in range(0,w):
if maze[i][j] == "O":
return [i, j]
return -1
def can_move(maze, direction, x, y):
if direction == "U":
if maze[ x -1 ][ y ] == "#":
return False
elif direction == "R":
if maze[ x ][ y + 1 ] == "#":
return False
elif direction == "D":
if maze[ x +1 ][ y ] == "#":
return False
elif direction == "L":
if maze[ x ][ y-1 ] == "#":
return False
return True
def turn_clockwise(direction):
directions = ["U", "R", "D", "L"]
return directions[ (directions.index(direction) + 1) % 4 ]
def move(direction, coordinates):
if direction == "U":
coordinates[0] -=1
elif direction == "R":
coordinates[1] +=1
elif direction == "D":
coordinates[0] +=1
elif direction == "L":
coordinates[1] -=1
def main():
w, h = [int(i) for i in input().split()]
n = int(input())
maze = []
direction = "U"
position = [0, 0]
for i in range(h):
line = input()
maze.append(line)
position = find_initial_position(maze, w, h)
for i in range(0, n):
while not can_move(maze, direction, position[0], position[1]):
direction = turn_clockwise(direction)
move(direction, position)
print( "%(x)d %(y)d" %{"x": position[1], "y": position[0]} )
main()

I streamlined your code a little bit and made it somewhat more readable, by:
making use of matrix multiplication with numpy to do the 90° clockwise turns;
using the built-in str.index() to find the initial position.
Result below...
But really, this is missing the point.
If you look at the "mazes" in all the test cases, what's happening is that the "robot" is bouncing cyclically between four # obstacles in a rectangular pattern (could also be a more complex pattern). So with your approach, you're computing and re-computing the same short sequence of moves, millions and billions of times; even though the longest possible cycle cannot possibly have more moves than the number of squares in your small maze (order of magnitude).
What you should try to do is keep a continuous log of all the moves done so far (position, direction). And if – or rather, when – you end up in a (position, direction) where you've already been before, then you've covered one full cycle. No need to compute any more moves. Say your cyclic sequence is of length L and the total number of moves prescribed is n, then the final position will be sequence element number L mod n (or something like that, off-by-one errors notwithstanding).
import sys
import numpy as np
def is_obstacle(maze, position):
return maze[position[0]][position[1]] == '#'
def main():
w, h = [int(i) for i in input().split()]
n = int(input())
# Load maze
maze = []
for i in range(h):
line = input()
maze.append(line)
if 'O' in line:
# Found initial position
position = np.array([i, line.index('O')])
# Initial direction
direction = np.array([-1,0])
# Define actions
turn_clockwise = np.array([[0,-1],[1,0]])
# Walk maze
for i in range(n):
while is_obstacle(maze, position + direction):
direction = direction # turn_clockwise
position = position + direction
print( "%(x)d %(y)d" %{"x": position[1], "y": position[0]} )
main()

Iterating through a tuple list and somehow the 'a' is always treated as even number in the tuple (a,b)

I am able to get the code work good when the compound statement is changed to
if a % 2 == 0 and b % 2 == 0:
But as I am in learning phase could someone please guide me in explaining the error in the original code.
exm_list = [(4,8),(1,2),(4,5),(6,7),(10,20),(3,5),(3,2)]
for a,b in exm_list:
if a and b % 2 == 0:
print(f'{a,b} are the even numbers')
else:
print(f'one of {a,b} is the odd number')
enter image description here

The issue is that you are not asking anything for the condition for 'a'. What you should state is the following:
exm_list = [(4,8),(1,2),(4,5),(6,7),(10,20),(3,5),(3,2)]
for a,b in exm_list:
if a % 2 == 0 and b % 2 == 0:
print(f'{a,b} are the even numbers')
else:
print(f'one of {a,b} is the odd number')
Let me know.

In you case
if a and b % 2 == 0:
is equivalent to
if bool(a) and bool(b % 2 == 0):
a is an integer so bool(a) is True if a is not 0

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Modification to Hamming distance/ Edit Distance - hamming-distance

Related

Write a recursive function

Is there a way to speed up these functions?

Is there any way of eliminating the nested loops to prevent time execution error

Codingame 'A child's play' process times out

Iterating through a tuple list and somehow the 'a' is always treated as even number in the tuple (a,b)

Categories

Resources