Alien Dictionary
Link to the online judge -> LINK
Given a sorted dictionary of an alien language having N words and k starting alphabets of standard dictionary. Find the order of characters in the alien language.
Note: Many orders may be possible for a particular test case, thus you may return any valid order and output will be 1 if the order of string returned by the function is correct else 0 denoting incorrect string returned.
Example 1:
Input:
N = 5, K = 4
dict = {"baa","abcd","abca","cab","cad"}
Output:
1
Explanation:
Here order of characters is
'b', 'd', 'a', 'c' Note that words are sorted
and in the given language "baa" comes before
"abcd", therefore 'b' is before 'a' in output.
Similarly we can find other orders.
My working code:
from collections import defaultdict
class Solution:
def __init__(self):
self.vertList = defaultdict(list)
def addEdge(self,u,v):
self.vertList[u].append(v)
def topologicalSortDFS(self,givenV,visited,stack):
visited.add(givenV)
for nbr in self.vertList[givenV]:
if nbr not in visited:
self.topologicalSortDFS(nbr,visited,stack)
stack.append(givenV)
def findOrder(self,dict, N, K):
list1 = dict
for i in range(len(list1)-1):
word1 = list1[i]
word2 = list1[i+1]
rangej = min(len(word1),len(word2))
for j in range(rangej):
if word1[j] != word2[j]:
u = word1[j]
v = word2[j]
self.addEdge(u,v)
break
stack = []
visited = set()
vlist = [v for v in self.vertList]
for v in vlist:
if v not in visited:
self.topologicalSortDFS(v,visited,stack)
result = " ".join(stack[::-1])
return result
#{
# Driver Code Starts
#Initial Template for Python 3
class sort_by_order:
def __init__(self,s):
self.priority = {}
for i in range(len(s)):
self.priority[s[i]] = i
def transform(self,word):
new_word = ''
for c in word:
new_word += chr( ord('a') + self.priority[c] )
return new_word
def sort_this_list(self,lst):
lst.sort(key = self.transform)
if __name__ == '__main__':
t=int(input())
for _ in range(t):
line=input().strip().split()
n=int(line[0])
k=int(line[1])
alien_dict = [x for x in input().strip().split()]
duplicate_dict = alien_dict.copy()
ob=Solution()
order = ob.findOrder(alien_dict,n,k)
x = sort_by_order(order)
x.sort_this_list(duplicate_dict)
if duplicate_dict == alien_dict:
print(1)
else:
print(0)
My problem:
The code runs fine for the test cases that are given in the example but fails for ["baa", "abcd", "abca", "cab", "cad"]
It throws the following error for this input:
Runtime Error:
Runtime ErrorTraceback (most recent call last):
File "/home/e2beefe97937f518a410813879a35789.py", line 73, in <module>
x.sort_this_list(duplicate_dict)
File "/home/e2beefe97937f518a410813879a35789.py", line 58, in sort_this_list
lst.sort(key = self.transform)
File "/home/e2beefe97937f518a410813879a35789.py", line 54, in transform
new_word += chr( ord('a') + self.priority[c] )
KeyError: 'f'
Running in some other IDE:
If I explicitly give this input using some other IDE then the output I'm getting is b d a c
Interesting problem. Your idea is correct, it is a partially ordered set you can build a directed acyclcic graph and find an ordered list of vertices using topological sort.
The reason for your program to fail is because not all the letters that possibly some letters will not be added to your vertList.
Spoiler: adding the following line somewhere in your code solves the issue
vlist = [chr(ord('a') + v) for v in range(K)]
A simple failing example
Consider the input
2 4
baa abd
This will determine the following vertList
{"b": ["a"]}
The only constraint is that b must come before a in this alphabet. Your code returns the alphabet b a, since the letter d is not present you the driver code will produce an error when trying to check your solution. In my opinion it should simply output 0 in this situation.
import re
import sys
def isValid(s):
pattern_= re.compile("[12][\d]{12}$")
return pattern_.match(s)
loop = int(input ())
output=[]
for _ in range(0, loop):
ele = int(input())
output.append(ele)
entries = ''
for x in output :
entries += str(x)+ ''
print (output ) #['0123456789012']
print (entries ) #0123456789012
print(type(entries )) #str
print(type(output )) #list
# Driver Code
for _ in range(loop):
for x in entries:
if (isValid(x)):
sys.stdout.write ("Valid Number")
break
else :
sys.stdout.write ("Invalid Number")
break
Phones Numbers starts with the digit 1 or 2 followed by exactly 12 digits i.e Phones Numbers comprises of 13 digits.
For each Phone Number, print "Valid" or "Invalid" in a new line.
The list is taking wrong input
The output generated is,
2
0123456789012
1123456789012
[123456789012, 1123456789012]
123456789012 1123456789012
<class 'str'>
<class 'list'>
Invalid NumberInvalid Number
[Program finished]
Also, I have searched on stack before posting. This looked different issue. If anything matches the error on stack please redirect me there.
2
1123456789012
0123456778901
Valid Number
Invalid Number
[Program finished]
This is what it should look like
import re
def isValid(s):
pattern_= re.compile(r'[1|2][0-9]{12}$')
return pattern_.match(s)
loop = int(input())
# no of times loops to run
output = []
for _ in range(0, loop):
output.append(input())
entries = ''
for x in output :
entries += x + ''
result = []
# Driver Code
for val in output:
if isValid(val):
result.append('Valid Number')
else:
result.append ('Invalid Number')
for i in range(len(result )-1):
print(result[i])
print(result[-1], end = " ")
This should work too.
print first converts the object to a string (if it is not already a string). It will also put a space before the object if it is not the start of a line and a newline character at the end.
When using stdout, you need to convert the object to a string yourself (by calling "str", for example) and there is no newline character.
May I also suggest to rephrase your question as it's not a logic issue but a syntax issue.
Comment:
Checked with single and multiple inputs.
Works.
Try using the below regex
def is_valid(s):
pattern_= re.compile(r'[1|2][0-9]{12}$')
return pattern_.match(s)
I am not sure, why you are appending the numbers to the entities variable. I have changed the code a bit and the regex is working fine.
def is_valid(s):
pattern_= re.compile(r'[1|2][0-9]{12}$')
return pattern_.match(s)
loop = int(input())
output = []
for _ in range(0, loop):
output.append(input())
entries = ''
for x in output :
entries += x + ''
print (output ) # ['0123456789012']
print (entries ) # 0123456789012
print(type(entries )) # str
print(type(output )) # list
# Driver Code
for val in output:
if isValid(val):
print('Valid Number')
else:
print('Invalid Number')
Input:
5
1234567891234
1893456879354
2897347838389
0253478642678
6249842352985
Output:
['1234567891234', '1893456879354', '2897347838389', '0253478642678', '6249842352985']
12345678912341893456879354289734783838902534786426786249842352985
<class 'str'>
<class 'list'>
Valid Number
Valid Number
Valid Number
Invalid Number
Invalid Number
import sys
import re
def isValid(s):
pattern_= re.compile(r'[1|2][0-9]{12}$')
return pattern_.match(s)
loop = int(input())
output = []
for _ in range(0, loop):
output.append(input())
entries = ''
for x in output :
entries += x + ''
print (output ) # ['0123456789012']
print (entries ) # 0123456789012
print(type(entries )) # str
print(type(output )) # list
# Driver Code
for val in output:
if isValid(val):
sys.stdout.write('Valid Number')
else:
sys.stdout.write('Invalid Number')
produces
1
1234567891234
['1234567891234']
1234567891234
<class 'str'>
<class 'list'>
Valid Number
[Program finished]
print always returns carriage.
Whereas sys.stdout.write doesn't.
The challenge was resolved hence.
If a string is given, substitute the character with number of times that character is repeated only if the character is repeated more than three times consecutively like below
Input: aaaaa Output: 5Za
Input: addeeeeuyyyyy OutPut: add4Zeu5Zy
Tried like below:
>>> from itertools import groupby
>>> strs="aaaaa"
>>> [[k, len(list(g))] for k, g in groupby(strs)]
[['a', 5]]
>>>
Along the same thought process you were using.
from itertools import groupby
def condense(strs):
new_str = ''
for k, g in groupby(strs):
length = len(list(g))
if length > 3:
new_str += f'{length}Z{k}'
else:
new_str += k*length
return new_str
print(condense('aaaaa '))
print(condense('addeeeeuyyyyy'))
You got a good part of it - you need to implement the restriction of only abbreviating consecutive letters that occure 4+ times and add the 'Z' to the output.
You could do it like so:
from itertools import groupby
def run_length_encode(data):
result = []
for letter,amount in ( (k, len(list(g))) for k, g in groupby(data)):
result.append(letter*amount if amount < 4 else f'{amount}Z{letter}')
return ''.join(result)
data = "someaaaaabbbcdccdd"
print(data, run_length_encode(data), sep = "=>" )
Output:
someaaaaabbbcdccdd => some5Zabbbcdccdd
You can find more solutions (regex f.e.) in this related post:
Run length encoding in Python
s=input()
ini=""
ans=""
for i in range(len(s)):
if ini=="":
ini=s[i]
c=0
if ini==s[i]:
c=c+1
if ini!=s[i]:
if c>=3:
ans=ans+str(c)+"Z"+ini
else:
ans=ans+c*ini
ini=s[i]
c=1
if c!=0 and c<3:
ans=ans+c*ini
elif c!=0 and c>=3:
ans=ans+str(c)+"Z"+ini
print(ans)
I have this code
for letters in itertools.product(charset, repeat=47):
string = "".join(letters)
print(string)
and out from that is
aaaaaaaaaaaa
aaaaaaaaaaab
aaaaaaaaaaac
but im wondering how can I make it not generate same three characters in row so that out put is
dddcccbbbaaa
dddcccbbbaab
dddcccbbbaac
and so on without using something like this
for letters in itertools.product(charset, repeat=47):
string = "".join(letters)
for i in range(1,len(string)-1):
if string[i] is not string[i+1] is not string[i-1]:
print(string)
else:
pass
Here's a slightly modified version of your code:
import itertools
def version1(charset, N):
result = []
for letters in itertools.product(charset, repeat=N):
string = "".join(letters)
for i in range(0, N-2):
if string[i] == string[i+1] == string[i+2]:
break
else: # did not find any ZZZ sequence
result.append(string)
return result
>>> charset = "abc"
>>> N = 5
>>> version1(charset, N)
['aabaa', 'aabab', 'aabac', 'aabba', 'aabbc', 'aabca', 'aabcb', 'aabcc', 'aacaa', 'aacab', 'aacac', 'aacba', 'aacbb', 'aacbc', 'aacca', 'aaccb', 'abaab', 'abaac', 'ababa', 'ababb', 'ababc', 'abaca', 'abacb', 'abacc', 'abbaa', 'abbab', 'abbac', 'abbca', 'abbcb', 'abbcc', 'abcaa', 'abcab', 'abcac', 'abcba', 'abcbb', 'abcbc', 'abcca', 'abccb', 'acaab', 'acaac', 'acaba', 'acabb', 'acabc', 'acaca', 'acacb', 'acacc', 'acbaa', 'acbab', 'acbac', 'acbba', 'acbbc', 'acbca', 'acbcb', 'acbcc', 'accaa', 'accab', 'accac', 'accba', 'accbb', 'accbc', 'baaba', 'baabb', 'baabc', 'baaca', 'baacb', 'baacc', 'babaa', 'babab', 'babac', 'babba', 'babbc', 'babca', 'babcb', 'babcc', 'bacaa', 'bacab', 'bacac', 'bacba', 'bacbb', 'bacbc', 'bacca', 'baccb', 'bbaab', 'bbaac', 'bbaba', 'bbabb', 'bbabc', 'bbaca', 'bbacb', 'bbacc', 'bbcaa', 'bbcab', 'bbcac', 'bbcba', 'bbcbb', 'bbcbc', 'bbcca', 'bbccb', 'bcaab', 'bcaac', 'bcaba', 'bcabb', 'bcabc', 'bcaca', 'bcacb', 'bcacc', 'bcbaa', 'bcbab', 'bcbac', 'bcbba', 'bcbbc', 'bcbca', 'bcbcb', 'bcbcc', 'bccaa', 'bccab', 'bccac', 'bccba', 'bccbb', 'bccbc', 'caaba', 'caabb', 'caabc', 'caaca', 'caacb', 'caacc', 'cabaa', 'cabab', 'cabac', 'cabba', 'cabbc', 'cabca', 'cabcb', 'cabcc', 'cacaa', 'cacab', 'cacac', 'cacba', 'cacbb', 'cacbc', 'cacca', 'caccb', 'cbaab', 'cbaac', 'cbaba', 'cbabb', 'cbabc', 'cbaca', 'cbacb', 'cbacc', 'cbbaa', 'cbbab', 'cbbac', 'cbbca', 'cbbcb', 'cbbcc', 'cbcaa', 'cbcab', 'cbcac', 'cbcba', 'cbcbb', 'cbcbc', 'cbcca', 'cbccb', 'ccaab', 'ccaac', 'ccaba', 'ccabb', 'ccabc', 'ccaca', 'ccacb', 'ccacc', 'ccbaa', 'ccbab', 'ccbac', 'ccbba', 'ccbbc', 'ccbca', 'ccbcb', 'ccbcc']
Your algorithm is not optimal. Look at the first string:
aaaaa
You know that you need len(charset) - 1 iterations (aaaab, aaaac) to arrive to:
aaaba
And then again len(charset) - 1 iterations to arrive to:
aaaca
But you can skip all those iterations, because of the aaa beginning.
Actually, when you find sequence aaa, you can skip len(charset)^K - 1 where
K is the number of remaining chars. This does not change the big O complexity,
but will reduce the time of computation for long sequences, depending on the
size of the charset and the number of characters of the strings.
Intuitively, if the charset has few chars, you will spare a lot of computations.
First, you need to find the first letter after a ZZZ sequence:
def first_after_ZZZ(string):
for i in range(0, len(string)-2):
if string[i] == string[i+1] == string[i+2]:
return i+3
return -1
>>> first_after_ZZZ("ababa")
-1
>>> first_after_ZZZ("aaaba")
3
>>> first_after_ZZZ("aaabaaabb")
3
We use this function in the previous code (intermediate step):
def version2(charset, N):
result = []
for letters in itertools.product(charset, repeat=N):
string = "".join(letters)
f = first_after_ZZZ(string)
if f == -1:
result.append(string)
return result
>>> version2(charset, N) == version1(charset, N)
True
Now, we can skip some elements:
def version3(charset, N):
result = []
it = itertools.product(charset, repeat=N)
for letters in it:
string = "".join(letters)
f = first_after_ZZZ(string)
if f == -1:
result.append(string)
elif f < N:
K = N - f # K > 1
to_skip = len(charset)**K-1
next(itertools.islice(it, to_skip, to_skip), None) # this will skip to_skip tuples
return result
>>> version3(charset, N) == version1(charset, N)
True
Benchmark:
>>> from timeit import timeit as ti
>>> ti(lambda: version1(charset, 15), number=1)
13.14919564199954
>>> ti(lambda: version3(charset, 15), number=1)
6.94705574299951
This is impressive because the charset is small, but may be insignificant with a whole alphabet.
Of course, if you write your own implementation of product, you can skip the
tuples without generating them and this could be faster.
Is there an easy way to modify this code which converts from base 2 into base 10 to work for converting base 16 into base 10? My objective is to build a dedicated function for conversion and not use any built-in Python features for the calculation. Thanks
BinaryVal = int(input('Enter:')
DecVal = 0
for n in range(len(str(BinaryVal))):
Power = len(str(BinX))-(n+1)
DecVal += int(str(BinaryVal)[n])*(2**Power)
print(DecVal)
Yikes.
int already can convert from any base to base 10 - just supply it as the second argument.
int('101010',2)
Out[64]: 42
int('2A',16)
Out[66]: 42
To convert hexadecimal string to int:
>>> hexstr = '101010'
>>> int(hexstr, 16)
1052688
The same -- without int constructor:
>>> import binascii
>>> int.from_bytes(binascii.unhexlify(hexstr), 'big')
1052688
The same -- similar to #SzieberthAdam's answer:
>>> hex2dec = {d: i for i, d in enumerate('0123456789abcdef')}
>>> sum(hex2dec[h] * 16**pos for pos, h in enumerate(reversed(hexstr.lower())))
1052688
or:
>>> from functools import reduce
>>> reduce(lambda n, h: n*16 + hex2dec[h], hexstr.lower(), 0)
1052688
that is equivalent to:
def hex2int(hexstr):
n = 0
for h in hexstr.lower():
n = n*16 + hex2dec[h]
return n
Example:
>>> hex2int('101010')
1052688
As an alternative, one could convert all digits to int first:
>>> reduce(lambda n, d: n*16 + d, map(hex2dec.get, hexstr.lower()))
1052688
It raises TypeError for empty strings.
Well, here you go then:
>>> binary_num = '101010'
>>> sum(int(b)*2**i for i, b in enumerate(reversed(binary_num)))
42