I am given a sample string AABCAAADA. I then split it into 3 parts: AAB, CAA, ADA.
I have nested these 3 elements into a list. In each part, I should check whether a duplicate character is present and delete the duplicate character. I know strings are immutable, but is there any trick to do that?
Below is the sample approach I tried but I am unable to use del and pop method to delete that duplicate character.
s='AABCAAADA'
x = int(input())
l=[]
#for i in range(0,len(s),x):
for j in range(0,len(s),3):
l.append(s[j:j+3])
j=0
for i in range(0,len(s)//x):
for j in range(0,len(l[j])-1):
if(l[i][j] == l[i][j+1]):
pass
#need to remove the (j+1)th term if it is duplicate
The output should be AB, CA, AD.
delete duplicate character in nested list
from functools import reduce
l = ['AAB','CAA','ADA']
print([''.join(reduce(lambda a, b: a if b in a else a + b, s, '')) for s in l])
Or, for Python 3.6+:
print([''.join({a: 1 for a in s}) for s in l])
Both output:
['AB', 'CA', 'AD']
Related
So far, I have:
my_list = ['hello', 'oi']
comparison_list = ['this hellotext', 'this oitext']
for w in my_list:
if w in comparison_list: print('yes')
However, nothing prints because no element in my_list equals any element in comparison_list.
So how do I make this check as a subset or total occurance?
Ideal output:
yes
yes
You are checking the occurrence of the complete string in the list currently. Instead you can check for the occurrence of the string inside each comparison string and make a decision. A simple approach will be to re-write the loop as below
for w in my_list:
# Check for every comparison string. any() considers atleast 1 true value
if any([True for word in comparison_list if w in word]):
print('yes')
It's because you're comparing w to the list elements. If you wanna find w in each string in your comparison_list you can use any:
my_list = ['hello', 'oi', 'abcde']
comparison_list = ['this hellotext', 'this oitext']
for w in my_list:
if any(w in s for s in comparison_list):
print('yes')
else:
print('no')
I added a string to your list and handle the 'no' case in order to get an output for each element
Output:
yes
yes
no
Edited Solution:
Apologies for older solution, I was confused somehow.
Using re module , we can use re.search to determine if the string is present in the list of items. To do this we can create an expression using str.join to concatenate all the strings using |. Once the expression is created we can iterate through the list of comparison to be done. Note | means 'Or', so any of the searched strings if present should return bool value of True. Note I am assuming there are no special characters present in the my_list.
import re
reg = '|'.join(my_list)
for item in comparison_list:
print(bool(re.search(reg, item)))
A list of IP addresses are downloaded to a file and rename to Old_file. As days goes by device get update with more IPs(or deleted). Therefore, I download a new list of IP addresses to another file named as New_file
I then want to run a compare these two files and see what is not matching
Old_file = [1.1.1.1,
1.1.1.2,
1.1.1.3,
1.1.1.4,
1.1.1.6,]
new_file = [1.1.1.1,
1.1.1.2,
1.1.1.3,
1.1.1.5,
1.1.1.6]
return needs to 1.1.1.4, and stops there. But never from Old_file e.g: 1.1.1.5 (we need the results only from the New_file only)
I really hope this would explain.
Thanks in advance
Tony
For a simple element-wise comparison, you could do
def get_first_unequal(s0, s1):
for e0, e1 in zip(s0, s1): # assumes sequences are of equal length!
if e0 != e1:
print(f"unequal elements: '{e0}' vs. '{e1}'!")
return (e0, e1)
return None # all equal
a = ['a', 'b', 'c']
b = ['a', 'b', 'd']
get_first_unequal(a, b)
# unequal elements: 'c' vs. 'd'!
# ('c', 'd')
# --> to get a list of all unequal pairs, you could also use
# [(e0, e1) for (e0, e1) in zip(s0, s1) if e0 != e1]
If you want to go more sophisticated, as mentioned in the comments, difflib might be your way to go. to run e.g. a comparison of two sequences (which are the list of strings you read from the two txt files you want to compare):
import difflib
a = ['a', 'b', 'c']
b = ['s', 'b', 'c', 'd']
delta = difflib.context_diff(a, b)
for d in delta:
print(d)
gives
*** 1,3 ****
! a
b
c
--- 1,4 ----
! s
b
c
+ d
to check the difference between two strings, you could do something like (borrowing from here):
a = 'string1'
b = 'string 2'
delta = difflib.ndiff(a, b)
print(f"a -> b: {a} -> {b}")
for i, d in enumerate(delta):
if d[0] == ' ': # no difference
continue
elif d[0] == '-':
print(f"Deleted '{d[-1]}' from position {i}")
elif d[0] == '+':
print(f"Added '{d[-1]}' to position {i-1}")
gives
a -> b: string1 -> string 2
Deleted '1' from position 6
Added ' ' to position 6
Added '2' to position 7
If you're assuming that both files should be exactly identical, you can just iterate over the characters of the first and compare them to the second. I.e.
# check that they're the same length first
if len(Old_file) != len(New_file):
print('not the same!')
else:
for indx, char in enumerate(Old_file):
try:
# actually compare the characters
old_char = char
new_char = New_file[indx]
assert(old_char == new_char)
except IndexError:
# the new file is shorter than the old file
print('not the same!')
break # kill the loop
except AssertionError:
# the characters do not match
print('not the same!')
break # kill the loop
It's worth noting that there are faster ways to do this. You could look into performing a checksum, though it wouldn't tell you which parts are different only that they are different. If the files are large, the performance of doing the check one character at a time will be quite bad -- in that case you can try instead to compare blocks of data at a time.
Edit: re-reading your original question, you could definitely do this with a while loop. If you did, I would suggest basically the same strategy of checking each individual character. In that case you would manually need to increment the indx of course.
Changing Duplicate characters in a string to ) and non duplicate to (.
I have tried 2 for loops but it doesn't work. I am beginner in coding therefore I cant understand this complex code can someone explain.
def duplicate_encode(word):
return (lambda w: ''.join(('(', ')')[c in w[:i] + w[i+1:]] for i, c in enumerate(w)))(word.lower())
print(duplicate_encode("rEcede"))
Input: "Mercedes Bench"
Output: ()())()((()()(
As said in a comment, I think this is bad coding practice and should be avoided. But it can serve as an example of code reading. So I'll give it a try here. (First you should read about lambda if you're not familiar with it.)
First, look at the matching parentheses and try to find the "deepest" parts:
The top one is: lambda w: ''.join(('(', ')')[c in w[:i] + w[i+1:]] for i, c in enumerate(w))) applied to word.lower().
Then we have ('(', ')')[c in w[:i] + w[i+1:]] for i, c in enumerate(w)) in place of three dots inside ''.join(...).
enumerate(w), where w is a string, will produce an enumerate object that can be iterated to get tuples of form (i,c), where i is the index of the letter c. Try running for x in enumerate(w): print(x) for different strings w in order to get a feel for it.
The ('(', ')')[c in w[:i] + w[i+1:]] for i, c in enumerate(w)) will then produce a generator object by iterating through the tuples of letters of w and the respective indices that will consist of only ')' and '(' that will be then concatenated by ''.join(...) into the final output string. Let's break it down further.
[c in w[:i] + w[i+1:]] will always evaluate to either [True] or [False] (see 6 as to why). Now, ('(', ')')[False] will return '(' and ('(', ')')[True] will return ')' (something I learned right now by typing it out to see what happens).
For any letter in w there will be a tuple in the generator object (see point 4), (i, c). The [c in w[:i] + w[i+1:]] will first take two substrings of w. The first one will include all the letters up to the position i (where the current letter is located) and the second will include all the letters after the current letter. These two substrings are then concatenated. Then c in part will just check if the current letter is in the resulting string, effectively checking if the letter c appears at some other part of the string as well. For example for a w = 'aba' and second tuple from enumerate('aba'), that is (1, 'b'), w[:i] will be equal to 'aba'[:1] which is 'a' and w[i+1:] will be equal to 'aba'[:1] which is equal to 'a', concatenated we get a string 'aa' and thus [c in w[:i] + w[i+1:]] which in this case is equal to ['b' in 'aa'] will evaluate to [False], hence resulting in '('.
Effectively the lambda part is just a function that for each letter at a given position, checks if the same letter is present in a modified string with the letter removed from that position. It is then applied to an argument word.lower() which just insures that the caps are ignored (e.g., 'A' and 'a' are counted as the same letter).
This code replicates exactly what the lambda function does. By separating the logic into distinct statements it is easier to follow the logic. Remove the comments from the print statements to see the whole process in detail.
def simple_duplicate_encode(word):
output = ""
for i, c in enumerate(word):
# print(i,c)
i1 = word[:i]
i2 = word[i+1:]
# print(":{} = {}".format(i, word[:i]))
# print("{}: = {}".format(i+1, word[i+1:]))
is_duplicated = c in i1 + i2 # Check to see if the character c is in the rest of the string
# print("Is duplicated:{}".format(is_duplicated))
character = ('(',')')[is_duplicated] # If is_duplicated = True the value is 1, else 0
# print(character)
output += character
return output
I am trying to solve a problem that can be modelled most simply as follows.
I have a large collection of letter sequences. The letters come from two lists: (1) member list (2) non-member list. The sequences are of different compositions and lengths (e.g. AQFG, CCPFAKXZ, HBODCSL, etc.). My goal is to insert the number '1' into these sequences when any 'member' is followed by any two 'non-members':
Rule 1: Insert '1' after the first member letter that is followed
by 2 or more non-members letters.
Rule 2: Insert not more than one '1' per sequence.
The 'Members': A, B, C, D
'Non-members': E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z
In other words, once a member letter is followed by 2 non-member letters, insert a '1'. In total, only one '1' is inserted per sequence. Examples of what I am trying to achieve are this:
AQFG ---> A1QFG
CCPFAKXZ ---> CC1PFAKXZ
BDDCCA ---> BDDCCA1
HBODCSL ---> HBODC1SL
ABFCC ---> ABFCC
ACKTBB ---> AC1KTBB # there is no '1' to be inserted after BB
I assume the code will be something like this:
members = ['A','B','C','D']
non_members = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N',
'O','P','Q','R','S','T','U','V','W','X','Y','Z']
strings = ['AQFG', 'CCPFAKXZ', 'BDDCCA', 'HBODCSL', 'ABFCC']
for i in members:
if i in strings:
if member is followed by 2 non-members: # Struggling here
i.insert(index_member, '1')
return i
return ''
EDIT
I have found that one solution could be to generate a list of all permutations of two 'non-member' items using itertools.permutations(non_members, 2), and then test for their presence in the string.
But is there a more elegant solution for this problem?
Generating all permutations is going to explode the number of things you are checking. you need to change how you are iterating something like:
members = ...
non_members = ...
s = 'AQFG'
out = ""
look = 2
for i in range(len(s)-look):
out += s[i]
if (s[i] in members) & \
(s[i+1] in non_members) & \
(s[i+2] in non_members):
out += '1' + s[i+1:]
break
This way you only need to go through the target string once, and you don't need to generate permutations, this method could be extended to look ahead many more than your method.
I believe can be done via regex also.
s = 'AQFG'
x = re.sub(r'([ABCD])([EFGHIJKLMNOPQRSTUVWXYZ])',r'\g<1>1\2',s)
print(x)
This will print A1QFG
Sorry. I missed that. re.sub can take an optional count parameter that can stop after the given number of replacements are made.
s = 'HBODCSL'
x = re.sub(r'([ABCD]+)([EFGHIJKLMNOPQRSTUVWXYZ])',r'\g<1>1\2',s,count=1)
print(x)
This will print HB1ODCSL
I am trying to write an old maid card game. Now I reach the stage of removing pairs, so if there are same numbers (2-10) and same letters(AKQJ), delete both of them. I have written several lines of code, but it does not work. Could you tell me why and help me fix it.
How can I identify the same number with different suits and delete both of them in the same list?
def x(alist):
n = '2345678910AKJQ'
a=[]
b=[]
for i in alist:
j = ''.join([k for k in i if k in n])
if not j in b:
a.append(i)
b.append(j)
return a
create a default dictionary creating a list, split the items according to last character (I used standard letters not symbols), and compose a listcomp with keys where there's only 1 value.
import re
from collections import defaultdict
deck = ['10H','AS','AH','4C','4S','5D']
dd = defaultdict(list)
for d in deck:
dd[d[:-1]].append(d[-1])
print([k for k,v in dd.items() if len(v)==1])
result:
['5', '10']