comparing two arrays and get the values which are not common - python-3.x

I am doing this problem a friend gave me where you are given 2 arrays say (a[1,2,3,4] and b[8,7,9,2,1]) and you have to find not common elements.
Expected output is [3,4,8,7,9]. Code below.
def disjoint(e,f):
c = e[:]
d = f[:]
for i in range(len(e)):
for j in range(len(f)):
if e[i] == f[j]:
c.remove(e[i])
d.remove(d[j])
final = c + d
print(final)
print(disjoint(a,b))
I tried with nested loops and creating copies of given arrays to modify them then add them but...
def disjoint(e,f):
c = e[:] # list copies
d = f[:]
for i in range(len(e)):
for j in range(len(f)):
if e[i] == f[j]:
c.remove(c[i]) # edited this line
d.remove(d[j])
final = c + d
print(final)
print(disjoint(a,b))
when I try removing common element from list copies, I get different output [2,4,8,7,9]. why ??
This is my first question in this website. I'll be thankful if anyone can clear my doubts.

Using sets you can do:
a = [1,2,3,4]
b = [8,7,9,2,1]
diff = (set(a) | set(b)) - (set(a) & set(b))
(set(a) | set(b)) is the union, set(a) & set(b) is the intersection and finally you do the difference between the two sets using -.
Your bug comes when you remove the elements in the lines c.remove(c[i]) and d.remove(d[j]). Indeed, the common elements are e[i]and f[j] while c and d are the lists you are updating.
To fix your bug you only need to change these lines to c.remove(e[i]) and d.remove(f[j]).
Note also that your method to delete items in both lists will not work if a list may contain duplicates.
Consider for instance the case a = [1,1,2,3,4] and b = [8,7,9,2,1].

You can simplify your code to make it works:
def disjoint(e,f):
c = e.copy() # [:] works also, but I think this is clearer
d = f.copy()
for i in e: # no need for index. just walk each items in the array
for j in f:
if i == j: # if there is a match, remove the match.
c.remove(i)
d.remove(j)
return c + d
print(disjoint([1,2,3,4],[8,7,9,2,1]))
Try it online!
There are a lot of more effecient way to achieve this. Check this stack overflow question to discover them: Get difference between two lists. My favorite way is to use set (like in #newbie's answer). What is a set? Lets check the documentation:
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built-in dict, list, and tuple classes, and the collections module.)
emphasis mine
Symmetric difference is perfect for our need!
Returns a new set with elements in either the set or the specified iterable but not both.
Ok here how to use it in your case:
def disjoint(e,f):
return list(set(e).symmetric_difference(set(f)))
print(disjoint([1,2,3,4],[8,7,9,2,1]))
Try it online!

Related

comparing int value in list throws index out of range Error

I'm struggling to grasp the problem here. I already tried everything but the issue persist.
Basically I have a list of random numbers and when I try to compare the vaalues inside loop it throws "IndexError: list index out of range"
I even tried with range(len(who) and len(who) . Same thing. When put 0 instead "currentskill" which is int variable it works. What I don't understand is why comparing both values throws this Error. It just doesn't make sence...
Am I not comparing a value but the index itself ???
EDIT: I even tried with print(i) / print(who[i] to see if everything is clean and where it stops, and I'm definitelly not going outside of index
who = [2, 0, 1]
currentskill = 1
for i in who:
if who[i] == currentskill: # IndexError: list index out of range
who.pop(i)
The problem is once you start popping out elements list size varies
For eg take a list of size 6
But you iterate over all indices up to len(l)-1 = 6-1 = 5 and the index 5 does not exist in the list after removing elements in a previous iteration.
solution for this problem,
l = [x for x in l if x]
Here x is a condition you want to implement on the element of the list which you are iterating.
As stated by #Hemesh
The problem is once you start popping out elements list size varies
Problem solved. I'm just popping the element outside the loop now and it works:
def deleteskill(who, currentskill):
temp = 0
for i in range(len(who)):
if who[i] == currentskill:
temp = i
who.pop(temp)
There are two problems in your code:
mixing up the values and indexes, as pointed out by another answer; and
modifying the list while iterating over it
The solution depends on whether you want to remove just one item, or potentially multiple.
For removing just one item:
for idx, i in enumerate(who)::
if i == currentskill:
who.pop(idx)
break
For removing multiple items:
to_remove = []
for idx, i in enumerate(who)::
if i == currentskill:
to_remove.append[idx]
for idx in reversed(to_remove):
who.pop(idx)
Depending on the situation, it may be easier to create a new list instead:
who = [i for i in who if i != currentskill]
Your logic is wrong. To get the index as well as the value, use the built-in function enumerate:
idx_to_remove = []
for idx, i in enumerate(who)::
if i == currentskill:
idx_to_remove.append[idx]
for idx in reversed(idx_to_remove):
who.pop(idx)
Edited after suggestion from #sabik

How can i optimise my code and make it readable?

The task is:
User enters a number, you take 1 number from the left, one from the right and sum it. Then you take the rest of this number and sum every digit in it. then you get two answers. You have to sort them from biggest to lowest and make them into a one solid number. I solved it, but i don't like how it looks like. i mean the task is pretty simple but my code looks like trash. Maybe i should use some more built-in functions and libraries. If so, could you please advise me some? Thank you
a = int(input())
b = [int(i) for i in str(a)]
closesum = 0
d = []
e = ""
farsum = b[0] + b[-1]
print(farsum)
b.pop(0)
b.pop(-1)
print(b)
for i in b:
closesum += i
print(closesum)
d.append(int(closesum))
d.append(int(farsum))
print(d)
for i in sorted(d, reverse = True):
e += str(i)
print(int(e))
input()
You can use reduce
from functools import reduce
a = [0,1,2,3,4,5,6,7,8,9]
print(reduce(lambda x, y: x + y, a))
# 45
and you can just pass in a shortened list instead of poping elements: b[1:-1]
The first two lines:
str_input = input() # input will always read strings
num_list = [int(i) for i in str_input]
the for loop at the end is useless and there is no need to sort only 2 elements. You can just use a simple if..else condition to print what you want.
You don't need a loop to sum a slice of a list. You can also use join to concatenate a list of strings without looping. This implementation converts to string before sorting (the result would be the same). You could convert to string after sorting using map(str,...)
farsum = b[0] + b[-1]
closesum = sum(b[1:-2])
"".join(sorted((str(farsum),str(closesum)),reverse=True))

Efficiently Perform Nested Dictionary Lookups and List Appending Using Numpy Nonzero Indices

I have working code to perform a nested dictionary lookup and append results of another lookup to each key's list using the results of numpy's nonzero lookup function. Basically, I need a list of strings appended to a dictionary. These strings and the dictionary's keys are hashed at one point to integers and kept track of using separate dictionaries with the integer hash as the key and the string as the value. I need to look up these hashed values and store the string results in the dictionary. It's confusing so hopefully looking at the code helps. Here's a simplified version of code:
for key in ResultDictionary:
ResultDictionary[key] = []
true_indices = np.nonzero(numpy_array_of_booleans)
for idx in range(0, len(true_indices[0])):
ResultDictionary.get(HashDictA.get(true_indices[0][idx])).append(HashDictB.get(true_indices[1][idx]))
This code works for me, but I am hoping there's a way to improve the efficiency. I am not sure if I'm limited due to the nested lookup. The speed is also dependent on the number of true results returned by the nonzero function. Any thoughts on this? Appreciate any suggestions.
Here are two suggestions:
1) since your hash dicts are keyed with ints it might help to transform them into arrays or even lists for faster lookup if that is an option.
k, v = map(list, (HashDictB.keys(), HashDictB.values())
mxk, mxv = max(k), max(v, key=len)
lookupB = np.empty((mxk+1,), dtype=f'U{mxv}')
lookupB[k] = v
2) you probably can save a number of lookups in ResultDictionary and HashDictA by processing your numpy_array_of_booleans row-wise:
i, j = np.where(numpy_array_of_indices)
bnds, = np.where(np.r_[True, i[:-1] != i[1:], True])
ResultDict = {HashDictA[i[l]]: [HashDictB[jj] for jj in j[l:r]] for l, r in zip(bnds[:-1], bnds[1:])}
2b) if for some reason you need to incrementally add associations you could do something like (I'll shorten variable names for that)
from operator import itemgetter
res = {}
def add_batch(data, res, hA, hB):
i, j = np.where(data)
bnds, = np.where(np.r_[True, i[:-1] != i[1:], True])
for l, r in zip(bnds[:-1], bnds[1:]):
if l+1 == r:
res.setdefault(hA[i[l]], set()).add(hB[j[l]])
else:
res.setdefault(hA[i[l]], set()).update(itemgetter(*j[l:r])(hB))
You can't do much about the dictionary lookups - you have to do those one at a time.
You can clean up the array indexing a bit:
idxes = np.argwhere(numpy_array_of_booleans)
for i,j in idxes:
ResultDictionary.get(HashDictA.get(i)).append(HashDictB.get(j)
argwhere is transpose(nonzero(...)), turning the tuple of arrays into a (n,2) array of index pairs. I don't think this makes a difference in speed, but the code is cleaner.

setting an array element with a list

I'd like to create a numpy array with 3 columns (not really), the last of which will be a list of variable lengths (really).
N = 2
A = numpy.empty((N, 3))
for i in range(N):
a = random.uniform(0, 1/2)
b = random.uniform(1/2, 1)
c = []
A[i,] = [a, b, c]
Over the course of execution I will then append or remove items from the lists. I used numpy.empty to initialize the array since this is supposed to give an object type, even so I'm getting the 'setting an array with a sequence error'. I know I am, that's what I want to do.
Previous questions on this topic seem to be about avoiding the error; I need to circumvent the error. The real array has 1M+ rows, otherwise I'd consider a dictionary. Ideas?
Initialize A with
A = numpy.empty((N, 3), dtype=object)
per numpy.empty docs. This is more logical than A = numpy.empty((N, 3)).astype(object) which first creates an array of floats (default data type) and only then casts it to object type.

set from the union of elements contained in two lists

this is for a pre-interview questioner. i believe i have the answer just wanted to get confirmation that im right.
Part 1 - Tell me what this code does, and its big-O performance
Part 2 - Re-write it yourself and tell me the big-O performance of your solution
def foo(a, b):
""" a and b are both lists """
c = []
for i in a:
if is_bar(b, i):
c.append(i)
return unique(c)
def is_bar(a, b):
for i in a:
if i == b:
return True
return False
def unique(arr):
b = {}
for i in arr:
b[i] = 1
return b.keys()
ANSWERS:
It creates a set from the union of elements contained in two lists. It big O performance is O(n2)
my solution which i believe achieves O(n)
Set A = getSetA();
Set B = getSetB();
Set UnionAB = new Set(A);
UnionAB.addAll(B);
for (Object inA : a)
if(B.contains(inA))
UnionAB.remove(inA);
It seems like the original code is doing an intersection not a union. It's traversing all the elements in the first list (a) and checking if it exists in the second list (b), in which case it is adding it to list c. Then it is returning the unique elements from c. Performance of O(n^2) seems right.

Resources