Python pair sum problem in O(n) clarification - python-3.x

Given an integer array, output all of the unique pairs that sum up to a specific value k.
Here is the standard solution, in O(n) time:
if len(arr) < 2:
return
seen = set()
output = set()
for num in arr:
target = k - num
if target not in seen:
seen.add(num)
else:
output.add((min(num, target), max(num, target)))
return print('\n'.join(map(str, list(output))))
I have a few questions regarding this solution:
1) Why do we use a set to store the seen values of the array? Why not a list? Does this change anything?
2) Why the min(num, target), max(num, target)? Is this for the sake of consistent formatting or is there a deeper reason behind this? At first, I thought it would be to deal with duplicate cases such as (1,3) & (3,1), but this solution doesn't come across that I don't think?

1) Set is a faster way in python to check if a value exists in and not to store duplicates that not needed in this case.
2) The reason of doing (min(num, target), max(num, target)) is probably to add to the output set a tuple that contains both of the numbers in the order of (min, max) that it will print in the last print statement in a better format.

Related

Creating a 2D Array with all elements initially set to 'None'

I found the answer to this question very helpful, but I have never seen the keyword None used in such a way before and cannot understand what it's function is in the below block of code:
def get_matrix(self, n, m):
num = 1
***matrix = [[None for j in range(m)] for i in range(n)]***
for i in range(len(matrix)):
for j in range(len(matrix[i])):
matrix[i][j] = num
num += 1
return matrix
If anyone is able to clarify, thank you in advance and I will rename the question to more accurately reflect the topic involved.
It's creating a 2D array of None before populating it. The value of the array doesn't really matter, since it is reassigned later, but None takes less space than other types (specifically, numbers, since that is what is being stored here)
Related - How to define a two-dimensional array?

Get the value of a list that produces the maximum value of a calculation

I apologize if this is a duplicate, I tried my best to find an existing question but was unsuccessful.
Recently, I've run into a couple of problems where I've needed to find the element in a list that produces the max/min value when a calculation is performed. For example, a list of real numbers where you want to find out which element produces the highest value when squared. The actual value of the squared number is unimportant, I just need the element(s) from the list that produces it.
I know I can solve the problem by finding the max, then making a pass through the list to find out which values' square matches the max I found:
l = [-0.25, 21.4, -7, 0.99, -21.4]
max_squared = max(i**2 for i in l)
result = [i for i in l if i**2 == max_squared]
but I feel like there should be a better way to do it. Is there a more concise/one-step solution to this?
This will return you just the element which gives the max when squared.
result = max(l, key = lambda k: k**2)
It does not get much better if you need the value in a list f.e. to see how often it occures. You can remeber the source element as well if you do not need that:
l = [-0.25, 21.4, -7, 0.99, -21.4]
max_squared = max( (i**2, i) for i in l) # remeber a tuple, with the result coming first
print(max_squared[1]) # print the source number (2nd element of the tuple)
Output:
21.4
Your calculation does only return the first occurence of abs(24.1) because max only returns one value, not two - if you need both, you still need to do:
print( [k for k in l if abs(k) == max_squared[1]])
to get
[21.4,-21.4]

comparing two arrays and get the values which are not common

I am doing this problem a friend gave me where you are given 2 arrays say (a[1,2,3,4] and b[8,7,9,2,1]) and you have to find not common elements.
Expected output is [3,4,8,7,9]. Code below.
def disjoint(e,f):
c = e[:]
d = f[:]
for i in range(len(e)):
for j in range(len(f)):
if e[i] == f[j]:
c.remove(e[i])
d.remove(d[j])
final = c + d
print(final)
print(disjoint(a,b))
I tried with nested loops and creating copies of given arrays to modify them then add them but...
def disjoint(e,f):
c = e[:] # list copies
d = f[:]
for i in range(len(e)):
for j in range(len(f)):
if e[i] == f[j]:
c.remove(c[i]) # edited this line
d.remove(d[j])
final = c + d
print(final)
print(disjoint(a,b))
when I try removing common element from list copies, I get different output [2,4,8,7,9]. why ??
This is my first question in this website. I'll be thankful if anyone can clear my doubts.
Using sets you can do:
a = [1,2,3,4]
b = [8,7,9,2,1]
diff = (set(a) | set(b)) - (set(a) & set(b))
(set(a) | set(b)) is the union, set(a) & set(b) is the intersection and finally you do the difference between the two sets using -.
Your bug comes when you remove the elements in the lines c.remove(c[i]) and d.remove(d[j]). Indeed, the common elements are e[i]and f[j] while c and d are the lists you are updating.
To fix your bug you only need to change these lines to c.remove(e[i]) and d.remove(f[j]).
Note also that your method to delete items in both lists will not work if a list may contain duplicates.
Consider for instance the case a = [1,1,2,3,4] and b = [8,7,9,2,1].
You can simplify your code to make it works:
def disjoint(e,f):
c = e.copy() # [:] works also, but I think this is clearer
d = f.copy()
for i in e: # no need for index. just walk each items in the array
for j in f:
if i == j: # if there is a match, remove the match.
c.remove(i)
d.remove(j)
return c + d
print(disjoint([1,2,3,4],[8,7,9,2,1]))
Try it online!
There are a lot of more effecient way to achieve this. Check this stack overflow question to discover them: Get difference between two lists. My favorite way is to use set (like in #newbie's answer). What is a set? Lets check the documentation:
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built-in dict, list, and tuple classes, and the collections module.)
emphasis mine
Symmetric difference is perfect for our need!
Returns a new set with elements in either the set or the specified iterable but not both.
Ok here how to use it in your case:
def disjoint(e,f):
return list(set(e).symmetric_difference(set(f)))
print(disjoint([1,2,3,4],[8,7,9,2,1]))
Try it online!

Choosing minimum numbers from a given list to give a sum N( repetition allowed)

How to find the minimum number of ways in which elements taken from a list can sum towards a given number(N)
For example if list = [1,3,7,4] and N=14 function should return 2 as 7+7=14
Again if N= 11, function should return 2 as 7+4 =11. I think I have figured out the algorithm but unable to implement it in code.
Pls use Python, as that is the only language I understand(at present)
Sorry!!!
Since you mention dynamic programming in your question, and you say that you have figured out the algorithm, i will just include an implementation of the basic tabular method written in Python without too much theory.
The idea is to have a tabular structure we will use to compute all possible values we need without having to doing the same computations many times.
The basic formula will try to sum values in the list till we reach the target value, for every target value.
It should work, but you can of course make some optimization like trying to order the list and/or find dividends in order to construct a smaller table and have faster termination.
Here is the code:
import sys
# num_list : list of numbers
# value: value for which we want to get the minimum number of addends
def min_sum(num_list, value):
list_len = len(num_list)
# We will use the tipycal dynamic programming table construct
# the key of the list will be the sum value we want,
# and the value will be the
# minimum number of items to sum
# Base case value = 0, first element of the list is zero
value_table = [0]
# Initialize all table values to MAX
# for range i use value+1 because python range doesn't include the end
# number
for i in range(1, value+1):
value_table.append(sys.maxsize);
# try every combination that is smaller than <value>
for i in range(1, value+1):
for j in range(0, list_len):
if (num_list[j] <= i):
tmp = value_table[i-num_list[j]]
if ((tmp != sys.maxsize) and (tmp + 1 < value_table[i])):
value_table[i] = tmp + 1
return value_table[value]
## TEST ##
num_list = [1,3,16,5,3]
value = 22
print("Min Sum: ",min_sum(num_list,value)) # Outputs 3
it would be helpful if you include your Algorithm in Pseudocode - it will very much look like Python :-)
Another aspect: your first operation is a multiplication with one item from the list (7) and one outside of the list (2), whereas for the second opration it is 7+4 - both values in the list.
Is there a limitation for which operation or which items to use (from within or without the list)?

Efficiently Perform Nested Dictionary Lookups and List Appending Using Numpy Nonzero Indices

I have working code to perform a nested dictionary lookup and append results of another lookup to each key's list using the results of numpy's nonzero lookup function. Basically, I need a list of strings appended to a dictionary. These strings and the dictionary's keys are hashed at one point to integers and kept track of using separate dictionaries with the integer hash as the key and the string as the value. I need to look up these hashed values and store the string results in the dictionary. It's confusing so hopefully looking at the code helps. Here's a simplified version of code:
for key in ResultDictionary:
ResultDictionary[key] = []
true_indices = np.nonzero(numpy_array_of_booleans)
for idx in range(0, len(true_indices[0])):
ResultDictionary.get(HashDictA.get(true_indices[0][idx])).append(HashDictB.get(true_indices[1][idx]))
This code works for me, but I am hoping there's a way to improve the efficiency. I am not sure if I'm limited due to the nested lookup. The speed is also dependent on the number of true results returned by the nonzero function. Any thoughts on this? Appreciate any suggestions.
Here are two suggestions:
1) since your hash dicts are keyed with ints it might help to transform them into arrays or even lists for faster lookup if that is an option.
k, v = map(list, (HashDictB.keys(), HashDictB.values())
mxk, mxv = max(k), max(v, key=len)
lookupB = np.empty((mxk+1,), dtype=f'U{mxv}')
lookupB[k] = v
2) you probably can save a number of lookups in ResultDictionary and HashDictA by processing your numpy_array_of_booleans row-wise:
i, j = np.where(numpy_array_of_indices)
bnds, = np.where(np.r_[True, i[:-1] != i[1:], True])
ResultDict = {HashDictA[i[l]]: [HashDictB[jj] for jj in j[l:r]] for l, r in zip(bnds[:-1], bnds[1:])}
2b) if for some reason you need to incrementally add associations you could do something like (I'll shorten variable names for that)
from operator import itemgetter
res = {}
def add_batch(data, res, hA, hB):
i, j = np.where(data)
bnds, = np.where(np.r_[True, i[:-1] != i[1:], True])
for l, r in zip(bnds[:-1], bnds[1:]):
if l+1 == r:
res.setdefault(hA[i[l]], set()).add(hB[j[l]])
else:
res.setdefault(hA[i[l]], set()).update(itemgetter(*j[l:r])(hB))
You can't do much about the dictionary lookups - you have to do those one at a time.
You can clean up the array indexing a bit:
idxes = np.argwhere(numpy_array_of_booleans)
for i,j in idxes:
ResultDictionary.get(HashDictA.get(i)).append(HashDictB.get(j)
argwhere is transpose(nonzero(...)), turning the tuple of arrays into a (n,2) array of index pairs. I don't think this makes a difference in speed, but the code is cleaner.

Resources