Looking for a specific combination algorithm to solve a problem - python-3.x

Let’s say I have a purchase total and I have a csv file full of purchases where some of them make up that total and some don’t. Is there a way to search the csv to find the combination or combinations of purchases that make up that total ? Let’s say the purchase total is 155$ and my csv file has the purchases [5.00$,40.00$,7.25$,$100.00,$10.00]. Is there an algorithm that will tell me the combinations of the purchases that make of the total ?
Edit: I am still having trouble with the solution you provided. When I feed this spreadsheet with pandas into the code snippet you provided it only shows one solution equal to 110.04$ when there are three. It is like it is stopping early without finding the final solutions.This is the output that I have from the terminal - [57.25, 15.87, 13.67, 23.25]. The output should be [10.24,37.49,58.21,4.1] and [64.8,45.24] and [57.25,15.87,13.67,23.25]
from collections import namedtuple
import pandas
df = pandas.read_csv('purchases.csv',parse_dates=["Date"])
from collections import namedtuple
values = df["Purchase"].to_list()
S = 110.04
Candidate = namedtuple('Candidate', ['sum', 'lastIndex', 'path'])
tuples = [Candidate(0, -1, [])]
while len(tuples):
next = []
for (sum, i, path) in tuples:
# you may range from i + 1 if you don't want repetitions of the same purchase
for j in range(i+1, len(values)):
v = values[j]
# you may check for strict equality if no purchase is free (0$)
if v + sum <= S:
next.append(Candidate(sum = v + sum, lastIndex = j, path = path + [v]))
if v + sum == S :
print(path + [v])
tuples = next

A dp solution:
Let S be your goal sum
Build all 1-combinations. Keep those which sums less or equal than S. Whenever one equals S, output it
Build all 2-combinations reusing the previous ones.
Repeat
from collections import namedtuple
values = [57.25,15.87,13.67,23.25,64.8,45.24,10.24,37.49,58.21,4.1]
S = 110.04
Candidate = namedtuple('Candidate', ['sum', 'lastIndex', 'path'])
tuples = [Candidate(0, -1, [])]
while len(tuples):
next = []
for (sum, i, path) in tuples:
# you may range from i + 1 if you don't want repetitions of the same purchase
for j in range(i + 1, len(values)):
v = values[j]
# you may check for strict equality if no purchase is free (0$)
if v + sum <= S:
next.append(Candidate(sum = v + sum, lastIndex = j, path = path + [v]))
if abs(v + sum - S) <= 1e-2 :
print(path + [v])
tuples = next
More detail about the tuple structure:
What we want to do is to augment a tuple with a new value.
Assume we start with some tuple with only one value, say the tuple associated to 40.
its sum is trivially 40
the last index added is 1 (it is the number 40 itself)
the used values is [40], since it is the sole value.
Now to generate the next tuples, we will iterate from the last index (1), to the end of the array.
So candidates are 7.25, 100.00, 10.00
The new tuple associated to 7.25 is:
sum: 40 + 7.25
last index: 2 (7.25 has index 2 in array)
used values: values of tuple union 7.25, so [40, 7.25]
The purpose of using the last index, is to avoid considering [7.25, 40] and [40, 7.25]. Indeed they would be the same combination
So to generate tuples from an old one, only consider values occurring 'after' the old one from the array
At every step, we thus have tuples of the same size, each of them aggregates the values taken, the sum it amounts to, and the next values to consider to augment it to a bigger size
edit: to handle floats, you may replace (v+sum)<=S by abs(v+sum - S)<=1e-2 to say a solution is reach when you are very close (here distance arbitrarily set to 0.01) to solution
edit2: same code here as in https://repl.it/repls/DrearyWindingHypertalk (which does give
[64.8, 45.24]
[57.25, 15.87, 13.67, 23.25]
[10.24, 37.49, 58.21, 4.1]

Related

Is there an efficient way to lower time complexity of this problem ? current T(n) = O(N^3)

need to choose the value such that the value of the equation abs('a' - 'b') + abs('b' - 'c') + abs('c' - 'a') is minimized.
def minimumValue(n: int, a: List[int], b: List[int], c: List[int]) -> int :
# Write your code here.
ans=10000000000000
for i in range (n):
for j in range (n):
for k in range (n):
ans = min(ans, abs(a[i] - b[j]) + abs(b[j] - c[k]) + abs(c[k] - a[i]))
return ans
Here is a O(nlogn) solution. You can sort the three lists, and then do this:
get the first value from each of the three lists
Repeat while we have three values:
sort these three values (and keep track of where they came from)
calculate the target expression with those three, and check if it improves on the best result so far
replace the least of these three values with the next value from the same list as this value came from. If there is no more next value in that list, return the result (quit)
Note also that the formula of the expression to evaluate is the same as doing (max(x,y,z)-min(x,y,z))*2, and this is easy to do when the values x, y and z are sorted, as then it becomes (z-x)*2. To find the minimum that this expression can take, we can leave out the *2 and only do that multiplication at the very end.
Here is the code for implementing that idea:
def minimumValue(n: int, a: List[int], b: List[int], c: List[int]) -> int:
queues = map(iter, map(sorted, (a, b, c)))
three = [[next(q), q] for q in queues]
least = float("inf")
while True:
three.sort()
least = min(least, three[2][0] - three[0][0])
try:
three[0][0] = next(three[0][1])
except:
return least*2
The time complexity for initially sorting the three input lists is O(nlogn). The loop will iterate 3n-2 times, which is O(n). Each of the actions in one loop iteration executes in constant time.
So the overall complexity is determined by the initial sorting: O(nlogn)
Without any further knowledge/assumption on the content of the 3 lists, and if you need to obtain the true minimum value (and not an approximate value), then there's no other choice than using brute force. Some optimisations are possible, but still with a N^3 complexity (and without any speed-up in the worst case).
for i in range (n):
for j in range (n):
v = abs(a[i] - b[j])
if v < ans:
for k in range (n):
ans = min(ans, v + abs(b[j] - c[k]) + abs(c[k] - a[i]))

Sliding window over a string using python

I am working on a dataset as a part of my course practice and am stuck in a particular step. I have tried that using R, but I wish to do the same in python. I am comparatively new to python and so require help.
The data set consists of a column with name 'Seq' with seq(5000+) records. I have another column of name 'MainSeq' that contains the substring seq values in it. I need to check the presence of seq on MainSeq based on the start position given and then print 7 letters before and after each letter of the seq. i.e.
I have a a value in col 'MainSeq' as 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.
Col 'Seq' contains value JKLMNO
Start Position of J= 10 and O= 15
I need to create a new column such that it takes 7 letters before and after the start letter from J till O i.e. having a total length of 15
CDEFGHI**J**KLMNOPQ
DEFGHIJ**K**LMNOPQR
EFGHIJK**L**MNOPQRS
FGHIJKL**M**NOPQRST
GHIJKLM**N**OPQRSTU
HIJKLMN**O**PQRSTUV
I know to apply the logic on a specific seq. But since I have around 5000+ seq records, I need to figure out a way to apply the same on all the seq records.
seq = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
i = seq.index('J')
j = seq.index('O')
value = 7
for mid in range(i, 1+j):
print(seq[mid-value:mid+value+1])
I'm not sure this will do exactly what you want, you've not really supplied a lot of data to test with, but it might work or at least give you a start.
import pandas as pd
df = pd.DataFrame({'MainSeq':['ABCDEFGHIJKLMNOPQRSTUVWZYZ','ABCDEFGHIJKLMNOPQRSTUVWZYZ'], 'Seq':'JKLMNO'})
def get_sequences(seq, letters, value):
sequences = [seq[seq.index(letter)-value:seq.index(letter)+value+1] for letter in letters]
return sequences
df['new_seq'] = df.apply(lambda row : get_sequences(row['MainSeq'], row['Seq'], 7), axis = 1)
df = df.explode('new_seq')
print(df)

Conditionally use parts of a nested for loop

I've searched for this answer extensively, but can't seem to find an answer. Therefore, for the first time, I am posting a question here.
I have a function that uses many parameters to perform a calculation. Based on user input, I want to iterate through possible values for some (or all) of the parameters. If I wanted to iterate through all of the parameters, I might do something like this:
for i in range(low1,high1):
for j in range(low2,high2):
for k in range(low3,high3):
for m in range(low4,high4):
doFunction(i, j, k, m)
If I only wanted to iterate the 1st and 4th parameter, I might do this:
for i in range(low1,high1):
for m in range(low4,high4):
doFunction(i, user_input_j, user_input_k, m)
My actual code has almost 15 nested for-loops with 15 different parameters - each of which could be iterable (or not). So, it isn't scalable for me to use what I have and code a unique block of for-loops for each combination of a parameter being iterable or not. If I did that, I'd have 2^15 different blocks of code.
I could do something like this:
if use_static_j == True:
low2 = -999
high2 = -1000
for i in range(low1,high1):
for j in range(low2,high2):
for k in range(low3,high3):
for m in range(low4,high4):
j1 = j if use_static_j==False else user_input_j
doFunction(i, j1, k, m)
I'd just like to know if there is a better way. Perhaps using filter(), map(), or list comprehension... (which I don't have a clear enough understanding of yet)
As suggested in the comments, you could build an array of the parameters and then call the function with each of the values in the array. The easiest way to build the array is using recursion over a list defining the ranges for each parameter. In this code I've assumed a list of tuples consisting of start, stop and scale parameters (so for example the third element in the list produces [3, 2.8, 2.6, 2.4, 2.2]). To use a static value you would use a tuple (static, static+1, 1).
def build_param_array(ranges):
r = ranges[0]
if len(ranges) == 1:
return [[p * r[2]] for p in range(r[0], r[1], -1 if r[1] < r[0] else 1)]
res = []
for p in range(r[0], r[1], -1 if r[1] < r[0] else 1):
pa = build_param_array(ranges[1:])
for a in pa:
res.append([p * r[2]] + a)
return res
# range = (start, stop, scale)
ranges = [(1, 5, 1),
(0, 10, .1),
(15, 10, .2)
]
params = build_param_array(ranges)
for p in params:
doFunction(*p)

How to limit number of permutations

I have about 20 short strings that I want to permutate. I want only permutations that have len == 8.
I would like to avoid calculating every possible permutation, as seen below:
import itertools
p = itertools.permutations([s1, s2, s3, s4, s5, s6,...])
for i in p:
s = ''.join(j for j in i)
if len(s)==8:
print(s)
But that's too slow right? How can I decrease the number of calculations? (to not spend processing and RAM).
The first, obvious thing to do is filter out any strings with length > 8:
newList = [i for i in [s1, s2, s3, s4, s5, s6, ...] if len(i) <= 8]
Then, you can use the second argument of itertools.permutations to set the number of items you want. If you have no empty strings in your list, you'll never need more than 8 items, so we can use 8 as the second argument:
p = itertools.permutations(newList, 8)
However, if any of your strings are longer than one character, this won't get you what you want, since it will only return permutations of exactly 8 items. One way to resolve this is to iterate through the various lengths:
pList = [itertools.permutations(newList, length) for length in range(1, 9)]
Yet here you end up with an enormous amount of permutations to filter through: P(20, 8) + P(20, 7) + ... P(20, 1) = roughly 5.5 billion, which is impractical to work with.
A different direction
Instead of using permutations, let's use combinations, of which there are far fewer ("only" 263,949). Recall that in combinations, the order of the combined items doesn't matter, while in permutations it does. Thus we can use the smaller set of combinations to filter for the length 8 that we want:
cList = (combo for length in range(1, 9)
for combo in itertools.combinations(newList, length)
if len(''.join(combo)) == 8)
Using () instead of [] will make this a generator rather than a list, to delay evaluation until we really need it. And now we are close!
We can get our final result by taking the permutations of the items in cList:
result = [''.join(perm) for combo in cList
for perm in itertools.permutations(combo)]

python3 functional programming: Accumulating items from a different list into an initial value

I have some code that performs the following operation, however I was wondering if there was a more efficient and understandable way to do this. I am thinking that there might be something in itertools or such that might be designed to perform this type of operation.
So I have a list of integers the represents changes in the number of items from one period to the next.
x = [0, 1, 2, 1, 3, 1, 1]
Then I need a function to create a second list that accumulates the total number of items from one period to the next. This is like an accumulate function, but with elements from another list instead of from the same list.
So I can start off with an initial value y = 3.
The first value in the list y = [3]. The I would take the second
element in x and add it to the list, so that means 3+1 = 4. Note that I take the second element because we already know the first element of y. So the updated value of y is [3, 4]. Then the next iteration is 4+2 = 6. And so forth.
The code that I have looks like this:
def func():
x = [0, 1, 2, 1, 3, 1, 1]
y = [3]
for k,v in enumerate(x):
y.append(y[i] + x[i])
return y
Any ideas?
If I understand you correctly, you do what what itertools.accumulate does, but you want to add an initial value too. You can do that pretty easily in a couple ways.
The easiest might be to simply write a list comprehension around the accumulate call, adding the initial value to each output item:
y = [3 + val for val in itertools.accumulate(x)]
Another option would be to prefix the x list with the initial value, then skip it when accumulate includes it as the first value in the output:
acc = itertools.accumulate([3] + x)
next(acc) # discard the extra 3 at the start of the output.
y = list(acc)
Two things I think that need to be fixed:
1st the condition for the for loop. I'm not sure where you are getting the k,v from, maybe you got an example using zip (which allows you to iterate through 2 lists at once), but in any case, you want to iterate through lists x and y using their index, one approach is:
for i in range(len(x)):
2nd, using the first append as an example, since you are adding the 2nd element (index 1) of x to the 1st element (index 0) of y, you want to use a staggered approach with your indices. This will also lead to revising the for loop condition above (I'm trying to go through this step by step) since the first element of x (0) will not be getting used:
for i in range(1, len(x)):
That change will keep you from getting an index out of range error. Next for the staggered add:
for i in range(1, len(x)):
y.append(y[i-1] + x[i])
return y
So going back to the first append example. The for loop starts at index 1 where x = 1, and y has no value. To create a value for y[1] you append the sum of y at index 0 to x at index 1 giving you 4. The loop continues until you've exhausted the values in x, returning accumulated values in list y.

Resources