I keep timing out calculating the mean of a subset - python-3.x
I have a working block of code, but the online judge on HackerEarth keeps returning a timing error. I'm new to coding and so i don't know the tricks to speed up my code. any help would be much appreciated!
N, Q = map(int, input().split())
#N is the length of the array, Q is the number of queries
in_list =input().split()
#input is a list of integers separated by a space
array = list(map(int, in_list))
from numpy import mean
means=[]
for i in range(Q):
L, R = map(int, input().split())
m= int(mean(array[L-1:R]))
means.append(m)
for i in means:
print(i)
Any suggestions would be amazing!
You probably need to avoid doing O(N) operations in the loop. Currently both the slicing and the mean call (which needs to sum up the items in the slice) are both that slow. So you need a better algorithm.
I'll suggest that you do some preprocessing on the list of numbers so that you can figure out the sum of the values that would be in the slice (without actually doing a slice and adding them up). By using O(N) space, you can do the calculation of each sum in O(1) time (making the whole process take O(N + Q) time rather than O(N * Q)).
Here's a quick solution I put together, using itertools.accumulate to find a cumulative sum of the list items. I don't actually save the items themselves, as the cumulative sum is enough.
from itertools import accumulate
N, Q = map(int, input().split())
sums = list(accumulate(map(int, input().split())))
for _ in range(Q):
L, R = map(int, input().split())
print((sums[R] - (sums[L-1] if L > 0 else 0)) / (R-L+1))
Related
Number of Different Subsequences GCDs
Number of Different Subsequences GCDs You are given an array nums that consists of positive integers. The GCD of a sequence of numbers is defined as the greatest integer that divides all the numbers in the sequence evenly. For example, the GCD of the sequence [4,6,16] is 2. A subsequence of an array is a sequence that can be formed by removing some elements (possibly none) of the array. For example, [2,5,10] is a subsequence of [1,2,1,2,4,1,5,10]. Return the number of different GCDs among all non-empty subsequences of nums. Example 1: Input: nums = [6,10,3] Output: 5 Explanation: The figure shows all the non-empty subsequences and their GCDs. The different GCDs are 6, 10, 3, 2, and 1. from itertools import permutations import math class Solution: def countDifferentSubsequenceGCDs(self, nums: List[int]) -> int: s = set() g =0 for i in range(1,len(nums)+1): comb = combinations(nums,i) for i in comb: if len(i)==1: u = i[0] s.add(u) else: g = math.gcd(i[0],i[1]) s.add(g) if len(i)>2: for j in range(2,len(i)): g = math.gcd(g,i[j]) s.add(g) g = 0 y = len(s) return y I am getting TLE for this input. Can someone pls help? [5852,6671,170275,141929,2414,99931,179958,56781,110656,190278,7613,138315,58116,114790,129975,144929,61102,90624,60521,177432,57353,199478,120483,75965,5634,109100,145872,168374,26215,48735,164982,189698,77697,31691,194812,87215,189133,186435,131282,110653,133096,175717,49768,79527,74491,154031,130905,132458,103116,154404,9051,125889,63633,194965,105982,108610,174259,45353,96240,143865,184298,176813,193519,98227,22667,115072,174001,133281,28294,42913,136561,103090,97131,128371,192091,7753,123030,11400,80880,184388,161169,155500,151566,103180,169649,44657,44196,131659,59491,3225,52303,141458,143744,60864,106026,134683,90132,151466,92609,120359,70590,172810,143654,159632,191208,1497,100582,194119,134349,33882,135969,147157,53867,111698,14713,126118,95614,149422,145333,52387,132310,108371,127121,93531,108639,90723,416,141159,141587,163445,160551,86806,120101,157249,7334,60190,166559,46455,144378,153213,47392,24013,144449,66924,8509,176453,18469,21820,4376,118751,3817,197695,198073,73715,65421,70423,28702,163789,48395,90289,76097,18224,43902,41845,66904,138250,44079,172139,71543,169923,186540,77200,119198,184190,84411,130153,124197,29935,6196,81791,101334,90006,110342,49294,67744,28512,66443,191406,133724,54812,158768,113156,5458,59081,4684,104154,38395,9261,188439,42003,116830,184709,132726,177780,111848,142791,57829,165354,182204,135424,118187,58510,137337,170003,8048,103521,176922,150955,84213,172969,165400,111752,15411,193319,78278,32948,55610,12437,80318,18541,20040,81360,78088,194994,41474,109098,148096,66155,34182,2224,146989,9940,154819,57041,149496,120810,44963,184556,163306,133399,9811,99083,52536,90946,25959,53940,150309,176726,113496,155035,50888,129067,27375,174577,102253,77614,132149,131020,4509,85288,160466,105468,73755,4743,41148,52653,85916,147677,35427,88892,112523,55845,69871,176805,25273,99414,143558,90139,180122,140072,127009,139598,61510,17124,190177,10591,22199,34870,44485,43661,141089,55829,70258,198998,87094,157342,132616,66924,96498,88828,89204,29862,76341,61654,158331,187462,128135,35481,152033,144487,27336,84077,10260,106588,19188,99676,38622,32773,89365,30066,161268,153986,99101,20094,149627,144252,58646,148365,21429,69921,95655,77478,147967,140063,29968,120002,72662,28241,11994,77526,3246,160872,175745,3814,24035,108406,30174,10492,49263,62819,153825,110367,42473,30293,118203,43879,178492,63287,41667,195037,26958,114060,99164,142325,77077,144235,66430,186545,125046,82434,26249,54425,170932,83209,10387,7147,2755,77477,190444,156388,83952,117925,102569,82125,104479,16506,16828,83192,157666,119501,29193,65553,56412,161955,142322,180405,122925,173496,93278,67918,48031,141978,54484,80563,52224,64588,94494,21331,73607,23440,197470,117415,23722,170921,150565,168681,88837,59619,102362,80422,10762,85785,48972,83031,151784,79380,64448,87644,26463,142666,160273,151778,156229,24129,64251,57713,5341,63901,105323,18961,70272,144496,18591,191148,19695,5640,166562,2600,76238,196800,94160,129306,122903,40418,26460,131447,86008,20214,133503,174391,45415,47073,39208,37104,83830,80118,28018,185946,134836,157783,76937,33109,54196,37141,142998,189433,8326,82856,163455,176213,144953,195608,180774,53854,46703,78362,113414,140901,41392,12730,187387,175055,64828,66215,16886,178803,117099,112767,143988,65594,141919,115186,141050,118833,2849]
I'm going to add "an answer" here because most "not horribly slow" programs I've seen for this are way too elaborate. Call the input xs. The fastest way I know of asks, for each integer j in 1 through max(xs), can j be the gcd of some non-empty subset of xs? Of course if max(xs) can be huge, that can be slow. But in the context you apparently took this from (LeetCode), it cannot be huge. So, given j, how do we know whether some subset's gcd is j? Actually easy! We look at all and only the multiples of j in xs. The gcd of all of those is at least j. If, at any point along the way, their gcd so far is j, we found a subset whose gcd is j. Else the running gcd exceeds j after processing all of j's multiples, so no subset's gcd is j. def numgcds(xs): from math import gcd limit = max(xs) + 1 result = 0 xsset = set(xs) for j in range(1, limit): g = 0 for x in range(j, limit, j): if x in xsset: g = gcd(x, g) if g == j: result += 1 break return result Where L is max(xs), worst-case runtime is O(L * log(L)). Across outer loop iterations, the inner loop goes around (at worst) L times at first, then L/2 times, then L/3, and so on. That sums to L*(1/1 + 1/2 + 1/3 + ... + 1/L). The second factor (the sum of reciprocals) is the L'th "harmonic number", and is approximately the natural logarithm of L. More Gonzo I don't really like having the runtime depend on the largest integer in the input. For example, numgcds([20000000]) takes 20 million iterations of the outer loop to determine that there's only one gcd, and can take appreciable time (about 30 seconds on my box just now). Instead, with more code, we can build some dicts that eliminate all searching. For each divisor d of an integer in xs, d2xs[d] is the list of multiples of d in xs. The keys of d2xs are the only possible gcds we need to check, and a key's associated values are exactly (no searching needed) the multiples of the key in xs. The collection of all possible divisors of all integers in xs can be found by factoring each integer in xs, and generating all possible combinations of its factors' prime powers. This is harder to code, but can run very much faster. numgcds([20000000]) is essentially instant. And it runs about 10 times faster for the largish example you gave. def gendivisors(x): from collections import Counter from itertools import product from math import prod c = Counter(factor(x)) pows = [] for p, k in c.items(): pows.append([p**i for i in range(k+1)]) for t in product(*pows): yield prod(t) def numgcds(xs): from math import gcd from collections import defaultdict d2xs = defaultdict(list) for x in xs: for d in gendivisors(x): d2xs[d].append(x) result = 0 for j, mults in d2xs.items(): g = 0 for x in mults: g = gcd(x, g) if g == j: result += 1 break return result I'm not including code for factor(n) - pick your favorite. The code requires it return an iterable (list, generator iterator, tuple, doesn't matter) of all n's prime factors. Order doesn't matter. As special cases, list(factor(i)) should return [i] for i equal to 0 or 1. For ordinary cases, list(factor(p)) == [p] for a prime p, and, e.g., sorted(factor(20)) == [2, 2, 5]. Worst-case timing is much harder to nail, but the key bit is that a reasonable implementation of factor(n) will have worst-case time O(sqrt(n)).
Construct powerset without complements
Starting from this question I've built this code: import itertools n=4 nodes = set(range(0,n)) ss = set() for i in range(1,n+1): ss = ss.union( set(itertools.combinations(range(0,n), i))) ss2 = set() for s in ss: cs = [] for i in range(0,n): if not(i in s): cs.append(i) cs=tuple(cs) if not(s in ss2) and not(cs in ss2): ss2.add(s) ss = ss2 The code construct all subsets of S={0,1,...,n-1} (i) without complements (example, for n=4, either (1,3) or (0,2) is contained, which one does not matter); (ii) without the empty set, but (iii) with S; the result is in ss. Is there a more compact way to do the job? I do not care if the result is a set/list of sets/lists/tuples. (The result contains 2**(n-1) elements) Additional options: favorite subset or complement that has less elements output sorted by increasing size
When you exclude complements, you actually exclude half of the combinations. So you could imagine generating all combinations and then kick out the last half of them. There you must be sure not to kick out a combination together with its complement, but the way you have them ordered, that will not happen. Further along this idea, you don't even need to generate combinations that have a size that is more than n/2. For even values of n, you would need to halve the list of combinations with size n/2. Here is one way to achieve all that: import itertools n=4 half = n//2 # generate half of the combinations ss = [list(itertools.combinations(range(0,n), i)) for i in range(1, half+1)] # if n is even, kick out half of the last list if n % 2 == 0: ss[-1] = ss[-1][0:len(ss[-1])//2] # flatten ss = [y for x in ss for y in x] print(ss)
Big-O analysis of permutation algorithm
result = False def permute(a,l,r,b): global result if l==r: if a==b: result = True else: for i in range(l, r+1): a[l], a[i] = a[i], a[l] permute(a, l+1, r, b) a[l], a[i] = a[i], a[l] string1 = list("abc") string2 = list("ggg") permute(string1, 0, len(string1)-1, string2) So basically I think that finding each permutation takes n^2 steps (times some constant) and to find all permutations should take n! steps. So does this make it O(n^2 * n!) ? and if so does the n! take over, making it just O(n!)? Thanks edit: this algorithm might seem weird for just finding permutations, and that is because i'm also using it to test for anagrams between the two strings. I just haven't renamed the method yet sorry
Finding each permutation doesn't take O(N^2). Creating each permutation happens in O(1) time. While it is tempting to say that this O(N) because you assign a new element to each index N times per permutation, each permutation shares assignments with other permutations. When we do: a[l], a[i] = a[i], a[l] permute(a, l+1, r, b) All subsequent recursive calls of permute down the line have this assignment already in place. In reality, assignments only happen each time permute is called, which is times. We can then determine the time complexity to build each permutation using some limit calculus. We take the number of assignments over the total number of permutations as N approaches infinity. We have: Expanding the sigma: The limit of the sum is the sum of the limits: At this point we evaluate our limits and all of the terms except the first collapse to zero. Since our result is a constant, we get that our complexity per permutation is O(1). However, we're forgetting about this part: if l==r: if a==b: result = True The comparison of a == b (between two lists) occurs in O(N). Building each permutation takes O(1), but our comparison at the end, which occurs for each permutation, actually takes O(N). This gives us a time complexity of O(N) per permutation. This gives you N! permutations times O(N) for each permutation giving you a total time complexity of O(N!) * O(N) = O(N * N!). Your final time complexity doesn't reduce to O(N!), since O(N * N!) is still an order of magnitude greater than O(N!), and only constant terms get dropped (same reason why O(NlogN) != O(N)).
I want to minimize the runtime of this Python program
Code that I want to minimize the runtime, it goes through an array of number and finds the max between the current max_product and the next product. def max_pairwise_product(numbers): n = len(numbers) max_product = 0 for i in range(n): for j in range(i+1,n): max_product = max(max_product,numbers[i]*numbers[j]) return max_product if __name__ == '__main__': input_n = int(input()) input_numbers = [int(x) for x in input().split()] print(max_pairwise_product(input_numbers))
Your code is trying to find the maximum product of any two non-identical elements of a numeric array. You are currently doing that by calculating each product. This algorithm has n²/2 calculations and comparisons, while all you actually need to do is much less: We know from basic math that the two largest numbers in the original array will have the largest product. So all you need to do is: Find the two largest integers in the array multiply them. You could do so by sorting the original array or just skimming through the array to find the two largest elements (which is a bit more tricky as it sounds because those two elements could have the same value but may not be the same element) As a side note: In the future, please format your posts so that a reader may actually understand what your code does without going through hoops.
Sorting the numbers and multiplying the last two elements would give better time complexity than O(n^2). Sort - O(nlogn) Multiplication - O(1) def max_pairwise_product(numbers): n = len(numbers) max_product = 0 numbers.sort() if ((numbers[n-1] >0) and (numbers[n-2] >0)): max_product = numbers[n-1]*numbers[n-2] return max_product if __name__ == '__main__': input_n = int(input()) input_numbers = [int(x) for x in input().split()] print(max_pairwise_product(input_numbers))
Add numbers to the beginning of lists
I have a list of list, say X, that looks like this X_train = [[4,3,1,5], [3,1,6,2], [5,0,49,4], ... , [3,57,3,3]] I wrote this piece of code for x in range(0,len(X_train)): X_train[x].insert(0, x+1) For each list in X this code inserts the index value of the list + 1 to the beginning of the list. That is, running for x in range(0,len(X_train)): X_train[x].insert(0, x+1) print(X) will produce the following output [[1,4,1,5],[2,3,1,6,2],[3,5,0,49,4],...,[n,3,57,3,3]] where n is the number of lists in X. Question: Is there a faster way to do this? I would like to be able to do this for very large lists, e.g. list with millions of sublists (if that's possible).
This is faster in my testing: X = [[n, *l] for n, l in enumerate(X, 1)]
To my knowledge, the standard insert method in Python has a time complexity of O(n). Given your current implementation, your algo would have a time complexity of O(m x n) where m is the number of sublists and n is the number of elements in the sublists (I assume here that the number of sublist elements is always the same). You could use blist instead of the standard lists which has a time complexity of O(log n) for insertions. This means the total time reduces to O(m x log n). It's not that much of an improvement, though.