Speed up calculating sum of cartesian products

Speed up calculating sum of cartesian products - python-3.x

I failed several test cases for an automated screening exam because my code ran too long. Is there a way to write this more efficiently?
The prompt was something like:
Write a program that takes as an input a list and returns the sum of all combinations of concatenating its elements pairwise.
For example, with the list [20, 5] this would be:
2020 + 205 + 520 + 55 = 2800
I still can't think of a way to do this without casting to string and back into int. The list comprehensions were previously nested for loops which performed worse but I still need more speed.
def concatenationsSum(a):
# turn into strings
a = [str(i) for i in a]
# concat
cartesian_product = [j + k for j in a for k in a]
# turn back into integers
total = [int(i) for i in cartesian_product]
return sum(total)

So i tried some optimazation in your code, your main bottleneck here is the casting to str and back to int so i modified that part
def concatenationsSum(a):
numDigits = {i: (10 ** (int(math.log10(i)) + 1)) for i in a}
cpro = product(a, a)
cartesian_product = [i * numDigits[x] + x for i, x in cpro]
return sum(cartesian_product)
here you can see i changed a few parts, i added a dictionary to lookup for each number the number of digits it needs to multiply, an example would be 5 returns 10 so when you have 20 and 5 you can do 20 * digits[5] + 5 = 205 that speeds up the whole proccess.
also no need to use a double for loop in the list comprehension python itertools provides product() which return the cartesian product.
Testing done: with small lists about 8 elements i went from 4.6e05 to 3.1e05 average and with bigger lists of 5400 elements it got from 11.7 seconds average to 5.3 seconds. That's about double the speed.

Related

Program does not run faster as expected when checking much less numbers for finding primes

I made a program to find primes below a given number.
number = int(input("Enter number: "))
prime_numbers = [2] # First prime is needed.
for number_to_be_checked in range(3, number + 1):
square_root = number_to_be_checked ** 0.5
for checker in prime_numbers: # Checker will become
# every prime number below the 'number_to_be_checked'
# variable because we are adding all the prime numbers
# in the 'prime_numbers' list.
if checker > square_root:
prime_numbers.append(number_to_be_checked)
break
elif number_to_be_checked % checker == 0:
break
print(prime_numbers)
This program checks every number below the number given as the input. But primes are of the form 6k ± 1 only. Therefore, instead of checking all the numbers, I defined a generator that generates all the numbers of form 6k ± 1 below the number given as the input. (I added 3 also in the prime_numbers list while initializing it as 2,3 cannot be of the form 6k ± 1)
def potential_primes(number: int) -> int:
"""Generate the numbers potential to be prime"""
# Prime numbers are always of the form 6k ± 1.
number_for_function = number // 6
for k in range(1, number_for_function + 1):
yield 6*k - 1
yield 6*k + 1
Obviously, the program should have been much faster because I am checking comparatively many less numbers. But, counterintuitively the program is slower than before. What could be the reason behind this?

In every six numbers, three are even and one is a multiple of 3. The other two are 6-coprime, so are potentially prime:
6k+0 6k+1 6k+2 6k+3 6k+4 6k+5
even even even
3x 3x
For the three evens your primality check uses only one division (by 2) and for the 4th number, two divisions. In all, five divisions that you seek to avoid.
But each call to a generator has its cost too. If you just replace the call to range with the call to create your generator, but leave the other code as is(*), you are not realizing the full savings potential.
Why? Because (*)if that's the case, while you indeed test only 1/3 of the numbers now, you still test each of them by 2 and 3. Needlessly. And apparently the cost of generator use is too high.
The point to this technique known as wheel factorization is to not test the 6-coprime (in this case) numbers by the primes which are already known to not be their divisors, by construction.
Thus, you must start with e.g. prime_numbers = [5,7] and use it in your divisibility testing loop, not all primes, which start with 2 and 3, which you do not need.

Using nested for loop along with square root will be heavy on computation, rather look at Prime Sieve Algorithm which is much faster but does take some memory.

One way to use the 6n±1 idea is to alternate step sizes in the main loop by stepping 2 then 4. My Python is not good, so this is pseudocode:
function listPrimes(n)
// Deal with low numbers.
if (n < 2) return []
if (n = 2) return [2]
if (n = 3) return [2, 3]
// Main loop
primeList ← [2, 3]
limit ← 1 + sqrt(n) // Calculate square root once.
index ← 5 // We have checked 2 and 3 already.
step ← 2 // Starting step value: 5 + 2 = 7.
while (index <= limit) {
if (isPrime(index)) {
primeList.add(index)
}
index ← index + step
step ← 6 - step // Alternate steps of 2 and 4
}
return primeList
end function

Number of Different Subsequences GCDs

Number of Different Subsequences GCDs
You are given an array nums that consists of positive integers.
The GCD of a sequence of numbers is defined as the greatest integer that divides all the numbers in the sequence evenly.
For example, the GCD of the sequence [4,6,16] is 2.
A subsequence of an array is a sequence that can be formed by removing some elements (possibly none) of the array.
For example, [2,5,10] is a subsequence of [1,2,1,2,4,1,5,10].
Return the number of different GCDs among all non-empty subsequences of nums.
Example 1:
Input: nums = [6,10,3]
Output: 5
Explanation: The figure shows all the non-empty subsequences and their GCDs.
The different GCDs are 6, 10, 3, 2, and 1.
from itertools import permutations
import math
class Solution:
def countDifferentSubsequenceGCDs(self, nums: List[int]) -> int:
s = set()
g =0
for i in range(1,len(nums)+1):
comb = combinations(nums,i)
for i in comb:
if len(i)==1:
u = i[0]
s.add(u)
else:
g = math.gcd(i[0],i[1])
s.add(g)
if len(i)>2:
for j in range(2,len(i)):
g = math.gcd(g,i[j])
s.add(g)
g = 0
y = len(s)
return y
I am getting TLE for this input. Can someone pls help?
[5852,6671,170275,141929,2414,99931,179958,56781,110656,190278,7613,138315,58116,114790,129975,144929,61102,90624,60521,177432,57353,199478,120483,75965,5634,109100,145872,168374,26215,48735,164982,189698,77697,31691,194812,87215,189133,186435,131282,110653,133096,175717,49768,79527,74491,154031,130905,132458,103116,154404,9051,125889,63633,194965,105982,108610,174259,45353,96240,143865,184298,176813,193519,98227,22667,115072,174001,133281,28294,42913,136561,103090,97131,128371,192091,7753,123030,11400,80880,184388,161169,155500,151566,103180,169649,44657,44196,131659,59491,3225,52303,141458,143744,60864,106026,134683,90132,151466,92609,120359,70590,172810,143654,159632,191208,1497,100582,194119,134349,33882,135969,147157,53867,111698,14713,126118,95614,149422,145333,52387,132310,108371,127121,93531,108639,90723,416,141159,141587,163445,160551,86806,120101,157249,7334,60190,166559,46455,144378,153213,47392,24013,144449,66924,8509,176453,18469,21820,4376,118751,3817,197695,198073,73715,65421,70423,28702,163789,48395,90289,76097,18224,43902,41845,66904,138250,44079,172139,71543,169923,186540,77200,119198,184190,84411,130153,124197,29935,6196,81791,101334,90006,110342,49294,67744,28512,66443,191406,133724,54812,158768,113156,5458,59081,4684,104154,38395,9261,188439,42003,116830,184709,132726,177780,111848,142791,57829,165354,182204,135424,118187,58510,137337,170003,8048,103521,176922,150955,84213,172969,165400,111752,15411,193319,78278,32948,55610,12437,80318,18541,20040,81360,78088,194994,41474,109098,148096,66155,34182,2224,146989,9940,154819,57041,149496,120810,44963,184556,163306,133399,9811,99083,52536,90946,25959,53940,150309,176726,113496,155035,50888,129067,27375,174577,102253,77614,132149,131020,4509,85288,160466,105468,73755,4743,41148,52653,85916,147677,35427,88892,112523,55845,69871,176805,25273,99414,143558,90139,180122,140072,127009,139598,61510,17124,190177,10591,22199,34870,44485,43661,141089,55829,70258,198998,87094,157342,132616,66924,96498,88828,89204,29862,76341,61654,158331,187462,128135,35481,152033,144487,27336,84077,10260,106588,19188,99676,38622,32773,89365,30066,161268,153986,99101,20094,149627,144252,58646,148365,21429,69921,95655,77478,147967,140063,29968,120002,72662,28241,11994,77526,3246,160872,175745,3814,24035,108406,30174,10492,49263,62819,153825,110367,42473,30293,118203,43879,178492,63287,41667,195037,26958,114060,99164,142325,77077,144235,66430,186545,125046,82434,26249,54425,170932,83209,10387,7147,2755,77477,190444,156388,83952,117925,102569,82125,104479,16506,16828,83192,157666,119501,29193,65553,56412,161955,142322,180405,122925,173496,93278,67918,48031,141978,54484,80563,52224,64588,94494,21331,73607,23440,197470,117415,23722,170921,150565,168681,88837,59619,102362,80422,10762,85785,48972,83031,151784,79380,64448,87644,26463,142666,160273,151778,156229,24129,64251,57713,5341,63901,105323,18961,70272,144496,18591,191148,19695,5640,166562,2600,76238,196800,94160,129306,122903,40418,26460,131447,86008,20214,133503,174391,45415,47073,39208,37104,83830,80118,28018,185946,134836,157783,76937,33109,54196,37141,142998,189433,8326,82856,163455,176213,144953,195608,180774,53854,46703,78362,113414,140901,41392,12730,187387,175055,64828,66215,16886,178803,117099,112767,143988,65594,141919,115186,141050,118833,2849]

I'm going to add "an answer" here because most "not horribly slow" programs I've seen for this are way too elaborate.
Call the input xs. The fastest way I know of asks, for each integer j in 1 through max(xs), can j be the gcd of some non-empty subset of xs? Of course if max(xs) can be huge, that can be slow. But in the context you apparently took this from (LeetCode), it cannot be huge.
So, given j, how do we know whether some subset's gcd is j? Actually easy! We look at all and only the multiples of j in xs. The gcd of all of those is at least j. If, at any point along the way, their gcd so far is j, we found a subset whose gcd is j. Else the running gcd exceeds j after processing all of j's multiples, so no subset's gcd is j.
def numgcds(xs):
from math import gcd
limit = max(xs) + 1
result = 0
xsset = set(xs)
for j in range(1, limit):
g = 0
for x in range(j, limit, j):
if x in xsset:
g = gcd(x, g)
if g == j:
result += 1
break
return result
Where L is max(xs), worst-case runtime is O(L * log(L)). Across outer loop iterations, the inner loop goes around (at worst) L times at first, then L/2 times, then L/3, and so on. That sums to L*(1/1 + 1/2 + 1/3 + ... + 1/L). The second factor (the sum of reciprocals) is the L'th "harmonic number", and is approximately the natural logarithm of L.
More Gonzo
I don't really like having the runtime depend on the largest integer in the input. For example, numgcds([20000000]) takes 20 million iterations of the outer loop to determine that there's only one gcd, and can take appreciable time (about 30 seconds on my box just now).
Instead, with more code, we can build some dicts that eliminate all searching. For each divisor d of an integer in xs, d2xs[d] is the list of multiples of d in xs. The keys of d2xs are the only possible gcds we need to check, and a key's associated values are exactly (no searching needed) the multiples of the key in xs.
The collection of all possible divisors of all integers in xs can be found by factoring each integer in xs, and generating all possible combinations of its factors' prime powers.
This is harder to code, but can run very much faster. numgcds([20000000]) is essentially instant. And it runs about 10 times faster for the largish example you gave.
def gendivisors(x):
from collections import Counter
from itertools import product
from math import prod
c = Counter(factor(x))
pows = []
for p, k in c.items():
pows.append([p**i for i in range(k+1)])
for t in product(*pows):
yield prod(t)
def numgcds(xs):
from math import gcd
from collections import defaultdict
d2xs = defaultdict(list)
for x in xs:
for d in gendivisors(x):
d2xs[d].append(x)
result = 0
for j, mults in d2xs.items():
g = 0
for x in mults:
g = gcd(x, g)
if g == j:
result += 1
break
return result
I'm not including code for factor(n) - pick your favorite. The code requires it return an iterable (list, generator iterator, tuple, doesn't matter) of all n's prime factors. Order doesn't matter. As special cases, list(factor(i)) should return [i] for i equal to 0 or 1.
For ordinary cases, list(factor(p)) == [p] for a prime p, and, e.g., sorted(factor(20)) == [2, 2, 5].
Worst-case timing is much harder to nail, but the key bit is that a reasonable implementation of factor(n) will have worst-case time O(sqrt(n)).

How do I make my program in python faster

I have a project where I need to find the optimal triplets of number (E, F, G) such that E and F is very close to eachother (the difference is smallest) and G is bigger than E and F. I have to make n numbers of such triplets.
The way I tought about it was to sort the given list of numbers then search for the smallest differences then those two will be my E and F after all the n pairs will be done I will search for every pair of E and F a G such that G is bigger than E and F. I know this is the greedy way but my code is very slow, it takes up to 1 minute when the list is like 300k numbers and i have to do 2k triplets. Any idea on how to improve the code?
guests is n (the number of triplets)
sticks is the list of all the numbers
# We sort the list using the inbuilt function sticks.sort()
save = guests # Begining to search for the best pairs of E and F
efficiency = 0 while save != 0:
difference = 1000000 # We asign a big value to difference each time
# Searching for the smallest difference between two elements
for i in range(0, length - 1):
if sticks[i+1] - sticks[i] < difference:
temp_E = i
temp_F = i + 1
difference = sticks[i+1] - sticks[i]
# Saving the two elements in list stick_E and stick_F
stick_E.append(sticks[temp_E])
stick_F.append(sticks[temp_F])
# Calculating the efficiency
efficiency += ((sticks[temp_F] - sticks[temp_E]) * (sticks[temp_F] - sticks[temp_E]))
# Deleting the two elements from the main list
sticks.pop(temp_E)
sticks.pop(temp_E)
length -= 2
save -= 1
# Searching for stick_G for every pair made for i in range(0, len(stick_F)):
for j in range(0, length):
if stick_F[i] < sticks[j]:
stick_G.append(sticks[j]) # Saves the element found
sticks.pop(j) # Deletes the element from the main list
length -= 1
break
> # Output the result to a local file print_to_file(stick_E, stick_F, stick_G, efficiency, output_file)
I commented the code the best I could so it would be easier for you to understand.

How to generate 1825 numbers with a step of 0.01 in a specific range

In the following code I want to get len(a) should be 1825 keeping step 0.01. But when I print len(a) it gives me 73. For getting length of 1825 I have to generate numbers from 2.275 to 3 with a step of 0.01 ,73 times. How can I do that? I tried to use np.linspace but that command doesn't work for this case.
a = np.arange(2.275, 3, 0.01)

Seems like you want to np.random.choice 1825 times
>>> a = np.arange(2.275,3,0.01)
>>> c = np.random.choice(a, 1825)
array([2.995, 2.545, 2.755, ..., 2.875, 2.275, 2.605])
>>> c.shape
(1825,)
Edit
If you want a repeated 25 times (i.e. 1825/73) in sequence, use np.tile()
target = 1825
n = target/len(a)
np.tile(a, int(n))
yields
array([2.275, 2.285, 2.295, ..., 2.975, 2.985, 2.995])

Here's a one liner, given a = np.arange(2.275, 3, 0.01) and n = 1825:
a = np.broadcast_to(a, (n // a.size + book(n % a.size), a.size)).ravel()[:n]
This uses np.broadcast_to to turn a into a matrix where it repeats itself enough times to fill 1825 elements. ravel then flattens the repeated list and the final slice chops off the unwanted elements. The ravel operation is what actually copies the list since the broadcast uses stride tricks to avoid copying the data.

find primes in a certain range efficiently

This is code an algorithm I found for Sieve of Eratosthenes for python3. What I want to do is edit it so the I can input a range of bottom and top and then input a list of primes up to the bottom one and it will output a list of primes within that range.
However, I am not quite sure how to do that.
If you can help that would be greatly appreciated.
from math import sqrt
def sieve(end):
if end < 2: return []
#The array doesn't need to include even numbers
lng = ((end//2)-1+end%2)
# Create array and assume all numbers in array are prime
sieve = [True]*(lng+1)
# In the following code, you're going to see some funky
# bit shifting and stuff, this is just transforming i and j
# so that they represent the proper elements in the array.
# The transforming is not optimal, and the number of
# operations involved can be reduced.
# Only go up to square root of the end
for i in range(int(sqrt(end)) >> 1):
# Skip numbers that aren’t marked as prime
if not sieve[i]: continue
# Unmark all multiples of i, starting at i**2
for j in range( (i*(i + 3) << 1) + 3, lng, (i << 1) + 3):
sieve[j] = False
# Don't forget 2!
primes = [2]
# Gather all the primes into a list, leaving out the composite numbers
primes.extend([(i << 1) + 3 for i in range(lng) if sieve[i]])
return primes

I think the following is working:
def extend_erathostene(A, B, prime_up_to_A):
sieve = [ True ]* (B-A)
for p in prime_up_to_A:
# first multiple of p greater than A
m0 = ((A+p-1)/p)*p
for m in range( m0, B, p):
sieve[m-A] = False
limit = int(ceil(sqrt(B)))
for p in range(A,limit+1):
if sieve[p-A]:
for m in range(p*2, B, p):
sieve[m-A] = False
return prime_up_to_A + [ A+c for (c, isprime) in enumerate(sieve) if isprime]

This problem is known as the "segmented sieve of Eratosthenes." Google gives several useful references.

You already have the primes from 2 to end, so you just need to filter the list that is returned.

One way is to run the sieve code with end = top and modify the last line to give you only numbers bigger than bottom:
If the range is small compared with it's magnitude (i.e. top-bottom is small compared with bottom), then you better use a different algorithm:
Start from bottom and iterate over the odd numbers checking whether they are prime. You need an isprime(n) function which just checks whether n is divisible by all the odd numbers from 1 to sqrt(n):
def isprime(n):
i=2
while (i*i<=n):
if n%i==0: return False
i+=1
return True

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Speed up calculating sum of cartesian products - python-3.x

Related

Program does not run faster as expected when checking much less numbers for finding primes

Number of Different Subsequences GCDs

How do I make my program in python faster

How to generate 1825 numbers with a step of 0.01 in a specific range

find primes in a certain range efficiently

Categories

Resources