Set partition with fixed size - python-3.x

I need to find the set partition with fixed size blocks. For example set S=(1,2,3,4,5,6,7) and I want to partition it as block of (4,2,1). The answers are
([1,2,3,4][5,6][7])
([2,3,4,5][6,7][1])
([1,4,5,6][2,3][7])
.....................
.....................
Anybody knows how to solve it in Python easily. Please give some clue

You have to permute your table in 7*6*5*4*3*2*1 ways and then each table cut into parts.
For example:
def permute(table):
return [[],[],[],[],...]//table of permuted tables
def cut_into_parts(lengths_list, table):
rlist = []
for i in lengths_list[:-1]:
rlist.append(table[:-len(table) + i])
table = table[i:]
rlist.append(table[:lengths_list[-1]])
return rlist
I hope it is a helpful and easy way to do this.

Try the following function:
from itertools import permutations
def take(l, partition):
if len(partition) == 1:
for p in permutations(l):
yield (p,)
else:
for p in permutations(l,partition[0]):
for t in take([x for x in l if x not in p], partition[1:]):
yield (p,) + t
Then take([1,2,3,4,5,6,7],(4,2,1)) should be what you are looking for.
EDIT: Different solution now I understand the requirements better:
from itertools import permutations
def take(l, partition):
offsets = [0]
for x in partition:
offsets.append(offsets[-1]+x)
for p in permutations(l):
yield frozenset([frozenset(p[offsets[i]:offsets[i+1]]) for i in range(len(offsets)-1)])
for x in frozenset(take([1,2,3,4,5],(3,1,1))):
print([[z for z in y] for y in x])

Related

faster method for comparing two lists element-wise

I am building a relational DB using python. So far I have two tables, as follows:
>>> df_Patient.columns
[1] Index(['NgrNr', 'FamilieNr', 'DosNr', 'Geslacht', 'FamilieNaam', 'VoorNaam',
'GeboorteDatum', 'PreBirth'],
dtype='object')
>>> df_LaboRequest.columns
[2] Index(['RequestId', 'IsComplete', 'NgrNr', 'Type', 'RequestDate', 'IntakeDate',
'ReqMgtUnit'],
dtype='object')
The two tables are quite big:
>>> df_Patient.shape
[3] (386249, 8)
>>> df_LaboRequest.shape
[4] (342225, 7)
column NgrNr on df_LaboRequest if foreign key (FK) and references the homonymous column on df_Patient. In order to avoid any integrity error, I need to make sure that all the values under df_LaboRequest[NgrNr] are in df_Patient[NgrNr].
With list comprehension I tried the following (to pick up the values that would throw an error):
[x for x in list(set(df_LaboRequest['NgrNr'])) if x not in list(set(df_Patient['NgrNr']))]
Though this is taking ages to complete. Would anyone recommend a faster method (method as a general word, as synonym for for procedure, nothing to do with the pythonic meaning of method) for such a comparison?
One-liners aren't always better.
Don't check for membership in lists. Why on earth would you create a set (which is the recommended data structure for O(1) membership checks) and then cast it to a list which has O(N) membership checks?
Make the set of df_Patient once outside the list comprehension and use that instead of making the set in every iteration
patients = set(df_Patient['NgrNr'])
lab_requests = set(df_LaboRequest['NgrNr'])
result = [x for x in lab_requests if x not in patients]
Or, if you like to use set operations, simply find the difference of both sets:
result = lab_requests - patients
Alternatively, use pandas isin() function.
patients = patients.drop_duplicates()
lab_requests = lab_requests.drop_duplicates()
result = lab_requests[~lab_requests.isin(patients)]
Let's test how much faster these changes make the code:
import pandas as pd
import random
import timeit
# Make dummy dataframes of patients and lab_requests
randoms = [random.randint(1, 1000) for _ in range(10000)]
patients = pd.DataFrame("patient{0}".format(x) for x in randoms[:5000])[0]
lab_requests = pd.DataFrame("patient{0}".format(x) for x in randoms[2000:8000])[0]
# Do it your way
def fun1(pat, lr):
return [x for x in list(set(lr)) if x not in list(set(pat))]
# Do it my way: Set operations
def fun2(pat, lr):
pat_s = set(pat)
lr_s = set(lr)
return lr_s - pat_s
# Or explicitly iterate over the set
def fun3(pat, lr):
pat_s = set(pat)
lr_s = set(lr)
return [x for x in lr_s if x not in pat_s]
# Or using pandas
def fun4(pat, lr):
pat = pat.drop_duplicates()
lr = lr.drop_duplicates()
return lr[~lr.isin(pat)]
# Make sure all 3 functions return the same thing
assert set(fun1(patients, lab_requests)) == set(fun2(patients, lab_requests)) == set(fun3(patients, lab_requests)) == set(fun4(patients, lab_requests))
# Time it
timeit.timeit('fun1(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun1', number=100)
# Output: 48.36615000000165
timeit.timeit('fun2(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun2', number=100)
# Output: 0.10799920000044949
timeit.timeit('fun3(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun3', number=100)
# Output: 0.11038020000069082
timeit.timeit('fun4(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun4', number=100)
# Output: 0.32021789999998873
Looks like we have a ~150x speedup with pandas and a ~500x speedup with set operations!
I don't have a pandas installed right now to try this. But you could try removing the list(..) cast. I don't think it provides anything meaningful to the program and sets are much faster for lookup, e.g. x in set(...), than lists.
Also you could try doing this with the pandas API rather than lists and sets, sometimes this faster. Try searching for unique. Then you could compare the size of the two columns and if it is the same, sort them and do an equality check.

What is the difference between list1 and list1[:] in Python?

I was trying to implement a backtracking solution to a Leetcode problem (https://leetcode.com/problems/subsets/) and found out an unexpected bug in my code. In the first solution I do out_list.append(curr_array) in line 8 and it outputs me an empty output list.
class Solution:
def subsets(self, nums):
def backtrack(curr_array, curr_idx):
if len(curr_array) > 0:
out_list.append(curr_array)
for idx in range(curr_idx, len(nums)):
curr_array.append(nums[idx])
backtrack(curr_array, idx + 1)
curr_array.pop()
out_list = []
backtrack([], 0)
return out_list
Whereas when I do out_list.append(curr_array[:]), I get the correct answer as output.
class Solution:
def subsets(self, nums):
def backtrack(curr_array, curr_idx):
if len(curr_array) > 0:
out_list.append(curr_array[:])
for idx in range(curr_idx, len(nums)):
curr_array.append(nums[idx])
backtrack(curr_array, idx + 1)
curr_array.pop()
out_list = []
backtrack([], 0)
return out_list
I've been under the impression that list1[:] is the same thing as list1. Can you tell me what am I missing here?
cur_array is a reference to the original list. When you append cur_array to out_list and then later modify cur_array, out_list changes as well.
cur_array[:] is a copy of cur_array (same as cur_array.copy()). When you append cur_array[:] to out_list and then later modify cur_array, out_list does not change because it has its own copy of cur_array.
It looks like you make a misunderstand on list.
>>> a = [] # Here we announce a list
>>> id(a)
82700416
>>> id(a[:])
82700544
As you can see, slice will make a copy of list.

Creating tables in a radom in python

My aim is to create a table using two list. I was successful creating this, but I need this result in random order, not in sequence. Here my question how my result to make random from my output.
Is there any other method?
a = [2,3,4,5,6,7,8,9]
b = [12,13,14,15,16,17,19]
for i in b:
for j in a:
print(i,'x',j,'=,')
This should give you the desired result:
from random import randint
a = [2,3,4,5,6,7,8,9]
b = [12,13,14,15,16,17,19]
for i in range(0, len(a)):
for j in range(0, len(b)):
aNum = a[randint(0, len(a)-1)]
bNum = b[randint(0, len(b)-1)]
print(aNum, 'x', bNum, '=')
a.remove(aNum)
b.remove(bNum)

Using shapely to return co ordinates of multilinestring that intersect

I've generated random streets using Shapely's LineString function using the following code:
class StreetNetwork():
def __init__(self):
self.street_coords = []
self.coords = {}
def gen_street_coords(self, length, coordRange):
min_, max_ = coordRange
for i in range(length):
street = LineString(((randint(min_, max_), randint(min_, max_)),
(randint(min_, max_), randint(min_,max_))))
self.street_coords.append(street)
If I use:
street_network = StreetNetwork()
street_network.gen_street_coords(10, [-50, 50])
I get an image like so: Simple
I've been looking at the following question which seems similar. I now want to iterate through my list of street_coords, and split streets into 2 if they cross with another street but I'm finding it difficult to find the co-ordinates of the point of intersection. However, as I am unfamiliar with using Shapely, I am struggling to use the "intersects" function.
It is rather simple to check intersection of two LineString objects. To avoid getting empty geometries, I suggest to check for intersection first before computing it. Something like this:
from shapely.geometry import LineString, Point
def get_intersections(lines):
point_intersections = []
line_intersections = [] #if the lines are equal the intersections is the complete line!
lines_len = len(lines)
for i in range(lines_len):
for j in range(i+1, lines_len): #to avoid computing twice the same intersection we do some index handling
l1, l2 = lines[i], lines[j]
if l1.intersects(l2):
intersection = l1.intersection(l2)
if isinstance(intersection, LineString):
line_intersections.append(intersection)
elif isinstance(intersection, Point)
point_intersections.append(intersection)
else:
raise Exception('What happened?')
return point_intersections, line_intersections
With the example:
l1 = LineString([(0,0), (1,1)])
l2 = LineString([(0,1), (1,0)])
l3 = LineString([(5,5), (6,6)])
l4 = LineString([(5,5), (6,6)])
my_lines = [l1, l2, l3, l4]
print get_intersections(my_lines)
I got:
[<shapely.geometry.point.Point object at 0x7f24f00a4710>,
<shapely.geometry.linestring.LineString object at 0x7f24f00a4750>]

Python/Pandas element wise union of 2 Series containing sets in each element

I have 2 pandas data Series that I know are the same length. Each Series contains sets() in each element. I want to figure out a computationally efficient way to get the element wise union of these two Series' sets. I've created a simplified version of the code with fake and short Series to play with below. This implementation is a VERY inefficient way of doing this. There has GOT to be a faster way to do this. My real Series are much longer and I have to do this operation hundreds of thousands of times.
import pandas as pd
set_series_1 = pd.Series([{1,2,3}, {'a','b'}, {2.3, 5.4}])
set_series_2 = pd.Series([{2,4,7}, {'a','f','g'}, {0.0, 15.6}])
n = set_series_1.shape[0]
for i in range(0,n):
set_series_1[i] = set_series_1[i].union(set_series_2[i])
print set_series_1
>>> set_series_1
0 set([1, 2, 3, 4, 7])
1 set([a, b, g, f])
2 set([0.0, 2.3, 15.6, 5.4])
dtype: object
I've tried combining the Series into a data frame and using the apply function, but I get an error saying that sets are not supported as dataframe elements.
pir4
After testing several options, I finally came up with a good one... pir4 below.
Testing
def jed1(s1, s2):
s = s1.copy()
n = s1.shape[0]
for i in range(n):
s[i] = s2[i].union(s1[i])
return s
def pir1(s1, s2):
return pd.Series([item.union(s2[i]) for i, item in enumerate(s1.values)], s1.index)
def pir2(s1, s2):
return pd.Series([item.union(s2[i]) for i, item in s1.iteritems()], s1.index)
def pir3(s1, s2):
return s1.apply(list).add(s2.apply(list)).apply(set)
def pir4(s1, s2):
return pd.Series([set.union(*z) for z in zip(s1, s2)])

Resources