Creating custom combination from two lists - python-3.5

I am looking to use two lists:
L1 = ['a', 's', 'd']
L2 = [str(1), str(2)]
I need to create a third list:
L3 = [(a1, s1, d1), (a1, s1, d2), ... ]
L3 has tuples of size 3 where each tuple has only non-repetitive elements from L1 but can have repetitive elements from L2.
i.e. a pair as (a1, s2, d2) is allowed but (a1, a2, d1) is not allowed.
L3 has tuples of size 3.
I am working with large L1 and L2 so the above example is only for illustration. I am not sure how to approach this problem. I have thought about using itertools permutation and combination modules but I am not getting the list L3 above. One brute force solution is to do something as:
L3 = list(itertools.combinations(list(itertools.product(L1, L2)), 3))
and then condition out the elements as ('a', '1'), ('a', '2'), ('d', '2') but for a large combination that is not efficient for loop.

Since it sounds like you want L1 to be the first elements of the tuples, I think we simply need to zip them, not iter-anything them. We only need to take the product of L2.
In [327]: [list(zip(L1, p)) for p in itertools.product(L2, repeat=len(L1))]
Out[327]:
[[('a', '1'), ('s', '1'), ('d', '1')],
[('a', '1'), ('s', '1'), ('d', '2')],
[('a', '1'), ('s', '2'), ('d', '1')],
[('a', '1'), ('s', '2'), ('d', '2')],
[('a', '2'), ('s', '1'), ('d', '1')],
[('a', '2'), ('s', '1'), ('d', '2')],
[('a', '2'), ('s', '2'), ('d', '1')],
[('a', '2'), ('s', '2'), ('d', '2')]]
where you can replace [ and ] with ( and ) to turn the listcomp into a genexp if you don't want to materialize the whole object at once.
If you want to merge your tuples' elements into one string, you could do that too:
In [338]: gen = (tuple(''.join(pair) for pair in zip(L1, p))
for p in itertools.product(L2, repeat=len(L1)))
In [339]: for elem in gen:
...: print(elem)
('a1', 's1', 'd1')
('a1', 's1', 'd2')
('a1', 's2', 'd1')
('a1', 's2', 'd2')
('a2', 's1', 'd1')
('a2', 's1', 'd2')
('a2', 's2', 'd1')
('a2', 's2', 'd2')

Related

How to pair up elements using permutations in an rdd list of lists

I am trying to get a list of paired elements from each list of lists in rdd.
My data :
[['a','b','c'],['e','f','g','h'],['x','y','z']]
I want :
[('a','b'),('b','c'),('c','a'),('e','f'),('f','g'),('g','h'),('e','g'),('e','h')...... and all possible pairs]
>>> data = [[['a','b','c'],['e','f','g','h'],['x','y','z']]]
>>> df = spark.sparkContext.parallelize(data)
>>> import itertools
>>> df.map(lambda x: [list(itertools.combinations(i,2)) for i in x]).map(lambda x: list(itertools.chain.from_iterable(x))).foreach(lambda y: print(y))
result:
[('a', 'b'), ('a', 'c'), ('b', 'c'), ('e', 'f'), ('e', 'g'), ('e', 'h'), ('f', 'g'), ('f', 'h'), ('g', 'h'), ('x', 'y'), ('x', 'z'), ('y', 'z')]

how to get the expected output in list format

below are 2 lst1 and lst2 and expected output is in output as below.
lst1 = ['q','r','s','t','u','v','w','x','y','z']
lst2 =['1','2','3']
Output expected
[['q','1'], ['r','2'], ['s','3'], ['t','1'],['u','2'],['v','3'],['w','1'],['x','2'],['y','3'],
['z','1']]"
This is a very simple approach to this problem.
lst1 = ['q','r','s','t','u','v','w','x','y','z']
lst2 = ['1','2','3']
new_list = []
for x in range(len(lst1)):
new_list.append([lst1[x], lst2[x % 3]])
print(new_list) # [['q', '1'], ['r', '2'], ['s', '3'], ['t', '1'], ['u', '2'], ['v', '3'], ['w', '1'], ['x', '2'], ['y', '3'], ['z', '1']]
You could also use list comprehension in this case, like so:-
new_list = [[lst1[x], lst2[x % 3]] for x in range(len(lst1))]
You can use zip() and itertools.cycle().
from itertools import cycle
lst1 = ['q','r','s','t','u','v','w','x','y','z']
lst2 =['1','2','3']
result = [[letter, number] for letter, number in zip(lst1, cycle(lst2))]
print(result)
Expected output:
[['q', '1'], ['r', '2'], ['s', '3'], ['t', '1'], ['u', '2'], ['v', '3'], ['w', '1'], ['x', '2'], ['y', '3'], ['z', '1']]
Another solution would be to additonally use map().
result = list(map(list, zip(lst1, cycle(lst2))))
In case you wanna use tuples you could just do
from itertools import cycle
lst1 = ['q','r','s','t','u','v','w','x','y','z']
lst2 =['1','2','3']
result = list(zip(lst1, cycle(lst2)))
print(result)
which would give you
[('q', '1'), ('r', '2'), ('s', '3'), ('t', '1'), ('u', '2'), ('v', '3'), ('w', '1'), ('x', '2'), ('y', '3'), ('z', '1')]

How to past a list of lists to a function expecting *iterables

I want the cartesian product of a bunch of lists.
from itertools import product
v1=['a','b','c']
v2=['1','2']
v3=['x','y','z']
list(product(v1,v2,v3))
This returns the desired result:
[('a', '1', 'x'),
('a', '1', 'y'),
('a', '1', 'z'),
('a', '2', 'x'),
('a', '2', 'y'),
('a', '2', 'z'),
('b', '1', 'x'),
('b', '1', 'y'),
('b', '1', 'z'),
('b', '2', 'x'),
('b', '2', 'y'),
('b', '2', 'z'),
('c', '1', 'x'),
('c', '1', 'y'),
('c', '1', 'z'),
('c', '2', 'x'),
('c', '2', 'y'),
('c', '2', 'z')]
However, I don't know the number of lists in advance. Suppose I have them stored as a list of lists vs, and I try to do this:
vs=[v1,v2,v3]
list(product(vs))
Of course, that doesn't give me what I want, because it treats vs as a single argument instead of multiple arguments.
[(['a', 'b', 'c'],), (['1', '2'],), (['x', 'y', 'z'],)]
Is there a way that I can pass a list of lists into product and have it operate on the sublists?
Try using a starred expression list(product(*vs))

How to create New Rdd with all possible combination of elements other Rdd in pyspark?

Hi i have created a Rdd like below
rdd1=sc.parallelize(['P','T','K'])
rdd1.collect()
['P', 'T', 'K']
Now I want to create new RDD2 with all possible combinations like below with new RDD.i.e.except same element combination like(p,p),(k,k),(t,t).
my expected output when i am doing
RDD2.collect()
[
('P'),('T'),('K'),
('P','T'),('P','K'),('T','K'),('T','P'),('K','P'),('K','T'),
('P','T','K'),('P','K','T'),('T','P','K'),('T','K','P'),('K','P','T'),('K','T','P')
]
It seems that you want to generate all permutations of the elements in your rdd where each row contains unique values.
One way would be to first create a helper function to generate the desired combination of length n:
from functools import reduce
from itertools import chain
def combinations_of_length_n(rdd, n):
# for n > 0
return reduce(
lambda a, b: a.cartesian(b).map(lambda x: tuple(chain.from_iterable(x))),
[rdd]*n
).filter(lambda x: len(set(x))==n)
Essentially the function will do n Cartesian products of your rdd with itself and keep only the rows where all of the values are distinct.
We can test this out for n = [2, 3]:
print(combinations_of_length_n(rdd1, n=2).collect())
#[('P', 'T'), ('P', 'K'), ('T', 'P'), ('K', 'P'), ('T', 'K'), ('K', 'T')]
print(combinations_of_length_n(rdd1, n=3).collect())
#[('P', 'T', 'K'),
# ('P', 'K', 'T'),
# ('T', 'P', 'K'),
# ('K', 'P', 'T'),
# ('T', 'K', 'P'),
# ('K', 'T', 'P')]
The final output that you want is just union of these intermediate results with the original rdd (with the values mapped to tuples).
rdd1.map(lambda x: tuple((x,)))\
.union(combinations_of_length_n(rdd1, 2))\
.union(combinations_of_length_n(rdd1, 3)).collect()
#[('P',),
# ('T',),
# ('K',),
# ('P', 'T'),
# ('P', 'K'),
# ('T', 'P'),
# ('K', 'P'),
# ('T', 'K'),
# ('K', 'T'),
# ('P', 'T', 'K'),
# ('P', 'K', 'T'),
# ('T', 'P', 'K'),
# ('K', 'P', 'T'),
# ('T', 'K', 'P'),
# ('K', 'T', 'P')]
To generalize for any max number of repetitions:
num_reps = 3
reduce(
lambda a, b: a.union(b),
[
combinations_of_length_n(rdd1.map(lambda x: tuple((x,))), i+1)
for i in range(num_reps)
]
).collect()
#Same as above
Note: Cartesian products are expensive operations and should be avoided when possible.
There are several ways. You can run a loop and get permutations and store them in a list then convert list to rdd
>>> rdd1.collect()
['P', 'T', 'K']
>>>
>>> l = []
>>> for i in range(2,rdd1.count()+1):
... x = list(itertools.permutations(rdd1.toLocalIterator(),i))
... l = l+x
...
>>> rdd2 = sc.parallelize(l)
>>>
>>> rdd2.collect()
[('P', 'T'), ('P', 'K'), ('T', 'P'), ('T', 'K'), ('K', 'P'), ('K', 'T'), ('P', 'T', 'K'), ('P', 'K', 'T'), ('T', 'P', 'K'), ('T', 'K', 'P'), ('K', 'P', 'T'), ('K', 'T', 'P')]

Type Conversion in Python using ord()

s = '4'
c = ord(s)
print ("After converting character to integer : ",end="")
print (c)
output: After converting character to integer : 52
I don't understand the value of output. Can someone please explain why 52 is printed?
If you want become a string into integer you need use the int() function, ord() function returns an integer representing Unicode code point for the given Unicode character.
Example
s = '4'
c = int(s)
print ("After converting character to integer : ",end="")
print (c)
Python’s built-in function chr() is used for converting an Integer to a Character, while the function ord() is used to do the reverse, i.e, convert a Character to an Integer.
For example :
print([(chr(i),i) for i in range(49, 123)])
output is :
[('1', 49), ('2', 50), ('3', 51), ('4', 52), ('5', 53), ('6', 54), ('7', 55), ('8', 56), ('9', 57), (':', 58), (';', 59), ('<', 60), ('=', 61), ('>', 62), ('?', 63), ('#', 64), ('A', 65), ('B', 66), ('C', 67), ('D', 68), ('E', 69), ('F', 70), ('G', 71), ('H', 72), ('I', 73), ('J', 74), ('K', 75), ('L', 76), ('M', 77), ('N', 78), ('O', 79), ('P', 80), ('Q', 81), ('R', 82), ('S', 83), ('T', 84), ('U', 85), ('V', 86), ('W', 87), ('X', 88), ('Y', 89), ('Z', 90), ('[', 91), ('\', 92), (']', 93), ('^', 94), ('_', 95), ('`', 96), ('a', 97), ('b', 98), ('c', 99), ('d', 100), ('e', 101), ('f', 102), ('g', 103), ('h', 104), ('i', 105), ('j', 106), ('k', 107), ('l', 108), ('m', 109), ('n', 110), ('o', 111), ('p', 112), ('q', 113), ('r', 114), ('s', 115), ('t', 116), ('u', 117), ('v', 118), ('w', 119), ('x', 120), ('y', 121), ('z', 122)]
The ord() function takes a string argument of a single Unicode character and returns its integer Unicode code point value. It does the reverse of chr().

Resources