Combine all column in df with pandas (itertools)

Combine all column in df with pandas (itertools) - python-3.x

Hello I have a df such as :
COL1 COL2 COL3 COL4
A B C D
how can I get a nex df with all combination between columns ?
and get
COL1_COL2 COL1_COL3 COL1_COL4 COL2_COL3 COL2_COL4 COL3_COL4
['A','B']['A','C'] ['A','D'] ['B','C'] ['B','D'] ['C','D']
I gess we coule use itertool?

Indeed itertools are useful here
from itertools import combinations
columns = [df[c] for c in df.columns]
column_pairs = ([pd.DataFrame(
columns = [pair[0].name + '_' + pair[1].name],
data= pd.concat([pair[0],pair[1]],axis=1)
.apply(list,axis=1))
for pair in combinations(columns, 2)]
)
pd.concat(column_pairs, axis = 1)
produces
COL1_COL2 COL1_COL3 COL1_COL4 COL2_COL3 COL2_COL4 COL3_COL4
-- ----------- ----------- ----------- ----------- ----------- -----------
0 ['A', 'B'] ['A', 'C'] ['A', 'D'] ['B', 'C'] ['B', 'D'] ['C', 'D']
1 ['a', 'b'] ['a', 'c'] ['a', 'd'] ['b', 'c'] ['b', 'd'] ['c', 'd']
(I added another row to the original df with a, b, c, d, to make sure it works in this slightly more general case)
The code is fairly straightforward. columns are a list of columns, each as pd.Series, of the original dataframe. combinations(columns, 2) enumerate all pairs of those. The pd.DataFrame(columns = [pair[0].name + '_' + pair[1].name], data= pd.concat([pair[0],pair[1]],axis=1).apply(list,axis=1)) combines first and second column from the tuple pair into a single-column df with the combined name and values. Finally pd.concat(column_pairs, axis = 1) combines them all together

Related

Set itertools product maximum repeat value per element

I want to generate different combinations of 3 elements a, b, and c. The length of these combinations needs to be 4. I want to have a maximum of 4 times from 'a' and a maximum 1 time from each 'b' and 'c' element. So, for example, we can have ['a',' a',' a','a'] or ['a','a','b','c'] but not ['a','b','b','b'].
There is a similar question in 1, but, as far as I know, using the last 'gen' function, the length of a generation is controlled by the multiplication of a maximum number of repetitions (4 in my case). Also, cases were limited to tuples with exactly 1 'b' and 1 'c' and the rest are 'a'. For the last issue, I replaced 'combinations' with 'combinations_with_replacement', but it still produces tuples with 4 elements and there is no ['a',' a',' a','a'].
How can I tackle this problem?
Here is the code:
from itertools import combinations_with_replacement
def gen(ns, elems=None, C=None, out=None):
if elems is None:
elems = list(range(len(ns)))
else:
assert len(elems) == len(ns)
if out is None:
N = 1
for n in ns:
N *= n
out = [elems[0]]*N
C = range(N)
if len(ns) == 1:
yield out
else:
n = ns[-1]
e = elems[-1]
for c in combinations_with_replacement(C,n):
out_ = out.copy()
for i in c:
out_[i] = e
C_ = [i for i in C if i not in c]
yield from gen(ns[:-1], elems[:-1], C_, out_)
for tmp_list in gen([4,1,1], elems=['a', 'b', 'c'], C=None, out=None):
print(tmp_list)
output:
['c', 'b', 'a', 'a']
['c', 'a', 'b', 'a']
['c', 'a', 'a', 'b']
['b', 'c', 'a', 'a']
['a', 'c', 'b', 'a']
['a', 'c', 'a', 'b']
['b', 'a', 'c', 'a']
['a', 'b', 'c', 'a']
['a', 'a', 'c', 'b']
['b', 'a', 'a', 'c']
['a', 'b', 'a', 'c']
['a', 'a', 'b', 'c']
Please note that since I care about the execution time, I want to generate the tuples in a loop.

How do I remove all elements that occur more than once in a list? [duplicate]

Given a list of strings I want to remove the duplicates and original word.
For example:
lst = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
The output should have the duplicates removed,
so something like this ['a', 'b', 'd']
I do not need to preserve the order.

Use a collections.Counter() object, then keep only those values with a count of 1:
from collections import counter
[k for k, v in Counter(lst).items() if v == 1]
This is a O(N) algorithm; you just need to loop through the list of N items once, then a second loop over fewer items (< N) to extract those values that appear just once.
If order is important and you are using Python < 3.6, separate the steps:
counts = Counter(lst)
[k for k in lst if counts[k] == 1]
Demo:
>>> from collections import Counter
>>> lst = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
>>> [k for k, v in Counter(lst).items() if v == 1]
['a', 'b', 'd']
>>> counts = Counter(lst)
>>> [k for k in lst if counts[k] == 1]
['a', 'b', 'd']
That the order is the same for both approaches is a coincidence; for Python versions before Python 3.6, other inputs may result in a different order.
In Python 3.6 the implementation for dictionaries changed and input order is now retained.

t = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
print [a for a in t if t.count(a) == 1]

lst = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
from collections import Counter
c = Counter(lst)
print([k for k,v in c.items() if v == 1 ])
collections.Counter will count the occurrences of each element, we keep the elements whose count/value is == 1 with if v == 1

#Padraic:
If your list is:
lst = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
then
list(set(lst))
would return the following:
['a', 'c', 'b', 'e', 'd']
which is not the thing adhankar wants..
Filtering all duplicates completely can be easily done with a list comprehension:
[item for item in lst if lst.count(item) == 1]
The output of this would be:
['a', 'b', 'd']
item stands for every item in the list lst, but it is only appended to the new list if lst.count(item) equals 1, which ensures, that the item only exists once in the original list lst.
Look up List Comprehension for more information: Python list comprehension documentation

You could make a secondary empty list and only append items that aren't already in it.
oldList = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
newList = []
for item in oldList:
if item not in newList:
newList.append(item)
print newList
I don't have an interpreter with me, but the logic seems sound.

How to use the difference between two lists to transform the first into the second? Python

I have a large number of identical lists 'old' which I want to transform in the same way into a list 'new'. The way I want to do it, is to make an example of the desired list 'new'. Then I turn the difference between the two lists 'old' and 'new' into a rule, and then use that rule to turn my other lists 'old_2' into 'new_2'.
I cannot figure out how to do the first step and the second step does not give me the expected result. Is there an elegant way to do this?
import numpy
# 0 1 2 3 4 5
old_1 = ['A', 'B', 'C', 'D', 'E', 'F']
new = ['B', 'C', 'D', 'E', 'A']
# 01 Get the difference new - /- old_1 based on index positions of
# the list elements, to get something like this:
order = [1,2,3,4,0]
# 02 Then use this order to transform a second identical list, old_2.
# For this I wanted to use the following:
old_2 = ['A', 'B', 'C', 'D', 'E', 'F']
old_2 = numpy.array(old_2)
order = numpy.array(order)
inds = order.argsort()
print('inds =', inds) # As a check, this gives the wrong order: [4 1 0 2 3]
new_2 = old_2[inds]
# I expected this to result in what I want, which is:
print(new_2)
['C', 'B', 'D', 'E', 'A']
# But what I get in reality is this:
inds = [4 1 0 2 3]
['E' 'B' 'A' 'C' 'D']
Any suggestions to get the desired result?
new_2 = ['B', 'C', 'D', 'E', 'A']

From what I understand, I tried to edit your code. Hopefully it helps.
import numpy as np
def get_order(new, old):
order = []
for element in new:
order.append(old.index(element))
return order
def main():
old_1 = ['A', 'B', 'C', 'D', 'E', 'F']
new = ['B', 'C', 'D', 'E', 'A']
order = get_order(new, old_1)
print(order)
old_2 = ['A', 'B', 'C', 'D', 'E', 'F']
old_2 = np.array(old_2)
order = np.array(order)
#inds = order.argsort()
#print('inds =', inds) # As a check, this gives the wrong order: [4 1 0 2 3]
new_2 = old_2[order]
print(new_2)
if __name__ == '__main__':
main()
Output
[1, 2, 3, 4, 0]
['B' 'C' 'D' 'E' 'A']

How to combine rows in pandas

I have a dataset like this
df = pd.DataFrame({'a' : ['a', 'b' , 'b', 'a'], 'b': ['a', 'b' , 'b', 'a'] })
And i want to combine first two rows and get dataset like this
df = pd.DataFrame({'a' : ['a b' , 'b', 'a'], 'b': ['a b' , 'b', 'a'] })
no rules but first two rows. I do not know how to combine row so i 'create' method to combine by transpose() as below
db = df.transpose()
db["new"] = db[0].map(str) +' '+ db[1]
db.drop([0, 1], axis=1, inplace=True) # remove these two columns
cols = db.columns.tolist() # re order
cols = cols[-1:] + cols[:-1]
db = db[cols]
df = db.transpose() # reverse operation
df.reset_index()
It works but i think there is an easier way

You can simply add the two rows
df.loc[0] = df.loc[0]+ df.loc[1]
df.drop(1, inplace = True)
You get
a b
0 ab ab
2 b b
3 a a
A bit more fancy looking :)
df.loc[0]= df[:2].apply(lambda x: ''.join(x))
df.drop(1, inplace = True)

Real combination in Groovy

Is there method or some smart way that's easy to read to make a combination of elements in Groovy? I'm aware of Iterable#combinations or GroovyCollections#combinations but it makes Partial permutation with repetition as I understand it so far. See example.
// Groovy combinations result
def e = ['a', 'b', 'c']
def result = [e, e].combinations()
assert [['a', 'a'], ['b', 'a'], ['c', 'a'], ['a', 'b'], ['b', 'b'], ['c', 'b'], ['a','c'], ['b', 'c'], ['c', 'c']] == result
// What I'm looking for
def e = ['a', 'b', 'c']
def result = ???
assert [['a', 'b'], ['a', 'c'], ['b', 'c']] == result
Feel free to post alternate solutions. I'm still looking for better readability (it's used in script for non-developers) and performance (w/o unnecessary iterations).

I'm not so sure about the readability, but this should do the trick.
def e = ['a', 'b', 'c']
def result = [e, e].combinations().findAll { a, b ->
a < b
}
assert [['a', 'b'], ['a', 'c'], ['b', 'c']] == result
Note that if a element occur twice in the list its combinations will also occur twice. Add a '.unique()' at the end if they are unwanted

Here's a more generalized approach that allows you to specify the "r" value for your nCr combinations. It does this by storing permutations in Sets, with the Sets providing the uniqueness:
// returns combinations of the input list of the provided size, r
List combinationsOf(List list, int r) {
assert (0..<list.size()).contains(r) // validate input
def combs = [] as Set
list.eachPermutation {
combs << it.subList(0, r).sort { a, b -> a <=> b }
}
combs as List
}
// the test scenario...
def e = ['a', 'b', 'c']
def result = combinationsOf(e, 2)
assert [['a', 'b'], ['a', 'c'], ['b', 'c']] == result

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Combine all column in df with pandas (itertools) - python-3.x

Hello I have a df such as : COL1 COL2 COL3 COL4 A B C D how can I get a nex df with all combination between columns ? and get COL1_COL2 COL1_COL3 COL1_COL4 COL2_COL3 COL2_COL4 COL3_COL4 ['A','B']['A','C'] ['A','D'] ['B','C'] ['B','D'] ['C','D'] I gess we coule use itertool?

Related

Set itertools product maximum repeat value per element

How do I remove all elements that occur more than once in a list? [duplicate]

How to use the difference between two lists to transform the first into the second? Python

How to combine rows in pandas

Real combination in Groovy

Categories

Resources