Apparently empty groups generated with itertools.groupby - python-3.x

I have some troubles with groupby from itertools
from itertools import groupby
for k, grp in groupby("aahfffddssssnnb"):
print(k, list(grp), list(grp))
output is:
a ['a', 'a'] []
h ['h'] []
f ['f', 'f', 'f'] []
d ['d', 'd'] []
s ['s', 's', 's', 's'] []
n ['n', 'n'] []
b ['b'] []
It works as expected.
itertools._grouper objects seems to be readable only once (maybe iterators ?)
but:
li = [grp for k, grp in groupby("aahfffddssssnnb")]
list(li[0])
[]
list(li[1])
[]
It seems empty ... I don't understand why ?
This one works:
["".join(grp) for k, grp in groupby("aahfffddssssnnb")]
['aa', 'h', 'fff', 'dd', 'ssss', 'nn', 'b']
I am using version 3.9.9
Question already asked to newsgroup comp.lang.python without any answsers

grp is a sub-iterator over the same major iterator given to groupby. A new one is created for every key.
When you skip to the next key, the old grp is no longer available as you advanced the main iterator beyond the current group.
It is stated clearly in the Python documentation:
The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:
k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)

Related

Find cartesian product of the elements in a program generated dynamic "sub-list"

I have a program which producing and modifying a list of "n" elements/members, n remaining constant throughout a particular run of the program. (The value of "n" might change in the next run).
Each member in the list is a "sub-list"! Each of these sub-list elements are not only of variable lengths, but are also dynamic and might keep changing while the program keeps running.
So, eventually, at some given point, my list would look something like (assuming n=3):
[['1', '2'], ['a', 'b', 'c', 'd'], ['x', 'y', 'z']]
I want the output to be like the following:
['1ax', '1ay', '1az', '1bx', '1by', '1bz',
'1cx', '1cy', '1cz', '1dx', '1dy', '1dz',
'2ax', '2ay', '2az', '2bx', '2by', '2bz',
'2cx', '2cy', '2cz', '2dx', '2dy', '2dz']
i.e. a list with exactly (2 * 3 * 4) elements where each element is of length exactly 3 and has exactly 1 member from each of the "sub-lists".
Easiest is itertools.product:
from itertools import product
lst = [['1', '2'], ['a', 'b', 'c', 'd'], ['x', 'y', 'z']]
output = [''.join(p) for p in product(*lst)]
# OR
output = list(map(''.join, product(*lst)))
# ['1ax', '1ay', '1az', '1bx', '1by', '1bz',
# '1cx', '1cy', '1cz', '1dx', '1dy', '1dz',
# '2ax', '2ay', '2az', '2bx', '2by', '2bz',
# '2cx', '2cy', '2cz', '2dx', '2dy', '2dz']
A manual implementation specific to strings could look like this:
def prod(*pools):
if pools:
*rest, pool = pools
for p in prod(*rest):
for el in pool:
yield p + el
else:
yield ""
list(prod(*lst))
# ['1ax', '1ay', '1az', '1bx', '1by', '1bz',
# '1cx', '1cy', '1cz', '1dx', '1dy', '1dz',
# '2ax', '2ay', '2az', '2bx', '2by', '2bz',
# '2cx', '2cy', '2cz', '2dx', '2dy', '2dz']

Remove redundant sublists within list in python

Hello everyone I have a list of lists values such as :
list_of_values=[['A','B'],['A','B','C'],['D','E'],['A','C'],['I','J','K','L','M'],['J','M']]
and I would like to keep within that list, only the lists where I have the highest amount of values.
For instance in sublist1 : ['A','B'] A and B are also present in the sublist2 ['A','B','C'], so I remove the sublist1.
The same for sublist4.
the sublist6 is also removed because J and M were present in a the longer sublist5.
at the end I should get:
list_of_no_redundant_values=[['A','B','C'],['D','E'],['I','J','K','L','M']]
other exemple =
list_of_values=[['A','B'],['A','B','C'],['B','E'],['A','C'],['I','J','K','L','M'],['J','M']]
expected output :
[['A','B','C'],['B','E'],['I','J','K','L','M']]
Does someone have an idea ?
mylist=[['A','B'],['A','C'],['A','B','C'],['D','E'],['I','J','K','L','M'],['J','M']]
def remove_subsets(lists):
outlists = lists[:]
for s1 in lists:
for s2 in lists:
if set(s1).issubset(set(s2)) and (s1 is not s2):
outlists.remove(s1)
break
return outlists
print(remove_subsets(mylist))
This should result in [['A', 'B', 'C'], ['D', 'E'], ['I', 'J', 'K', 'L', 'M']]

Sliding Window algorithm in python 3.x. Extracting the before and after values of an element from a list

I have 2 lists; terms and key_terms. I need to extract the before and after elements from the terms list using the elements from the key_terms list. I have tried the below and it works but it has a bug.
terms=['b','a','f','s','w','c','g']
key_terms=['a','w','g']
context_terms=[]
for kt in key_terms:
if(kt!=0):
before=terms[(terms.index(kt))-1]
if(terms.index(kt)==len(terms)-1):
context_terms.append(before)
break
else:
after=terms[(terms.index(kt))+1]
context_terms.append(before)
context_terms.append(after)
print(context_terms)
Output: ['b', 'f', 's', 'c', 'c']
The problem with the above is that if the key_terms appear twice in the terms list, the second instance is ignored.
terms=['b','a','f','s','a','c','g']
key_terms=['a','g']
context_terms=[]
for kt in key_terms:
if(kt!=0):
before=terms[(terms.index(kt))-1]
if(terms.index(kt)==len(terms)-1):
context_terms.append(before)
break
else:
after=terms[(terms.index(kt))+1]
context_terms.append(before)
context_terms.append(after)
print(context_terms)
Output: ['b', 'f', 'c']
The correct output should be ['b', 'f', 's', 'c', 'c']
After some research i noticed that i have to use a sliding window. Can someone please help me because i can't understand how i am to apply the sliding window for my case. Thank you (P.s this is my first ever question, sorry if my issue is not clear)
Try looping over terms instead of key_terms. For every element in terms which is present in key_terms, add the element prior to and next to it.
The pseudo-code would be:
for e in terms:
if e present in key_terms:
ans.add(element_to_left_of_e)
ans.add(element_to_right_of_e)
As opposed to finding indices later, the following pseudo code might prove better to iterate over indices:
for index in range(0, length of terms):
if terms[index] present in key_terms:
ans.add(terms[index-1])
ans.add(terms[index+1])
If I get your problem correctly may be following can help:
terms=['b','a','f','s','a','c','g']
key_terms=['a','g']
context_terms=[]
for k in key_terms:
indices = [i for i, item in enumerate(terms) if item == k]
for kt in indices:
before=terms[kt - 1]
if kt == len(terms)-1:
context_terms.append(before)
break
else:
after=terms[kt + 1]
context_terms.append(before)
context_terms.append(after)
print(context_terms)
Output: ['b', 'f', 's', 'c', 'c']

Retrieving repeated items in a list comprehension

Given a sorted list, I would like to retrieve the first repeated item in the list using list comprehension.
So I ran the line below:
list=['1', '2', '3', 'a', 'a', 'b', 'c']
print(k for k in list if k==k+1)
I expected the output "a". But instead I got:
<generator object <genexpr> at 0x0021AB30>
I'm pretty new at this, would someone be willing to clarify why this doesn't work?
You seem to confuse the notion of list element and index.
For example the generator expression iterating over all items of list xs equal to its predecessor would look like this:
g = (xs[k] for k in range(1, len(xs)) if xs[k] == xs[k - 1])
Since you are interested only in first such item, you could write
next(xs[k] for k in range(1, len(xs)) if xs[k] == xs[k - 1])
however you'll get an exception if there is in fact no such items.
As a general advice, prefer simple readable functions over clever long one-liners,
especially when you are new to language. Your task could be accomplished as follows:
def first_duplicate(xs):
for k in range(1, len(xs)):
if xs[k] == xs[k - 1]:
return xs[k]
chars = ['1', '2', '3', 'a', 'a', 'b', 'c']
print(first_duplicate(chars)) # 'a'
P.S. Beware using list as your variable name -- you're shadowing built-in type
If you want just the first repeated item in the list you can use the next function with a generator expression that iterates through the list zipped with itself but with an offset of 1 to compare adjacent items:
next(a for a, b in zip(lst, lst[1:]) if a == b)
so that given lst = ['1', '2', '3', 'a', 'a', 'b', 'c'], the above returns: 'a'.

Any reason not to convert string to list this way?

I'm using PyCharm on Windows (and very new to Python)
I'm a 'what happens when I try this?' person and so I tried:
alist = []
alist += 'wowser'
which returns ['w', 'o', 'w', 's', 'e', 'r']
Is there any reason not to convert a string to a list of individual characters like this? I know I could use For loop method OR I could .append or +concatenate (both seem to be too tedious!!), but I can't find anything that mentions using += to do this. So, since I'm new, I figure I should ask why not to do it this way before I develop a bad habit that will get me into trouble in the future.
Thanks for your help!
I think this would help: Why does += behave unexpectedly on lists?
About the question "Is there any reason not to convert a string to a list of individual characters like this". I think it depends on your purpose. It will be quite convenient if you need to split the letters. If you don't want to split the letters, just don't use it.
String is a type of array so it behaves like an array as lists do.
>>> # This way you would do it with a list:
>>> list('wowser')
['w', 'o', 'w', 's', 'e', 'r']
>>> lst=list('wowser')
>>> a='w'
>>> a is lst[0]
True
>>> # The String Version:
>>> strng = 'wowser'
>>> a is strng[0]
True
>>> # Iterate over the string like doing it with lists:
>>> [print(char) for char in 'wowser']
w
o
w
s
e
r
>>> [print(char) for char in ['w', 'o', 'w', 's', 'e', 'r']]
w
o
w
s
e
r
w3schools.com
docs.python.org

Resources