find the lengths of all sublists containing common repeated element - python-3.x

I need to find all the sublists from a list where the element is 'F' and that must come one after other
g= ['T','F','F,'F','F','T','T','T','F,'F','F','T]
so, here in this case there are two sublists present in this list which contains element 'F' in repeat
i.e; ['F','F,'F','F'] in index 1,2,3,4 which is in repeat ,so answer is 4
and
['F','F,'F'] in index 8,9,10 which is again in continuous index,so answer is 3
Note:
The list contains only two elements 'T' and 'F' and every time we are doing these operations for element 'F'

You can get the lengths of consecutive sequences with itertools.groupby:
from itertools import groupby
data = ['T','F','F','F','F','T','T','T','F','F','F','T']
# Consecutive sequences of "F".
# "groupby(data)" produces an iterator that calculates on-the-fly.
# The iterator returns consecutive keys and groups from the iterable "data".
seqs = [list(g) for k, g in groupby(data) if k == 'F']
print(seqs)
# [['F', 'F', 'F', 'F'], ['F', 'F', 'F']]
seq_lens = [len(k) for k in seqs]
print(seq_lens)
# [4, 3]
Also cool is max length of such consecutive sequences:
max_len_seq = len(max(seqs, key=len))
print(max_len_seq)
# 4
See itertools.groupby for more info:
class groupby:
# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
...
etc

You can create 2 variable to keep count of the repeated letter. Traverse the array and when you found t increase t, when you find a f check the tcount first if it is bigger than 1 it means there is a repeat print the count of the repetition.
tcount = 0;
fcount = 0;
for e in g:
if e=="T":
tcount++
if fcount>1
print(fcount)
fcount=0
//do same operation for F

Related

Efficient way of calculating specific length combinations of adjacent data?

I have a list of elements, of which I'd like to determine all possible combinations that can be arranged - preserving their order - to arrive at 'n' groups
So as an example, if I have an ordered list of A, B, C, D, E, and only want 2 groups, the four solutions would be;
ABCD, E
ABC, DE
AB, CDE
A, BCDE
Now, with some help from another StackOverflow post I've come up with a workable brute-force solution that calculates all possible combinations of all possible groupings from which I simply extract those cases that meet my target number of groupings.
For reasonable numbers of elements, this is just fine, but as I extend the numbers of elements, the number of combinations increases very very quickly, and I was wondering if there might be a clever way to limit the solutions calculated to only those that meet my target groupings number?
Code so far is as follows;
import itertools
import string
import collections
def generate_combination(source, comb):
res = []
for x, action in zip(source,comb + (0,)):
res.append(x)
if action == 0:
yield "".join(res)
res = []
#Create a list of first 20 letters of the alphabet
seq = list(string.ascii_uppercase[0:20])
seq
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T']
#Generate all possible combinations
combinations = [list(generate_combination(seq,c)) for c in itertools.product((0,1), repeat=len(seq)-1)]
len(combinations)
524288
#Create a list that counts the number of groups in each solution,
#and counter to allow easy query
group_counts = [len(i) for i in combinations]
count_dic = collections.Counter(group_counts)
count_dic[1], count_dic[2], count_dic[3], count_dic[4], count_dic[5], count_dic[6]
(1, 19, 171, 969, 3876, 11628)
So as you can see, while over half a million combinations were calculated, if I had only wanted ones of length = 5, only 3,876 need have been calculated
Any suggestions?
A partition of seq into 5 parts is equivalent to a choice of 4 locations in range(1, len(seq)) at which to cut seq.
Thus you could use itertools.combinations(range(1, len(seq)), 4) to generate all the partitions of seq into 5 parts:
import itertools as IT
import string
def partition_into_n(iterable, n, chain=IT.chain, map=map):
"""
Return a generator of all partitions of iterable into n parts.
Based on http://code.activestate.com/recipes/576795/ (Raymond Hettinger)
which generates all partitions.
"""
s = iterable if hasattr(iterable, '__getitem__') else tuple(iterable)
size = len(s)
first, middle, last = [0], range(1, size), [size]
getitem = s.__getitem__
return (map(getitem, map(slice, chain(first, div), chain(div, last)))
for div in IT.combinations(middle, n-1))
seq = list(string.ascii_uppercase[0:20])
ngroups = 5
for partition in partition_into_n(seq, ngroups):
print(' '.join([''.join(grp) for grp in partition]))
print(len(list(partition_into_n(seq, ngroups))))
yields
A B C D EFGHIJKLMNOPQRST
A B C DE FGHIJKLMNOPQRST
A B C DEF GHIJKLMNOPQRST
A B C DEFG HIJKLMNOPQRST
...
ABCDEFGHIJKLMNO P Q RS T
ABCDEFGHIJKLMNO P QR S T
ABCDEFGHIJKLMNO PQ R S T
ABCDEFGHIJKLMNOP Q R S T
3876

Check if element is occurring very first time in python list

I have a list with values occurring multiple times. I want to loop over the list and check if value is occurring very first time.
For eg: Let's say I have a one list like ,
L = ['a','a','a','b','b','b','b','b','e','e','e'.......]
Now, at every first occurrence of element, I want to perform some set of tasks.
How to get the first occurrence of element?
Thanks in Advance!!
Use a set to check if you had processed that item already:
visited = set()
L = ['a','a','a','b','b','b','b','b','e','e','e'.......]
for e in L:
if e not in visited:
visited.add(e)
# process first time tasks
else:
# process not first time tasks
You can use unique_everseen from itertools recipes.
This function returns a generator which yield only the first occurence of an element.
Code
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Example
lst = ['a', 'a', 'b', 'c', 'b']
for x in unique_everseen(lst):
print(x) # Do something with the element
Output
a
b
c
The function unique_everseen also allows to pass a key for comparison of elements. This is useful in many cases, by example if you also need to know the position of each first occurence.
Example
lst = ['a', 'a', 'b', 'c', 'b']
for i, x in unique_everseen(enumerate(lst), key=lambda x: x[1]):
print(i, x)
Output
0 a
2 b
3 c
Why not using that?
L = ['a','a','a','b','b','b','b','b','e','e','e'.......]
for idxL, L_idx in enumerate(L):
if (L.index(L_idx) == idxL):
print("This is first occurence")
For very long lists, it is less efficient than building a set prior to the loop, but seems more direct to write.

Python: Symmetrical Difference Between List of Sets of Strings

I have a list that contains multiple sets of strings, and I would like to find the symmetric difference between each string and the other strings in the set.
For example, I have the following list:
targets = [{'B', 'C', 'A'}, {'E', 'C', 'D'}, {'F', 'E', 'D'}]
For the above, desired output is:
[2, 0, 1]
because in the first set, A and B are not found in any of the other sets, for the second set, there are no unique elements to the set, and for the third set, F is not found in any of the other sets.
I thought about approaching this backwards; finding the intersection of each set and subtracting the length of the intersection from the length of the list, but set.intersection(*) does not appear to work on strings, so I'm stuck:
set1 = {'A', 'B', 'C'}
set2 = {'C', 'D', 'E'}
set3 = {'D', 'E', 'F'}
targets = [set1, set2, set3]
>>> set.intersection(*targets)
set()
The issue you're having is that there are no strings shared by all three sets, so your intersection comes up empty. That's not a string issue, it would work the same with numbers or anything else you can put in a set.
The only way I see to do a global calculation over all the sets, then use that to find the number of unique values in each one is to first count all the values (using collections.Counter), then for each set, count the number of values that showed up only once in the global count.
from collections import Counter
def unique_count(sets):
count = Counter()
for s in sets:
count.update(s)
return [sum(count[x] == 1 for x in s) for s in sets]
Try something like below:
Get symmetric difference with every set. Then intersect with the given input set.
def symVal(index,targets):
bseSet = targets[index]
symSet = bseSet
for j in range(len(targets)):
if index != j:
symSet = symSet ^ targets[j]
print(len(symSet & bseSet))
for i in range(len(targets)):
symVal(i,targets)
Your code example doesn't work because it's finding the intersection between all of the sets, which is 0 (since no element occurs everywhere). You want to find the difference between each set and the union of all other sets. For example:
set1 = {'A', 'B', 'C'}
set2 = {'C', 'D', 'E'}
set3 = {'D', 'E', 'F'}
targets = [set1, set2, set3]
result = []
for set_element in targets:
result.append(len(set_element.difference(set.union(*[x for x in targets if x is not set_element]))))
print(result)
(note that the [x for x in targets if x != set_element] is just the set of all other sets)

Which character comes first?

So the input is word and I want to know if a or b comes first.
I can use a_index = word.find('a') and compare this to b_index = word.find('b') and if a is first, a is first is returned. But if b isn't in word, .find() will return -1, so simply comparing b_index < a_index would return b is first. This could be accomplished by adding more if-statements, but is there a cleaner way?
function description:
input: word, [list of characters]
output: the character in the list that appears first in the word
Example: first_instance("butterfly", ['a', 'u', 'e'] returns u
You can create a function that takes word and a list of chars - convert those chars into a set for fast lookup and looping over word take the first letter found, eg:
# Chars can be any iterable whose elements are characters
def first_of(word, chars):
# Remove duplicates and get O(1) lookup time
lookup = set(chars)
# Use optional default argument to next to return `None` if no matches found
return next((ch for ch in word if ch in lookup), None)
Example:
>>> first_of('bob', 'a')
>>> first_of('bob', 'b')
'b'
>>> first_of('abob', 'ab')
'a'
>>> first_of("butterfly", ['a', 'u', 'e'])
'u'
This way you're only ever iterating over word once and short-circuit on the first letter found instead of running multiple finds, storing the results and then computing the lowest index.
Make a list without the missing chars and then sort it by positions.
def first_found(word, chars):
places = [x for x in ((word.find(c), c) for c in chars) if x[0] != -1]
if not places:
# no char was found
return None
else:
return min(places)[1]
In any case you need to check the type of the input:
if isinstance(your_input, str):
a_index = your_input.find('a')
b_index = your_input.find('b')
# Compare the a and b indexes
elif isinstance(your_input, list):
a_index = your_input.index('a')
b_index = your_input.index('b')
# Compare the a and b indexes
else:
# Do something else
EDIT:
def first_instance(word, lst):
indexes = {}
for c in lst:
if c not in indexes:
indexes[c] = word.find(c)
else:
pass
return min(indexes, key=indexes.get)
It will return the character from list lst which comes first in the word.
If you need to return the index of this letter then replace the return statement with this:
return min_value = indexes[min(indexes, key=indexes.get)]

testing if the values of a dictionary are non zero with all() function

I use Python 3
I want to check if all of my tested values in the nested dictionary are non 0.
So here is the simplified example dict:
d = {'a': {'1990': 10, '1991': 0, '1992': 30},
'b': {'1990': 15, '1991': 40, '1992': 0}}
and I want to test if for both dicts 'a' and 'b' the values of the keys '1990' and '1991' are not zero
for i in d:
for k in range(2):
year = 1990
year = year + k
if all((d[i][str(year)]) != 0):
print(d[i])
so it should only return b, because a['1991']=0
but this is the first time I work with the all() function and I get the error core: TypeError: 'bool' object is not iterable
the error is in the if all() line
thank you very much!
This can done a bit more generally with a list comprehension where you iterate over the items in dict d. A simple comprehension to iterate over the keys and values in our dictionary looks like this:
>>> [k for k, v in d.items()]
['a', 'b']
In the above k will contain the keys and v the values. The comprehension also has an if clause. With that you can filter out the items you don't want. So we define years = ('1990', '1991'). Now we can do another comprehension to test our year values.
To iterate over only 'a', we could do this:
>>> [d['a'][y] for y in years]
[10, 0]
>>> all([d['a'][y] for y in years])
False
Gluing the whole thing together:
>>> d={'a' :{ '1990': 10, '1991':0, '1992':30},'b':{ '1990':15, '1991':40, '1992':0}}
>>> years = ('1990', '1991')
>>> [k for k, v in d.items() if all([v[y] for y in years])]
['b']
See the python docs for more information on list comprehensions.

Resources