Generating all the covering substrings of a string

Generating all the covering substrings of a string - string

How do you the following: given a string, generate all the possible ways to parse that string into substrings (time is important, space dont' care).
For example, given the string ABCD, I need to generate:
ABCD
A BCD
A BC D
A B CD
AB CD
AB C D
ABC D
A B C D
Probably a recursive solution, but I can't quite get it to work.

Another solution in Python, without recursion:
def substrings(s):
for k in xrange(1, len(s)+1):
for i in xrange(len(s)-k+1):
yield s[i:i+k]
so that
>>> print list(substrings("ABCD"))
['A', 'B', 'C', 'D', 'AB', 'BC', 'CD', 'ABC', 'BCD', 'ABCD']

Python:
def splitstring(s):
result = [s]
for i in range(1, len(s)):
result.extend('%s %s' % (s[:i], x) for x in splitstring(s[i:]))
return result

To get a particular split:
First, decide at which indices to split;
Then, split the string at the given indices.
For a string of length n, a given set of split indices is an element of the powerset of {1, ..., n}.
In python:
from itertools import combinations, chain
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
def pairs(seq, end):
"pairs([13,23,33], 55) --> (0,13) (13,23) (23,33) (33,55)"
return zip(chain((0,), seq), chain(seq, (end,)))
def allsplits(s):
"allsplits('abc') --> ['abc'] ['a', 'bc'] ['ab', 'c'] ['a', 'b', 'c']"
for split_indices in powerset(range(1,len(s))):
yield [s[i:j] for i,j in pairs(split_indices, len(s))]
print(list( allsplits('abcd') ))
# [['abcd'], ['a', 'bcd'], ['ab', 'cd'], ['abc', 'd'], ['a', 'b', 'cd'], ['a', 'bc', 'd'], ['ab', 'c', 'd'], ['a', 'b', 'c', 'd']]

Related

Set itertools product maximum repeat value per element

I want to generate different combinations of 3 elements a, b, and c. The length of these combinations needs to be 4. I want to have a maximum of 4 times from 'a' and a maximum 1 time from each 'b' and 'c' element. So, for example, we can have ['a',' a',' a','a'] or ['a','a','b','c'] but not ['a','b','b','b'].
There is a similar question in 1, but, as far as I know, using the last 'gen' function, the length of a generation is controlled by the multiplication of a maximum number of repetitions (4 in my case). Also, cases were limited to tuples with exactly 1 'b' and 1 'c' and the rest are 'a'. For the last issue, I replaced 'combinations' with 'combinations_with_replacement', but it still produces tuples with 4 elements and there is no ['a',' a',' a','a'].
How can I tackle this problem?
Here is the code:
from itertools import combinations_with_replacement
def gen(ns, elems=None, C=None, out=None):
if elems is None:
elems = list(range(len(ns)))
else:
assert len(elems) == len(ns)
if out is None:
N = 1
for n in ns:
N *= n
out = [elems[0]]*N
C = range(N)
if len(ns) == 1:
yield out
else:
n = ns[-1]
e = elems[-1]
for c in combinations_with_replacement(C,n):
out_ = out.copy()
for i in c:
out_[i] = e
C_ = [i for i in C if i not in c]
yield from gen(ns[:-1], elems[:-1], C_, out_)
for tmp_list in gen([4,1,1], elems=['a', 'b', 'c'], C=None, out=None):
print(tmp_list)
output:
['c', 'b', 'a', 'a']
['c', 'a', 'b', 'a']
['c', 'a', 'a', 'b']
['b', 'c', 'a', 'a']
['a', 'c', 'b', 'a']
['a', 'c', 'a', 'b']
['b', 'a', 'c', 'a']
['a', 'b', 'c', 'a']
['a', 'a', 'c', 'b']
['b', 'a', 'a', 'c']
['a', 'b', 'a', 'c']
['a', 'a', 'b', 'c']
Please note that since I care about the execution time, I want to generate the tuples in a loop.

How do I remove all elements that occur more than once in a list? [duplicate]

Given a list of strings I want to remove the duplicates and original word.
For example:
lst = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
The output should have the duplicates removed,
so something like this ['a', 'b', 'd']
I do not need to preserve the order.

Use a collections.Counter() object, then keep only those values with a count of 1:
from collections import counter
[k for k, v in Counter(lst).items() if v == 1]
This is a O(N) algorithm; you just need to loop through the list of N items once, then a second loop over fewer items (< N) to extract those values that appear just once.
If order is important and you are using Python < 3.6, separate the steps:
counts = Counter(lst)
[k for k in lst if counts[k] == 1]
Demo:
>>> from collections import Counter
>>> lst = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
>>> [k for k, v in Counter(lst).items() if v == 1]
['a', 'b', 'd']
>>> counts = Counter(lst)
>>> [k for k in lst if counts[k] == 1]
['a', 'b', 'd']
That the order is the same for both approaches is a coincidence; for Python versions before Python 3.6, other inputs may result in a different order.
In Python 3.6 the implementation for dictionaries changed and input order is now retained.

t = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
print [a for a in t if t.count(a) == 1]

lst = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
from collections import Counter
c = Counter(lst)
print([k for k,v in c.items() if v == 1 ])
collections.Counter will count the occurrences of each element, we keep the elements whose count/value is == 1 with if v == 1

#Padraic:
If your list is:
lst = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
then
list(set(lst))
would return the following:
['a', 'c', 'b', 'e', 'd']
which is not the thing adhankar wants..
Filtering all duplicates completely can be easily done with a list comprehension:
[item for item in lst if lst.count(item) == 1]
The output of this would be:
['a', 'b', 'd']
item stands for every item in the list lst, but it is only appended to the new list if lst.count(item) equals 1, which ensures, that the item only exists once in the original list lst.
Look up List Comprehension for more information: Python list comprehension documentation

You could make a secondary empty list and only append items that aren't already in it.
oldList = ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'e']
newList = []
for item in oldList:
if item not in newList:
newList.append(item)
print newList
I don't have an interpreter with me, but the logic seems sound.

pyspark Udf is not working as expected when apply map transformation with broadcast?

I have a two list like below
l=[['A', 'B', 'C'], ['A', 'C'], ['A', 'B', 'C'], ['A', 'B'],['B','C'],['B']]
x=[('A', 'B'), ('A', 'C')]
I want to remove from the list of lists l, all elements that do not contain all of the elements in any of the tuples in the list x. In other words, there should be at least one tuple in x for which all of the all items that tuple are present in the elements of l.
Based on my last question, I was given the following solution in python:
print([l_ for l_ in l if any(all(e in l_ for e in x_) for x_ in x)])
which yields the desired output of:
[['A', 'B', 'C'], ['A', 'C'], ['A', 'B', 'C'], ['A', 'B']]
Now I am trying to replicate the same operation with a pyspark rdd, but I am not getting the expected result.
This is what I tried:
rddsort=sc.parallelize(l)
broadcastVar = sc.broadcast(x)
def flist(unique_product_List,x):
filter_list = [
l_ for l_ in unique_product_List
if any(all(e in l_ for e in x_) for x_ in x)
]
return filter_list
rddsort=rddsort.map(lambda flist(x[0],broadcastVar.value))
print(rddsort.collect())
I am getting a list of empty lists as the result:
[[], [], [], [], [], []]
But my expected result should be the same as above.

You need a filter on the rdd (not a map). The filter would check for a condition on each row and remove those that don't match. Here the condition is that row value (list _l = l[0]) should have of all elements in of one of lists in x.
l=[['A', 'B', 'C'], ['A', 'C'], ['A', 'B', 'C'], ['A', 'B'],['B','C'],['B']]
x=[('A', 'B'), ('A', 'C')]
rddsort=sc.parallelize(l)
rddsort=rddsort.filter(lambda l_: any(all(e in l_ for e in x_) for x_ in x))
print(rddsort.collect())
Output
[['A', 'B', 'C'], ['A', 'C'], ['A', 'B', 'C'], ['A', 'B']]
Update:
With broadcast variable in a function:
l=[['A', 'B', 'C'], ['A', 'C'], ['A', 'B', 'C'], ['A', 'B'],['B','C'],['B']]
x=[('A', 'B'), ('A', 'C')]
rddsort=sc.parallelize(l)
broadcastVar = sc.broadcast(x)
def flist(row):
filter_flag = any(all(e in l_ for e in x_) for x_ in broadcastVar.value)
return filter_flag
rddsort=rddsort.filter(flist)
print(rddsort.collect())

how to return two list in two variables in python 3

I need to do this:
from collections import deque
def list3_to2(list1, list2, list3):
Left = []
Right = []
q = deque()
for a, b, c in list1, list2, list3:
q.append(a)
q.append(b)
q.append(c)
tmp = 1
while q:
if tmp % 2 == 0:
Left.append(q.popleft())
else:
Right.append(q.popleft())
tmp += 1
return Left, Right
a = ['a', 'b', 'c']
b = ['d', 'e', 'f']
c = ['g', 'h', 'i']
l, r = list3_to2(a, b, c)
print(l)
print(r)
But instead of two lists in result i got four lists.
Output:
['b', 'd', 'f', 'h']
['a', 'c', 'e', 'g', 'i']
['b', 'd', 'f', 'h']
['a', 'c', 'e', 'g', 'i']
What i'm doing wrong?
Basically i need to transform 3 lists into 2 lists using deque with correct order.

Thank to everyone. I got it. My function just returning a tuple. That's why i got both lists in varibles l and r. Just need to type l = list3_to2(a, b, c)[0] and r = list3_to2(a, b, c)[1]

Real combination in Groovy

Is there method or some smart way that's easy to read to make a combination of elements in Groovy? I'm aware of Iterable#combinations or GroovyCollections#combinations but it makes Partial permutation with repetition as I understand it so far. See example.
// Groovy combinations result
def e = ['a', 'b', 'c']
def result = [e, e].combinations()
assert [['a', 'a'], ['b', 'a'], ['c', 'a'], ['a', 'b'], ['b', 'b'], ['c', 'b'], ['a','c'], ['b', 'c'], ['c', 'c']] == result
// What I'm looking for
def e = ['a', 'b', 'c']
def result = ???
assert [['a', 'b'], ['a', 'c'], ['b', 'c']] == result
Feel free to post alternate solutions. I'm still looking for better readability (it's used in script for non-developers) and performance (w/o unnecessary iterations).

I'm not so sure about the readability, but this should do the trick.
def e = ['a', 'b', 'c']
def result = [e, e].combinations().findAll { a, b ->
a < b
}
assert [['a', 'b'], ['a', 'c'], ['b', 'c']] == result
Note that if a element occur twice in the list its combinations will also occur twice. Add a '.unique()' at the end if they are unwanted

Here's a more generalized approach that allows you to specify the "r" value for your nCr combinations. It does this by storing permutations in Sets, with the Sets providing the uniqueness:
// returns combinations of the input list of the provided size, r
List combinationsOf(List list, int r) {
assert (0..<list.size()).contains(r) // validate input
def combs = [] as Set
list.eachPermutation {
combs << it.subList(0, r).sort { a, b -> a <=> b }
}
combs as List
}
// the test scenario...
def e = ['a', 'b', 'c']
def result = combinationsOf(e, 2)
assert [['a', 'b'], ['a', 'c'], ['b', 'c']] == result

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Generating all the covering substrings of a string - string

Another solution in Python, without recursion: def substrings(s): for k in xrange(1, len(s)+1): for i in xrange(len(s)-k+1): yield s[i:i+k] so that >>> print list(substrings("ABCD")) ['A', 'B', 'C', 'D', 'AB', 'BC', 'CD', 'ABC', 'BCD', 'ABCD']

Python: def splitstring(s): result = [s] for i in range(1, len(s)): result.extend('%s %s' % (s[:i], x) for x in splitstring(s[i:])) return result

Related

Set itertools product maximum repeat value per element

How do I remove all elements that occur more than once in a list? [duplicate]

pyspark Udf is not working as expected when apply map transformation with broadcast?

how to return two list in two variables in python 3

Real combination in Groovy

Categories

Resources