Split '000111010010111' into ['000','111','0',1','00','1','0','111'] [duplicate] - python-3.x

This question already has answers here:
Splitting a string with repeated characters into a list
(4 answers)
Closed 2 years ago.
Is there some concise way to split up a string of 0s and 1s into the homogeneous, contiguous segments of all 0s and all 1s? Example in the title.
I can of course do it with a nested loop, conditionals, and the .count() method, but this seems like something there'd be a library function for. I'm just not sure how to search for it if there is.

Yes you can using itertools.groupby
from itertools import groupby
a = "000111010010111"
result = ["".join(list(group)) for key, group in groupby(a)]
What happened? We used itertools.groupby to group consecutive terms. A new group is created every time the key element changes (which is when a 0 turns into a 1 or a 1 turns into a 0 in your example). The inner lists are then joined to arrive at your desired output.
Output:
['000', '111', '0', '1', '00', '1', '0', '111']
This will work for any string (not just 1s and 0s) and will group items together based on their consecutive appearances.

This is a quick way of doing it with a generator function, easy to read and understand no cleverness involved.
def split_me(s):
temp=s[0]
last=s[0]
for l in s[1:]:
if l==last:
temp+=l
else:
yield temp
temp=l
last=l
yield temp
print(list(split_me('000111010010111')))

From the docs for re.split:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list
s = '000111010010111'
list(filter(None, re.split('(0+)', s)))
# ['000', '111', '0', '1', '00', '1', '0', '111']
s2 = '111110110110'
list(filter(None, re.split('(0+)', s)))
# ['11111', '0', '11', '0', '11', '0']
The filter removes empty groups at the beginning or end of the list

Related

Fill a dictionnary from a list

Dears,
Your support please to have the below target output..
The input is a list in which the number whith four digits (exp : '1368', , '1568', '1768') should b the key of output dictionary, and the number with one digit which follow the 4 digits numbers should be the values of those keys like below.
exp :
INPUT :
['1368', '1', '1368', '3', '1568', '1', '1568', '3', '1568', '2', '1768', '3', '1768', '2', '2368', '1', '2368', '3', '2368', '2']
OUTPUT
{'1368' :['1', '3'], '1568' :['1', '3','2'], '1768' :['3','2'], '2368' :['1','3','2'] }
/////////////////////////////////////////////////////
try this:
output = {}
for key in iterator:=iter(input):
value = next(iterator)
output.setdefault(key, []).append(value)
But I apologize - this is a "pythonism" - as we in Python tend to avoid having to deal with explicit indexes for sequences. There is no harm in the more readable:
output = {}
for index in range(0, len(input), 2):
key, value = input[i], input[i+1]
output.setdefault(key, []).append(value)
The "setdefault" method on the other hand is a valid "pythonism" and is equivalent to "if this key already exists in the dicionary, return it, otherwise, set it to this new value (the second argument), and return it΅ - and it avoids the need for an "if" statement and an additional line.

Replace a specific character at a specific position without using loops - Python

I am trying to replace a specific character at a specific position in a string.
This need to be accomplished without for loop and use of functions.
Here is an example - need to replace 5th character with x
s = "0123456789012345689"
Output needs to be - "01234x67890x23456x89"
I have tried replace(), split() but may be I am not using them in correct context. The string could be n characters long so I can't hardcode where I get the specific position and break them down in substrings. String are supposed to be immutable in python so are there any other alternatives?
Any help would be appreciated. Thanks.
I guess, if you strictly want to avoid for loops, the easiest way is converting to a list, replacing items, and then converting back.
This would be:
s = "0123456789012345689"
lst = list(s) # result: ['0', '1', '2', ..., '9']
lst[0:-1:5] = ["x"]*len(lst[0:-1:5])
Result then is:
In [43]: lst
Out[43]:
['x',
'1',
'2',
'3',
'4',
'x',
'6',
'7',
'8',
'9',
'x',
'1',
'2',
'3',
'4',
'x',
'6',
'8',
'9']
For getting it back to string you would simply use a join:
In [44]: "".join(lst)
Out[44]: 'x1234x6789x1234x689
The part lst[0:-1:5] select every 5th element in the list, beginning with the very first entry, denonted by 0:. If you want it to start at the 5th element, then simply do lst[5:-1:5]. The -1 part means "until the end", the last :5 stands for "every fifth".
Assigning values with ["x"]*len(lst[0:-1:5]) is needed, since "x" here would try to assign a single value to a slice of the original list with the length len(lst[0:-1:5]), because this is exactly how many items we selected in the first place.
EDIT:
Giving it a second look the expected outcome is actually to change every 6th character, not every 5th (while preserving 5 characters in between the changed ones from the original string).
One would then, of course, need to adjust the slicing to select every 6th character:
lst[5:-1:6] = ["x"]*len(lst[5:-1:6])
^ ^
Result:
In [12]: "".join(lst)
Out[12]: '01234x67890x23456x9'
Every sixth character now is being replaced, while preserving the 5 ones in between from the original string.
My solution to this would be this:
def replace_position(message, replacement, n):
lis = list(message) # get a list of all characters.
lis.pop(n) # delete the chosen character from the list.
lis.insert(n, replacement) # replace it at the given position.
return ''.join(lis)
Example use:
>>> a = 'Hello world, the 6th character is an x!'
>>> replace_postion(a, 'x', 6)
'Hello xorld, the 6th character is an x!'
However, this only gets the first one in that position. In order to do that, this is what I would do:
def replace_position(message, replacement, n):
lis = list(message)
for i in range(n, len(message), n + 1):
lis.pop(i)
lis.insert(i, replacement)
return ''.join(lis)
The only difference is that we are now iterating over the entire message, and replacing each of them as we go. Similar use:
>>> a = 'Hello world, every 6th character is an x!'
>>> replace_position(a, 'x', 6)
'Hello xorld, xvery 6xh charxcter ix an x!'
Hope this helps.
Try
s = "01234567890123456789012345"
pieces = []
for i in range((len(s)-1)//5):
pieces.append(s[i*6:i*6+5])
result = 'x'.join(pieces)
assert(result == '01234x67890x23456x89012x45')
We iterator over every window of six characters in the string, and then collect the first five characters in each window. We then join all of the windows together, using the string x as a separator.
To avoid using loops, you can convert the above into a list comprehension. I'll leave this step for you to complete.

How to separate individual characters from elements in a list?

I'm working on an assignment and the problem draws a grid of squares A-J and 1-7. A function exists which randomly generates co-ordinates, e.g.
[['I5'],
['E1', 'F1', 'E2', 'F2'],
['J5', 'J6'],
['G7', 'H7']]
The problem to solve requires a function to read the elements in each list and draw a tile there using Turtle.
How can I separate the letter from the number in each list?
Just for testing, I'm trying to print each co-ordinate (so that I can get a better understanding, the end result actually needs to be goto(x,x) and then call a function I've already defined to draw something):
for instructions in fixed_pattern_16:
print(instructions[0][1])
Which outputs:
5
1
5
7
But because each list is a different length, I get a out of range error when trying to access elements that are in a position that is longer than the the length of the shortest list. E.g.:
print(instructions[2][0])
Try regular expressions and some nested list comprehension:
import re
lists = [['I5'],['E1', 'F1', 'E2', 'F2'],['J5', 'J6'],['G7', 'H7']]
### General format to unpack the list of lists
for i in lists: # Go through each list element
for x in i: # Go through each element of the element
print(x) # Print that element to the console
### Flattening that gives us our list comprehension,
### which we can use to unpack this list of lists
[print(x) for i in lists for x in i]
### We want to find a single alphabetic value and capture a single numeric value
### In short, \w looks for a word (letter) and \d looks for a number
### Check out https://regexr.com/ for more info and an interactive canvas.
letter_number_pat = r'\w(\d)'
### We can use re.sub(<pattern>, <replacement>, <string>) to capture and keep our
### numeric value (\1 since it is the first capture group
### Then, we'll anticipate the need to return a list of values, so we'll go with
### the traditional newline (\n) and split our results afterward
number_list = '\n'.join([re.sub(letter_number_pat, r'\1', x) for i in lists for x in i]).split('\n')
Input: number_list
Output: ['5', '1', '1', '2', '2', '5', '6', '7', '7']
You can get unique values by calling the set() function and wrapping that in list() and sorted() functions from the standard library:
Input: sorted(list(set(number_list)))
Output: ['1', '2', '5', '6', '7']

Remove Tuple if it Contains any Empty String Elements

There have been questions asked that are similar to what I'm after, but not quite, like Python 3: Removing an empty tuple from a list of tuples, but I'm still having trouble reading between the lines, so to speak.
Here is my data structure, a list of tuples containing strings
data
>>[
('1','1','2'),
('','1', '1'),
('2','1', '1'),
('1', '', '1')
]
What I want to do is if there is an empty string element within the tuple, remove the entire tuple from the list.
The closest I got was:
data2 = any(map(lambda x: x is not None, data))
I thought that would give me a list of trues' and falses' to see which ones to drop, but it just was a single bool. Feel free to scrap that approach if there is a better/easier way.
You can use filter - in the question you linked to None is where you put a function to filter results by. In your case:
list(filter(lambda t: '' not in t, data))
t ends up being each tuple in the list - so you filter to only results which do not have '' in them.
You can use list comprehension as follows:
data = [ ('1','1','2'), ('','1', '1'), ('2','1', '1'), ('1', '', '1') ]
data2 = [_ for _ in data if '' not in _]
print(data2)
output:
[('1', '1', '2'), ('2', '1', '1')]

Going over a list to find which if any items repeat more than X times, then returning those items

I have a list:
my_list = ['2', '5', '7', '7', '5']
I need to be able to check if any item repeats X time in the list, and if so - which one(s). For instance, I'd like to check if any (and which) items repeat (2) times in the list above, in which case I would expect:
5, 7 # this can be in the form of a list, strings, or anything else.
What I have tried:
After looking over some previous posts on StackExchange, I first went ahead and used collections-counter (not sure if this is a good approach), like so:
repetition = collections.Counter(my_list)
What this returns is a dictionary, like so:
{'5': 2, '7': 2, '2': 1}
Now I still need to check which item(s) repeat twice. After some more searching, I ended up with this:
def any(dict):
repeating = []
for element in dict.values():
if element == 2:
(...)
I'm uncertain however of how to continue with thise code. Seems like I can only get the number of repetitions, in this '2' (ie. the value from the dictionary), but am unable to figure out a simple way for getting the Keys which have a value of 2.
Is there an easy way to do it? Or should I try a different approach?
Thank you.
You need to loop over the items of the dictionary so you have both the key and the value:
repeating = [key for key, value in repetition.items() if value >= 2]
I used a list comprehension here to do the looping; all keys that have a value of 2 or higher are selected.
Demo:
>>> from collections import Counter
>>> my_list = ['2', '5', '7', '7', '5']
>>> repetition = Counter(my_list)
>>> [key for key, value in repetition.items() if value >= 2]
['5', '7']

Resources