How can I group a list of strings by another list of strings using Python? - python-3.x

I have two lists:
List 1
filenames = ['K853.Z', 'K853.N', 'K853.E', 'K400.Z', 'K400.N', 'K400.E']
List 2
l = ['K853', 'K400']
I want to iterate through the filenames list and group the strings by the l list.
I've tried the below:
for name in filenames:
for i in l:
if i in name:
print(name)
But this just prints the first list. I've seen the Pandas groupby method but I can't figure out how to utilize it in this case.

you could just use a list generator like this:
new = [[name for name in filenames if(name.startswith(prefix))] for prefix in l]
This would provide you with a list of list, where for each index of l you would get a list of files with its prefix at the same index in the new list.

Related

Generate a list of strings from another list using python random and eliminate duplicates

I have the following list:
original_list = [('Anger', 'Envy'), ('Anger', 'Exasperation'), ('Joy', 'Zest'), ('Sadness', 'Suffering'), ('Joy', 'Optimism'), ('Surprise', 'Surprise'), ('Love', 'Affection')]
I am trying to create a random list comprising of the 2nd element of the tuples (of the above list) using the random method in such a way that duplicate values appearing as the first element are only considered once.
That is, the final list I am looking at, will be:
random_list = [Exasperation, Suffering, Optimism, Surprise, Affection]
So, in the new list random_list, strings Envy and Zest are eliminated (as they are appearin the the original list twice). And the process has to randomize the result, i.e. with each iteration would produce a different list of Five elements.
May I ask somebody to show me the way how may I do it?
You can use dictionary to filter the duplicates from original_list (shuffled before with random.sample):
import random
original_list = [
("Anger", "Envy"),
("Anger", "Exasperation"),
("Joy", "Zest"),
("Sadness", "Suffering"),
("Joy", "Optimism"),
("Surprise", "Surprise"),
("Love", "Affection"),
]
out = list(dict(random.sample(original_list, len(original_list))).values())
print(out)
Prints (for example):
['Optimism', 'Envy', 'Surprise', 'Suffering', 'Affection']

Appending strings with parentheses in a list

I have a set of strings.
people = {'RAM_S', 'SHYAM', 'GEORGEY', 'MUFASSIR'}
I have an empty list : list_c = [].
I want to append items in people to list_c such that resulting list looks as follows:
list_c = [('RAM_S'),('SHYAM'),('GEORGEY'),('MUFASSIR')]
I'm unable to append elements with parentheses in the list.
Please suggest some way to do it.
If you need tuples:
result = [tuple([i]) for i in people]
OUTPUT:
[('MUFASSIR',), ('SHYAM',), ('GEORGEY',), ('RAM_S',)]

How to filter a certain type of python list

I have a list of strings. Each string has the same length/number of characters in the format
xyzw01.ext or xyzv02.ext, etc.
For example
list 1: ['ABCJ01.ext','CDEJ02.ext','ADEJ01.ext','CDEJ01.ext','ABCJ02.ext','CDEJ03.ext']
list 2: ['ABCJ01.ext','ADEJ01.ext','CDEJ01.ext','RPNJ01.ext','PLEJ01.ext']
I would like from these lists to build new lists with only the strings with highest number.
So from list 1 I would like to get
['ADEJ01.ext','ABCJ02.ext','CDEJ03.ext']
while for list 2 I would like to get the same list since all numbers are 01.
Is there a "simple" way of achieving this?
You can use defaultdict and max
from collections import defaultdict
def fun(lst):
res = defaultdict(list)
for x in lst:
res[x[:4]].append(x)
return [max(res[x], key=lambda x: x[4:6]) for x in res]
lst = ['ABCJ01.ext','CDEJ02.ext','ADEJ01.ext','CDEJ01.ext','ABCJ02.ext','CDEJ03.ext']
lst2 = ['ABCJ01.ext','ADEJ01.ext','CDEJ01.ext','RPNJ01.ext','PLEJ01.ext']
print(fun(lst))
print(fun(lst2))
Output:
['ABCJ02.ext', 'CDEJ03.ext', 'ADEJ01.ext']
['ABCJ01.ext', 'ADEJ01.ext', 'CDEJ01.ext', 'RPNJ01.ext', 'PLEJ01.ext']
The easiest way is probably to use an intermediate data structure, like a dict - sort the list items into buckets based on the first part of their names, and then take the maximum number for each bucket. We can just use the built-in max() without a key, since as-given lexicographic sorting works to find the largest. If that's not sufficient, you could use more regex to take the number out of the item and use it as the key instead.
import re
def filter_list(lst):
prefixes = {}
for item in lst:
# use regex to isolate the non-numeric characters at the start of the string
prefix = re.match(r'^([^0-9]*)', item).group(1)
# make a bucket based on each prefix, and put the item in it
prefixes.setdefault(prefix, [])
prefixes[prefix].append(item)
# make a list comprehension taking the maximum item from each bucket
return [max(value) for value in prefixes.values()]
>>> a = ['ABCJ01.ext','CDEJ02.ext','ADEJ01.ext','CDEJ01.ext','ABCJ02.ext','CDEJ03.ext']
>>> b = ['ABCJ01.ext','ADEJ01.ext','CDEJ01.ext','RPNJ01.ext','PLEJ01.ext']
>>> filter_list(a)
['ABCJ02.ext', 'CDEJ03.ext', 'ADEJ01.ext']
>>> filter_list(b)
['ABCJ01.ext', 'ADEJ01.ext', 'CDEJ01.ext', 'RPNJ01.ext', 'PLEJ01.ext']
In python 3.7+, this should preserve the order of list from the first occurrence of each prefix (i.e. CDEJ03.ext will precede ADEJ01.ext in the output because CDEJ02.ext precedes it in the input).
To get the output in the exact same order as the original list, behavior, you'd want to explicitly reassign the key instead of using .setdefault(), perhaps with a pattern like prefixes[prefix] = prefixes[prefix] if prefix in prefixes else [].

Python remove word from bigram in list without returning a new list

Just quick side question. Is there a way and if, how to remove/delete a specific word from a bigram in a list (must be the same list!) that also contains just words. E.g.
In:
x = ['Peter Parker', 'Hugo', 'Fischerman']
Task, delete Parker from that same list:
Expected output:
x as ['Peter', 'Hugo', 'Fischerman']
I tried to use xx = [x.replace('Parker, '') for x in xx]but it seems to give me a new list in the sack.
Any ideas?
list = ['Peter Parker', 'Hugo', 'Fischerman'] # initialize list
for item in range(len(list)): # loop
list[item] = list[item].replace("Parker", "").strip() # replace item nu=umber "item" with its fixed result, replacing "Parker" with nothing and stripping - this just does nothing if "Parker" is not in item number "item".
That should work, just omit the list initialization to add it wherever (and don't forget to fix the variable names!)

Function to iterate through a nested list and append other lists isn't functioning properly

I am currently trying to write a function to iterate through a nested list and check if one item from the list, 'team', is already in a separate list 'teams'.
If it is not, I want to append a nested list, 'player_values' with a different item from the original nested list that was examined, in the form of a new list in the nested list.
If it is, I want to append the nested list 'player_values' with the item from the original nested list, but I want to add it to the most recent list in the nested list 'player_values' instead of creating a new list.
Currently, my code looks like this :
def teams_and_games(list, player, idx):
teams = []
player_values = []
x = 0
y = -1
for rows in list:
if player == list[x][BD.player_id] and list[x][BD.team] not in teams:
teams.append(list[x][BD.team])
player_values.append([list[x][idx]])
x += 1
y += 1
elif player == list[x][BD.player_id]:
player_values[y].append(list[x][idx])
x += 1
return player_values, teams
However, when I run the code in my main, using
values, teams = teams_and_games(NiceRow, name, BD.games)
print(values)
print(teams)
It only prints empty lists. The fact that it prints empty lists shows that it is returning the correct variables, but I can't figure out why the code in the function is failing to add anything to the lists. I have tried switching the .append with a more simple list += statement, but the result has been the same so far.
Ideally, I would be getting a nested list, containing an amount of lists equal to the number of items added to the other 'teams' list, and the list of teams in the order they were added.
The data I am working with is a nested list pulled from a .csv file, which has been formatted slightly using the .strip() and .split() commands. Each number has been converted to an int, and strings left as they are. The .CSV file it is from has 19 columns and ~80,000 rows, with each column always being either a string or an int.

Resources