I would like to get a very uncommon output, as the follows Desired output shows:
I have tried many ways for a whole day, but still, no idea how to get it.
Two lists:
a = ['11', '22', '33','44', '55', '66']
b = ['E', 'AA', 'AA','AA', 'SS', 'SS']
Desired output:
{'11': ['E'],
'22': ['AA','33','44'], # order does not matter
'33': ['AA','22','44'],
'44': ['AA','22','33'],
'55': ['SS','66'],
'66': ['SS','55']}
For better ubderstanding, if a and b are:
a = ['11', '22', '33','44', '55', '66']
b = ['E', 'AA', 'AA','AA', 'SS', 'AA']
Desired output:
{'11': ['E'],
'22': ['AA','33','44','66'], # order does not matter
'33': ['AA','22','44','66'],
'44': ['AA','22','33','66'],
'55': ['SS'],
'66': ['AA','22','33','44']
}
Question:
Which is suitable to get my desired output? tuple, dictionary or set?
My current status:
a = ['11', '22', '33','44', '55', '66']
b = ['E', 'AA', 'AA','AA', 'SS', 'SS']
#put them into a dictionary:
dict_1 = {}
for i, j in zip(a, b):
dict_1.setdefault(i, []).append(j)
print(dict_1)
c = list(zip(a,b))
i=0
#find the target tuples
while i<len(c):
j = i+1
while i < j < len(c):
if c[i][1] == c[j][1]:
print('Got a repeat one')
print(c[i], c[j])
j+=1
i+=1
Current output:
{'11': ['E'], '22': ['AA'], '33': ['AA'], '44': ['AA'], '55': ['SS'], '66': ['SS']}
Got a repeat one
('22', 'AA') ('33', 'AA')
Got a repeat one
('22', 'AA') ('44', 'AA')
Got a repeat one
('33', 'AA') ('44', 'AA')
Got a repeat one
('55', 'SS') ('66', 'SS')
Open question:
once got the target tuples, how can I combine them to get my desired output, I have tried using .append, but it is a mass.
Incorrect output while trying to collect targets:
['AA', '33', 'AA', '44', 'AA', '44', 'SS', '66']
['AA', '33', '44', 'SS', '66']
If anyone has a hint on this, I much appreciate it, thanks in advance!
Try:
a = ["11", "22", "33", "44", "55", "66"]
b = ["E", "AA", "AA", "AA", "SS", "AA"]
tmp = {}
for i, j in zip(a, b):
tmp.setdefault(j, []).append(i)
out = {}
for k, v in tmp.items():
for i in v:
vc = v.copy()
vc.remove(i)
vc.append(k)
out[i] = vc
print(out)
Prints:
{'11': ['E'],
'22': ['33', '44', '66', 'AA'],
'33': ['22', '44', '66', 'AA'],
'44': ['22', '33', '66', 'AA'],
'66': ['22', '33', '44', 'AA'],
'55': ['SS']}
Here's a possible solution,
# Assumes the lists are equal length
# Input 1
#a = ['11', '22', '33','44', '55', '66']
#b = ['E', 'AA', 'AA','AA', 'SS', 'SS']
# Input 2
a = ['11', '22', '33','44', '55', '66']
b = ['E', 'AA', 'AA','AA', 'SS', 'AA']
# Dictionary with instances of a that
# correspond to b - assumes all elements
# of a are unique
a_in_b = {}
for el_a, el_b in zip(a,b):
if el_b in a_in_b:
# If already a key there, append
a_in_b[el_b].add(el_a)
else:
# Initialize a new list
a_in_b[el_b] = set([el_a])
# Check it
#print(a_in_b)
# Now get the final structure
output = {}
for el_a, el_b in zip(a,b):
output[el_a] = [el_b]
rest = a_in_b[el_b] - set([el_a])
if rest:
# If not empty
output[el_a] += list(rest)
print(output)
It first creates a dictionary of all elements in a that would correspond to elements in b, then it constructs the desired output. It uses sets, because of the set difference used later, the - operator.
Note the assumptions as highlighted in the code: a and b are the same size, and a entries are unique. If they are not, then you'd need to use a list. Depending on your requirements you would have to change the - approach too.
Also, consider adjusting the names of variables into something that reflects your application better.
Related
I want to combine two lists into a dictionary, but keep all values of each key. See my desired output:
Two lists:
a = ['E', 'AA', 'AA','AA', 'S', 'P']
b = ['11', '22', '33','44', '55', '66']
Output is a dictionary:
dict_1 = {'E': ['11'], 'AA': ['22', '33', '44'], 'S': ['55'], 'P': ['66']}
Problem:
I have the following code, after trying many times, I still only get an undesired output as follows
(I have tried a whole afternoon):
Current undesired output:
dict_1 = {'E': ['11'], 'AA': ['44'], 'S': ['55'], 'P': ['66']}
My code:
a = ['E', 'AA', 'AA','AA', 'S', 'P']
b = ['11', '22', '33','44', '55', '66']
c = []
dict_1 = {}
i = 0
while i < 6:
j = i
c = []
while i<= j <6:
if a[i]==a[j]:
c.append(b[j])
j+=1
dict_1[a[i]] = c
# print(dict_1)
i+=1
print(dict_1)
New to Python, nothing elegant on coding. I only want to update it so that I can get my desired output.
If anyone has a hint on it, please feel free to comment or answer. Thanks!
You can use dict.setdefault:
a = ["E", "AA", "AA", "AA", "S", "P"]
b = ["11", "22", "33", "44", "55", "66"]
dict_1 = {}
for i, j in zip(a, b):
dict_1.setdefault(i, []).append(j)
print(dict_1)
Prints:
{'E': ['11'], 'AA': ['22', '33', '44'], 'S': ['55'], 'P': ['66']}
I am yet to learn the 'lambda' concept in python, I tried to look for answers and every answer includes lambda in it. This is my code, can you please suggest me a way to sort it by values.
sorted_dict = {'sir': '113', 'to': '146', 'my': '9', 'jesus': '4', 'saving': '275', 'changing': '72', 'apologize': '285', 'pain': '308', 'sisters': '27', 'forgiving': '36', 'can': '62', 'family': '77', 'sorry': '8', 'is': '360', 'too': '15', 'her': '37', 'wanted': '18', 'being': '44', 'into': '208', 'are': '17', 'just': '97', 'so': '148', 'now': '112', 'be': '19', 'right': '189', 'been': '105', 'no': '56', 'because': '74', 'forgive': '52', 'keep': '88', 'wish': '12', "i'm": '67', 'always': '53', 'ask': '29'}
new_list = list()
for key,value in sorted_dict.items():
new_tup = (key, value)
new_list.append(new_tup)
new_list = sorted(new_list)
How do i proceed further?
lambda is often used as the key to sort every value in an iterator.
The same step from turning dictionaries to list of tuples, can be done using the dict method dict.items().
and i used lambda in sorting, as a key to tell the sorted function that, i want to sort based on the value in each tuple located in the 1st index.
sorted_dict = {'sir': '113', 'to': '146', 'my': '9', 'jesus': '4', 'saving': '275', 'changing': '72', 'apologize': '285', 'pain': '308', 'sisters': '27', 'forgiving': '36', 'can': '62', 'family': '77', 'sorry': '8', 'is': '360', 'too': '15', 'her': '37', 'wanted': '18', 'being': '44', 'into': '208', 'are': '17', 'just': '97', 'so': '148', 'now': '112', 'be': '19', 'right': '189', 'been': '105', 'no': '56', 'because': '74', 'forgive': '52', 'keep': '88', 'wish': '12', "i'm": '67', 'always': '53', 'ask': '29'}
new_list = sorted_dict.items()
new_list = sorted(new_list, key=lambda x: int(x[1]))
print(new_list)
if you are familiar with other programming concepts, you may have heard of what is called an "inline function"...
Lambda is an "inline function" equivalent in Python..
its a function which doesnt have a function name, and is restricted to have only a single line of code.
now coming to the problem of sort, the sort function in python accepts two arguments,
the list to be sorted
a function where you can define how to sort a list.
Suppose if its a list of numbers , you dont need the 2nd argument at all..
But if its like your case, where its a list of tuples or say a list of dictionaries,you need to tell python how to sort that list..
That is accomplished with the help of the 'key' argument in the sort function...
Below code is an illustration of that..
In [1]: l1 = [('a',1), ('b', 3), ('c', 2)]
In [2]: def sortHelper(x):
...: return x[1]
...:
In [3]: l1.sort(key=sortHelper)
In [4]: l1
Out[4]: [('a', 1), ('c', 2), ('b', 3)]
In [5]:
Now as you see, the sortHelper method is just a single line function, which can very well be written with a lambda function.
lambda x: x[1]
So its common to use lambda functions, but its not a compulsion.. you can accomplish the same functionality with normal python functions also..
This question already has answers here:
Sorting sub-lists into new sub-lists based on common first items
(4 answers)
Closed 2 years ago.
I have a text file that has lines in following order:
1 id:0 e1:"a" e2:"b"
0 id:0 e1:"4" e2:"c"
0 id:1 e1:"6" e2:"d"
2 id:2 e1:"8" e2:"f"
2 id:2 e1:"9" e2:"f"
2 id:2 e1:"d" e2:"k"
and I have to extract a list of lists containing elements (e1,e2) with id determining the index of the outer list and inner list following the order of the lines. So in the above case my output will be
[[("a","b"),("4","c")],[("6","d")],[("8","f"),("9","f"),("d","k")]]
The problem for me is that to know that the beginning of the new inner list, I need to check if the id value has changed. Each id does not have fixed number of elements. For example id:0 has 2, id:1 has 1 and id:2 has 3. Is there a efficient way to check this condition in next line while making the list?
You can use itertools.groupby() for the job:
import itertools
def split_by(
items,
key=None,
processing=None,
container=list):
for key_value, grouping in itertools.groupby(items, key):
if processing:
grouping = (processing(group) for group in grouping)
if container:
grouping = container(grouping)
yield grouping
to be called as:
from operator import itemgetter
list(split_by(items, itemgetter(0), itemgetter(slice(1, None))))
The items can be easily generated from text above (assuming it is contained in the file data.txt):
def get_items():
# with io.StringIO(text) as file_obj: # to read from `text`
with open(filename, 'r') as file_obj: # to read from `filename`
for line in file_obj:
if line.strip():
vals = line.replace('"', '').split()
yield tuple(val.split(':')[1] for val in vals[1:])
Finally, to test all the pieces (where open(filename, 'r') in get_items() is replaced by io.StringIO(text)):
import io
import itertools
from operator import itemgetter
text = """
1 id:0 e1:"a" e2:"b"
0 id:0 e1:"4" e2:"c"
0 id:1 e1:"6" e2:"d"
2 id:2 e1:"8" e2:"f"
2 id:2 e1:"9" e2:"f"
2 id:2 e1:"d" e2:"k"
""".strip()
print(list(split_by(get_items(), itemgetter(0), itemgetter(slice(1, None)))))
# [[('a', 'b'), ('4', 'c')], [('6', 'd')], [('8', 'f'), ('9', 'f'), ('d', 'k')]]
This efficiently iterates through the input without unnecessary memory allocation.
No other packages are required
Load and parse the file:
Beginning with a text file, formatted as shown in the question
# parse text file into dict
with open('test.txt', 'r') as f:
text = [line[2:].replace('"', '').strip().split() for line in f.readlines()] # clean each line and split it into a list
text = [[v.split(':') for v in t] for t in text] # split each value in the list into a list
d =[{v[0]: v[1] for v in t} for t in text] # convert liest to dicts
# text will appear as:
[[['id', '0'], ['e1', 'a'], ['e2', 'b']],
[['id', '0'], ['e1', '4'], ['e2', 'c']],
[['id', '1'], ['e1', '6'], ['e2', 'd']],
[['id', '2'], ['e1', '8'], ['e2', 'f']],
[['id', '2'], ['e1', '9'], ['e2', 'f']],
[['id', '2'], ['e1', 'd'], ['e2', 'k']]]
# d appears as:
[{'id': '0', 'e1': 'a', 'e2': 'b'},
{'id': '0', 'e1': '4', 'e2': 'c'},
{'id': '1', 'e1': '6', 'e2': 'd'},
{'id': '2', 'e1': '8', 'e2': 'f'},
{'id': '2', 'e1': '9', 'e2': 'f'},
{'id': '2', 'e1': 'd', 'e2': 'k'}]
Parse the list of dicts to expected output
Use .get to determine if a key exists, and return some specified value, None in this case, if the key is nonexistent.
dict.get defaults to None, so this method never raises a KeyError.
If None is a value in the dictionary, then change the default value returned by .get.
test.get(v[0], 'something here')
test = dict()
for r in d:
v = list(r.values())
if test.get(v[0]) == None:
test[v[0]] = [tuple(v[1:])]
else:
test[v[0]].append(tuple(v[1:]))
# test dict appears as:
{'0': [('a', 'b'), ('4', 'c')],
'1': [('6', 'd')],
'2': [('8', 'f'), ('9', 'f'), ('d', 'k')]}
# final output
final = list(test.values())
[[('a', 'b'), ('4', 'c')], [('6', 'd')], [('8', 'f'), ('9', 'f'), ('d', 'k')]]
Code Updated and reduced:
In this case, text is a list of lists, and there's no need to convert it to dict d, as above.
For each list t in text, index [0] is always the key, and index [1:] are the values.
with open('test.txt', 'r') as f:
text = [line[2:].replace('"', '').strip().split() for line in f.readlines()] # clean each line and split it into a list
text = [[v.split(':')[1] for v in t] for t in text] # list of list of only value at index 1
# text appears as:
[['0', 'a', 'b'],
['0', '4', 'c'],
['1', '6', 'd'],
['2', '8', 'f'],
['2', '9', 'f'],
['2', 'd', 'k']]
test = dict()
for t in text:
if test.get(t[0]) == None:
test[t[0]] = [tuple(t[1:])]
else:
test[t[0]].append(tuple(t[1:]))
final = list(test.values())
Using defaultdict
Will save a few lines of code
Using text as a list of lists from above
from collections import defaultdict as dd
test = dd(list)
for t in text:
test[t[0]].append(tuple(t[1:]))
final = list(test.values())
I am trying to remove sequential duplicate separated by delimiter '>' from journey column and also aggregate values under column uu and conv. I've tried
INPUT
a=[['journey', 'uu', 'convs'],
['Ct', '10', '2'],
['Ct>Ct', '100', '3'],
['Ct>Pt>Ct', '200', '10'],
['Ct>Pt>Ct>Ct', '40', '5'],
['Ct>Pt>Bu', '1000', '8']]
OUTPUT
a=[['journey', 'uu', 'convs'],
['Ct', '110', '5'],
['Ct>Pt>Ct', '240', '15'],
['Ct>Pt>Bu', '1000', '8']]
I tried below to split but it didn't work
a='>'.join(set(a.split()))
You need to split your string by > and then you could use groupby to eliminate duplicate items in your string. For example:
x = ['Ct>Pt>Ct>Ct', '40', '5']
print(">".join([i for i, _ in groupby(x[0].split(">"))]))
# 'Ct>Pt>Ct'
You could use this as a lambda function in another groupby to aggregate the lists. Then sum each element of the same index by using zip. Check it out:
a=[['journey', 'uu', 'convs'],
['Ct', '10', '2'],
['Ct>Ct', '100', '3'],
['Ct>Pt>Ct', '200', '10'],
['Ct>Pt>Ct>Ct', '40', '5'],
['Ct>Pt>Bu', '1000', '8']]
from itertools import groupby
result = [a[0]] # Add header
groups = groupby(
a[1:],
key=lambda x: ">".join([i for i, _ in groupby(x[0].split(">"))])
)
# groups:
# ['Ct, '[['Ct', '10', '2'], ['Ct>Ct', '100', '3']]]
# ['Ct>Pt>Ct', [['Ct>Pt>Ct', '200', '10'], ['Ct>Pt>Ct>Ct', '40', '5']]]
# ['Ct>Pt>Bu', [['Ct>Pt>Bu', '1000', '8']]]
for key, items in groups:
row = [key]
for i in zip(*items):
if i[0].isdigit():
row.append(str(sum(map(int, i))))
result.append(row)
print(result)
Prints:
[['journey', 'uu', 'convs'],
['Ct', '110', '5'],
['Ct>Pt>Ct', '240', '15'],
['Ct>Pt>Bu', '1000', '8']]
array([
['192', '895'],
['14', '269'],
['1', '23'],
['1', '23'],
['50', '322'],
['19', '121'],
['17', '112'],
['12', '72'],
['2', '17'],
['5,250', '36,410'],
['2,546', '17,610'],
['882', '6,085'],
['571', '3,659'],
['500', '3,818'],
['458', '3,103'],
['151', '1,150'],
['45', '319'],
['44', '335'],
['30', '184']
])
How can I remove some of the rows and left the array like:
Table3=array([
['192', '895'],
['14', '269'],
['1', '23'],
['50', '322'],
['17', '112'],
['12', '72'],
['2', '17'],
['5,250', '36,410'],
['882', '6,085'],
['571', '3,659'],
['500', '3,818'],
['458', '3,103'],
['45', '319'],
['44', '335'],
['30', '184']
])
I removed the index 2,4,6. I am not sure how should I do it. I have tried few ways, but still can't work.
It seems like you actually deleted indices 2, 5, and 10 (not 2, 4 and 6). To do this you can use np.delete, pass it a list of the indices you want to delete, and apply it along axis=0:
Table3 = np.delete(arr, [[2,5,10]], axis=0)
>>> Table3
array([['192', '895'],
['14', '269'],
['1', '23'],
['50', '322'],
['17', '112'],
['12', '72'],
['2', '17'],
['5,250', '36,410'],
['882', '6,085'],
['571', '3,659'],
['500', '3,818'],
['458', '3,103'],
['151', '1,150'],
['45', '319'],
['44', '335'],
['30', '184']],
dtype='<U6')