I am trying to automate number groupings of several lists by exporting the data to ms excel using openpyxl. The output is a list of lists with two sets of numbers per element, the first set being the matched number (0 to 99), and the second is the index number where they matched.
def variable_str_to_list_pairs_overlapping (str):
return [''.join(pair) for pair in zip(str[:-1], str[1:])]
list1 = variable_str_to_list_pairs_overlapping (list1)
list2 = variable_str_to_list_pairs_overlapping (list2)
lst_result = []
for i in range(len(list2)):
if list1[i] == list2[i]:
data = [list1[i], i]
data[0] = int(list1[i])
lst_result.append(data)
print(lst_result)
Output:
[[7, 265], [8, 281], [2, 303], [8, 332], [7, 450], [1, 544], [0,
737], [9, 805], [2, 970], [4, 1103], [4, 1145], [8, 1303], [1,
1575], [4, 1592], [2, 1593], [3, 1948], [4, 2200], [5, 2419], [3,
2464], [9, 2477], [1, 2529], [6, 2785], [2, 2842], [8, 2843], [7,
2930], [3, 2991], [8, 3096], [3, 3248], [2, 3437], [7, 3438], [8,
3511], [0, 3522], [0, 3523], [5, 3590], [6, 3621], [1, 3622], [2,
3671], [6, 3835], [7, 3876]]
I'm looking to export the data to excel in such a way that the first element is assigned as the row index and the second as the value inside the cell
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
dest_filename = 'openpyxltest.xlsx'
for x in lst_result:
ws.cell(row = x[0] + 1, column = +2).value = x[1]
wb.save(filename = dest_filename)
Actual Output:
Desired Output:
What do I need to change in my code? Thank you in advance for the help. you guys are awesome! :)
You are overwriting the cells - you never adjust the column you write into - so it gets overwritten by later writes...
You could solve this by using a defaultdict(list) to collect all values of one key into a list, sort it and then create the xlsx from the dictionary like so:
lst_result = [[7, 265], [8, 281], [2, 303], [8, 332], [7, 450], [1, 544], [0,
737], [9, 805], [2, 970], [4, 1103], [4, 1145], [8, 1303], [1,
1575], [4, 1592], [2, 1593], [3, 1948], [4, 2200], [5, 2419], [3,
2464], [9, 2477], [1, 2529], [6, 2785], [2, 2842], [8, 2843], [7,
2930], [3, 2991], [8, 3096], [3, 3248], [2, 3437], [7, 3438], [8,
3511], [0, 3522], [0, 3523], [5, 3590], [6, 3621], [1, 3622], [2,
3671], [6, 3835], [7, 3876]]
from collections import defaultdict
# group all datapoints by 1st value
grpd_data = defaultdict(list)
for k,v in lst_result:
grpd_data[k].append(v)
# sort all grouped values (not needed here, as inputs are sorted)
# for l in grpd_data:
# grpd_data[l].sort()
grpd_data looks like:
# defaultdict(<type 'list'>, {0: [737, 3522, 3523], 1: [544, 1575, 2529, 3622],
# 2: [303, 970, 1593, 2842, 3437, 3671], 3: [1948, 2464, 2991, 3248],
# 4: [1103, 1145, 1592, 2200], 5: [2419, 3590], 6: [2785, 3621, 3835],
# 7: [265, 450, 2930, 3438, 3876], 8: [281, 332, 1303, 2843, 3096, 3511],
# 9: [805, 2477]})
Then create the workbook:
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
dest_filename = 'openpyxltest.xlsx'
for x,data in grpd_data.items():
ws.cell(row = x + 1, column = 1).value = x
for col,d in enumerate(data,2):
ws.cell(row = x + 1, column = col).value = d
wb.save(filename = dest_filename)
Output:
See:
enumerate
How does collections.defaultdict work?
Related
For example, if I have a tensor A = [[1,1,1], [2,2,2], [3,3,3]], and B = [1,2,3]. How do I get C = [[1,1,1], [2,2,2], [2,2,2], [3,3,3], [3,3,3], [3,3,3]], and doing this batch-wise?
My current element-wise solution btw (takes forever...):
def get_char_context(valid_embeds, words_lens):
chars_contexts = []
for ve, wl in zip(valid_embeds, words_lens):
for idx, (e, l) in enumerate(zip(ve, wl)):
if idx ==0:
chars_context = e.view(1,-1).repeat(l, 1)
else:
chars_context = torch.cat((chars_context, e.view(1,-1).repeat(l, 1)),0)
chars_contexts.append(chars_context)
return chars_contexts
I'm doing this to add bert word embedding to a char level seq2seq task...
Use this:
import torch
# A is your tensor
B = torch.tensor([1, 2, 3])
C = A.repeat_interleave(B, dim = 0)
EDIT:
The above works fine if A is a single 2D tensor. To repeat all (2D) tensors in a batch in the same manner, this is a simple workaround:
A = torch.tensor([[[1, 1, 1], [2, 2, 2], [3, 3, 3]],
[[1, 2, 3], [4, 5, 6], [2,2,2]]]) # A has 2 tensors each of shape (3, 3)
B = torch.tensor([1, 2, 3]) # Rep. of each row of every tensor in the batch
A1 = A.reshape(1, -1, A.shape[2]).squeeze()
B1 = B.repeat(A.shape[0])
C = A1.repeat_interleave(B1, dim = 0).reshape(A.shape[0], -1, A.shape[2])
C is:
tensor([[[1, 1, 1],
[2, 2, 2],
[2, 2, 2],
[3, 3, 3],
[3, 3, 3],
[3, 3, 3]],
[[1, 2, 3],
[4, 5, 6],
[4, 5, 6],
[2, 2, 2],
[2, 2, 2],
[2, 2, 2]]])
As you can see each inside tensor in the batch is repeated in the same manner.
Given a nested list say:
a = [[1, 5, 100],
[2],
[2, 100]]
The desired result to be obtained is as follows:
[[1, 2, 5], [1, 2, 100], [5, 2, 100], [100, 2, 5]]
Here is my code, but it does not give the output as desired. I am unable to progress further:
arr = [[i] for i in a[0]]
def poss(j, arr, tmp):
for i in range(len(tmp)):
arr[i] = tmp[i] + [j]
print(arr)
for i in a[1:]:
tmp = [k for k in arr] # deepcopy of arr
for j in i:
poss(j, arr, tmp)
Output for above code:
[[1, 2], [5, 2], [100, 2]]
[[1, 2, 5], [5, 2, 5], [100, 2, 5]]
[[1, 2, 100], [5, 2, 100], [100, 2, 100]]
I also feel this code is inefficient on large data, is that so? I'm looking for a better code to get the result.
This problem can be solved by using itertools module of python.
The itertools.combinations() function returns all the possible subsets of the given set without repetition of elements.
import math
import itertools
a = [[1, 5, 100],
[2],
[2, 100]]
dimx = max([len(el) for el in a])
uniqueEls={}
for el in a:
for subel in el:
uniqueEls[subel] = uniqueEls.get(subel,0)
desiredArr= [list(x) for x in list(itertools.combinations(uniqueEls.keys(), dimx))]
print(desiredArr)
[[1, 5, 100], [1, 5, 2], [1, 100, 2], [5, 100, 2]]
How to merge two array and group by key?
Example:
my_list = [3, 4, 5, 6, 4, 6, 8]
keys = [1, 1, 2, 2, 3, 5, 7]
Expected outcome:
[[1, 3, 4], [2, 5, 6], [3, 4], [5, 6], [7, 8]]
If I understand it right, the list of keys map to the list of values. You can use the zip function to iterate through two lists at the same time. Its convenient in this case. Also check up on the beautiful defaultdict functionality - we can use it to fill a list without initialising it explicitely.
from collections import defaultdict
result = defaultdict(list) # a dictionary which by default returns a list
for key, val in zip(keys, my_list):
result[key].append(val)
result
# {1: [3, 4], 2: [5, 6], 3: [4], 5: [6], 7: [8]}
You can then go to a list (but not sure why you would want to) with:
final = []
for key, val in result.items():
final.append([key] + val) # add key back to the list of values
final
# [[1, 3, 4], [2, 5, 6], [3, 4], [5, 6], [7, 8]]
I think you have to write it by your own using set() to remove duplicates, so I have made a function called merge_group
my_list = [3, 4, 5, 6, 4, 6, 8]
keys = [1, 1, 2, 2, 3, 5, 7]
def merge_group(input_list : list, input_key : list):
result = []
i = 0
while i < len(my_list):
result.append([my_list[i], keys[i]])
i += 1
j = 0
while j < len(result):
if j+1 < len(result):
check_sum = result[j] + result[j+1]
check_sum_set = list(set(check_sum))
if len(check_sum) != len(check_sum_set):
result[j] = check_sum_set
j += 1
return result
print(merge_group(my_list, keys))
How will i access the list in a list?
For example
data = [ [[list([1,2,3]), list([0,1])]], [[list([4,5,6]), list([1,1])]] ]
The output data should be
output = [[1,2,3],[4,5,6]]
Access individual lists as follows by index.
data[0] = [[[1, 2, 3], [0, 1]]]
data[0][0] = [[1, 2, 3], [0, 1]]
data[0][0][0] = [1, 2, 3]
data[1][0][0] = [4, 5, 6]
This is what i did
>>> data = [ [[list([1,2,3]), list([0,1])]], [[list([4,5,6]), list([1,1])]] ]
>>> output = list([row[0][0] for row in data])
>>> print (output)
[[1, 2, 3], [4, 5, 6]]
Can anyone help me. This is what i want to do.
x = [[1,2,3,4,5],[6,7,8,9,10]]
y= [0,1]
desired output = [
[[1,2,3,4,5],[0,1]],
[[6,7,8,9,10],[0,1]]
]
I try putting it in a for loop
>>> x = [[1,2,3,4,5],[6,7,8,9,10]]
>>> for value in x:
... a = []
... a += ([x,y])
... print(a)
...
[[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]], [0, 1]]
[[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]], [0, 1]]
I also tried doing this
>>> for value in x:
... a = []
... a += ([x,y])
... print(a)
...
[[1, 2, 3, 4, 5], [0, 1]]
[[1, 2, 3, 4, 5], [0, 1]]
[[1, 2, 3, 4, 5], [0, 1]]
[[1, 2, 3, 4, 5], [0, 1]]
[[1, 2, 3, 4, 5], [0, 1]]
Thank you for helping. I need it for putting label on my data for neural networks.
You can use a list comprehension, and iterate over each sublist in x. Since you're inserting y into different sublists, you might want to insert a copy of the list, not the original.
[[i, y[:]] for i in x]
Or,
[[i, y.copy()] for i in x]
[[[1, 2, 3, 4, 5], [0, 1]], [[6, 7, 8, 9, 10], [0, 1]]]
The copy is done as a safety precaution. To understand why, consider an example,
z = [[i, y] for i in x] # inserting y (reference copy)
y[0] = 12345
print(z)
[[[1, 2, 3, 4, 5], [12345, 1]], [[6, 7, 8, 9, 10], [12345, 1]]] # oops
Modifying the original y or the y in any other sublist will reflect changes across all sublists. You can prevent that by inserting a copy instead, which is what I've done at the top.
Try this:
for i in range(len(x)):
z[i] = [x[i],y];