Effective ways to group things into list - python-3.x

I am doing a K-means project and I have to do it by hand, which is why I am trying to figure out what is the best ways to group things according to their last values into a list or a dictionary. Here is what I am talking about
list_of_tuples = [(honey,1),(bee,2),(tree,5),(flower,2),(computer,5),(key,1)]
Now my ultimate goal is to be able to sort out the list and have 3 different lists each with its respected element
"""This is the goal"""
list_1 = [honey,key]
list_2 = [bee,flower]
list_3 = [tree, computer]
I can use a lot of if statements and a for loop, but is there a more efficient way to do it?

If you're not opposed to using something like pandas, you could do something along these lines:
import pandas as pd
list_1, list_2, list_3 = pd.DataFrame(list_of_tuples).groupby(1)[0].apply(list).values
Result:
In [19]: list_1
Out[19]: ['honey', 'key']
In [20]: list_2
Out[20]: ['bee', 'flower']
In [21]: list_3
Out[21]: ['tree', 'computer']
Explanation:
pd.DataFrame(list_of_tuples).groupby(1) groups your list of tuples by the value at index 1, then you extract the values as lists of index 0 with [0].apply(list).values. This gives you an array of lists as below:
array([list(['honey', 'key']), list(['bee', 'flower']),
list(['tree', 'computer'])], dtype=object)

Something to the effect can be achieved with a dictionary and a for loop, using the second element of the tuple as a key value.
list_of_tuples = [("honey",1),("bee",2),("tree",5),("flower",2),("computer",5),("key",1)]
dict_list = {}
for t in list_of_tuples:
# create key and a single element list if key doesn't exist yet
# append to existing list otherwise
if t[1] not in dict_list.keys():
dict_list[t[1]] = [t[0]]
else:
dict_list[t[1]].append( t[0] )
list_1, list_2, list_3 = dict_list.values()

Related

Python how can create a subset from a integer array list based on a range?

I am looking around a way to get the subset from an integer array based on certain range
For example
Input
array1=[3,5,4,12,34,54]
#Now getting subset for every 3 element
Output
subset= [(3,5,4), (12,34,54)]
I know it could be simple, but didn't find the right way to get this output
Appreciated for the help
Thanks
Consider using a list comprehension:
>>> array1 = [3, 5, 4, 12, 34, 54]
>>> subset = [tuple(array1[i:i+3]) for i in range(0, len(array1), 3)]
>>> subset
[(3, 5, 4), (12, 34, 54)]
Links to other relevant documentation:
tuples
ranges
arr = [1,2,3,4,5,6]
sets = [tuple(arr[i:i+3]) for i in range(0, len(arr), 3)]
print(sets)
We are taking a range of values from the array that we make into a tuple. The range is determined by the for loop which iterates at a step of three so that a tuple only is create after every 3 items.
you can use code:
from itertools import zip_longest
input_list = [3,5,4,12,34,54]
iterables = [iter(input_list)] * 3
slices = zip_longest(*iterables, fillvalue=None)
output_list =[]
for slice in slices:
my_list = [slice]
# print(my_list)
output_list = output_list + my_list
print(output_list)
You could use the zip_longest function from itertools
https://docs.python.org/3.0/library/itertools.html#itertools.zip_longest

numpy selecting elements in sub array using slicing [duplicate]

I have a list like this:
a = [[4.0, 4, 4.0], [3.0, 3, 3.6], [3.5, 6, 4.8]]
I want an outcome like this (EVERY first element in the list):
4.0, 3.0, 3.5
I tried a[::1][0], but it doesn't work
You can get the index [0] from each element in a list comprehension
>>> [i[0] for i in a]
[4.0, 3.0, 3.5]
Use zip:
columns = zip(*rows) #transpose rows to columns
print columns[0] #print the first column
#you can also do more with the columns
print columns[1] # or print the second column
columns.append([7,7,7]) #add a new column to the end
backToRows = zip(*columns) # now we are back to rows with a new column
print backToRows
You can also use numpy:
a = numpy.array(a)
print a[:,0]
Edit:
zip object is not subscriptable. It need to be converted to list to access as list:
column = list(zip(*row))
You could use this:
a = ((4.0, 4, 4.0), (3.0, 3, 3.6), (3.5, 6, 4.8))
a = np.array(a)
a[:,0]
returns >>> array([4. , 3. , 3.5])
You can get it like
[ x[0] for x in a]
which will return a list of the first element of each list in a
Compared the 3 methods
2D list: 5.323603868484497 seconds
Numpy library : 0.3201274871826172 seconds
Zip (Thanks to Joran Beasley) : 0.12395167350769043 seconds
D2_list=[list(range(100))]*100
t1=time.time()
for i in range(10**5):
for j in range(10):
b=[k[j] for k in D2_list]
D2_list_time=time.time()-t1
array=np.array(D2_list)
t1=time.time()
for i in range(10**5):
for j in range(10):
b=array[:,j]
Numpy_time=time.time()-t1
D2_trans = list(zip(*D2_list))
t1=time.time()
for i in range(10**5):
for j in range(10):
b=D2_trans[j]
Zip_time=time.time()-t1
print ('2D List:',D2_list_time)
print ('Numpy:',Numpy_time)
print ('Zip:',Zip_time)
The Zip method works best.
It was quite useful when I had to do some column wise processes for mapreduce jobs in the cluster servers where numpy was not installed.
If you have access to numpy,
import numpy as np
a_transposed = a.T
# Get first row
print(a_transposed[0])
The benefit of this method is that if you want the "second" element in a 2d list, all you have to do now is a_transposed[1]. The a_transposed object is already computed, so you do not need to recalculate.
Description
Finding the first element in a 2-D list can be rephrased as find the first column in the 2d list. Because your data structure is a list of rows, an easy way of sampling the value at the first index in every row is just by transposing the matrix and sampling the first list.
Try using
for i in a :
print(i[0])
i represents individual row in a.So,i[0] represnts the 1st element of each row.

Appending value to a list based on dictionary key

I started writing Python scripts for my research this past summer, and have been picking up the language as I go. For my current work, I have a dictionary of lists, sample_range_dict, that is initialized with descriptor_cols as the keys and empty lists for values. Sample code is below:
import numpy as np
import pandas as pd
def rangeFunc(arr):
return (np.max(arr) - np.min(arr))
df_sample = pd.DataFrame(np.random.rand(2000, 4), columns=list("ABCD")) #random dataframe for testing
col_list = df_sample.columns
sample_range_dict = dict.fromkeys(col_list, []) #creates dictionary where each key pairs with an empty list
rand_df = df_sample.sample(n=20) #make a new dataframe with 20 random rows of df_sample
I want to go through each column from rand_df and calculate the range of values, putting each range in the list with the specified column name (e.g. sample_range_dict["A"] = [range in column A]). The following is the code I initially thought to use for this:
for d in col_list:
sample_range_dict[d].append(rangeFunc(rand_df[d].tolist()))
However, instead of each key having one item in the list, printing sample_range_dict shows each key having an identical list of 4 values:
{'A': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'B': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'C': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'D': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744]}
I've determined that the first value is the range for "A", second value is the range for "B", and so on. My question is about why this is happening, and how I could rewrite the code in order to get one item in the list for each key.
P.S. I'm looking to make this an iterative process, hence using lists instead of single numbers.
The issue is this line:
sample_range_dict = dict.fromkeys(col_list, [])
You only created one list. You don't have four lists with the same elements; you have one list, and four references to it. When you add to it via one reference, the element is visible through the other references, because it's the same list:
>>> a = dict.fromkeys(['x', 'y', 'z'], [])
>>> a['x'] is a['y']
True
>>> a['x'].append(5)
>>> a['y']
[5]
If you want each key to have a different list, either create a new list for each key:
>>> a = { k: [] for k in ['x', 'y', 'z'] }
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]
Or use a defaultdict which will do it for you:
>>> from collections import defaultdict
>>> a = defaultdict(list)
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]

Group two dimensional list records Python [duplicate]

This question already has answers here:
Python summing values in list if it exists in another list
(5 answers)
Closed 4 years ago.
I have a list of lists (string,integer)
eg:
my_list=[["apple",5],["banana",6],["orange",6],["banana",9],["orange",3],["apple",111]]
I'd like to sum the same items and finally get this:
my2_list=[["apple",116],["banana",15],["orange",9]]
You can use itertools.groupby on the sorted list:
from itertools import groupby
my_list=[["apple",5],["banana",6],["orange",6],["banana",9],["orange",3],["apple",111]]
my_list2 = []
for i, g in groupby(sorted(my_list), key=lambda x: x[0]):
my_list2.append([i, sum(v[1] for v in g)])
print(my_list2)
# [['apple', 116], ['banana', 15], ['orange', 9]]
Speaking of SQL Group By and pre-sorting:
The operation of groupby() is similar to the uniq filter in Unix. It
generates a break or new group every time the value of the key
function changes (which is why it is usually necessary to have sorted
the data using the same key function). That behavior differs from
SQL’s GROUP BY which aggregates common elements regardless of their
input order.
Emphasis Mine
from collections import defaultdict
my_list= [["apple",5],["banana",6],["orange",6],["banana",9],["orange",3],["apple",111]]
result = defaultdict(int)
for fruit, value in my_list:
result[fruit] += value
result = result.items()
print result
Or you can keep result as dictionary
Using Pandas and groupby:
import pandas as pd
>>> pd.DataFrame(my_list, columns=['fruit', 'count']).groupby('fruit').sum()
count
fruit
apple 116
banana 15
orange 9
from itertools import groupby
[[k, sum(v for _, v in g)] for k, g in groupby(sorted(my_list), key = lambda x: x[0])]
# [['apple', 116], ['banana', 15], ['orange', 9]]
If you dont want the order to preserved, then plz use the below code.
my_list=[["apple",5],["banana",6],["orange",6],["banana",9],["orange",3],["apple",111]]
my_dict1 = {}
for d in my_list:
if d[0] in my_dict1.keys():
my_dict1[d[0]] += d[1]
else:
my_dict1[d[0]] = d[1]
my_list2 = [[k,v] for (k,v) in my_dict1.items()]

Python: How to find the average on each array in the list?

Lets say I have a list with three arrays as following:
[(1,2,0),(2,9,6),(2,3,6)]
Is it possible I get the average by diving each "slot" of the arrays in the list.
For example:
(1+2+2)/3, (2+0+9)/3, (0+6+6)/3
and make it become new arraylist with only 3 integers.
You can use zip to associate all of the elements in each of the interior tuples by index
tups = [(1,2,0),(2,9,6),(2,3,6)]
print([sum(x)/len(x) for x in zip(*tups)])
# [1.6666666666666667, 4.666666666666667, 4.0]
You can also do something like sum(x)//len(x) or round(sum(x)/len(x)) inside the list comprehension to get an integer.
Here are couple of ways you can do it.
data = [(1,2,0),(2,9,6),(2,3,6)]
avg_array = []
for tu in data:
avg_array.append(sum(tu)/len(tu))
print(avg_array)
using list comprehension
data = [(1,2,0),(2,9,6),(2,3,6)]
comp = [ sum(i)/len(i) for i in data]
print(comp)
Can be achieved by doing something like this.
Create an empty array. Loop through your current array and use the sum and len functions to calculate averages. Then append the average to your new array.
array = [(1,2,0),(2,9,6),(2,3,6)]
arraynew = []
for i in range(0,len(array)):
arraynew.append(sum(array[i]) / len(array[i]))
print arraynew
As you were told in the comments with sum and len it's pretty easy.
But in python I would do something like this, assuming you want to maintain decimal precision:
list = [(1, 2, 0), (2, 9, 6), (2, 3, 6)]
res = map(lambda l: round(float(sum(l)) / len(l), 2), list)
Output:
[1.0, 5.67, 3.67]
But as you said you wanted 3 ints in your question, would be like this:
res = map(lambda l: sum(l) / len(l), list)
Output:
[1, 5, 3]
Edit:
To sum the same index of each tuple, the most elegant method is the solution provided by #PatrickHaugh.
On the other hand, if you are not fond of list comprehensions and some built in functions as zip is, here's a little longer and less elegant version using a for loop:
arr = []
for i in range(0, len(list)):
arr.append(sum(l[i] for l in list) / len(list))
print(arr)
Output:
[1, 4, 4]

Resources