numpy selecting elements in sub array using slicing [duplicate] - python-3.x

I have a list like this:
a = [[4.0, 4, 4.0], [3.0, 3, 3.6], [3.5, 6, 4.8]]
I want an outcome like this (EVERY first element in the list):
4.0, 3.0, 3.5
I tried a[::1][0], but it doesn't work

You can get the index [0] from each element in a list comprehension
>>> [i[0] for i in a]
[4.0, 3.0, 3.5]

Use zip:
columns = zip(*rows) #transpose rows to columns
print columns[0] #print the first column
#you can also do more with the columns
print columns[1] # or print the second column
columns.append([7,7,7]) #add a new column to the end
backToRows = zip(*columns) # now we are back to rows with a new column
print backToRows
You can also use numpy:
a = numpy.array(a)
print a[:,0]
Edit:
zip object is not subscriptable. It need to be converted to list to access as list:
column = list(zip(*row))

You could use this:
a = ((4.0, 4, 4.0), (3.0, 3, 3.6), (3.5, 6, 4.8))
a = np.array(a)
a[:,0]
returns >>> array([4. , 3. , 3.5])

You can get it like
[ x[0] for x in a]
which will return a list of the first element of each list in a

Compared the 3 methods
2D list: 5.323603868484497 seconds
Numpy library : 0.3201274871826172 seconds
Zip (Thanks to Joran Beasley) : 0.12395167350769043 seconds
D2_list=[list(range(100))]*100
t1=time.time()
for i in range(10**5):
for j in range(10):
b=[k[j] for k in D2_list]
D2_list_time=time.time()-t1
array=np.array(D2_list)
t1=time.time()
for i in range(10**5):
for j in range(10):
b=array[:,j]
Numpy_time=time.time()-t1
D2_trans = list(zip(*D2_list))
t1=time.time()
for i in range(10**5):
for j in range(10):
b=D2_trans[j]
Zip_time=time.time()-t1
print ('2D List:',D2_list_time)
print ('Numpy:',Numpy_time)
print ('Zip:',Zip_time)
The Zip method works best.
It was quite useful when I had to do some column wise processes for mapreduce jobs in the cluster servers where numpy was not installed.

If you have access to numpy,
import numpy as np
a_transposed = a.T
# Get first row
print(a_transposed[0])
The benefit of this method is that if you want the "second" element in a 2d list, all you have to do now is a_transposed[1]. The a_transposed object is already computed, so you do not need to recalculate.
Description
Finding the first element in a 2-D list can be rephrased as find the first column in the 2d list. Because your data structure is a list of rows, an easy way of sampling the value at the first index in every row is just by transposing the matrix and sampling the first list.

Try using
for i in a :
print(i[0])
i represents individual row in a.So,i[0] represnts the 1st element of each row.

Related

Set decimal values to 2 points in list under list pandas

I am trying to set max decimal values upto 2 digit for result of a nested list. I have already tried to set precision and tried other things but can not find a way.
r_ij_matrix = variables[1]
print(type(r_ij_matrix))
print(type(r_ij_matrix[0]))
pd.set_option('display.expand_frame_repr', False)
pd.set_option("display.precision", 2)
data = pd.DataFrame(r_ij_matrix, columns= Attributes, index= Names)
df = data.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])
df.set_properties(**{'text-align': 'center'})
df.set_caption('Table: Combined Decision Matrix')
You can solve your problem with the apply() method of the dataframe. You can do something like that :
df.apply(lambda x: [[round(elt, 2) for elt in list_] for list_ in x])
Solved it by copying the list to another with the desired decimal points. Thanks everyone.
rij_matrix = variables[1]
rij_nparray = np.empty([8, 6, 3])
for i in range(8):
for j in range(6):
for k in range(3):
rij_nparray[i][j][k] = round(rij_matrix[i][j][k], 2)
rij_list = rij_nparray.tolist()
pd.set_option('display.expand_frame_repr', False)
data = pd.DataFrame(rij_list, columns= Attributes, index= Names)
df = data.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])
df.set_properties(**{'text-align': 'center'})
df.set_caption('Table: Normalized Fuzzy Decision Matrix (r_ij)')
applymap seems to be good here:
but there is a BUT: be aware that it is propably not the best idea to store lists as values of a df, you just give up the functionality of pandas. and also after formatting them like this, there are stored as strings. This (if really wanted) should only be for presentation.
df.applymap(lambda lst: list(map("{:.2f}".format, lst)))
Output:
A B
0 [2.05, 2.28, 2.49] [3.11, 3.27, 3.42]
1 [2.05, 2.28, 2.49] [3.11, 3.27, 3.42]
2 [2.05, 2.28, 2.49] [3.11, 3.27, 3.42]
Used Input:
df = pd.DataFrame({
'A': [[2.04939015319192, 2.280350850198276, 2.4899799195977463],
[2.04939015319192, 2.280350850198276, 2.4899799195977463],
[2.04939015319192, 2.280350850198276, 2.4899799195977463]],
'B': [[3.1144823004794873, 3.271085446759225, 3.420526275297414],
[3.1144823004794873, 3.271085446759225, 3.420526275297414],
[3.1144823004794873, 3.271085446759225, 3.420526275297414]]})

Python Multivariate Apply/Map/Applymap

Trying to use apply/map for multiple column values. works fine for one column. Made an example below of applying to a single column, need help making the commented out part work for multiple columns as inputs.
I need it to take 2 values on same row but from different columns as inputs to the function, perform a calc, and then place the result in the new column. If there is an efficent/optimized way to do this with apply/map/or both please let me know, Thanks!!!
import numpy as np
import pandas as pd
#gen some data to work with
df = pd.DataFrame({'Col_1': [2, 4, 6, 8],
'Col_2': [11, 22, 33, 44],
'Col_3': [2, 1, 2, 1]})
#Make new empty col to write to
df[4]=""
#some Function of one variable
def func(a):
return(a**2)
#applies function itemwise to coll & puts result in new column correctly
df[4] = df['Col_1'].map(func)
"""
#Want Multi Variate version. apply itemwise function (where each item is from different columns), compute, then add to new column
def func(a,b):
return(a**2+b**2)
#Take Col_1 value and Col_2 value; run function of multi variables, display result in new column...???
df[4] = df['Col_1']df['Col_2'].map(func(a,b))
"""
You can pass each row of a dataframe as a Series, using apply function. And then in the function itself you can use its value according to the requirement.
def func(df):
return (df['Col_1']**2+df['Col_2']**2)
df[4] = df.apply(func, axis = 1)
Do refer Documentation to explore.

Forming an array within an array using np.append()

I want to form an array within an array. So that if I call Array_1[0] will give me the first value and Array_1[1] will give me the second value. In this instance, the first value will be 1 and the second value will be 2. Similar to tuples within a list. I have written the following code but I end up just with just one array containing all elements.
import numpy as np
list_A = [1,3,5]
list_B = [2,4,6]
Array_1 = np.array([])
for i,j in zip(list_A,list_B):
Array_1 = np.append(arr = Array_1, values = [i,j])
print(Array_2.astype(int))
#output: [1,2,3,4,5,6]
#Desired output: [[1,2],[3,4],[5,6]]
Any ideas if the desired output can be achieved through a numpy array. (btw not using a list and tuples)

Effective ways to group things into list

I am doing a K-means project and I have to do it by hand, which is why I am trying to figure out what is the best ways to group things according to their last values into a list or a dictionary. Here is what I am talking about
list_of_tuples = [(honey,1),(bee,2),(tree,5),(flower,2),(computer,5),(key,1)]
Now my ultimate goal is to be able to sort out the list and have 3 different lists each with its respected element
"""This is the goal"""
list_1 = [honey,key]
list_2 = [bee,flower]
list_3 = [tree, computer]
I can use a lot of if statements and a for loop, but is there a more efficient way to do it?
If you're not opposed to using something like pandas, you could do something along these lines:
import pandas as pd
list_1, list_2, list_3 = pd.DataFrame(list_of_tuples).groupby(1)[0].apply(list).values
Result:
In [19]: list_1
Out[19]: ['honey', 'key']
In [20]: list_2
Out[20]: ['bee', 'flower']
In [21]: list_3
Out[21]: ['tree', 'computer']
Explanation:
pd.DataFrame(list_of_tuples).groupby(1) groups your list of tuples by the value at index 1, then you extract the values as lists of index 0 with [0].apply(list).values. This gives you an array of lists as below:
array([list(['honey', 'key']), list(['bee', 'flower']),
list(['tree', 'computer'])], dtype=object)
Something to the effect can be achieved with a dictionary and a for loop, using the second element of the tuple as a key value.
list_of_tuples = [("honey",1),("bee",2),("tree",5),("flower",2),("computer",5),("key",1)]
dict_list = {}
for t in list_of_tuples:
# create key and a single element list if key doesn't exist yet
# append to existing list otherwise
if t[1] not in dict_list.keys():
dict_list[t[1]] = [t[0]]
else:
dict_list[t[1]].append( t[0] )
list_1, list_2, list_3 = dict_list.values()

Group two dimensional list records Python [duplicate]

This question already has answers here:
Python summing values in list if it exists in another list
(5 answers)
Closed 4 years ago.
I have a list of lists (string,integer)
eg:
my_list=[["apple",5],["banana",6],["orange",6],["banana",9],["orange",3],["apple",111]]
I'd like to sum the same items and finally get this:
my2_list=[["apple",116],["banana",15],["orange",9]]
You can use itertools.groupby on the sorted list:
from itertools import groupby
my_list=[["apple",5],["banana",6],["orange",6],["banana",9],["orange",3],["apple",111]]
my_list2 = []
for i, g in groupby(sorted(my_list), key=lambda x: x[0]):
my_list2.append([i, sum(v[1] for v in g)])
print(my_list2)
# [['apple', 116], ['banana', 15], ['orange', 9]]
Speaking of SQL Group By and pre-sorting:
The operation of groupby() is similar to the uniq filter in Unix. It
generates a break or new group every time the value of the key
function changes (which is why it is usually necessary to have sorted
the data using the same key function). That behavior differs from
SQL’s GROUP BY which aggregates common elements regardless of their
input order.
Emphasis Mine
from collections import defaultdict
my_list= [["apple",5],["banana",6],["orange",6],["banana",9],["orange",3],["apple",111]]
result = defaultdict(int)
for fruit, value in my_list:
result[fruit] += value
result = result.items()
print result
Or you can keep result as dictionary
Using Pandas and groupby:
import pandas as pd
>>> pd.DataFrame(my_list, columns=['fruit', 'count']).groupby('fruit').sum()
count
fruit
apple 116
banana 15
orange 9
from itertools import groupby
[[k, sum(v for _, v in g)] for k, g in groupby(sorted(my_list), key = lambda x: x[0])]
# [['apple', 116], ['banana', 15], ['orange', 9]]
If you dont want the order to preserved, then plz use the below code.
my_list=[["apple",5],["banana",6],["orange",6],["banana",9],["orange",3],["apple",111]]
my_dict1 = {}
for d in my_list:
if d[0] in my_dict1.keys():
my_dict1[d[0]] += d[1]
else:
my_dict1[d[0]] = d[1]
my_list2 = [[k,v] for (k,v) in my_dict1.items()]

Resources