Set decimal values to 2 points in list under list pandas - python-3.x

I am trying to set max decimal values upto 2 digit for result of a nested list. I have already tried to set precision and tried other things but can not find a way.
r_ij_matrix = variables[1]
print(type(r_ij_matrix))
print(type(r_ij_matrix[0]))
pd.set_option('display.expand_frame_repr', False)
pd.set_option("display.precision", 2)
data = pd.DataFrame(r_ij_matrix, columns= Attributes, index= Names)
df = data.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])
df.set_properties(**{'text-align': 'center'})
df.set_caption('Table: Combined Decision Matrix')

You can solve your problem with the apply() method of the dataframe. You can do something like that :
df.apply(lambda x: [[round(elt, 2) for elt in list_] for list_ in x])

Solved it by copying the list to another with the desired decimal points. Thanks everyone.
rij_matrix = variables[1]
rij_nparray = np.empty([8, 6, 3])
for i in range(8):
for j in range(6):
for k in range(3):
rij_nparray[i][j][k] = round(rij_matrix[i][j][k], 2)
rij_list = rij_nparray.tolist()
pd.set_option('display.expand_frame_repr', False)
data = pd.DataFrame(rij_list, columns= Attributes, index= Names)
df = data.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])
df.set_properties(**{'text-align': 'center'})
df.set_caption('Table: Normalized Fuzzy Decision Matrix (r_ij)')

applymap seems to be good here:
but there is a BUT: be aware that it is propably not the best idea to store lists as values of a df, you just give up the functionality of pandas. and also after formatting them like this, there are stored as strings. This (if really wanted) should only be for presentation.
df.applymap(lambda lst: list(map("{:.2f}".format, lst)))
Output:
A B
0 [2.05, 2.28, 2.49] [3.11, 3.27, 3.42]
1 [2.05, 2.28, 2.49] [3.11, 3.27, 3.42]
2 [2.05, 2.28, 2.49] [3.11, 3.27, 3.42]
Used Input:
df = pd.DataFrame({
'A': [[2.04939015319192, 2.280350850198276, 2.4899799195977463],
[2.04939015319192, 2.280350850198276, 2.4899799195977463],
[2.04939015319192, 2.280350850198276, 2.4899799195977463]],
'B': [[3.1144823004794873, 3.271085446759225, 3.420526275297414],
[3.1144823004794873, 3.271085446759225, 3.420526275297414],
[3.1144823004794873, 3.271085446759225, 3.420526275297414]]})

Related

numpy selecting elements in sub array using slicing [duplicate]

I have a list like this:
a = [[4.0, 4, 4.0], [3.0, 3, 3.6], [3.5, 6, 4.8]]
I want an outcome like this (EVERY first element in the list):
4.0, 3.0, 3.5
I tried a[::1][0], but it doesn't work
You can get the index [0] from each element in a list comprehension
>>> [i[0] for i in a]
[4.0, 3.0, 3.5]
Use zip:
columns = zip(*rows) #transpose rows to columns
print columns[0] #print the first column
#you can also do more with the columns
print columns[1] # or print the second column
columns.append([7,7,7]) #add a new column to the end
backToRows = zip(*columns) # now we are back to rows with a new column
print backToRows
You can also use numpy:
a = numpy.array(a)
print a[:,0]
Edit:
zip object is not subscriptable. It need to be converted to list to access as list:
column = list(zip(*row))
You could use this:
a = ((4.0, 4, 4.0), (3.0, 3, 3.6), (3.5, 6, 4.8))
a = np.array(a)
a[:,0]
returns >>> array([4. , 3. , 3.5])
You can get it like
[ x[0] for x in a]
which will return a list of the first element of each list in a
Compared the 3 methods
2D list: 5.323603868484497 seconds
Numpy library : 0.3201274871826172 seconds
Zip (Thanks to Joran Beasley) : 0.12395167350769043 seconds
D2_list=[list(range(100))]*100
t1=time.time()
for i in range(10**5):
for j in range(10):
b=[k[j] for k in D2_list]
D2_list_time=time.time()-t1
array=np.array(D2_list)
t1=time.time()
for i in range(10**5):
for j in range(10):
b=array[:,j]
Numpy_time=time.time()-t1
D2_trans = list(zip(*D2_list))
t1=time.time()
for i in range(10**5):
for j in range(10):
b=D2_trans[j]
Zip_time=time.time()-t1
print ('2D List:',D2_list_time)
print ('Numpy:',Numpy_time)
print ('Zip:',Zip_time)
The Zip method works best.
It was quite useful when I had to do some column wise processes for mapreduce jobs in the cluster servers where numpy was not installed.
If you have access to numpy,
import numpy as np
a_transposed = a.T
# Get first row
print(a_transposed[0])
The benefit of this method is that if you want the "second" element in a 2d list, all you have to do now is a_transposed[1]. The a_transposed object is already computed, so you do not need to recalculate.
Description
Finding the first element in a 2-D list can be rephrased as find the first column in the 2d list. Because your data structure is a list of rows, an easy way of sampling the value at the first index in every row is just by transposing the matrix and sampling the first list.
Try using
for i in a :
print(i[0])
i represents individual row in a.So,i[0] represnts the 1st element of each row.

How to remove rows in pandas dataframe column that contain the hyphen character?

I have a DataFrame given as follows:
new_dict = {'Area_sqfeet': '[1002, 322, 420-500,300,1.25acres,100-250,3.45 acres]'}
df = pd.DataFrame([new_dict])
df.head()
I want to remove hyphen values and change acres to sqfeet in this dataframe.
How may I do it efficiently?
Use list comprehension:
mylist = ["1002", "322", "420-500","300","1.25acres","100-250","3.45 acres"]
# ['1002', '322', '420-500', '300', '1.25acres', '100-250', '3.45 acres']
Step 1: Remove hyphens
filtered_list = [i for i in mylist if "-" not in i] # remove hyphens
Step 2: Convert acres to sqfeet:
final_list = [i if 'acres' not in i else eval(i.split('acres')[0])*43560 for i in filtered_list] # convert to sq foot
#['1002', '322', '300', 54450.0, 150282.0]
Also, if you want to keep the "sqfeet" next tot he converted values use this:
final_list = [i if 'acres' not in i else "{} sqfeet".format(eval(i.split('acres')[0])*43560) for i in filtered_list]
# ['1002', '322', '300', '54450.0 sqfeet', '150282.0 sqfeet']
It's not clear if this is homework, and you haven't shown us what you have already tried per https://stackoverflow.com/help/how-to-ask
Here's something that might get you going in the right direction:
import pandas as pd
col_name = 'Area_sqfeet'
# per comment on your question, you need to make a dataframe with more
# than one row, your original question only had one row
new_list = ["1002", "322", "420-500","300","1.25acres","100-250","3.45 acres"]
df = pd.DataFrame(new_list)
df.columns = ["Area_sqfeet"]
# once you have the df as strings, here's how to remove the ones with hyphens
df = df[df["Area_sqfeet"].str.contains("-")==False]
print(df.head())

compare index and column in data frame with dictionary

I have a dictionary:
d = {'A-A': 1, 'A-B':2, 'A-C':3, 'B-A':5, 'B-B':1, 'B-C':5, 'C-A':3,
'C-B':4, 'C-C': 9}
and a list:
L = [A,B,C]
I have a DataFrame:
df =pd.DataFrame(columns = L, index=L)
I would like to fill each row in df by values in dictionary based on dictionary keys.For example:
A B C
A 1 2 3
B 5 1 5
C 3 4 9
I tried doing that by:
df.loc[L[0]]=[1,2,3]
df.loc[L[1]]=[5,1,5]
df.loc[L[2]] =[3,4,9]
Is there another way to do that especially when there is a huge data?
Thank you for help
Here is another way that I can think of:
import numpy as np
import pandas as pd
# given
d = {'A-A': 1, 'A-B':2, 'A-C':3, 'B-A':5, 'B-B':1, 'B-C':5, 'C-A':3,
'C-B':4, 'C-C': 9}
L = ['A', 'B', 'C']
# copy the key values into a numpy array
z = np.asarray(list(d.values()))
# reshape the array according to your DataFrame
z_new = np.reshape(z, (3, 3))
# copy it into your DataFrame
df = pd.DataFrame(z_new, columns = L, index=L)
This should do the trick, though it's probably not the best way:
for index in L:
prefix = index + "-"
df.loc[index] = [d.get(prefix + column, 0) for column in L]
Calculating the prefix separately beforehand is probably slower for a small list and probably faster for a large list.
Explanation
for index in L:
This iterates through all of the row names.
prefix = index + "-"
All of the keys for each row start with index + "-", e.g. "A-", "B-"… etc..
df.loc[index] =
Set the contents of the entire row.
[ for column in L]
The same as your comma thing ([1, 2, 3]) just for an arbitrary number of items. This is called a "list comprehension".
d.get( , 0)
This is the same as d[ ] but returns 0 if it can't find anything.
prefix + column
Sticks the column on the end, e.g. "A-" gives "A-A", "A-B"…

Apply style to a DataFrame using index/column from a list of tuples in Python/Pandas

I have a list of tuples that represent DataFrame index row number and a column name, in a form:
[(12, 'col3'), (16, 'col7'), ...].
I need to be able to find rows/column values that correspond to those tuple values in another dataframe and mark them red for example. Usually I use
df.style.apply(...)
from here: https://pandas.pydata.org/pandas-docs/stable/style.html and it works but in this case I am not sure how to map those tuple values with a dataframe in a function. Any help is much appreciated.
You can use custom function with at for set values by tups:
tups = [(12, 'col3'), (16, 'col7'), ...]
def highlight(x):
r = 'background-color: red'
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
#rewrite values by selecting by tuples
for i, c in tups:
df1.at[i, c] = r
return df1
df.style.apply(highlight, axis=None)

Common column names among data sets in Python

I have 6 data sets. Their names are: e10_all, e11_all, e12_all, e13_all, e14_all, and e19_all.
All have different numbers of columns and rows, but with some common columns. I need to append the rows of these columns together. First, I want to determine the columns that are common to all of the data sets, so I know which columns to select in my SQL query.
In R, I am able to do this using:
# Create list of dts
list_df = list(e10_all, e11_all, e12_all, e13_all, e14_all, e19_all)
col_common = colnames(list_df[[1]])
# Write for loop
for (i in 2:length(list_df)){
col_common = intersect(col_common, colnames(list_df[[i]]))
}
# View the common columns
col_common
# Get as a comma-separated list
cat(noquote(paste(col_common, collapse = ',')))
I want to do the same thing, but in Python. Does anyone happen to know a way?
Thank you
It's not that different in pandas. Making some dummy dataframes:
>>> import pandas as pd
>>> e10_all = pd.DataFrame({"A": [1,2], "B": [2,3], "C": [2,3]})
>>> e11_all = pd.DataFrame({"B": [4,5], "C": [5,6]})
>>> e12_all = pd.DataFrame({"B": [1,2], "C": [3,4], "M": [8,9]})
Then your code would translate to something like
>>> list_df = [e10_all, e11_all, e12_all]
>>> col_common = set.intersection(*(set(df.columns) for df in list_df))
>>> col_common
{'C', 'B'}
>>> ','.join(sorted(col_common))
'B,C'
That second line turns each of the frames' columns into a set and then takes the intersection of all of them. A more literal translation of your code would work too, although we tend to avoid writing loops where we can avoid it, and we tend to loop over elements directly (for df in list_df[1:]:) rather than going via index. Still,
col_common = set(list_df[0].columns)
for i in range(1, len(list_df)):
col_common = col_common.intersection(list_df[i].columns)
would get the job done.

Resources