Custom xticks in seaborn heatmap - python-3.x

I have the following heatmap (just a minimum working example, my data is huge!)
df = pd.DataFrame({'set1': ['0', '2', '2'],
'set2': ['1', '2', '0'],
'set3': ['0', '2', '1'],
'set4': ['1', '4', '1']
}).T.astype(float)
sns.heatmap(df, yticklabels = df.index, xticklabels = df.columns)
How can I put the xticks only of those where all the rows>=2? In this example it means putting only the '1' xtick only
So in this image '0' and '2' column names should not appear, only '1' because this is the column where all values are greater or equal than 2.
Problem is the the x axis gets too crowded. I want to show only the xticklabels of the columns where all values are >= 2. Still plotting everything but showing only those xticklabels.

Mask the DataFrame
Removes columns where the values are not >= to the specified value
# create a Boolean mask of df
mask = df.ge(2)
# apply the mask to df and dropna
dfm = df[mask].dropna(axis=1)
# plot the masked df
ax = sns.heatmap(dfm)
mask
0 1 2
set1 False True True
set2 False True False
set3 False True False
set4 False True False
Mask the xtick labels
Labels to columns where the values are not >= to the specified value are replaced with ''.
# create a Boolean mask of df
mask = df.ge(2).all()
# use the mask to update a list of labels
cols = [col if m else '' for (col, m) in zip(df.columns, mask)]
# plot with custom labels
ax = sns.heatmap(df, xticklabels=cols)
mask
0 False
1 True
2 False
dtype: bool

Are you looking to show the same heatmap, but only show xticklabels where ALL values are >=2? One way to do this might be to not use df.columns in heatmap, but mask and show only the ones you want. See if this is what you are looking for...
df = pd.DataFrame({'set1': ['0', '2', '2'],
'set2': ['1', '2', '0'],
'set3': ['0', '2', '1'],
'set4': ['1', '4', '1']
}).T.astype(float)
cols = [] ## Declare a new list to be used for xticklabels
for col in df.columns:
if col in set(df.columns).intersection(df[df >= 2].T.dropna().index):
cols.append(col) ##If all values are >=2, then add to label list
else:
cols.append('') ## Else, add blank
sns.heatmap(df, yticklabels = df.index, xticklabels = cols) ##Plot using new list

Related

Export Python dict of nested lists of varying lengths to csv. If nested list has > 1 entry, expand to column before moving to next key

I have the following dictionary of lists
d = {1: ['1','B1',['C1','C2','C3']], 2: ['2','B2','C15','D12'], 3: ['3','B3'], 4: ['4', 'B4', 'C4', ['D1', 'D2']]}
writing that to a csv using
with open('test.csv', "w", newline = '') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(d.values())
gives me a csv that looks like
A B C D
1 B1 ['C1','C2',C3']
2 B2 C15 D12
3 B3
4 B4 C4 ['D1','D2']
If there is a multiple item list in the value (nested list?), I would like that list to be expanded down the column like this
A B C D
1 B1 C1
1 C2
1 C3
2 B2 C15 D12
3 B3
4 B4 C4 D1
4 D2
I'm fairly new to python and can't seem to figure out a way to do what I need after a few days sifting through forums and banging my head on the wall. I think I may need to break apart the nested lists, but I need to keep them tied to their respective "A" value. Columns A and B will always have 1 entry, columns C and D can have 1 to X number of entries.
Any help is much appreciated
Seems like it might be easier to make a list of lists, with appropriately-located empty spaces, than what you're doing. Here's something that might do:
import csv
from itertools import zip_longest
def condense(dct):
# get the maximum number of columns of any list
num_cols = len(max(dct.values(), key=len)) - 1
# Ignore the key, it's not really relevant.
for _, v in dct.items():
# first, memorize the index of this list,
# since we need to repeat it no matter what
idx = v[0]
# next, use zip_longest to make a correspondence.
# We will deliberately make a 2d list,
# and we will later withdraw elements from it one by one.
matrix = [([] if elem is None else
[elem] if not isinstance(elem, list) else
elem[:] # soft copy to avoid altering original dict
) for elem, _ in zip_longest(v[1:], range(num_cols), fillvalue=None)
]
# Now, we output the top row of the matrix as long as it has contents
while any(matrix):
# If a column in the matrix is empty, we put an empty string.
# Otherwise, we remove the row as we pass through it,
# progressively emptying the matrix top-to-bottom
# as we output a row, we also remove that row from the matrix.
# *-notation is more convenient than concatenating these two lists.
yield [idx, *((col.pop(0) if col else '') for col in matrix)]
# e.g. for key 0 and a matrix that looks like this:
# [['a1', 'a2'],
# ['b1'],
# ['c1', 'c2', 'c3']]
# this would yield the following three lists before moving on:
# ['0', 'a1', 'b1', 'c1']
# ['0', 'a2', '', 'c2']
# ['0', '', '', 'c3']
# where '' should parse into an empty column in the resulting CSV.
The biggest thing to note here is that I use isinstance(elem, list) as a shorthand to check whether the thing is a list (which you need to be able to do, one way or another, to flatten or rounden lists as we do here). If you have more complicated or more varied data structures, you'll need to improvise with this check - maybe write a helper function isiterable() that tries to iterate through and returns a boolean based on whether doing so produced an error.
That done, we can call condense() on d and have the csv module deal with the output.
headers = ['A', 'B', 'C', 'D']
d = {1: ['1','B1',['C1','C2','C3']], 2: ['2','B2','C15','D12'], 3: ['3','B3'], 4: ['4', 'B4', 'C4', ['D1', 'D2']]}
# condense(d) produces
# [['1', 'B1', 'C1', '' ],
# ['1', '', 'C2', '' ],
# ['1', '', 'C3', '' ],
# ['2', 'B2', 'C15', 'D12'],
# ['3', 'B3', '', '' ],
# ['4', 'B4', 'C4', 'D1' ],
# ['4', '', '', 'D2' ]]
with open('test.csv', "w", newline = '') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(condense(d))
Which produces the following file:
A,B,C,D
1,B1,C1,
1,,C2,
1,,C3,
2,B2,C15,D12
3,B3,,
4,B4,C4,D1
4,,,D2
This is equivalent to your expected output. Hopefully the solution is sufficiently extensible for you to apply it to your non-MVCE problem.

Categorizing a data based on string in each row

I have the following dataframe:
raw_data = {'name': ['Willard', 'Nan', 'Omar', 'Spencer'],
'Last_Name': ['Smith', 'Nan', 'Sheng', 'Poursafar'],
'favorite_color': ['blue', 'red', 'Nan', "green"],
'Statues': ['Match', 'Mis-Match', 'Match', 'Mis_match']}
df = pd.DataFrame(raw_data, columns = ['name', 'age', 'favorite_color', 'grade'])
df
I wanna do the following tasks:
Separate the rows that contain Match and Mis-match
Make a category that only contains people whose first name and last name are Nan and love a color(any color except for nan).
Can you guys help me?
Use boolean indexing:
df1 = df[df['Statues'] == 'Match']
df2 = df[df['Statues'] =='Mis-Match']
If missing values are not strings use Series.isna and
Series.notna:
df3 = df[df['Name'].isna() & df['Last_NameName'].isna() & df['favorite_color'].notna()]
If Nans are strings compare by Nan:
df3 = df[(df['Name'] == 'Nan') &
(df['Last_NameName'] == 'Nan') &
(df['favorite_color'] != 'Nan')]

Trouble with a python loop

I'm having issues with a loop that I want to:
a. see if a value in a DF row is greater than a value from a list
b. if it is, concatenate the variable name and the value from the list as a string
c. if it's not, pass until the loop conditions are met.
This is what I've tried.
import pandas as pd
import numpy as np
df = {'level': ['21', '22', '23', '24', '25', '26', '27', '28', '29', '30']
, 'variable':'age'}
df = pd.DataFrame.from_dict(df)
knots = [0, 25]
df.assign(key = np.nan)
for knot in knots:
if df['key'].items == np.nan:
if df['level'].astype('int') > knot:
df['key'] = df['variable']+"_"+knot.astype('str')
else:
pass
else:
pass
However, this only yields the key column to have NaN values. I'm not sure why it's not placing the concatenation.
You can do something like this inside the for loop. No need of any if conditions:
df.loc[df['level'].astype('int') > 25, 'key'] = df.loc[df['level'].astype('int') > 25, 'variable'] + '_' + df.loc[df['level'].astype('int') > 25, 'level']

Compare pandas dataframes and check overlaps?

I am trying myself out at spame filters. I tried several methods to label text files as spam. As a result, I have three dataframes. They basically look like this:
df_method_1 = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['1', '0', '0']})
df_method_2 = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['1', '1', '0']})
df_method_3 = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['1', '1', '0']})
I am now trying to creat a dataframe showing, if a file was labled as spam and if so by which method.
In the best case, I can create a dataframe containing the following infortmation:
df_summary = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['All methods', 'Method 2 & Method 3', 'No method']})
Obviously, I am looking for the information. No need for the actual strings.
I tried pandas.DataFrame.isin() to make it happen. But I failed. Any ideas how to do this?
How about merge()?
df1.merge(df2, on="file").merge(df3, on="file")
file spam_x spam_y spam
0 A 1 1 1
1 B 0 1 1
2 C 0 0 0

Matrix input from a text file(python 3)

Im trying to find a way to be able to input a matrix from a text file;
for example, a text file would contain
1 2 3
4 5 6
7 8 9
And it would make a matrix with those numbers and put it in matrix = [[1,2,3],[4,5,6],[7,8,9]]
And then this has to be compatible with the way I print the matrix:
print('\n'.join([' '.join(map(str, row)) for row in matrix]))
So far,I tried this
chemin = input('entrez le chemin du fichier')
path = input('enter file location')
f = open ( path , 'r')
matrix = [ map(int,line.split(','))) for line in f if line.strip() != "" ]
All it does is return me a map object and return an error when I try to print the matrix.
What am I doing wrong? Matrix should contain the matrix read from the text file and not map object,and I dont want to use external library such as numpy
Thanks
You can use list comprehension as such:
myfile.txt:
1 2 3
4 5 6
7 8 9
>>> matrix = open('myfile.txt').read()
>>> matrix = [item.split() for item in matrix.split('\n')[:-1]]
>>> matrix
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
>>>
You can also create a function for this:
>>> def matrix(file):
... contents = open(file).read()
... return [item.split() for item in contents.split('\n')[:-1]]
...
>>> matrix('myfile.txt')
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
>>>
is working with both python2(e.g. Python 2.7.10) and python3(e.g. Python 3.6.4)
rows=3
cols=3
with open('in.txt') as f:
data = []
for i in range(0, rows):
data.append(list(map(int, f.readline().split()[:cols])))
print (data)

Resources