Related
I have a dataframe with two categories: female and male, and 2 years. What I want is to have in the same plot both categories but specific colors by plot (it could be barplot or boxplot).
My code is:
test =pd.DataFrame([[2021, 'female', 3], [2021, 'male', 1], [2021, 'female', 6],
[2021, 'female', 3], [2021, 'male', 4], [2021, 'female', 10],
[2020, 'female', 2], [2020, 'male', 9], [2020, 'male', 7],
[2020, 'female', 1], [2020, 'male', 5], [2020, 'male', 8]
], columns=['Year', 'category', 'value'])
plt.figure(figsize=(20,8))
g = sns.boxplot(y='value', x='category', data=test, hue='Year')
g.set_xticklabels(g.get_xmajorticklabels(), fontsize = 12)
And mi current output is:
What I expect to have is something like this:
Change the libraries is not a problem, if I can get a automatic way to have this plot.
I don't know of a library that does this kind of coloring out-of-the-box. With a seaborn boxplot, you could iterate through the generated boxes and assign individual colors and transparencies. A custom legend could show the transparency via shades of grey:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
test = pd.DataFrame([[2021, 'female', 3], [2021, 'male', 1], [2021, 'female', 6],
[2021, 'female', 3], [2021, 'male', 4], [2021, 'female', 10],
[2020, 'female', 2], [2020, 'male', 9], [2020, 'male', 7],
[2020, 'female', 1], [2020, 'male', 5], [2020, 'male', 8]
], columns=['Year', 'category', 'value'])
plt.figure(figsize=(20, 8))
sns.set_style('white')
ax = sns.boxplot(y='value', x='category', data=test, hue='Year')
ax.tick_params(axis='x', labelsize=14)
hue_values = np.sort(test['Year'].unique())
num_hue_values = len(hue_values)
colors = ['crimson', 'dodgerblue']
min_alpha = 0.3
max_alpha = 0.6
alphas = np.linspace(min_alpha, max_alpha, num_hue_values)
for i, box in enumerate(ax.artists):
box.set_facecolor(colors[i // num_hue_values])
box.set_alpha(alphas[i % num_hue_values])
handles = [plt.Rectangle((0, 0), 0, 0, facecolor='black', edgecolor='black', alpha=alpha, label=label)
for alpha, label in zip(alphas, hue_values)]
ax.legend(handles=handles)
plt.show()
Here is another example with more hue categories:
tips = sns.load_dataset('tips')
plt.figure(figsize=(20, 8))
sns.set_style('white')
ax = sns.boxplot(y='total_bill', x='sex', data=tips, hue='day')
ax.tick_params(axis='x', labelsize=14)
hue_values = tips['day'].cat.categories
num_hue_values = len(hue_values)
colors = ['lime', 'purple']
min_alpha = 0.3
max_alpha = 0.6
alphas = np.linspace(min_alpha, max_alpha, num_hue_values)
for i, box in enumerate(ax.artists):
box.set_facecolor(colors[i // num_hue_values])
box.set_alpha(alphas[i % num_hue_values])
handles = [plt.Rectangle((0, 0), 0, 0, facecolor='black', edgecolor='black', alpha=alpha, label=label)
for alpha, label in zip(alphas, hue_values)]
ax.legend(handles=handles)
sns.despine()
plt.show()
I have a txt file including 9 columns and 6 rows. The first 8 columns are either of these values: "1" , "2" and "3". I named these columns from "A" to "H". I named the last column: "class".
The last column is a name : "HIGH". Here is the txt file (data.txt):
1,1,1,1,2,1,1,3,HIGH
1,1,1,2,2,1,1,3,HIGH
1,1,1,1,1,1,1,3,HIGH
1,1,1,2,1,1,1,3,HIGH
1,1,1,3,2,1,1,3,HIGH
1,1,1,2,1,2,1,3,HIGH
I am trying to count the number of each value in each column and print a list that should have 3 components including the numbers of "1", "2" and "3" values in that column respectively. For example in the first column (e.g A) all values are "1". I expect to get : A : [6,0,0]. For the 8th column (e.g. H) where all values are "3", I expect to get: H : [0,0,6] or for the fourth column (e.g. D) I have two "1" , three "2" and one "3". So I expect : D : [2,3,1]. I tried to get it done using pandas and collection . Here is what I did:
import pandas as pd
from collections import Counter
df = pd.read_csv('data.txt')
df.columns = ['A','B','C','D','E','F','G','H','class']
X = df.ix[:, 0:8].values
y = df.ix[:, 8].values
deg = ['HIGH']
names = ['A','B','C','D','E','F','G','H']
for j in range(0, 8):
freqs = Counter(X[y == deg[0], j])
print(names[j],':',list(freqs.values()))
The output of the above code are empty lists. Here is what it returns:
A : []
B : []
C : []
D : []
E : []
F : []
G : []
H : []
How can I modify the above code to get what I want?
Thanks!
Use pandas.Series.value_counts
df.loc[:, :"H"].apply(pd.Series.value_counts).fillna(0).to_dict("l")
Output:
{'A': [6.0, 0.0, 0.0],
'B': [6.0, 0.0, 0.0],
'C': [6.0, 0.0, 0.0],
'D': [2, 3, 1],
'E': [3.0, 3.0, 0.0],
'F': [5.0, 1.0, 0.0],
'G': [6.0, 0.0, 0.0],
'H': [0.0, 0.0, 6.0]}
Define the following function:
def cntInts(col):
vc = col.value_counts()
return [ vc.get(i, 0) for i in range(1,4) ]
Then apply it and print results:
for k, v in df.loc[:, 'A':'H'].apply(cntInts).iteritems():
print(f'{k}: {v}')
For your data sample I got:
A: [6, 0, 0]
B: [6, 0, 0]
C: [6, 0, 0]
D: [2, 3, 1]
E: [3, 3, 0]
F: [5, 1, 0]
G: [6, 0, 0]
H: [0, 0, 6]
Or maybe it is enough to call just:
df.loc[:, 'A':'H'].apply(cntInts)
This time the result is a Series, which when printed yields:
A [6, 0, 0]
B [6, 0, 0]
C [6, 0, 0]
D [2, 3, 1]
E [3, 3, 0]
F [5, 1, 0]
G [6, 0, 0]
H [0, 0, 6]
dtype: object
Edit
Following your comments I suppose that there is something wrong with your data.
To trace the actual reason:
Define a string variable:
txt = '''1,1,1,1,2,1,1,3,HIGH
1,1,1,2,2,1,1,3,HIGH
1,1,1,1,1,1,1,3,HIGH
1,1,1,2,1,1,1,3,HIGH
1,1,1,3,2,1,1,3,HIGH
1,1,1,2,1,2,1,3,HIGH'''
Run:
import io
df = pd.read_csv(io.StringIO(txt), names=['A','B','C','D','E','F','G','H','class'])
Run my code on my data. The result should be just as expected.
Then read your input file (also into df) and run my code again.
Probably there is some difference between your data and mine.
Especially look for any extra spaces in your input file,
check also column types (after read_csv).
Solution with collections is select all columns without last, convert Counter to Series, so output is DataFrame, replace missing values by DataFrame.fillna, convert values to integers and last to dictionary by DataFrame.to_dict:
from collections import Counter
d = (df.iloc[:, :-1].apply(lambda x: pd.Series(Counter(x)))
.fillna(0)
.astype(int)
.to_dict("list"))
print (d)
{'A': [6, 0, 0], 'B': [6, 0, 0],
'C': [6, 0, 0], 'D': [1, 4, 1],
'E': [3, 3, 0], 'F': [5, 1, 0],
'G': [6, 0, 0], 'H': [0, 0, 6]}
Only pandas solution with pandas.value_counts:
d = (df.iloc[:, :-1].apply(pd.value_counts)
.fillna(0)
.astype(int)
.to_dict("list"))
print (d)
{'A': [6, 0, 0], 'B': [6, 0, 0],
'C': [6, 0, 0], 'D': [2, 3, 1],
'E': [3, 3, 0], 'F': [5, 1, 0],
'G': [6, 0, 0], 'H': [0, 0, 6]}
Working within python, since your end result is a dictionary:
from string import ascii_uppercase
from collections import Counter, defaultdict
from itertools import chain, product
import csv
d = defaultdict(list)
fieldnames = ascii_uppercase[:9]
# test.csv is your file above
with open('test.csv') as csvfile:
reader = csv.DictReader(csvfile, fieldnames = list(fieldnames))
reader = Counter(chain.from_iterable(row.items() for row in reader))
for col, value in product(fieldnames, ("1","2","3")):
if col != fieldnames[-1]:
d[col].append(reader.get((col,value), 0))
print(d)
defaultdict(list,
{'A': [6, 0, 0],
'B': [6, 0, 0],
'C': [6, 0, 0],
'D': [2, 3, 1],
'E': [3, 3, 0],
'F': [5, 1, 0],
'G': [6, 0, 0],
'H': [0, 0, 6]})
I am trying to order 4 dictionary lists from lowest to highest and I am invalid syntax (I am new to bioinformatics)
I have tried inline sorting
lists = sorted(list_dct.items, key=lambda k: k['name'])
list_dct = [{'name': 0.5, 0, 0, 0.5},
{'name' : 0.25, 0.25, 0.25, 0.25},
{'name' : 0, 0, 0, 1},
{'name' : 0.25, 0, 0.5, 0.25}]
print(lists)
I am getting an invalid syntax message... I should get the lists sorted by row lowest to row highest
You need to construct your dictionaries correctly. I've chosen to make the values a list. Then sort them with a list comprehension:
list_dct = [{'name': [0.5, 0, 0, 0.5]},
{'name' : [0.25, 0.25, 0.25, 0.25]},
{'name' : [0, 0, 0, 1]},
{'name' : [0.25, 0, 0.5, 0.25]}]
sorted([ d.get('name') for d in list_dct ])
1.) Define list_dct before the sorted() function, otherwise it's syntax error
2.) You want to sort whole list_dct, not list_dct.items()
3.) Make custom key= sorting function, where from each item we're sorting we select 'name' key.
list_dct = [{'name': [0.5, 0, 0, 0.5]},
{'name' : [0.25, 0.25, 0.25, 0.25]},
{'name' : [0, 0, 0, 1]},
{'name' : [0.25, 0, 0.5, 0.25]}]
lists = sorted(list_dct, key=lambda k: k['name'])
from pprint import pprint
pprint(lists)
Prints:
[{'name': [0, 0, 0, 1]},
{'name': [0.25, 0, 0.5, 0.25]},
{'name': [0.25, 0.25, 0.25, 0.25]},
{'name': [0.5, 0, 0, 0.5]}]
I have a dataframe with 12 different features. And I would like to plot histograms for each in one go on a panel 4x3.
test = pd.DataFrame({
'a': [10, 5, -2],
'b': [2, 3, 1],
'c': [10, 5, -2],
'd': [-10, -5, 2],
'aa': [10, 5, -2],
'bb': [2, 3, 1],
'cc': [10, 5, -2],
'dd': [-10, -5, 2],
'aaa': [10, 5, -2],
'bbb': [2, 3, 1],
'ccc': [10, 5, -2],
'ddd': [-10, -5, 2]
})
I can do it by writing something like the code below:
# plot
f, axes = plt.subplots(3, 4, figsize=(20, 10), sharex=True)
sns.distplot( test["a"] , color="skyblue", ax=axes[0, 0])
sns.distplot( test["b"] , color="olive", ax=axes[0, 1])
sns.distplot( test["c"] , color="teal", ax=axes[0, 2])
sns.distplot( test["d"] , color="grey", ax=axes[0, 3])
...
How can I loop and iterate through features in an elegant way instead? I'd like to assign the same four colors for each row.
you can include everything in a for loop:
colors =["skyblue", "olive", "teal", "grey"]
f, axes = plt.subplots(3, 4, figsize=(20, 10), sharex=True)
for i, ax in enumerate(axes.flatten()):
sns.distplot( test.iloc[:, i] , color=colors[i%4], ax=ax)
Seaborn provides a FacetGrid for such purposes.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
test = pd.DataFrame({
'a': [10, 5, -2],
'b': [2, 3, 1],
'c': [10, 5, -2],
'd': [-10, -5, 2],
'aa': [10, 5, -2],
'bb': [2, 3, 1],
'cc': [10, 5, -2],
'dd': [-10, -5, 2],
'aaa': [10, 5, -2],
'bbb': [2, 3, 1],
'ccc': [10, 5, -2],
'ddd': [-10, -5, 2]
})
data = pd.melt(test)
data["hue"] = data["variable"].apply(lambda x: x[:1])
g = sns.FacetGrid(data, col="variable", col_wrap=4, hue="hue")
g.map(sns.distplot, "value")
plt.show()
I'm trying to show only 5 points along the x axis, with panning enabled - I want the user to pan through the other points beyond 5.
I have disabled the zoom, but my problem seems to be that the more data I add, the points along the x axis starts to try and increase.
So if I load 20 points worth of data, instead of showing the 5 points which the user then should simply pan to the other 15, the x axis "zooms out" to show as many points as possible.
also, How do I set the start position? In that if there are 20 points, and only points are showing, how do I set the view to start at point 10 to 15, so the user pans back to the first 10 points, and pans forward for the remaining 5?
Use these options for your x axis:
xaxis: {
panRange: [0, 20],
min: 5,
max: 10
}
panRange defines the borders of the range to pan and min & max define the start range.
Edit: You can specify an array with ticknames:
var ticks = [
[1, 'one'],
[2, 'two'],
and use it like this:
xaxis: {
...
ticks: ticks,
See the code snippet below for a full example:
$(function() {
var data = [
[1, 2],
[2, 3],
[3, 1],
[4, 4],
[5, 2],
[6, 3],
[7, 3],
[8, 2],
[9, 1],
[10, 1],
[11, 3],
[12, 4],
[13, 2],
[14, 2],
[15, 4],
[16, 3],
[17, 3],
[18, 1],
[19, 4]
];
var ticks = [
[1, 'one'],
[2, 'two'],
[3, 'three'],
[4, 'four'],
[5, 'five'],
[6, 'six'],
[7, 'seven'],
[8, 'eight'],
[9, 'nine'],
[10, 'ten'],
[11, 'eleven'],
[12, 'twelve'],
[13, 'thirteen'],
[14, 'fourteen'],
[15, 'fifteen'],
[16, 'sixteen'],
[17, 'seventeen'],
[18, 'eighteen'],
[19, 'nineteen']
];
var options = {
series: {
points: {
show: true
},
lines: {
show: true
}
},
xaxis: {
panRange: [0, 20],
min: 5,
max: 10,
ticks: ticks,
tickDecimals: 0
},
yaxis: {
panRange: [0, 5],
min: 0,
max: 5,
tickDecimals: 0
},
zoom: {
interactive: false
},
pan: {
interactive: true
}
};
var plot = $.plot('#placeholder', [data], options);
});
#placeholder {
width: 400px;
height: 300px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script src="http://www.flotcharts.org/flot/jquery.flot.js"></script>
<script src="http://www.flotcharts.org/flot/jquery.flot.navigate.js"></script>
<div id="placeholder"></div>