Set hue using a range of values in Seaborn stripplot - colors

I am trying to set hue based on a range of values rather than unique values in seaborn stripplot. For example, different colors for different value ranges (1940-1950, 1950-1960 etc.).
sns.stripplot('Condition', 'IM', data=dd3, jitter=0.3, hue= dd3['Year Built'])
Output Figure
Thanks

Looks like you need to bin the data. Use .cut() in the below manner. The years are binned into 5 groups. You can arrange your own step in .arrange() to adjust your ranges.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
x = np.random.randint(0,100,size=100)
y = np.random.randint(0,100, size=100)
year = np.random.randint(1918, 2019, size=100)
df = pd.DataFrame({
'x':x,
'y':y,
'year':year
})
df['year_bin'] = pd.cut(df['year'], np.arange(min(year), max(year), step=20))
sns.lmplot('x','y', data=df, hue='year_bin')
plt.show()
Output:

Related

How to create a line plot in python, by importing data from excel and using it to create a plot that shares a common X-Axis?

Trying to create a plot using Python Spyder. I have sample data in excel which I am able to import into Spyder, I want one column ('Frequency') to be the X axis, and the rest of the columns ('C1,C2,C3,C4') to be plotted on the Y axis. How do I do this? This is the data in excel and how the plot looks in excel (https://i.stack.imgur.com/eRug5.png) , the plot and data
This is what I have so far . These commands below (Also seen in the image) give an empty plot.
data = data.head()
#data.plot(kind='line', x='Frequency', y=['C1','C2','C3','C4'])
df = pd.DataFrame(data, columns=["Frequency","C1", "C2","C3","C4"])
df.plot(x = "Frequency",y=["C1", "C2","C3","C4"])
Here is an example, you can change columns names:
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'X_Axis':[1,3,5,7,10,20],
'col_2':[.4,.5,.4,.5,.5,.4],
'col_3':[.7,.8,.9,.4,.2,.3],
'col_4':[.1,.3,.5,.7,.1,.0],
'col_5':[.5,.3,.6,.9,.2,.4]})
dfm = df.melt('X_Axis', var_name='cols', value_name='vals')
g = sns.catplot(x="X_Axis", y="vals", hue='cols', data=dfm, kind='point')
import pandas as pd
import matplotlib.pyplot as plt
path = r"C:\Users\Alisha.Walia\Desktop\Alisha\SAMPLE.xlsx"
data = pd.read_excel(path)
#df = pd.DataFrame.from_dict(data)
#print(df)
#prints out data from excl in tabular format
dict1 = (data.to_dict()) #print(dict1)
Frequency=data["Frequency "].to_list() #print (Frequency)
C1=data["C1"].to_list() #print(C1)
C2=data["C2"].to_list() #print(C2)
C3=data["C3"].to_list() #print(C3)
C4=data["C4"].to_list() #print(C4)
plt.plot(Frequency,C1)
plt.plot(Frequency,C2)
plt.plot(Frequency,C3)
plt.plot(Frequency,C4)
plt.style.use('ggplot')
plt.title('SAMPLE')
plt.xlabel('Frequency 20Hz-200MHz')
plt.ylabel('Capacitance pF')
plt.xlim(5, 500)
plt.ylim(-20,20)
plt.legend()
plt.show()

matplotlib x-axis messed up [duplicate]

I have a series whose index is datetime that I wish to plot. I want to plot the values of the series on the y axis and the index of the series on the x axis. The Series looks as follows:
2014-01-01 7
2014-02-01 8
2014-03-01 9
2014-04-01 8
...
I generate a graph using plt.plot(series.index, series.values). But the graph looks like:
The problem is that I would like to have only year and month (yyyy-mm or 2016 March). However, the graph contains hours, minutes and seconds. How can I remove them so that I get my desired formatting?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# sample data
N = 30
drange = pd.date_range("2014-01", periods=N, freq="MS")
np.random.seed(365) # for a reproducible example of values
values = {'values':np.random.randint(1,20,size=N)}
df = pd.DataFrame(values, index=drange)
fig, ax = plt.subplots()
ax.plot(df.index, df.values)
ax.set_xticks(df.index)
# use formatters to specify major and minor ticks
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
_ = plt.xticks(rotation=90)
You can try something like this:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
df = pd.DataFrame({'values':np.random.randint(0,1000,36)},index=pd.date_range(start='2014-01-01',end='2016-12-31',freq='M'))
fig,ax1 = plt.subplots()
plt.plot(df.index,df.values)
monthyearFmt = mdates.DateFormatter('%Y %B')
ax1.xaxis.set_major_formatter(monthyearFmt)
_ = plt.xticks(rotation=90)
You should check out this native function of matplotlib:
fig.autofmt_xdate()
See examples on the source website Custom tick formatter

how to make scatter plot of two columns and divide x_axis in 3 column f1,f2,and f3

I have dataframe i want to draw a scattor plot by dividing plot in 2 regions in region one only plot f_x_f1 vs A_x_f1, and in region2 plot f_x_f2 vs A_x_f2
please if someone can provide better solution for this problem
here is example of my dataframe
df=pd.DataFrame({'f_x_f1':[0.3,0.28,0.34],'A_x_f1':[0.003,0.28,0.034],'f1':[0.4,0.4,0.4],'f_x_f2':[0.91,0.88,0.96],'A_x_f2':[0.003,0.28,0.034],'f2':[1.3,1.3,1.3]})
Here, using matplotlib!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
### making some sample data
df = pd.DataFrame({"f_x_f1": np.random.randint(1,100,100)
, "A_x_f1": np.random.randint(1,100,100)
, "f_x_f2": np.random.randint(1,100,100)
, "A_x_f2": np.random.randint(1,100,100) })
fig, ax = plt.subplots(nrows=1, ncols=2)
ax[0].scatter(df.f_x_f1,df.A_x_f1)
ax[0].set_title("f_x_f1 vs A_x_f1")
ax[1].scatter(df.f_x_f2,df.A_x_f2)
ax[1].set_title("f_x_f2 vs A_x_f2")
OUTPUT:

Tick labels only displayed in one subplot

Need to display custom shared x-axis tick labels on both subplots, using two datasets with different dates.
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
#Dataset 1
rng1 = date_range(start='01-01-2015', periods=5, freq='1M')
df1 = DataFrame({'y':np.random.normal(size=len(rng1))}, index=rng1)
y1 = df1['y']
#Dataset 2
rng2 = date_range(start='01-01-2015', periods=5, freq='2M')
df2 = DataFrame({'y':np.random.normal(size=len(rng2))}, index=rng2)
y2 = df2['y']
#Figure
fig,(ax1,ax2) = plt.subplots(2,1,sharex=True)
y1.plot(ax=ax1)
y2.plot(ax=ax2)
plt.xticks(rotation=30)
ax1.xaxis.set_minor_formatter(plt.NullFormatter())
ax2.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%D'))
plt.show()
(Above) Creates figure without x-axis labels on the upper subplot
I expect adding the below code to display the same x-axis labels to the upper subplot, but they are not showing up. What am I doing wrong?
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%D'))
Setting sharex=False does not work because the dates are different for each dataset.
First of all, if you want to use matplotlib.dates locators and formatters on a plot created via pandas you should use the x_compat=True argument in your plot, otherwise pandas may scale the axis rather arbitrarily.
Then '%D' is not a valid format string. Maybe you mean '%b'?
Now there are two options.
Use sharex=False, set your locators and formatters to both axes, and finally set the limits of the one plot to the limits of the other. In this case since the lower plot comprises a larger range,
ax1.set_xlim(ax2.get_xlim())
The other option is to use sharex=True and turn the labels visible again.
plt.setp(ax1.get_xticklabels(), visible=True)
Unfortunately this option is broken on the newest matplotlib version. I just opened a bug report about it.
Full code for the first option (since the second one is not working):
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
#Dataset 1
rng1 = date_range(start='2015-01-01', periods=5, freq='1M')
df1 = DataFrame({'y':np.random.normal(size=len(rng1))}, index=rng1)
y1 = df1['y']
#Dataset 2
rng2 = date_range(start='2015-01-01', periods=5, freq='2M')
df2 = DataFrame({'y':np.random.normal(size=len(rng2))}, index=rng2)
y2 = df2['y']
#Figure
fig,(ax1,ax2) = plt.subplots(2,1,sharex=False)
y1.plot(ax=ax1, x_compat=True)
y2.plot(ax=ax2, x_compat=True)
plt.xticks(rotation=30)
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax1.xaxis.set_minor_locator(plt.NullLocator())
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax2.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax2.xaxis.set_minor_locator(plt.NullLocator())
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax1.set_xlim(ax2.get_xlim())
plt.show()
This seemed to work for me, just following the shared_axis_demo
fig = plt.figure()
ax1 = plt.subplot(211)
_ = plt.plot(y1)
plt.setp(ax1.get_xticklabels(), visible=True)
ax2 = plt.subplot(212, sharex=ax1)
_ = plt.plot(y2)
plt.show()

x axis labels (date) slips in Python matplotlib

I'm beginner in Python and I have the following problems. I would like to plot a dataset, where the x-axis shows date data. The Dataset look likes the follows:
datum, start, end
2017.09.01 38086 37719,8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
In the first column are the labels of the x-axis (this is the date).
When I write the following code the x axis data slips:
import pandas as pd
import matplotlib.pyplot as plt
bux = pd.read_csv('C:\\Home\\BUX.txt',
sep='\t',
decimal='.',
header=0)
fig1 = bux.plot(marker='o')
fig1.set_xticklabels(bux.datum, rotation='vertical', fontsize=8)
The resulted figure look likes as follows:
The second data row in the dataset is '2017.09.04 37707.3906 37465.2617', BUT '2017.09.04' is yield at the third data row with start value=37471.5117
What shell I do to get correct x axis labels?
Thank you!
Agnes
First, there is a comma in the second line instead of a .. This should be adjusted. Then, you convert the "datum," column to actual dates and simply plot the dataframe with matplotlib.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/BUX.txt', sep='\s+')
df["datum,"] = pd.to_datetime(df["datum,"], format="%Y.%m.%d")
plt.plot(df["datum,"], df["start,"], marker="o")
plt.plot(df["datum,"], df["end"], marker="o")
plt.gcf().autofmt_xdate()
plt.show()
Thank you! It works perfectly. The key moment was to convert the data to date format. Thank you again!
Agnes
Actually you can easily use the df.plot() to fix it:
import pandas as pd
import matplotlib.pyplot as plt
import io
t="""
date start end
2017.09.01 38086 37719.8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
"""
import numpy as np
data=pd.read_fwf(io.StringIO(t),header=1,parse_dates=['date'])
data.plot(x='date',marker='o')
plt.show()

Resources