Plotting a timeseris graph from pandas dataframe using matplotlib - python-3.x

I have the following data in a csv file
SourceID BSs hour Type
7208 87 11 MAIN
11060 67 11 MAIN
3737 88 11 MAIN
9683 69 11 MAIN
I have the following python code.I want to plot a graph with the following specifications.
For each SourceID and Type I want to plot a graph of BSs over time. I would prefer if each SourceID and Type is a subplot on single plot.I have tried a lot of options using groupby, but can't seem to get it work.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
COLLECTION = 'NEW'
DATA = r'C:\Analysis\Test\{}'.format(COLLECTION)
INPUT_FILE = DATA + r'\in.csv'
OUTPUT_FILE = DATA + r'\out.csv'
with open(INPUT_FILE) as fin:
df = pd.read_csv(INPUT_FILE,
usecols=["SourceID", 'hour','BSs','Type'],
header=0)
df.drop_duplicates(inplace=True)
df.reset_index(inplace=True)

It's still not 100% clear to me what sort of plot you actually want, but my guess is that you're looking for something like this:
from matplotlib import pyplot as plt
# group by SourceID and Type, find out how many unique combinations there are
grps = df.groupby(['SourceID', 'Type'])
ngrps = len(grps)
# make a grid of axes
ncols = int(np.sqrt(ngrps))
nrows = -(-ngrps // ncols)
fig, ax = plt.subplots(nrows, ncols, sharex=True, sharey=True)
# iterate over the groups, plot into each axis
for ii, (idx, rows) in enumerate(grps):
rows.plot(x='hour', y='BSs', style='-s', ax=ax.flat[ii], legend=False,
scalex=False, scaley=False)
# hide any unused axes
for aa in ax.flat[ngrps:]:
aa.set_axis_off()
# set the axis limits
ax.flat[0].set_xlim(df['hour'].min() - 1, df['hour'].max() + 1)
ax.flat[0].set_ylim(df['BSs'].min() - 5, df['BSs'].max() + 5)
fig.tight_layout()

Related

Need to force overlapping for seaborn's heatmap and kdeplot

I'm trying to combine seaborn's heatmap and kdeplot in one figure, but so far the result is not very promising since I cannot find a way to make them overlap. As a result, the heatmap is just squeezed to the left side of the figure.
I think the reason is that seaborn doesn't seem to recognize the x-axis as the same one in two charts (see picture below), although the data points are exactly the same. The only difference is that for heatmap I needed to pivot them, while for the kdeplot pivoting is not needed.
Therefore, data for the axis are coming from the same dataset, but in the different forms as it can be seen in the code below.
The dataset sample looks something like this:
X Y Z
7,75 280 52,73
3,25 340 54,19
5,75 340 53,61
2,5 180 54,67
3 340 53,66
1,75 340 54,81
4,5 380 55,18
4 240 56,49
4,75 380 55,17
4,25 180 55,40
2 420 56,42
2,25 380 54,90
My code:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize=(11, 9), dpi=300)
plt.tick_params(bottom='on')
# dataset is just a pandas frame with data
X1 = dataset.iloc[:, :3].pivot("X", "Y", "Z")
X2 = dataset.iloc[:, :2]
ax = sns.heatmap(X1, cmap="Spectral")
ax.invert_yaxis()
ax2 = plt.twinx()
sns.kdeplot(X2.iloc[:, 1], X2.iloc[:, 0], ax=ax2, zorder=2)
ax.axis('tight')
plt.show()
Please help me with placing kdeplot on top of the heatmap. Ideally, I would like my final plot to look something like this:
Any tips or hints will be greatly appreciated!
The question can be a bit hard to understand, because the dataset can't be "just some data". The X and Y values need to lie on a very regular grid. No X,Y combination can be repeated, but not all values appear. The kdeplot will then show where the used values of X,Y are concentrated.
Such a dataset can be simulated by first generating dummy data for a full grid, and then take a subset.
Now, a seaborn heatmap uses categorical X and Y axes. Such axes are very hard to align with the kdeplot. To obtain a similar heatmap with numerical axes, ax.pcolor() can be used.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
xs = np.arange(2, 10, 0.25)
ys = np.arange(150, 400, 10)
# first create a dummy dataset over a full grid
dataset = pd.DataFrame({'X': np.repeat(xs, len(ys)),
'Y': np.tile(ys, len(xs)),
'Z': np.random.uniform(50, 60, len(xs) * len(ys))})
# take a random subset of the rows
dataset = dataset.sample(200)
fig, ax = plt.subplots(figsize=(11, 9), dpi=300)
X1 = dataset.pivot("X", "Y", "Z")
collection = ax.pcolor(X1.columns, X1.index, X1, shading='nearest', cmap="Spectral")
plt.colorbar(collection, ax=ax, pad=0.02)
# default, cut=3, which causes a lot of surrounding whitespace
sns.kdeplot(x=dataset["Y"], y=dataset["X"], cut=1.5, ax=ax)
fig.tight_layout()
plt.show()

How to color the line graph according to conditions in a plot?

I try to find the solution for plot of the data
I have a graph of trajectory according to time(x) and kilometers(y) and i need to mark with different colours where the availability parameter from dataframe is 0 or 100
I try this but i have completly different result that i expected
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
# Read file, using ; as delimiter
filename = "H:\\run_linux\\river_km_calculations\\route2_8_07_23_07\\true_route2_8_07_23_07_test.csv"
df = pd.read_csv(filename, delimiter=';', parse_dates=['datetime']) #dtype={'lon_deg':'float', 'lat_deg':'float'})
df = df[189940:]
df.set_index('datetime', inplace=False)
plt.plot( df['datetime'], df['river_km'])
plt.show()
connection = 100
noconection = 0
def conditions(s):
if (s['age_gps_data'] <= 1.5) or (s['age_gps_data'] >=0.5 ):
return 100
else:
return 0
df['availability'] = df.apply(conditions, axis=1)
internet = np.ma.masked_where(df.availability == connection, df.availability)
nointernet = np.ma.masked_where((df.availability == noconection) , df.availability)
fig, ax = plt.subplots()
ax.plot(df.river_km, internet, df.river_km, nointernet)
plt.show()
How I can mark on a plot with different colours where availability is 0 and where is 100 and where is no value of these parameter?
What I want to achieve should looks like this:

Why is Python matplot not starting from the point where my Data starts [duplicate]

So currently learning how to import data and work with it in matplotlib and I am having trouble even tho I have the exact code from the book.
This is what the plot looks like, but my question is how can I get it where there is no white space between the start and the end of the x-axis.
Here is the code:
import csv
from matplotlib import pyplot as plt
from datetime import datetime
# Get dates and high temperatures from file.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
#for index, column_header in enumerate(header_row):
#print(index, column_header)
dates, highs = [], []
for row in reader:
current_date = datetime.strptime(row[0], "%Y-%m-%d")
dates.append(current_date)
high = int(row[1])
highs.append(high)
# Plot data.
fig = plt.figure(dpi=128, figsize=(10,6))
plt.plot(dates, highs, c='red')
# Format plot.
plt.title("Daily high temperatures, July 2014", fontsize=24)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)
plt.show()
There is an automatic margin set at the edges, which ensures the data to be nicely fitting within the axis spines. In this case such a margin is probably desired on the y axis. By default it is set to 0.05 in units of axis span.
To set the margin to 0 on the x axis, use
plt.margins(x=0)
or
ax.margins(x=0)
depending on the context. Also see the documentation.
In case you want to get rid of the margin in the whole script, you can use
plt.rcParams['axes.xmargin'] = 0
at the beginning of your script (same for y of course). If you want to get rid of the margin entirely and forever, you might want to change the according line in the matplotlib rc file:
axes.xmargin : 0
axes.ymargin : 0
Example
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
tips.plot(ax=ax1, title='Default Margin')
tips.plot(ax=ax2, title='Margins: x=0')
ax2.margins(x=0)
Alternatively, use plt.xlim(..) or ax.set_xlim(..) to manually set the limits of the axes such that there is no white space left.
If you only want to remove the margin on one side but not the other, e.g. remove the margin from the right but not from the left, you can use set_xlim() on a matplotlib axes object.
import seaborn as sns
import matplotlib.pyplot as plt
import math
max_x_value = 100
x_values = [i for i in range (1, max_x_value + 1)]
y_values = [math.log(i) for i in x_values]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
sn.lineplot(ax=ax1, x=x_values, y=y_values)
sn.lineplot(ax=ax2, x=x_values, y=y_values)
ax2.set_xlim(-5, max_x_value) # tune the -5 to your needs

Creating a single barplot with different colors with pandas and matplotlib

I have two dfs, for which I want to create a single bar plot,
each bar needs its own color depending on which df it came from.
# Ages < 20
df1.tags = ['locari', 'ママコーデ', 'ponte_fashion', 'kurashiru', 'fashion']
df1.tag_count = [2162, 1647, 1443, 1173, 1032]
# Ages 20 - 24
df2.tags= ['instagood', 'ootd', 'fashion', 'followme', 'love']
df2.tag_count = [6523, 4576, 3986, 3847, 3599]
How do I create such a plot?
P.S. The original df is way bigger. Some words may overlap, but I want them to have different colors as well
Your data frame tag_counts are just simple lists, so you can use standard mpl bar plots to plot both of them in the same axis. This answer assumes that both dataframes have the same length.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Create dataframes
df1=pd.DataFrame()
df2=pd.DataFrame()
# Ages < 20
df1.tags = ['locari', 'blub', 'ponte_fashion', 'kurashiru', 'fashion']
df1.tag_count = [2162, 1647, 1443, 1173, 1032]
# Ages 20 - 24
df2.tags= ['instagood', 'ootd', 'fashion', 'followme', 'love']
df2.tag_count = [6523, 4576, 3986, 3847, 3599]
# Create figure
fig=plt.figure()
ax=fig.add_subplot(111)
# x-coordinates
ind1 = np.arange(len(df1.tag_count))
ind2 = np.arange(len(df2.tag_count))
width = 0.35
# Bar plot for df1
ax.bar(ind1,df1.tag_count,width,color='r')
# Bar plot for df1
ax.bar(ind2+width,df2.tag_count,width,color='b')
# Create new xticks
ticks=list(ind1+0.5*width)+list(ind2+1.5*width)
ticks.sort()
ax.set_xticks(ticks)
# Sort labels in an alternating way
labels = [None]*(len(df1.tags)+len(df2.tags))
labels[::2] = df1.tags
labels[1::2] = df2.tags
ax.set_xticklabels(labels)
plt.show()
This will return a plot like this
Note that to merge both tags into a single list I assumed that both lists have the same length.

How to plot the S&P500 and its SMA in two different windows and in one window

I try to plot the S&P500 and its SMA in two different window with the folling codes. But it seems it doesn't work well. If I plot only one of them, it is OK.
import pandas_datareader.data as web
import datetime
import matplotlib.pyplot as pyplot
import talib
import pandas as pd
import numpy as np
start = datetime.datetime(2002, 1, 1)
## S&P 500
sp500 = web.DataReader("SP500", "fred", start)
head = sp500[-100:].dropna()
print(len(head))
## Transform DataFrame to nparray
my_array = head.as_matrix()
## Transform column to row
x = my_array.T[0]
## Get rid off the NaN
y = x[~np.isnan(x)]
print(len(y))
## Compute SMA
my_sma=talib.SMA(y, timeperiod=5)
print(len(my_sma))
## Plot
pyplot.figure(1)
pyplot.subplot(211) ## upper window
head.plot(use_index=False)
pyplot.subplot(212) ## lower window
pd.Series(my_sma).plot(use_index=False)
And here is the plotting.
And besides, I want to plot them in the same window, i.e. oberlay.
Sorry that I have to change a lillte bit my codes so that it is more well-formed and one can better understand what I mean.
start = datetime.datetime(2002, 1, 1)
def computeSMA(data):
head = data[-100:].dropna()
## Transform column to row
x = head.as_matrix().T[0]
## Get rid off the NaN
y = x[~np.isnan(x)]
## Compute SMA
my_sma=talib.SMA(y, timeperiod=5)
return my_sma
## S&P 500
sp500 = web.DataReader("SP500", "fred", start)
sp_sma = computeSMA(sp500)
## Plot
pyplot.figure(1)
sp500[-100:].dropna().plot()
pyplot.figure(2)
pd.Series(sp_sma).plot(use_index=False)
If I run the code, I got the error as follow:
File "C:\Users\Administrator\Anaconda3\lib\site-packages\matplotlib\dates.py", line 401, in num2date
return _from_ordinalf(x, tz)
File "C:\Users\Administrator\Anaconda3\lib\site-packages\matplotlib\dates.py", line 254, in _from_ordinalf
dt = datetime.datetime.fromordinal(ix).replace(tzinfo=UTC)
ValueError: ordinal must be >= 1
If I comment the plotting of figure(2), I will get plotting shown2:
If I comment the plotting of figure(1), I will get the plotting shown 3:
Besides, I want to plot the SP500 and its SMA on the same figure and with the Date in X-axis.
To plot two series on the same plot (note that this works for any number of time series):
import pandas as pd
import matplotlib.pyplot as pyplot
# Generate sample series (in your case these are s1=head and s2=pd.Series(my_sma)
s1 = pd.Series([1, 3, 8, 10])
s2 = pd.Series([2, 4, 9, 11])
# Create the plot.
pyplot.figure()
pyplot.plot(s1)
pyplot.plot(s2)
pyplot.show()
Result:

Resources