Reduce xtick frequency when stepping with a Series - python-3.x

I am trying to plot a large dataset using a CSV file I had gathered previously. When I plot the data using plt.step every xtick is marked making it unreadable. How do I reduce the number of displayed xticks but keep the same graph?
I have tried using plt.xticks with np.arrange but I keep getting errors as I am using a Dataframe rather than an array. Additionally the x values I'm working with is time in the following format (%H:%M:%S).
import pandas as pd
import numpy as np
import datetime
import matplotlib
matplotlib.use("TkAgg")
from matplotlib import pyplot as plt
df = pd.read_csv(thisFile)
plt.figure(figsize=(16,6))
plt.step(df.Time, df.HR)
print(df.Time)
Output
0 13:40:34
1 13:40:44
2 14:18:29
3 14:19:15
4 14:20:58
5 14:21:17
plt.ylabel('Heart Rate (BPM)')
plt.xlabel('Time')
plt.show()
Displaying the plot is mostly correct except for the xtick frequency. It appears too often and collides with each other as shown in my graph output.
Ideally I would like for the xticks to display from 12AM to 12PM with a xtick frequency of an hour.

Related

I am running a code on Regression on student grades and hours although i got the table right as output but the scattered plot is not showing

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# Load data
df = pd.read_csv('Grade_Set_1.csv')
print (df)
# Simple scatter plot
df.plot(kind='scatter', x='Hours_Studied', y='Test_Grade', title='Grade vs Hours Studied')
# check the correlation between variables
print(df.corr())
I got this key error on :
KeyError: 'Hours_Studied'

Nyquist Plot using Python with certain parameters

I am trying to draw the Nyquist plot using python but I have no clue what all parameters are required by python to do plot that curve.
Here is a glimpse of the parameters that I have:
Channel_ID,Step_ID,Cycle_ID,Test_Time,EIS_Test_ID,EIS_Data_Point,Frequency,Zmod,Zphz,Zreal,Zimg,OCV,AC_Amp_RMS
4,7,1,36966.3072,0,0,200015.6,0.4933,70.9969,0.1606,0.4664,3.6231,0.35
4,7,1,36966.3072,0,1,158953.1,0.412,70.8901,0.1349,0.3893,3.6231,0.35
4,7,1,36966.3072,0,2,126234.4,0.3437,70.7115,0.1135,0.3244,3.6231,0.35
4,7,1,36966.3072,0,3,100265.6,0.2869,70.6312,0.0951,0.2706,3.6231,0.35
4,7,1,36966.3072,0,4,79640.63,0.2364,70.2418,0.0799,0.2224,3.6231,0.35
and above are the values to those parameters.
Based on the above parameters that are
Test_Time, Frequency, Zmod, Zphz, Zreal, Zimg, OCV, AC_Amp_RMS where Zmod is the absolute value of Zreal and Zimg, I need to draw a Nyquist plot. I have no clue how these parameters could be used for the plot.
PS: I tried to plot the curve by making use of the real and imaginary part that is Zimg and Zreal
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
train_df = pd.read_csv("above_data_with_around_100_rows.csv")
plt.figure()
plt.plot(train_df["Zreal"], train_df["Zimg"], "b")
plt.plot(train_df["Zreal"], -train_df["Zimg"], "r")
plt.show()
Can this be the useful for Nyquist plot?

Pandas - comparing average of hour periods against each other for a given date range

I'm trying to get used to using datetime data in Pandas and plotting different comparisons for a given dataset. I'm using the London Air Quality dataset for Ozone to practice and am trying to replicate the chart below (that I've created using a pivot table in Excel) with Pandas and matplotlib.
The chart plots an average of each hours Ozone reading for each location across the entire dataset to see if there is one location which is constantly higher than others or if different locations have the highest Ozone levels at different periods throughout the day.
Essentially, I'm looking to plot the hourly average of Ozone for each location.
I've attempted to reshape the data into a multiindex format and then plot, similar to what I'd do in excel before plotting but am unsure if this is the correct way to approach the problem. Code for reshaping is below. I am still getting used to reshaping so not sure if this is the correct use/I am approaching the problem in the correct way and open to other methods to accomplish this task. Any assistance to accomplish this task would be much appreciated!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
data = pd.read_csv('/Users/xx/Downloads/LaqnData.csv')
data['ReadingDateTime'] = pd.to_datetime(data['ReadingDateTime'])
data['Date'] = pd.to_datetime(data['ReadingDateTime']).dt.date
data['Time'] = pd.to_datetime(data['ReadingDateTime']).dt.time
data.set_index(['Date', 'Time'], inplace = True)
hourly_dataframe = data.pivot_table(columns = 'Site', values = 'Value', index = ['Date', 'Time'])
hourly_dataframe.fillna(method = 'ffill', inplace = True)
hourly_dataframe[hourly_dataframe < 0] = 0
I have gone to the site and downloaded a 24 hour reading for the following;
data.Site.unique()
array(['BX1', 'TH4', 'BT4', 'HI0', 'BL0', 'RD0'], dtype=object)
I adopted your code to this point:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
data = pd.read_csv('/Users/xx/Downloads/LaqnData.csv')
data['ReadingDateTime'] = pd.to_datetime(data['ReadingDateTime'])
I then use datetime index to call each hour in the groupby function.
data.groupby([data.index.hour, data['Site']])['Value'].mean().reset_index()`#Convert to dataframe.`
To plot, I chain unstack to the groupby function and plot directly.
data.groupby([data.index.hour, data['Site']])['Value'].mean().reset_index#unstack().plot()
plt.xlabel('Hour of the day')
plt.ylabel('Ozone')
plt.title('Avarage Hourly comparison')
plt.legend()`# If you want the legend to appear in default location`
If fussed about legend location, this post explains it very well. In your case;
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15),
fancybox=True, shadow=True, ncol=6)

Y-axis values not showing in matplotlib.pyplot plot

My plot is not showing any indication of what the order of magnitude of my y-values are on the axis. How do I force python to indicate some values on the y-axis?
import numpy as np
import matplotlib.pyplot as plt
BERfinal = [0.4967843137254903, 0.49215686274509757, 0.4938823529411763,
0.49170588235294116, 0.48852941176470605, 0.48203921568627417,
0.4797058823529405, 0.47454901960784257, 0.4795686274509802,
0.474901960784313, 0.4732549019607838, 0.4703137254901953,
0.4705490196078425]
x = np.linspace(-4,8,len(BERfinal))
plt.semilogy(x,BERfinal)
plt.title("BER vs SNR")
plt.ylabel("Bit Error Rate(BER)")
plt.xlabel("Signal-to-Noise Ratio(SNR)[dB]")
plt.xlim(-4,8)
I ended up playing around with:
plt.ylim(4.7*10**-1, 5*10**-1)
and changed the values until I found an appropriate range. It now shows 5x10^-1 on the y-axis.

Pull out chunks of a plot made in python and re-display

I have made a plot in jupyter that has an x-axis spanning for about 40 seconds. I want to pull out sections that are milliseconds long and re-display them as separate plots (so that they can be better viewed). How would I go about doing this?
You could use some subplots, and slice the original data arrays. For example:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,40,1000)
y = np.random.random(1000)
fig, [ax1,ax2,ax3] = plt.subplots(3,1)
ax1.plot(x,y)
ax2.plot(x[100:120],y[100:120])
ax3.plot(x[500:520],y[500:520])
plt.show()

Resources